Documentation
Technical documentation and guides.
Monitoring & Data Collection
FleetManager continuously collects three signal types from your edge devices via a secured HTTPS API.
Heartbeats
Regular heartbeat signals from your software instances including version number, uptime, PID, and IP addresses. Automatic detection of overdue devices with configurable timeout threshold (default: 5 minutes).
Diagnostics
Structured error, warning, and info messages with diagnostic code, module name, and severity level (0=Info/Green, 1=Warning/Yellow, 2=Error/Red). Enables targeted filtering and searching across large device fleets.
Status Updates
Explicit state transitions (running, degraded, error, stopped) with optional snapshot as structured JSON data. Ideal for periodic state reports and configuration reconciliation.
HTTPS Ingest API
Secure data transmission via HTTPS with bearer token authentication. Automatic registration of new instances on first contact (UPSERT pattern). Support for standard and legacy payload formats.
Rejection Logging
Failed ingest requests are logged with payload and rejection reason. Significantly simplifies debugging of misconfigured devices.
Traffic Light Dashboard & Aggregation
The central dashboard displays the health of your entire device fleet as a hierarchical traffic light view.
Three-Level Hierarchy
Customer → Machine → Software: Each level aggregates the status of its children. Spot problems at a glance — from overall customer health down to individual software instances.
Worst-Case Propagation
The traffic light color of a parent always reflects the worst state of its children. A single red device turns the entire machine and customer red.
Traffic Light Colors
Green (healthy), Yellow (warnings), Red (critical). Determined by: heartbeat presence, current status, diagnostic severity, and unacknowledged critical alerts.
Recursive Customer Hierarchy
The dashboard supports arbitrarily nested customer structures. Parent customers automatically see the aggregated status of all sub-units.
Alert Management & Acknowledgment
Critical events remain visible until a responsible person explicitly confirms them.
Sticky Alerts
Diagnostics with severity ≥ 2 (Red) remain marked as "sticky" until manually acknowledged. Prevents subsequent clean batches from masking critical issues.
Single & Bulk Acknowledgment
Acknowledge individual alerts or all open alerts for an instance in one step. Every acknowledgment is logged with username and timestamp.
Audit Trail
Complete traceability: Who acknowledged which alert and when? Essential for compliance evidence and internal quality processes.
Active Alerts Query
Query all unacknowledged alerts per instance — as a basis for dashboards, reports, or external integrations.
Log Search & History
All incoming data is stored in three separate logs and is fully searchable.
Heartbeat Log
Chronological record of all heartbeats with version, uptime, PID, and IP addresses. Enables analysis of restarts, version changes, and network modifications.
Diagnostics Log
All diagnostic messages with code, severity, timestamp, and acknowledgment status. Indexed for high-performance queries on severity, timestamp, and code.
Status Log
Complete history of all state transitions with optional snapshot data (JSON). Ideal for analyzing outage patterns and state transitions.
Advanced Search
Cross-instance search with filters: severity (exact or minimum), code range, message text (free-text search), time range, acknowledgment status. Pagination up to 1,000 entries per page.
Tenant Filtering
Users only see logs from their assigned customers and sub-units — automatic tenant isolation at the database level.
Multi-Tenancy & Access Control
Enterprise-grade multi-tenant architecture with role-based access control.
Customer Hierarchies
Unlimited nesting via parent-child relationships (adjacency list model). Recursive visibility: accessing a customer automatically includes the entire subtree.
Role Model
Four roles with graduated permissions: System Admin (full access), Customer Admin (own customer + subtree), Manager (manage sub-customers), Viewer (read-only access).
JWT Authentication
Login, token refresh, and logout via secure JWT bearer tokens with configurable expiration. Session-based token invalidation for immediate logout.
Two-Factor Authentication
TOTP-based 2FA (compatible with Google Authenticator, Authy, etc.). Setup via QR code, confirmation by code, recovery codes as backup. Encrypted TOTP secret storage.
User Management
Create, edit, and deactivate users per customer. Email verification, token-based password reset, admin-initiated password reset.
Token Management
Bearer tokens secure communication between your edge devices and FleetManager.
Token Lifecycle
Create, list, and revoke tokens. Optional expiration date with configurable auto-rotation. Token prefix (first 8 characters) for easy identification.
Customer Binding
Each token is bound to a specific customer slug. Devices authenticate via bearer token and can only submit data for the assigned customer.
Usage Tracking
Last usage is automatically recorded. Expiring tokens can be queried in advance (e.g., "all tokens expiring within 30 days").
Audit Trail
Who created the token? Who revoked it? Complete traceability for compliance requirements.
Notifications
Automatic notifications on traffic light changes via multiple channels.
Webhook Notifications
HMAC-SHA256 signed webhooks with JSON payload on traffic light changes. Integrates with Slack, Microsoft Teams, PagerDuty, or custom systems. Delivery log for every message.
Email Alerts
Configurable email notifications via SMTP. Support for TLS encryption and custom sender addresses.
Trigger Policies
Configurable per customer: red alerts only, yellow and red, or every change. Cooldown mechanism prevents notification storms from unstable devices.
Delivery Log
View all sent notifications per configuration — for debugging and traceability.
Whitelist Rules
Selectively suppress or downgrade known, non-critical messages.
Rule-Based Filtering
Match on diagnostic code, module name, machine name, or message content (regex support). Multiple criteria can be combined for precise matching.
Actions
Downgrade severity (e.g., Red → Yellow) or suppress the message entirely. Original severity is preserved for audit purposes.
Expiration Date
Rules can have an expiration date — ideal for temporary exceptions during planned maintenance windows.
Match Statistics
Automatic counting of rule matches. Helps evaluate whether a rule is still relevant.
Data Retention & Cleanup
Configurable retention policies keep your database lean.
Separate Retention Periods
Heartbeat logs, diagnostics logs, and status logs each have independently configurable retention periods (in days). Snapshot data separately configurable (in hours).
Per-Customer Overrides
Define custom retention periods per customer — e.g., longer storage for regulated industries.
Automatic Cleanup
Background process deletes expired entries every 6 hours automatically. Unacknowledged alerts are retained regardless of retention policy.
Security Monitoring
Extending FleetManager into a central security and incident logger for your device fleet.
Modular Security Agents
Agents for Linux and Windows that capture security events and report them as standardized diagnostics to FleetManager.
Standardized Code Taxonomy
Diagnostic code ranges for security events: Authentication (1xxx), Firewall/Network (2xxx), Patch Status (3xxx), File Integrity (4xxx), Malware Detection (5xxx), Configuration Drift (6xxx).
CRA/NIS2 Compliance
Audit trails and configurable retention policies as the foundation for demonstrating continuous security monitoring per EU Cyber Resilience Act and NIS2 directive.
Existing Infrastructure Integration
Leverages the existing FleetManager platform: traffic light dashboard, alert management, notifications, and log search — no additional infrastructure required.
Python Client Library
Ready-made library for quick integration into your existing software.
FleetMonitorClient
Python class for easy connection to the FleetManager Ingest API. Automatic heartbeat thread (configurable interval, default: 60 seconds).
Thread-Safe API
Diagnostics and status updates can be sent from any thread. Built-in retry logic and reconnect on connection interruptions.
Zero Configuration
Automatic start/stop status when creating or destroying the client. SSL certificate validation for production environments.