Skip to content

Architecture

GridNMS uses a collector-first architecture. All active monitoring happens in collectors; the control plane is an API and broker that holds state and serves the UI. Collectors connect outbound to the control plane — there are no inbound connections to GridNMS.

┌─────────────────────────────────────────────────────────┐
│ Browser │
│ React SPA (UI) │
└───────────────────────┬───────────────────────────────────┘
│ REST /api/* + WebSocket tunnel
┌───────────────────────▼───────────────────────────────────┐
│ server/ — Control Plane │
│ Express REST API · Session auth · WebSocket tunnel broker │
│ License/cloud connector · Graph service · Retention │
│ (self-hosted: also bakes & serves the client SPA) │
└──────┬───────────────────────────────┬────────────────────┘
│ │ WebSocket tunnel (outbound from collector)
┌────▼──────────┐ ┌──────────▼──────────────────┐
│ PostgreSQL │ │ collector/ (data plane) │
│ TimescaleDB │ │ │
│ Memgraph │ │ monitor · poller │
│ │ │ syslog · snmptrap │
│ Stores: │ │ │
│ • topology │ │ Connects outbound via the │
│ • alarm state│ │ WebSocket tunnel; receives │
│ • device up/ │ │ a config snapshot, then │
│ down │ │ polls/listens accordingly. │
└───────────────┘ └──────────────────────────────┘
  • Serves the web UI and the REST API (/api/*).
  • Brokers the WebSocket tunnel that collectors connect to.
  • Runs the cloud connector for licensing, notices, and updates.
  • Holds topology (graph DB), alarm/event state, and device up/down in PostgreSQL.
  • Does all parsing and interpretation of the raw data collectors ship.
  • Run close to your devices and perform all active monitoring: ICMP reachability, SNMP/SSH polling, syslog reception (514/udp), and SNMP-trap reception (162/udp).
  • A collector only runs work after a config snapshot arrives over the tunnel — there is no standalone local-DB polling.
  • A collector may only initiate outbound connections to IPs within its explicitly assigned networks plus the control plane. This network assignment is the permission boundary.
LayerTechnologyWhat it holds
Hot relational statePostgreSQL 16 / TimescaleDBDevices, interfaces, VLANs, nodes, neighbors, users, packs, open + recent events, metric samples
Topology graphMemgraph (Bolt)Device nodes and CONNECTS_TO edges for the network graph, blast-radius, and upstream-path queries
Cold event archiveSQLite (default) or ClickHouseFull long-term syslog + SNMP-trap history, beyond the hot retention window

The PostgreSQL Events table is kept lean — it holds only recent, active, and recently-cleared events (a configurable hot-retention window, default 30 days). Older events age out into the cold archive:

  • SQLite (default) — one database file per calendar day under the archive directory. Zero extra infrastructure.
  • ClickHouse (optional) — a high-volume backend for NOC-scale deployments.

The hot store powers the live UI; the cold archive powers Event History search. See The event system for the full lifecycle.

MetricSamples is backed by TimescaleDB: 1-day chunks, ~90% compression on chunks older than 7 days, and a 5-minute continuous aggregate so long-range dashboards stay fast. It degrades gracefully to plain PostgreSQL if TimescaleDB isn’t available.