Architecture
GridNMS uses a collector-first architecture. All active monitoring happens in collectors; the control plane is an API and broker that holds state and serves the UI. Collectors connect outbound to the control plane — there are no inbound connections to GridNMS.
The two planes
Section titled “The two planes”┌─────────────────────────────────────────────────────────┐│ Browser ││ React SPA (UI) │└───────────────────────┬───────────────────────────────────┘ │ REST /api/* + WebSocket tunnel┌───────────────────────▼───────────────────────────────────┐│ server/ — Control Plane ││ Express REST API · Session auth · WebSocket tunnel broker ││ License/cloud connector · Graph service · Retention ││ (self-hosted: also bakes & serves the client SPA) │└──────┬───────────────────────────────┬────────────────────┘ │ │ WebSocket tunnel (outbound from collector) ┌────▼──────────┐ ┌──────────▼──────────────────┐ │ PostgreSQL │ │ collector/ (data plane) │ │ TimescaleDB │ │ │ │ Memgraph │ │ monitor · poller │ │ │ │ syslog · snmptrap │ │ Stores: │ │ │ │ • topology │ │ Connects outbound via the │ │ • alarm state│ │ WebSocket tunnel; receives │ │ • device up/ │ │ a config snapshot, then │ │ down │ │ polls/listens accordingly. │ └───────────────┘ └──────────────────────────────┘Control plane (the server)
Section titled “Control plane (the server)”- Serves the web UI and the REST API (
/api/*). - Brokers the WebSocket tunnel that collectors connect to.
- Runs the cloud connector for licensing, notices, and updates.
- Holds topology (graph DB), alarm/event state, and device up/down in PostgreSQL.
- Does all parsing and interpretation of the raw data collectors ship.
Data plane (collectors)
Section titled “Data plane (collectors)”- Run close to your devices and perform all active monitoring: ICMP reachability, SNMP/SSH polling, syslog reception (514/udp), and SNMP-trap reception (162/udp).
- A collector only runs work after a config snapshot arrives over the tunnel — there is no standalone local-DB polling.
- A collector may only initiate outbound connections to IPs within its explicitly assigned networks plus the control plane. This network assignment is the permission boundary.
Storage layers
Section titled “Storage layers”| Layer | Technology | What it holds |
|---|---|---|
| Hot relational state | PostgreSQL 16 / TimescaleDB | Devices, interfaces, VLANs, nodes, neighbors, users, packs, open + recent events, metric samples |
| Topology graph | Memgraph (Bolt) | Device nodes and CONNECTS_TO edges for the network graph, blast-radius, and upstream-path queries |
| Cold event archive | SQLite (default) or ClickHouse | Full long-term syslog + SNMP-trap history, beyond the hot retention window |
Two-tier event archive
Section titled “Two-tier event archive”The PostgreSQL Events table is kept lean — it holds only recent, active, and
recently-cleared events (a configurable hot-retention window, default 30 days).
Older events age out into the cold archive:
- SQLite (default) — one database file per calendar day under the archive directory. Zero extra infrastructure.
- ClickHouse (optional) — a high-volume backend for NOC-scale deployments.
The hot store powers the live UI; the cold archive powers Event History search. See The event system for the full lifecycle.
Metrics at scale
Section titled “Metrics at scale”MetricSamples is backed by TimescaleDB: 1-day chunks, ~90% compression on
chunks older than 7 days, and a 5-minute continuous aggregate so long-range
dashboards stay fast. It degrades gracefully to plain PostgreSQL if TimescaleDB
isn’t available.
Where to go next
Section titled “Where to go next”- Collectors & source IPs — deployment modes and the NAT gotcha.
- The event system — collection, transformation, archival, and alerting.
- Self-hosting overview — what’s in the stack and how to run it.