Skip to content

The Event System

Every signal GridNMS collects — a device going down, a syslog message, an SNMP trap, a threshold breach — becomes an event. This page describes the event lifecycle: collection, the transformation rules engine, the two-tier store, and notification delivery.

Events originate in one of four collector services:

SourceWhat it does
monitorPings every monitored device via ICMP. Raises PING_DOWN (critical) when a device goes unreachable and PING_UP (info) when it returns, which clears the open down event.
pollerRuns pack collectors on a per-device interval, records metrics, and raises/clears events when an interface bandwidth threshold is crossed.
syslogReceives RFC 3164 / RFC 5424 UDP syslog on 514/udp.
snmptrapReceives SNMP v1/v2c/v3 traps and informs on 162/udp.

Syslog and SNMP-trap packets are written to a durable per-collector queue the instant they arrive, then processed in batches — so no messages are dropped during bursts or while the database is busy.

All severities across the system use a single 5-level scale (see the severity reference). Incoming syslog priorities and SNMP trap types are mapped onto it:

Syslog priority → GridNMS severity

Syslog priorityGridNMS severity
0 Emergency, 1 Alert, 2 Critical1 Critical
3 Error2 Error
4 Warning3 Warning
5 Notice, 6 Informational4 Info
7 Debug5 Debug

SNMPv1 generic trap → tag / severity

GenericTagSeverity
0 coldStartSNMP_COLD_START2 Error
1 warmStartSNMP_WARM_START3 Warning
2 linkDownSNMP_LINK_DOWN1 Critical
3 linkUpSNMP_LINK_UP4 Info
4 authFailureSNMP_AUTH_FAIL2 Error
5 egpNeighborLossSNMP_EGP_LOSS3 Warning
6 enterpriseSpecificSNMP_TRAP3 Warning

SNMPv2c/v3 traps derive a SNMP_TRAP_<oid> tag with a default Warning severity, which transformations can override.

Syslog and trap events carry two timestamps: the event time parsed from the message itself (so the original device time is preserved even when delivery is delayed) and the receive time recorded when the collector got the packet. Both are kept in the cold archive.

Before an event is stored, it passes through two enrichment stages:

  1. Vendor parser chain (syslog only) — vendor-specific formats are normalised into meaningful tags and severities. For example, Palo Alto PAN-OS CSV logs become PAN_TRAFFIC, PAN_THREAT, PAN_SYSTEM, and so on.

  2. Transformation rules engine — admin-defined rules match on a message regex, tag, device, class, or severity and apply an action:

    ActionEffect
    setTagReplace the event’s tag
    setSeverityOverride the severity
    suppressDrop the event entirely (nothing is stored)
    closeExistingAuto-close matching open events

Rules are evaluated in order; the first matching suppress drops the event.

Once enriched, an event is written to the hot store (PostgreSQL Events) and is immediately visible in the UI. Syslog and trap events are simultaneously written to the cold archive (a fire-and-forget dual write), so full history is available in Event History right away.

  • Hot store — recent, active, and recently-cleared events only, within a configurable retention window (default 30 days). Kept small for UI speed.
  • Cold archive — long-term history in daily SQLite files (default) or ClickHouse. An hourly scheduler ages old events out of the hot store.

Key event fields:

FieldMeaning
Severity1 Critical … 5 Debug
TagShort machine-readable event tag (e.g. PING_DOWN)
SourceOriginating service: monitor, poller, syslog, snmptrap
Status0 open · 1 acknowledged · 2 closed
CountTimes the event has fired without clearing
FirstSeen / LastSeenFirst and most-recent occurrence

After an event is written, it’s matched against per-user subscriptions. Subscriptions filter by device, class, tag, severity threshold, and message regex. A match queues a notification, and a worker delivers it:

ChannelTransport
EmailSMTP
WebhookHTTP POST (JSON, custom headers supported)
SlackIncoming webhook
PagerDutyEvents API v2

Failed deliveries are retried up to three times before being marked failed.

Event History is queried from the cold archive with filters for device, time range, severity, source, and full-text search on the message and tag — with a time-bucketed severity histogram for the selected range.