Metrics

Metrics now includes multi-panel dashboards at both the tenant level and the individual agent level.

Where to Find It

  • Open Monitoring > Metrics for tenant-wide dashboards.

  • Open an individual agent and select the Metrics tab for detailed per-agent dashboards.

Dashboard Coverage

Tenant Metrics (Monitoring > Metrics)

The tenant metrics view includes tabs for:

  • Event

  • HTTP

  • Cron

  • Internal API

For Event/HTTP/Cron tabs, each row can be expanded to show per-instance CPU and memory charts.

Agent Metrics (Agent > Metrics)

  • Event agents include Event Metrics and System Metrics.

  • HTTP agents include HTTP Metrics and System Metrics.

  • Cron agents show System Metrics.

Time Range, Rollup, and Auto Refresh

  • Default settings are Last 24 Hours, 1m rollup, and max aggregation.

  • Rollup options are constrained by selected range to control point density.

  • Rollup defaults increase as ranges widen (for example: 24h -> 1m, 3d -> 5m, 7d -> 15m).

  • Time range, rollup, aggregation, and selected tab are saved in browser local storage.

  • Time ranges also sync to URL parameters for shareable links (last, from, to).

  • Auto refresh runs every 60 seconds for preset ranges, and is disabled for custom date ranges.

Linked Charts and Presentation Behavior

  • Charts in the same dashboard are linked for synchronized hover and zoom windows.

  • Zooming any chart pauses auto refresh and shows a pause state in the refresh control.

  • Restore returns all linked charts to the full selected range.

  • Use Zoom Window converts the current zoom selection into a custom time range.

  • Full screen mode hides navigation and header chrome for a focused dashboard view.

  • Browser back exits full screen first before navigating away.

  • Layout is responsive across breakpoints (wide multi-column down to single-column).

Event Agent Dashboard Metrics

Event agents include Pulsar-driven panels:

  • Backlog Count

  • Storage Backlog Size (KB)

  • Messages Received (Per Sec)

  • Data Received (KB/Sec)

These metrics help correlate backlog growth with message volume and payload throughput.

System Metrics (All Agent Types)

System metrics include:

  • Terminated Instances (including OOMKilled and other stop reasons)

  • Instance Count over time

  • Network I/O (KB and KB/Sec) by receive/transmit direction

  • CPU Usage (%) and CPU Usage (mCore) with request/limit references

  • Memory Usage (%) and Memory Usage (GB) with Usage, RSS, Working Set, and request/limit references

Instance Count overlays termination/watchdog markers. Hovering over markers shows stop reason, exit code, restart count, pod instance, image, start/termination timestamps, and runtime.

HTTP and Internal API Dashboards

  • HTTP dashboards include endpoint counts, latency, and status-code trends, with matching aggregate tables.

  • Internal API dashboards include request volume, latency slices, rate-limit delays, and rate-limit errors by service/user/method/path.

Interpreting Rollups and Gaps

  • Larger rollups smooth short spikes because each point represents an aggregated bucket.

  • A chart can look lower than max summary values when a high spike is averaged within a bucket.

  • Flat or inactive series can show sparse points or gaps when no new source metrics are emitted.

Last updated

Was this helpful?