Metrics
Metrics now includes multi-panel dashboards at both the tenant level and the individual agent level.
Where to Find It
Open Monitoring > Metrics for tenant-wide dashboards.
Open an individual agent and select the Metrics tab for detailed per-agent dashboards.
Dashboard Coverage
Tenant Metrics (Monitoring > Metrics)
The tenant metrics view includes tabs for:
Event
HTTP
Cron
Internal API
For Event/HTTP/Cron tabs, each row can be expanded to show per-instance CPU and memory charts.
Agent Metrics (Agent > Metrics)
Event agents include Event Metrics and System Metrics.
HTTP agents include HTTP Metrics and System Metrics.
Cron agents show System Metrics.
Time Range, Rollup, and Auto Refresh
Default settings are Last 24 Hours, 1m rollup, and max aggregation.
Rollup options are constrained by selected range to control point density.
Rollup defaults increase as ranges widen (for example: 24h -> 1m, 3d -> 5m, 7d -> 15m).
Time range, rollup, aggregation, and selected tab are saved in browser local storage.
Time ranges also sync to URL parameters for shareable links (
last,from,to).Auto refresh runs every 60 seconds for preset ranges, and is disabled for custom date ranges.
Linked Charts and Presentation Behavior
Charts in the same dashboard are linked for synchronized hover and zoom windows.
Zooming any chart pauses auto refresh and shows a pause state in the refresh control.
Restore returns all linked charts to the full selected range.
Use Zoom Window converts the current zoom selection into a custom time range.
Full screen mode hides navigation and header chrome for a focused dashboard view.
Browser back exits full screen first before navigating away.
Layout is responsive across breakpoints (wide multi-column down to single-column).
Event Agent Dashboard Metrics
Event agents include Pulsar-driven panels:
Backlog Count
Storage Backlog Size (KB)
Messages Received (Per Sec)
Data Received (KB/Sec)
These metrics help correlate backlog growth with message volume and payload throughput.
System Metrics (All Agent Types)
System metrics include:
Terminated Instances (including OOMKilled and other stop reasons)
Instance Count over time
Network I/O (KB and KB/Sec) by receive/transmit direction
CPU Usage (%) and CPU Usage (mCore) with request/limit references
Memory Usage (%) and Memory Usage (GB) with Usage, RSS, Working Set, and request/limit references
Instance Count overlays termination/watchdog markers. Hovering over markers shows stop reason, exit code, restart count, pod instance, image, start/termination timestamps, and runtime.
HTTP and Internal API Dashboards
HTTP dashboards include endpoint counts, latency, and status-code trends, with matching aggregate tables.
Internal API dashboards include request volume, latency slices, rate-limit delays, and rate-limit errors by service/user/method/path.
Interpreting Rollups and Gaps
Larger rollups smooth short spikes because each point represents an aggregated bucket.
A chart can look lower than max summary values when a high spike is averaged within a bucket.
Flat or inactive series can show sparse points or gaps when no new source metrics are emitted.
Last updated
Was this helpful?