# Metrics

Metrics now includes multi-panel dashboards at both the tenant level and the individual agent level.

## Where to Find It

* Open **Monitoring > Metrics** for tenant-wide dashboards.
* Open an individual agent and select the **Metrics** tab for detailed per-agent dashboards.

## Dashboard Coverage

### Tenant Metrics (Monitoring > Metrics)

The tenant metrics view includes tabs for:

* **Event**
* **HTTP**
* **Cron**
* **Internal API**

For Event/HTTP/Cron tabs, each row can be expanded to show per-instance CPU and memory charts.

### Agent Metrics (Agent > Metrics)

* **Event agents** include Event Metrics and System Metrics.
* **HTTP agents** include HTTP Metrics and System Metrics.
* **Cron agents** show System Metrics.

## Time Range, Rollup, and Auto Refresh

* Default settings are **Last 24 Hours**, **1m rollup**, and **max aggregation**.
* Rollup options are constrained by selected range to control point density.
* Rollup defaults increase as ranges widen (for example: 24h -> 1m, 3d -> 5m, 7d -> 15m).
* Time range, rollup, aggregation, and selected tab are saved in browser local storage.
* Time ranges also sync to URL parameters for shareable links (`last`, `from`, `to`).
* Auto refresh runs every 60 seconds for preset ranges, and is disabled for custom date ranges.

## Linked Charts and Presentation Behavior

* Charts in the same dashboard are linked for synchronized hover and zoom windows.
* Zooming any chart pauses auto refresh and shows a pause state in the refresh control.
* **Restore** returns all linked charts to the full selected range.
* **Use Zoom Window** converts the current zoom selection into a custom time range.
* Full screen mode hides navigation and header chrome for a focused dashboard view.
* Browser back exits full screen first before navigating away.
* Layout is responsive across breakpoints (wide multi-column down to single-column).

## Event Agent Dashboard Metrics

Event agents include Pulsar-driven panels:

* **Backlog Count**
* **Storage Backlog Size (KB)**
* **Messages Received (Per Sec)**
* **Data Received (KB/Sec)**

These metrics help correlate backlog growth with message volume and payload throughput.

## System Metrics (All Agent Types)

System metrics include:

* **Terminated Instances** (including OOMKilled and other stop reasons)
* **Instance Count** over time
* **Network I/O (KB and KB/Sec)** by receive/transmit direction
* **CPU Usage (%)** and **CPU Usage (mCore)** with request/limit references
* **Memory Usage (%)** and **Memory Usage (GB)** with Usage, RSS, Working Set, and request/limit references

Instance Count overlays termination/watchdog markers. Hovering over markers shows stop reason, exit code, restart count, pod instance, image, start/termination timestamps, and runtime.

## HTTP and Internal API Dashboards

* HTTP dashboards include endpoint counts, latency, and status-code trends, with matching aggregate tables.
* Internal API dashboards include request volume, latency slices, rate-limit delays, and rate-limit errors by service/user/method/path.

## Interpreting Rollups and Gaps

* Larger rollups smooth short spikes because each point represents an aggregated bucket.
* A chart can look lower than max summary values when a high spike is averaged within a bucket.
* Flat or inactive series can show sparse points or gaps when no new source metrics are emitted.
