# Metrics

Metrics now includes multi-panel dashboards at both the tenant level and the individual agent level.

## Where to Find It

* Open **Monitoring > Metrics** for tenant-wide dashboards.
* Open an individual agent and select the **Metrics** tab for detailed per-agent dashboards.

## Dashboard Coverage

### Tenant Metrics (Monitoring > Metrics)

The tenant metrics view includes tabs for:

* **Event**
* **HTTP**
* **Cron**
* **Internal API**
* **AI Gateway**

For Event/HTTP/Cron tabs, each row can be expanded to show per-instance CPU and memory charts.

### Agent Metrics (Agent > Metrics)

* **Event agents** include Event Metrics and System Metrics.
* **HTTP agents** include HTTP Metrics and System Metrics.
* **Cron agents** show System Metrics.

## Time Range, Rollup, and Auto Refresh

* Default settings are **Last 24 Hours**, **1m rollup**, and **max aggregation**.
* Rollup options are constrained by selected range to control point density.
* Rollup defaults increase as ranges widen (for example: 24h -> 1m, 3d -> 5m, 7d -> 15m).
* Time range, rollup, aggregation, and selected tab are saved in browser local storage.
* Time ranges also sync to URL parameters for shareable links (`last`, `from`, `to`).
* Auto refresh runs every 60 seconds for preset ranges, and is disabled for custom date ranges.

## Linked Charts and Presentation Behavior

* Charts in the same dashboard are linked for synchronized hover and zoom windows.
* Zooming any chart pauses auto refresh and shows a pause state in the refresh control.
* **Restore** returns all linked charts to the full selected range.
* **Use Zoom Window** converts the current zoom selection into a custom time range.
* Full screen mode hides navigation and header chrome for a focused dashboard view.
* Browser back exits full screen first before navigating away.
* Layout is responsive across breakpoints (wide multi-column down to single-column).

## Event Agent Dashboard Metrics

Event agents include Pulsar-driven panels:

* **Backlog Count**
* **Storage Backlog Size (KB)**
* **Messages Received (Per Sec)**
* **Data Received (KB/Sec)**

These metrics help correlate backlog growth with message volume and payload throughput.

## System Metrics (All Agent Types)

System metrics include:

* **Terminated Instances** (including OOMKilled and other stop reasons)
* **Instance Count** over time
* **Network I/O (KB and KB/Sec)** by receive/transmit direction
* **CPU Usage (%)** and **CPU Usage (mCore)** with request/limit references
* **Memory Usage (%)** and **Memory Usage (GB)** with Usage, RSS, Working Set, and request/limit references

Instance Count overlays termination/watchdog markers. Hovering over markers shows stop reason, exit code, restart count, pod instance, image, start/termination timestamps, and runtime.

## HTTP and Internal API Dashboards

* HTTP dashboards include endpoint counts, latency, and status-code trends, with matching aggregate tables.
* Internal API dashboards include request volume, latency slices, rate-limit delays, and rate-limit errors by service/user/method/path.

## Interpreting Rollups and Gaps

* Larger rollups smooth short spikes because each point represents an aggregated bucket.
* A chart can look lower than max summary values when a high spike is averaged within a bucket.
* Flat or inactive series can show sparse points or gaps when no new source metrics are emitted.

## AI Gateway Dashboard

The AI Gateway tab provides visibility into LLM provider usage, cost, latency, and reliability across all AI routes in the tenant. It supports time ranges up to **90 days** — other tabs are capped at 30 days. Switching away from the AI Gateway tab while a 90-day range is selected will automatically clamp the range to 30 days.

### Summary Cards

Six cards at the top of the dashboard display at-a-glance metrics for the selected time period. Each card shows a trend indicator comparing the current period to the prior equivalent window.

* **AI Calls** — Total number of provider API calls made. Each call corresponds to a single request sent to an LLM provider endpoint.
* **Tokens Processed** — Combined count of input and output tokens consumed across all calls.
* **Estimated Cost** — Total spend for the period. An info icon (ⓘ) on the card opens a breakdown of cost by token category: Input (uncached), Cache Read, Cache Write, Text Output, and Reasoning. The tooltip also shows when model rates were last synced and what percentage of calls have pricing data.
  * A **warning icon** replaces the info icon when the cost estimate may be unreliable. This occurs in two cases:
    * **Stale pricing data** — model rates have not been synced in the last 24 hours.
    * **Low coverage** — fewer than 90% of calls in the selected time range have pricing data available. This typically indicates a recently added model whose rates have not yet been configured.
* **Avg Latency** — Mean end-to-end response time across all calls.
* **Error Rate** — Percentage of calls that resulted in an error.
* **Cache Hit Rate** — Percentage of calls served from the provider's prompt cache.

### Time Series Charts

Six charts are displayed in a two-column grid below the summary cards. All charts share synchronized hover and zoom interactions.

* **Request Volume & Failures** — Call volume shown as bars with a failure count overlay. Use this to spot spikes in traffic or error bursts.
* **Token Usage** — Stacked area chart broken down by token type: Input, Cache Read, Cache Write, Output, and Reasoning. A total token line is overlaid for quick trend comparison.
* **Latency Trend** — P50 (median) and P95 latency over time. The gap between the two lines indicates how variable response times are — a widening gap suggests intermittent slowdowns even when the median looks healthy.
* **Cost Breakdown** — Stacked bars by token category (Input, Cache Read, Cache Write, Output, Reasoning) with a total cost trend line overlaid.
* **Route Health** — Horizontal stacked bar chart showing request outcomes per route. See [Route Health Chart](#route-health-chart) below.
* **Finish Reasons** — Donut chart showing how requests terminated: `stop`, `tool-calls`, `length`, `content_filter`, or `error`. The center displays total request count.

### Breakdown Table

A sortable table below the charts groups metrics by a user-selected dimension. Use the dropdown at the top-right of the table to switch grouping.

**Group by options:**

| Option           | Description                                             |
| ---------------- | ------------------------------------------------------- |
| Provider / Model | Breaks down metrics by LLM provider and model (default) |
| Route            | Groups by configured AI route                           |
| Agent            | Groups by the calling agent                             |
| Node             | Groups by workflow node                                 |
| Finish Reason    | Groups by how the request terminated                    |

**Columns:** Calls, Input Tokens, Output Tokens, Cache Hit, Avg Latency, P95 Latency, Errors, Error Rate, Est. Cost. All columns are sortable.

Rows with errors show a red drill-down icon next to the error count (not available when grouped by Provider / Model). Click it to open the [Error Detail dialog](#error-detail-dialog).

### Tool Calls Table

A separate table at the bottom of the page shows metrics for tool invocations made during agent execution. Columns: Tool Name, Calls, Avg Latency, P95 Latency, Errors, Error Rate.

***

### Route Health Chart

The Route Health chart gives a per-route breakdown of how requests resolved across the selected time period. Each route appears as a horizontal stacked bar with three segments:

* **Succeeded** (green) — Requests that completed successfully on the first attempt, with no fallback needed.
* **Recovered** (orange) — Requests that initially failed on the primary model but succeeded after falling back to one or more alternative models. These requests produced a successful response but required extra attempts to do so.
* **Failed** (red) — Requests where all attempts were exhausted — including any configured fallbacks — and no successful response was returned.

Hovering over a bar segment shows a tooltip with the exact count and percentage for each status, the total request and call counts for the route, the primary model, any configured fallback models, and the total wasted cost if applicable.

Clicking any bar segment opens the **Fallback Paths dialog** for that route.

#### Fallback Paths Dialog

The Fallback Paths dialog shows how requests moved through the model chain for a given route, aggregated into distinct path sequences.

**Summary line** at the top shows the number of requests recovered by fallback, the number that ultimately failed, and the total wasted cost.

**Fallback paths table** — Each row represents a unique sequence of model attempts:

| Column      | Description                                                                                                                                                                                                                    |
| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Path        | The sequence of models attempted, shown as chips connected by arrows (→). Intermediate models are outlined; the final model is filled green if the path succeeded for any request, or red if all requests on that path failed. |
| Requests    | Number of requests that followed this exact path.                                                                                                                                                                              |
| Outcome     | Count of succeeded requests (green checkmark) and failed requests (red error icon) for this path.                                                                                                                              |
| Wasted Cost | Cost incurred on attempts that did not produce a successful response, shown in red.                                                                                                                                            |
| Total Cost  | Total cost of all attempts across all requests on this path.                                                                                                                                                                   |

Rows where the primary model failed with no fallback attempted appear as a separate entry with the primary model chip in red and the label *(no fallback attempted)*.

***

### Error Detail Dialog

The Error Detail dialog opens from the drill-down icon on any row in the breakdown table (except when grouped by Provider / Model). It provides a detailed view of errors for that grouping within the current time range and filters.

**Summary bar** at the top shows three stats: total **Errors**, **Error Rate**, and **Est. Cost** for the selected group.

#### Failure Breakdown Table

Shows which provider/model combinations generated errors:

| Column       | Description                                                                                                                              |
| ------------ | ---------------------------------------------------------------------------------------------------------------------------------------- |
| Provider     | LLM provider name                                                                                                                        |
| Model        | Model identifier                                                                                                                         |
| Calls        | Total attempts made to this model                                                                                                        |
| Failed       | Number of failed attempts (shown in red)                                                                                                 |
| Failure Rate | Percentage of calls to this model that failed (shown in red)                                                                             |
| Wasted Cost  | Cost incurred on failed attempts                                                                                                         |
| Top Errors   | Up to three error message chips showing the most common error messages for this model. Messages longer than 60 characters are truncated. |

#### Recent Fallback Chains

Below the table, the last 10 fallback chain sequences are shown in full detail. Each chain represents a single end-user request and the sequence of model attempts made to fulfill it.

**Chain header:** Request ID (first 12 characters), attempt count, total duration, wasted cost (if any), and an **exhausted** badge if all attempts failed.

**Each attempt shows:**

* Attempt number
* Success or failure icon
* Provider and model (e.g. `openai/gpt-4o`)
* Finish reason as a colored chip: red for errors, purple for `tool-calls`, green for normal completions
* Error message, if the attempt failed
* Duration and cost for that attempt


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.contextual.io/documentation-and-resources/tenants/monitoring/metrics.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
