How it works
Cerebrium automatically pushes metrics from your applications to your monitoring platform every 60 seconds using the OpenTelemetry Protocol (OTLP). You provide an OTLP endpoint and authentication credentials through the Cerebrium dashboard, and Cerebrium handles the rest — collecting resource usage and execution data, formatting it as OpenTelemetry metrics, and delivering it to your platform.- Metrics are pushed every 60 seconds
- Failed pushes are retried 3 times with exponential backoff
- If pushes fail 10 consecutive times, export is automatically paused to avoid noise (you can re-enable at any time from the dashboard)
- Your credentials are stored encrypted and are never returned in API responses
Supported destinations
- Grafana Cloud — Primary supported destination
- Datadog — Via OTLP endpoint
- Prometheus — Self-hosted with OTLP receiver enabled
- Custom — Any OTLP-compatible endpoint (New Relic, Honeycomb, etc.)
What metrics are exported?
Resource Metrics
| Metric | Type | Unit | Description |
|---|---|---|---|
cerebrium_cpu_utilization_cores | Gauge | cores | CPU cores actively in use per app |
cerebrium_memory_usage_bytes | Gauge | bytes | Memory actively in use per app |
cerebrium_gpu_memory_usage_bytes | Gauge | bytes | GPU VRAM in use per app |
cerebrium_gpu_compute_utilization_percent | Gauge | percent | GPU compute utilization (0-100) per app |
cerebrium_containers_running_count | Gauge | count | Number of running containers per app |
cerebrium_containers_ready_count | Gauge | count | Number of ready containers per app |
Execution Metrics
| Metric | Type | Unit | Description |
|---|---|---|---|
cerebrium_run_execution_time_ms | Histogram | ms | Time spent executing user code |
cerebrium_run_queue_time_ms | Histogram | ms | Time spent waiting in queue |
cerebrium_run_coldstart_time_ms | Histogram | ms | Time for container cold start |
cerebrium_run_response_time_ms | Histogram | ms | Total end-to-end response time |
cerebrium_run_total_total | Counter | — | Total run count |
cerebrium_run_successes_total | Counter | — | Successful run count |
cerebrium_run_errors_total | Counter | — | Failed run count |
Labels
Every metric includes the following labels for filtering and grouping:| Label | Description | Example |
|---|---|---|
project_id | Your Cerebrium project ID | p-abc12345 |
app_id | Full application identifier | p-abc12345-my-model |
app_name | Human-readable app name | my-model |
region | Deployment region | us-east-1 |
Setup Guide
Step 1: Get your platform credentials
Before heading to the Cerebrium dashboard, you’ll need an OTLP endpoint and authentication credentials from your monitoring platform.- Grafana Cloud
- Datadog
- Self-hosted Prometheus
- Custom OTLP
- Sign in to Grafana Cloud
- Go to your stack → Connections → Add new connection
- Search for “OpenTelemetry” and click Configure
- Copy the OTLP endpoint — this will match your stack’s region:
- US:
https://otlp-gateway-prod-us-east-0.grafana.net/otlp - EU:
https://otlp-gateway-prod-eu-west-0.grafana.net/otlp - Other regions will show their specific URL on the configuration page
- US:
- On the same page, generate an API token with the MetricsPublisher role
- The page will show you an Instance ID and the generated token. Run the following in your terminal to create the Basic auth string:
Step 2: Configure in the Cerebrium dashboard
- In the Cerebrium dashboard, go to your project → Integrations → Metrics Export
- Paste your OTLP endpoint from Step 1
- Add your authentication headers:
- Grafana Cloud
- Datadog
- Self-hosted Prometheus
- Custom OTLP
- Header name:
Authorization- Header value:Basic YOUR_BASE64_STRING(the output from the terminal command in Step 1)
- Click Save & Enable
Step 3: Verify the connection
Click Test Connection in the dashboard to verify Cerebrium can reach your monitoring platform. You’ll see a success or failure message with details. If the test fails, double-check your endpoint URL and credentials from Step 1.Viewing Metrics
Once connected, metrics will appear in your monitoring platform within a minute.- Grafana Cloud
- Datadog
- Prometheus
- Go to your Grafana Cloud dashboard → Explore
- Select your Prometheus data source — it will be named something like grafanacloud-yourstack-prom (you can find it under Connections → Data sources if you’re unsure)
- Search for metrics starting with
cerebrium_
Managing Metrics Export
You can manage your metrics export configuration from the dashboard at any time by going to Integrations → Metrics Export.- Disable export: Toggle the switch off. Your configuration is preserved — you can re-enable at any time without reconfiguring.
- Update credentials: Enter new authentication headers and click Save Changes. Useful when rotating API keys.
- Change endpoint: Update the OTLP endpoint field and click Save Changes.
- Check status: The dashboard shows whether export is connected, the time of the last successful export, and any error messages.
API Reference
API Reference
You can also manage metrics export programmatically. Find your Cerebrium API key in the dashboard under Settings → API Keys.
Enable with endpoint and credentials:Test connection:Disable export:The
| Method | Endpoint | Description |
|---|---|---|
GET | /v2/metrics-export/{project_id}/config | Get current export configuration |
PUT | /v2/metrics-export/{project_id}/config | Update export configuration |
POST | /v2/metrics-export/{project_id}/test | Test connection to your monitoring platform |
authHeaders field is a map of header name → header value. These are stored encrypted and never returned in API responses.