Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Metrics

Auth-O-Tron exposes Prometheus-compatible metrics on a dedicated port for monitoring and observability, following the ECMWF Codex Observability guidelines.

Endpoint

Metrics are served at:

GET /metrics

By default, this endpoint is available on port 9090. The format follows the Prometheus text exposition format.

Metric Families

Metric NameTypeLabelsDescription
auth_requests_totalCounterresult, realmTotal authentication requests
auth_duration_secondsHistogramresult, realmAuthentication latency distribution
auth_provider_attempts_totalCounterprovider_name, provider_type, realm, resultAttempts per authentication provider
auth_provider_duration_secondsHistogramprovider_name, provider_type, realmProvider-specific latency
augmenter_attempts_totalCounteraugmenter_name, augmenter_type, realm, resultToken augmentation attempts
augmenter_duration_secondsHistogramaugmenter_type, realmAugmentation latency

Label Values

result (auth_requests_total): success, no_auth_header, invalid_header, all_failed

result (auth_provider_attempts_total): success, error, timeout

result (augmenter_attempts_total): success, error

realm: The configured authentication realm name, or unknown when the request fails before realm resolution

provider_name: Identifier for the authentication provider

provider_type: Type of provider (plain, jwt, ecmwf-api, efas-api, openid-offline, ecmwf-token-generator)

augmenter_name: Identifier for the token augmenter

augmenter_type: Type of augmenter

PromQL Examples

Request Rate

rate(auth_requests_total[5m])

Authentication Latency (99th Percentile)

histogram_quantile(0.99, rate(auth_duration_seconds_bucket[5m]))

Error Rate by Realm

rate(auth_requests_total{result!="success"}[5m])

Slow Providers (95th Percentile)

histogram_quantile(0.95, rate(auth_provider_duration_seconds_bucket[5m]))

Provider Success Rate

rate(auth_provider_attempts_total{result="success"}[5m])

Alerting Recommendations

Consider alerting on:

  • High error rates: rate(auth_requests_total{result!="success"}[5m]) > 0.1
  • Elevated latency: histogram_quantile(0.95, rate(auth_duration_seconds_bucket[5m])) > 1.0
  • Provider failures: rate(auth_provider_attempts_total{result=~"error|timeout"}[5m]) > 0