Skip to content

Resiliency metrics

Adding metrics to track resiliency verticals like Retries, CB, timeouts and rate limit.

  1. Timeout
Metrics Name Description
istio_requests_total Total requests that timed out waiting for a response via response code.
  1. Retries
Metrics Name Description
upstream_rq_retry Total request retries
upstream_rq_retry_success Total request retry successes
upstream_rq_retry_overflow Total requests not retried due to circuit breaking or exceeding the retry budget
  1. Circuit Breaker
Metrics Name Description
ejections_detected_consecutive_5xx Number of detected consecutive 5xx ejections (even if unenforced)
ejections_enforced_total Number of enforced ejections due to any outlier type
  1. Rate limit
Metrics Name Description
istio_requests_total Counts specific HTTP response codes (e.g., 201, 302, etc.) IN our case 429.

Reference link : https://www.envoyproxy.io/docs/envoy/latest/configuration/upstream/cluster_manager/cluster_stats.html?highlight=circuit_breakers

MR link for rate limit and timeout : https://dev.azure.com/OpenEnergyPlatform/Open%20Energy%20Platform/_git/oep-helm-charts?path=/telegraf/values.yaml&version=GBresiliencymetrics&line=126&lineEnd=127&lineStartColumn=6&lineEndColumn=73&lineStyle=plain&_a=contents

Edited by SHEFFALI JAIN

Merge request reports