adding resiliency metrics
Adding metrics to track resiliency verticals like Retries, CB, timeouts and rate limit.
- Timeout
Metrics Name | Description |
---|---|
istio_requests_total | Total requests that timed out waiting for a response via response code. |
- Retries
Metrics Name | Description |
---|---|
upstream_rq_retry | Total request retries |
upstream_rq_retry_success | Total request retry successes |
upstream_rq_retry_overflow | Total requests not retried due to circuit breaking or exceeding the retry budget |
- Circuit Breaker
Metrics Name | Description |
---|---|
ejections_detected_consecutive_5xx | Number of detected consecutive 5xx ejections (even if unenforced) |
ejections_enforced_total | Number of enforced ejections due to any outlier type |
- Rate limit
Metrics Name | Description |
---|---|
istio_requests_total | Counts specific HTTP response codes (e.g., 201, 302, etc.) IN our case 429. |
Reference link : https://www.envoyproxy.io/docs/envoy/latest/configuration/upstream/cluster_manager/cluster_stats.html?highlight=circuit_breakers
MR link for rate limit and timeout : https://dev.azure.com/OpenEnergyPlatform/Open%20Energy%20Platform/_git/oep-helm-charts?path=/telegraf/values.yaml&version=GBresiliencymetrics&line=126&lineEnd=127&lineStartColumn=6&lineEndColumn=73&lineStyle=plain&_a=contents