Retry amplification¶
TL;DR.
Pulse-Retry-Depthbaggage + a metric that fires before a three-deep retry chain becomes a 27× incident on the leaf service.
A three-deep retry chain with three retries at each level produces twenty-seven attempts on the leaf service. By the time you see the cascade in your dashboard, it's already in flight.
Pulse counts retry depth as it crosses every hop, so you can alert on a storm forming before it becomes an incident.
What you get¶
Any non-zero result is a request chain that's already retrying deeper than
your configured threshold — usually the leading symptom of a cascading
failure. The shipped alert (PulseRetryAmplification) fires here.
Turn it on¶
Nothing. On by default with a depth threshold of 3.
To raise or lower the threshold:
What it adds¶
| Where | Key / metric | Meaning |
|---|---|---|
| HTTP / Kafka header | Pulse-Retry-Depth |
Current retry depth in the chain |
| OTel baggage | pulse.retry.depth |
Same value, propagated to downstreams |
| Metric | pulse.retry.amplification (tag endpoint) |
Calls that exceeded the threshold |
| Span event | pulse.retry.amplification |
Marks the span where the threshold tripped |
Combined with timeout-budget propagation, these are the two leading indicators of a cascade in flight.
When to skip it¶
If your platform already enforces a retry-budget mechanism (Envoy retry
budgets, gRPC retry-policy with max_attempts):
Source: io.github.arun0009.pulse.resilience ·
Status: Stable since 1.0.0