Timeout-budget propagation¶
TL;DR. The remaining deadline travels with the request across
RestTemplate,RestClient,WebClient,OkHttp, Apache HttpClient 5, and Kafka. Doomed downstream calls fail fast instead of holding connections through the next retry storm.
The platform default timeout — 30 seconds, set once and forgotten — is what every downstream service uses. The original caller may have already given up after 2 seconds. The chain doesn't know, holds connections open, and feeds the retry storm that takes the cluster down.
Pulse propagates the deadline, not the timeout. Each hop sees the real remaining budget, fails fast when there's no time left, and never makes a doomed call against a dying downstream.
What you get¶
When the chain is healthy, the deadline shrinks at each hop:
Caller ──POST /orders Pulse-Timeout-Ms: 2000──▶ Edge
Edge ──GET /stock Pulse-Timeout-Ms: 1850──▶ Inventory (300ms elapsed)
Edge ──POST /charge Pulse-Timeout-Ms: 1500──▶ Payment (350ms elapsed)
When something is slow and the budget runs out, Pulse aborts the next call before it goes out — and a single Prometheus query lights up:
This is the leading indicator of a cascading failure. With the shipped
PulseTimeoutBudgetExhausted alert, you see it minutes before the user does.
Turn it on¶
Nothing. It's on by default with a 2 second budget per request.
To set a different default, or to forward a different header name to match an existing convention:
pulse:
timeout-budget:
default-budget: 5s # used when no inbound header is present
inbound-header: X-Deadline-Ms # match your gateway's convention
To read the remaining budget from your own code:
TimeoutBudget.current().ifPresent(budget -> {
if (budget.remaining().toMillis() < 500) {
// skip the optional enrichment call — not enough time
}
});
What it adds¶
| Where | Key | Value |
|---|---|---|
| HTTP header (in / out) | Pulse-Timeout-Ms |
Milliseconds remaining on the deadline |
| OTel baggage | pulse.timeout.deadline_ms |
Absolute epoch-millis deadline |
| MDC (logs) | timeout_remaining_ms |
Snapshot at log time |
| Metric | pulse.timeout_budget.exhausted (tag: transport) |
Aborted outbound calls |
The metric is tagged by transport: resttemplate, restclient, webclient,
okhttp, apache-hc5, kafka. So you can see which client gave up.
When to skip it¶
Disable when your platform already enforces a request budget end-to-end — Envoy timeouts, Istio request timeouts, gRPC deadlines you trust — and you don't want a parallel mechanism:
If you run an API gateway in front, configure it to set Pulse-Timeout-Ms
based on the gateway's own request timeout. Otherwise the first hop uses the
2-second default and only later hops see the propagated value.
Conditional gating¶
To skip the budget filter for some requests (synthetic probes, internal
admin traffic) without disabling the feature, use the shared
enabled-when block:
pulse:
timeout-budget:
enabled-when:
header-not-equals:
x-pulse-synthetic: "true"
path-excludes:
- /actuator
When the matcher rejects, no budget is established on baggage and
downstream calls see TimeoutBudget.current() == Optional.empty() —
your code already handles that.
Under the hood¶
Three pieces work together:
- A filter on the way in reads the deadline header (or applies the default),
places it on a thread-local and on OTel baggage, and clears it in a
finally. - An interceptor on the way out — wired into every supported HTTP and Kafka
client — computes
remaining − safety-marginand writes the outbound header. - If the remaining budget is below
minimum-budgetbefore the call fires, Pulse aborts the call, increments the exhausted counter, and emits aWARNlog.
The header name follows RFC 6648 (no X- prefix). inbound-header and
outbound-header can be configured separately if you need to bridge between
two conventions.
Source: TimeoutBudget.java ·
TimeoutBudgetFilter.java ·
TimeoutBudgetOutboundInterceptor.java ·
Runbook: Timeout-budget exhausted ·
Status: Stable since 1.0.0