Kafka time-based consumer lag¶
TL;DR.
now() − record.timestamp()per consumed record, in seconds. The only consumer-lag number that matters when your SLO is freshness.
Offset lag is a vanity metric. A consumer that's 500k offsets behind a low-volume topic is fine; one that's 10k offsets behind a high-volume topic might be eight minutes behind real time. Time lag is the SLO. Offset lag is not.
Pulse measures now() − record.timestamp() on every consumed record and
exposes it as a single metric in seconds — the only number that matters when
your SLO is freshness.
What you get¶
Any consumer falling more than five minutes behind real time, regardless of
topic volume. The shipped alert (PulseKafkaConsumerFallingBehind) fires
here.
Turn it on¶
Nothing. On by default whenever Pulse's Kafka record interceptor is registered (also default).
What it adds¶
| Metric | Type | Tags | Meaning |
|---|---|---|---|
pulse.kafka.consumer.time_lag |
Gauge (seconds) | group, topic, partition |
now() − record.timestamp on the most recent consumed record (per partition) |
Prometheus normalises this to pulse_kafka_consumer_time_lag_seconds.
When to skip it¶
If you're already capturing time lag from a Kafka exporter or burrow:
Source: PulseKafkaRecordInterceptor.java ·
Status: Stable since 1.0.0