Skip to content

Kafka time-based consumer lag

TL;DR. now() − record.timestamp() per consumed record, in seconds. The only consumer-lag number that matters when your SLO is freshness.

Offset lag is a vanity metric. A consumer that's 500k offsets behind a low-volume topic is fine; one that's 10k offsets behind a high-volume topic might be eight minutes behind real time. Time lag is the SLO. Offset lag is not.

Pulse measures now() − record.timestamp() on every consumed record and exposes it as a single metric in seconds — the only number that matters when your SLO is freshness.

What you get

max by (topic, group) (pulse_kafka_consumer_time_lag_seconds) > 300

Any consumer falling more than five minutes behind real time, regardless of topic volume. The shipped alert (PulseKafkaConsumerFallingBehind) fires here.

Turn it on

Nothing. On by default whenever Pulse's Kafka record interceptor is registered (also default).

What it adds

Metric Type Tags Meaning
pulse.kafka.consumer.time_lag Gauge (seconds) group, topic, partition now() − record.timestamp on the most recent consumed record (per partition)

Prometheus normalises this to pulse_kafka_consumer_time_lag_seconds.

When to skip it

If you're already capturing time lag from a Kafka exporter or burrow:

pulse:
  kafka:
    consumer-time-lag-enabled: false

Source: PulseKafkaRecordInterceptor.java · Status: Stable since 1.0.0