Skip to content

Features

Every feature below is on by default, and every one is opt-out via pulse.<feature>.enabled=false. Each page is short on jargon and answers the same four questions:

  • What problem does this solve? (Lead sentence)
  • What you get — the metric, log line, or dashboard you actually see
  • Turn it on — usually nothing; the right defaults are already there
  • When to skip it — honest opt-out reasons + the exact YAML

All 29 features at a glance

Scan the table for the feature that solves your current problem; click through for the full page. Every feature is on by default and opt-out via pulse.<feature>.enabled=false (column omitted to save space).

Feature What it gives you Skip when
Cardinality firewall Hard cap per (meter, tag); one bad tag can't 100× the metrics bill You already enforce this in the OTel Collector
Timeout-budget propagation The remaining deadline travels with the request across RestTemplate / WebClient / Kafka You manage budgets at the gateway
Context propagation MDC + OTel context restored on every TaskExecutor, TaskScheduler, Kafka listener You only run synchronous request handlers
Trace-context guard pulse.trace.received vs pulse.trace.missing per route + alert Your platform already enforces ingress trace headers
Structured logs OTel-aligned JSON; deploy/commit/pod stamped; PII masked You already ship a custom JSON layout
Stable exception fingerprints SHA-256 over (type + top frames) so the same bug groups across deploys You only run a single service and grep logs
Conditional features (enabled-when) Skip any Pulse feature for synthetic monitoring / smoke tests / trusted callers You don't need per-request gating
Dependency health map Per-downstream RED metrics from the caller; health flips DEGRADED on error spikes You only call one downstream and own its health endpoint
Retry amplification Pulse-Retry-Depth baggage + alert before the retry storm becomes an incident You don't use Resilience4j retries
Multi-tenant context Tenant id from header / JWT / subdomain → MDC + baggage + outbound headers + opt-in meter tag You're not multi-tenant
Request priority Pulse-Priority on MDC + baggage + outbound; RequestPriority.current() for shedding You don't load-shed by request class
Container-aware memory cgroup v1/v2 reader, OOM-kill counter, accurate memory metrics in containers You run on bare metal
Kafka time-based lag now() − record.timestamp() as the SLO, not raw offset lag You don't run Kafka consumers
Request fan-out pulse.request.fan_out per endpoint so you spot routes that call thirty services You only call one downstream
SLO-as-code Declare SLOs in YAML; /actuator/pulse/slo emits a multi-burn-rate PrometheusRule You don't run Prometheus
Resilience4j auto-instrumentation Circuit breaker / retry / bulkhead events → metrics + span events with no annotations You don't use Resilience4j
Background-job observability RED metrics + in-flight gauge + health indicator per @Scheduled job You don't run scheduled jobs
Database (N+1, slow query) Hibernate hook counts statements per request; alerts on threshold cross You don't use Hibernate / JPA
Cache observability (Caffeine) Every CaffeineCacheManager bound to Micrometer with hit/miss/eviction counts You don't use Caffeine
OpenFeature correlation Every flag evaluation lands on MDC + span event You don't use OpenFeature SDKs
Continuous-profiling correlation Every span carries profile.id, pyroscope.profile_id, deep link You don't run a continuous profiler
Wide-event API events.emit("order.placed", attrs) → span event + counter + structured log in one call You already standardised on a custom event helper
Graceful drain + OTel flush Readiness drain visible; JVM exit blocks until the last span batch is exported You don't care about losing the last second of traces on rolling deploys
Fleet config-drift detection Deterministic hash of resolved pulse.* tree; alert fires when one deployment has > 1 hash You only run a single replica
Live diagnostic actuator /actuator/pulse (JSON) + /actuator/pulseui (HTML) — what's on, off, why You forbid actuator endpoints in production
Sampling management.tracing.sampling.probability (Boot's standard knob) + pulse.sampling.prefer-sampling-on-error upgrades errors past the head sampler You enforce sampling in the OTel Collector
Dry-run / enforcement mode POST /actuator/pulse/enforcement flips Pulse process-wide between ENFORCING and DRY_RUN You're confident every Pulse feature is correctly tuned
Profile presets One-line spring.config.import for tuned dev / prod / test / canary configurations You manage all configuration centrally
Configuration validation JSR-380 constraints on every pulse.* property; typos fail at startup, not 3 AM You template all config and pre-validate it

Day-one drivers

Six things every Spring app should have on day one. None of them ship in Spring Boot or the OTel Java agent, all cost less than a microsecond on the hot path, and all of them decide whether observability actually works at 3 AM.

  • Cardinality firewall


    Hard cap per (meter, tag) with overflow bucket. Stops one bad tag from 100×-ing your metrics bill. ~17 ns/op cached.

  • Timeout-budget propagation


    The deadline travels with the request. RestTemplate, WebClient, OkHttp, Kafka all forward the remaining budget. Fail fast instead of holding doomed connections.

  • Context propagation


    Every TaskExecutor and TaskScheduler is wrapped automatically. Kafka listeners restored from record headers. No MDC.getCopyOfContextMap() boilerplate.

  • Trace-context guard


    pulse.trace.received vs pulse.trace.missing per route, with shipped alert. Find the upstream that's stripping traceparent.

  • Structured logs


    OTel-aligned JSON on every line. Deploy / commit / pod / cloud region stamped automatically. PII masking on by default.

  • Stable exception fingerprints


    SHA-256 over (type + top frames) so the same bug groups across deploys. On the response, the active span, the metric, and the log line.

Cross-cutting

Patterns that apply across multiple features rather than a single one.

  • Conditional features (enabled-when) — skip any Pulse feature for specific requests (synthetic monitoring, smoke tests, trusted internal callers) without setting enabled: false globally.

Distributed-systems extras

What you need when the system has more than two services.

  • Dependency health map — per-downstream RED metrics from the caller side, plus a health indicator that flips DEGRADED when error rates spike.
  • Retry amplificationPulse-Retry-Depth baggage + a metric that fires before a retry storm becomes an incident.
  • Multi-tenant context — extract the tenant from header, JWT, or subdomain; thread it through MDC, baggage, outbound headers, and (opt-in) configured meters.
  • Request priorityPulse-Priority header on MDC, baggage, and outbound headers; user-code load shedders read RequestPriority.current().
  • Container-aware memory — cgroup v1/v2 reader (no agent), accurate memory metrics in containers, OOM-kill counter, readiness health.
  • Kafka time-based lagnow() − record.timestamp() as the SLO, not raw offset lag.
  • Request fan-outpulse.request.fan_out per endpoint so you know which routes accidentally call thirty services.

Spring extras

Things every Spring shop eventually builds, badly.

  • SLO-as-code — declare objectives in YAML; /actuator/pulse/slo emits multi-window, multi-burn-rate PrometheusRule YAML.
  • Resilience4j auto-instrumentation — circuit breaker, retry, and bulkhead events become metrics + span events automatically.
  • Background-job observability — every @Scheduled job gets RED metrics, in-flight gauge, and a health indicator that flips DOWN when a job hasn't succeeded inside the grace period.
  • Database (N+1, slow query) — Hibernate hook counts statements per request; alerts when a route crosses a threshold.
  • Cache observability (Caffeine) — every CaffeineCacheManager bound to Micrometer with hit/miss/eviction counts.
  • OpenFeature correlation — every flag evaluation lands on MDC and a span event.
  • Continuous-profiling correlation — every span carries profile.id, pyroscope.profile_id, and a deep link.
  • Wide-event APIevents.emit("order.placed", attrs) writes a span event, increments a bounded counter, and emits a structured log line in one ~25 ns call.
  • Graceful drain + OTel flush — readiness drain observability + JVM exit blocks until the last span batch is exported.
  • Fleet config-drift detection — deterministic hash of the resolved pulse.* tree at startup; alert fires when a deployment has more than one hash.
  • Live diagnostic actuator/actuator/pulse (JSON) and /actuator/pulseui (HTML) for "what's running and what won."
  • Sampling — Spring Boot's standard management.tracing.sampling.probability for the head rate, plus Pulse's best-effort pulse.sampling.prefer-sampling-on-error for spans the head sampler would have dropped.

Operational levers

Adoption-shortcut features that exist purely to make Pulse safer and easier to roll out across an existing fleet.

  • Dry-run / enforcement mode — process-wide enforce-vs-observe lever (ENFORCING, DRY_RUN); flip via POST /actuator/pulse/enforcement without a redeploy. Roll Pulse out in observe-only mode first, flip enforcing once dashboards confirm impact.
  • Profile presets — one-line spring.config.import for dev / prod / test / canary profiles tuned for that environment.
  • Configuration validation — JSR-380 constraints on every pulse.* property; typos fail fast at startup with a Pulse-friendly message.

Coverage status

All features ship with full feature pages. The six day-one drivers have the deepest coverage; the remaining subsystems document what they do, the metrics they emit, and how to turn them off. Expanded sections land in patch releases on the 1.0.x line — track [issue

1](https://github.com/arun0009/pulse/issues) for the order.