Features¶

Every feature below is on by default, and every one is opt-out via pulse.<feature>.enabled=false. Each page is short on jargon and answers the same four questions:

What problem does this solve? (Lead sentence)
What you get — the metric, log line, or dashboard you actually see
Turn it on — usually nothing; the right defaults are already there
When to skip it — honest opt-out reasons + the exact YAML

All 29 features at a glance¶

Scan the table for the feature that solves your current problem; click through for the full page. Every feature is on by default and opt-out via pulse.<feature>.enabled=false (column omitted to save space).

Feature	What it gives you	Skip when
Cardinality firewall	Hard cap per `(meter, tag)`; one bad tag can't 100× the metrics bill	You already enforce this in the OTel Collector
Timeout-budget propagation	The remaining deadline travels with the request across `RestTemplate` / `WebClient` / Kafka	You manage budgets at the gateway
Context propagation	MDC + OTel context restored on every `TaskExecutor`, `TaskScheduler`, Kafka listener	You only run synchronous request handlers
Trace-context guard	`pulse.trace.received` vs `pulse.trace.missing` per route + alert	Your platform already enforces ingress trace headers
Structured logs	OTel-aligned JSON; deploy/commit/pod stamped; PII masked	You already ship a custom JSON layout
Stable exception fingerprints	SHA-256 over `(type + top frames)` so the same bug groups across deploys	You only run a single service and grep logs
Conditional features (`enabled-when`)	Skip any Pulse feature for synthetic monitoring / smoke tests / trusted callers	You don't need per-request gating
Dependency health map	Per-downstream RED metrics from the caller; health flips DEGRADED on error spikes	You only call one downstream and own its health endpoint
Retry amplification	`Pulse-Retry-Depth` baggage + alert before the retry storm becomes an incident	You don't use Resilience4j retries
Multi-tenant context	Tenant id from header / JWT / subdomain → MDC + baggage + outbound headers + opt-in meter tag	You're not multi-tenant
Request priority	`Pulse-Priority` on MDC + baggage + outbound; `RequestPriority.current()` for shedding	You don't load-shed by request class
Container-aware memory	cgroup v1/v2 reader, OOM-kill counter, accurate memory metrics in containers	You run on bare metal
Kafka time-based lag	`now() − record.timestamp()` as the SLO, not raw offset lag	You don't run Kafka consumers
Request fan-out	`pulse.request.fan_out` per endpoint so you spot routes that call thirty services	You only call one downstream
SLO-as-code	Declare SLOs in YAML; `/actuator/pulse/slo` emits a multi-burn-rate `PrometheusRule`	You don't run Prometheus
Resilience4j auto-instrumentation	Circuit breaker / retry / bulkhead events → metrics + span events with no annotations	You don't use Resilience4j
Background-job observability	RED metrics + in-flight gauge + health indicator per `@Scheduled` job	You don't run scheduled jobs
Database (N+1, slow query)	Hibernate hook counts statements per request; alerts on threshold cross	You don't use Hibernate / JPA
Cache observability (Caffeine)	Every `CaffeineCacheManager` bound to Micrometer with hit/miss/eviction counts	You don't use Caffeine
OpenFeature correlation	Every flag evaluation lands on MDC + span event	You don't use OpenFeature SDKs
Continuous-profiling correlation	Every span carries `profile.id`, `pyroscope.profile_id`, deep link	You don't run a continuous profiler
Wide-event API	`events.emit("order.placed", attrs)` → span event + counter + structured log in one call	You already standardised on a custom event helper
Graceful drain + OTel flush	Readiness drain visible; JVM exit blocks until the last span batch is exported	You don't care about losing the last second of traces on rolling deploys
Fleet config-drift detection	Deterministic hash of resolved `pulse.*` tree; alert fires when one deployment has > 1 hash	You only run a single replica
Live diagnostic actuator	`/actuator/pulse` (JSON) + `/actuator/pulseui` (HTML) — what's on, off, why	You forbid actuator endpoints in production
Sampling	`management.tracing.sampling.probability` (Boot's standard knob) + `pulse.sampling.prefer-sampling-on-error` upgrades errors past the head sampler	You enforce sampling in the OTel Collector
Dry-run / enforcement mode	`POST /actuator/pulse/enforcement` flips Pulse process-wide between ENFORCING and DRY_RUN	You're confident every Pulse feature is correctly tuned
Profile presets	One-line `spring.config.import` for tuned `dev` / `prod` / `test` / `canary` configurations	You manage all configuration centrally
Configuration validation	JSR-380 constraints on every `pulse.*` property; typos fail at startup, not 3 AM	You template all config and pre-validate it

Day-one drivers¶

Six things every Spring app should have on day one. None of them ship in Spring Boot or the OTel Java agent, all cost less than a microsecond on the hot path, and all of them decide whether observability actually works at 3 AM.

Cardinality firewall

Hard cap per (meter, tag) with overflow bucket. Stops one bad tag from 100×-ing your metrics bill. ~17 ns/op cached.
Timeout-budget propagation

The deadline travels with the request. RestTemplate, WebClient, OkHttp, Kafka all forward the remaining budget. Fail fast instead of holding doomed connections.
Context propagation

Every TaskExecutor and TaskScheduler is wrapped automatically. Kafka listeners restored from record headers. No MDC.getCopyOfContextMap() boilerplate.
Trace-context guard

pulse.trace.received vs pulse.trace.missing per route, with shipped alert. Find the upstream that's stripping traceparent.
Structured logs

OTel-aligned JSON on every line. Deploy / commit / pod / cloud region stamped automatically. PII masking on by default.
Stable exception fingerprints

SHA-256 over (type + top frames) so the same bug groups across deploys. On the response, the active span, the metric, and the log line.

Cross-cutting¶

Patterns that apply across multiple features rather than a single one.

Conditional features (enabled-when) — skip any Pulse feature for specific requests (synthetic monitoring, smoke tests, trusted internal callers) without setting enabled: false globally.

Distributed-systems extras¶

What you need when the system has more than two services.

Dependency health map — per-downstream RED metrics from the caller side, plus a health indicator that flips DEGRADED when error rates spike.
Retry amplification — Pulse-Retry-Depth baggage + a metric that fires before a retry storm becomes an incident.
Multi-tenant context — extract the tenant from header, JWT, or subdomain; thread it through MDC, baggage, outbound headers, and (opt-in) configured meters.
Request priority — Pulse-Priority header on MDC, baggage, and outbound headers; user-code load shedders read RequestPriority.current().
Container-aware memory — cgroup v1/v2 reader (no agent), accurate memory metrics in containers, OOM-kill counter, readiness health.
Kafka time-based lag — now() − record.timestamp() as the SLO, not raw offset lag.
Request fan-out — pulse.request.fan_out per endpoint so you know which routes accidentally call thirty services.

Spring extras¶

Things every Spring shop eventually builds, badly.

SLO-as-code — declare objectives in YAML; /actuator/pulse/slo emits multi-window, multi-burn-rate PrometheusRule YAML.
Resilience4j auto-instrumentation — circuit breaker, retry, and bulkhead events become metrics + span events automatically.
Background-job observability — every @Scheduled job gets RED metrics, in-flight gauge, and a health indicator that flips DOWN when a job hasn't succeeded inside the grace period.
Database (N+1, slow query) — Hibernate hook counts statements per request; alerts when a route crosses a threshold.
Cache observability (Caffeine) — every CaffeineCacheManager bound to Micrometer with hit/miss/eviction counts.
OpenFeature correlation — every flag evaluation lands on MDC and a span event.
Continuous-profiling correlation — every span carries profile.id, pyroscope.profile_id, and a deep link.
Wide-event API — events.emit("order.placed", attrs) writes a span event, increments a bounded counter, and emits a structured log line in one ~25 ns call.
Graceful drain + OTel flush — readiness drain observability + JVM exit blocks until the last span batch is exported.
Fleet config-drift detection — deterministic hash of the resolved pulse.* tree at startup; alert fires when a deployment has more than one hash.
Live diagnostic actuator — /actuator/pulse (JSON) and /actuator/pulseui (HTML) for "what's running and what won."
Sampling — Spring Boot's standard management.tracing.sampling.probability for the head rate, plus Pulse's best-effort pulse.sampling.prefer-sampling-on-error for spans the head sampler would have dropped.

Operational levers¶

Adoption-shortcut features that exist purely to make Pulse safer and easier to roll out across an existing fleet.

Dry-run / enforcement mode — process-wide enforce-vs-observe lever (ENFORCING, DRY_RUN); flip via POST /actuator/pulse/enforcement without a redeploy. Roll Pulse out in observe-only mode first, flip enforcing once dashboards confirm impact.
Profile presets — one-line spring.config.import for dev / prod / test / canary profiles tuned for that environment.
Configuration validation — JSR-380 constraints on every pulse.* property; typos fail fast at startup with a Pulse-friendly message.

Coverage status¶

All features ship with full feature pages. The six day-one drivers have the deepest coverage; the remaining subsystems document what they do, the metrics they emit, and how to turn them off. Expanded sections land in patch releases on the 1.0.x line — track [issue

1](https://github.com/arun0009/pulse/issues) for the order.¶