Runbook — HikariCP Connection Pool Saturation¶
Alerts: PulseHikariCpExhausted, PulseHikariCpConnectionAcquireSlow, PulseHikariCpConnectionTimeoutsRising
Severity: page (exhausted, timeouts) / ticket (acquire-slow)
TL;DR¶
The HikariCP connection pool is the choke point. Either (a) every connection is checked out
right now and threads are serializing on getConnection(), or (b) connections are timing
out before the pool can hand one out. Both translate to user-visible failures within seconds.
Triage in this order:
- Is a slow downstream dependency holding connections open? (most common)
- Is there a connection leak in application code?
- Is the pool simply undersized for current traffic?
What Pulse already did for you¶
- Charts active vs max utilization, pending acquires, p50/p95/p99 acquire latency, and acquire timeouts in the Pulse → Saturation — HikariCP dashboard row.
- Shipped three Prometheus alerts (page on exhaustion / timeouts, ticket on slow acquires).
- Tagged every panel by
poolso multi-pool services (read-replica + primary) are separable.
Pulse does not add a new metric here — Spring Boot already publishes
hikaricp.connections.* via Micrometer when HikariCP is on the classpath. Pulse's value
is the pre-built dashboard, alerts, and this runbook.
Diagnose¶
# Active vs max right now (per pool)
curl -s http://<host>/actuator/metrics/hikaricp.connections.active | jq
curl -s http://<host>/actuator/metrics/hikaricp.connections.max | jq
curl -s http://<host>/actuator/metrics/hikaricp.connections.pending | jq
Pull a thread dump and look for threads parked in HikariPool.getConnection. The number
parked there equals hikaricp_connections_pending — those are your queued requests.
Fix patterns¶
| You see | Likely cause | Fix |
|---|---|---|
| Active = max, pending > 0, downstream DB latency rising | A query (or DB) is slow | Find the slow query — JdbcTemplate traces, DB slow log |
| Active = max, downstream is fine | Connection leak — code paths returning early without close | Audit try-with-resources, transaction boundaries |
| Active < max but pending > 0 | Pool sized correctly but connections trickling back | Investigate connectionTimeout, network between app and DB |
| Acquire p95 spiked, no leak, no slow query | Traffic exceeded provisioned capacity | Bump spring.datasource.hikari.maximum-pool-size |
| Timeouts rising | Acquire is exceeding connectionTimeout (default 30s) |
All of the above, immediately |
Capacity rule of thumb¶
A reasonable starting point is pool_size = (cpu_cores * 2) + effective_spindle_count
(see HikariCP's pool-sizing wiki).
For a 4-core service against an SSD-backed DB, that's ~10. Raising past 20–30 rarely helps
unless the DB itself is the issue.
Long-term¶
- Trace every JDBC call. Pulse's standard histograms cover
jdbc.query; aggregate by query name to spot the slow one. - If your service makes outbound HTTP calls inside a DB transaction, that's an anti-pattern — the connection is held for the duration of the HTTP round-trip.
- Set a sensible
idleTimeoutandmaxLifetimeso connections recycle and the pool can detect dead connections quickly.