Time-Series Storage (Metrics)

What it is

Time-series databases and pipelines store (timestamp, value, labels) samples—metrics from hosts, apps, and custom events. They optimize for append-heavy writes, time-range queries, and aggregation over windows.

Time-series storage path showing metric emitters, cardinality guardrails, raw chunks, label index, rollups, retention tiers, dashboards, and alerts.

Metrics retention

High-resolution raw data kept for short periods (hours to days)—bounded disk cost.
Lower-resolution rollups kept longer (months to years) for trends and SLO reporting.
Cardinality explosion (unbounded label values) is a primary ops risk—control label sets.

  now-15m:   raw 15s resolution
  now-30d:   5m rollups
  now-1y:    1h rollups

Downsampling

Downsampling aggregates older buckets (mean, max, min, percentiles with care—averages of percentiles are misleading). Reduces storage and speeds long-range queries at cost of losing spike detail in ancient data.

Common stack pieces: Prometheus (pull, local TSDB), InfluxDB, TimescaleDB, Datadog-style SaaS, OpenTelemetry for ingestion.

When to use

Observability: dashboards, alerts, capacity planning.
IoT device telemetry with heavy ingest (may overlap with streaming—see message-queue-vs-stream).

Alternatives

General SQL for small metric volume: simpler; worse at ingest scale and compression.
Logging systems (ELK) for events—not optimized like TSDB for numeric series.

Failure modes

Cardinality and churn blow memory and index size.
Clock skew across emitters distorts ordering—use bounded skew handling or trust ingestion time.
Alert fatigue from noisy metrics without good rollups and SLO windows.

Interview talking points

Explicit retention and downsampling policy; tie alerts to SLIs and error budgets.
Ingest path: samples/sec, label cardinality, replication—use back-of-envelope.
Read path latency-throughput: query fan-out for global dashboards.

On this page