THN Interview Prep

Time-Series Storage (Metrics)

What it is

Time-series databases and pipelines store (timestamp, value, labels) samples—metrics from hosts, apps, and custom events. They optimize for append-heavy writes, time-range queries, and aggregation over windows.

Metrics retention

  • High-resolution raw data kept for short periods (hours to days)—bounded disk cost.
  • Lower-resolution rollups kept longer (months to years) for trends and SLO reporting.
  • Cardinality explosion (unbounded label values) is a primary ops risk—control label sets.
  now-15m:   raw 15s resolution
  now-30d:   5m rollups
  now-1y:    1h rollups

Downsampling

Downsampling aggregates older buckets (mean, max, min, percentiles with care—averages of percentiles are misleading). Reduces storage and speeds long-range queries at cost of losing spike detail in ancient data.

Common stack pieces: Prometheus (pull, local TSDB), InfluxDB, TimescaleDB, Datadog-style SaaS, OpenTelemetry for ingestion.

When to use

  • Observability: dashboards, alerts, capacity planning.
  • IoT device telemetry with heavy ingest (may overlap with streaming—see message-queue-vs-stream).

Alternatives

  • General SQL for small metric volume: simpler; worse at ingest scale and compression.
  • Logging systems (ELK) for events—not optimized like TSDB for numeric series.

Failure modes

  • Cardinality and churn blow memory and index size.
  • Clock skew across emitters distorts ordering—use bounded skew handling or trust ingestion time.
  • Alert fatigue from noisy metrics without good rollups and SLO windows.

Interview talking points

  • Explicit retention and downsampling policy; tie alerts to SLIs and error budgets.
  • Ingest path: samples/sec, label cardinality, replication—use back-of-envelope.
  • Read path latency-throughput: query fan-out for global dashboards.

Last updated on

Spotted something unclear or wrong on this page?

On this page