THN Interview Prep

Message Queue vs Stream

What it is

  • Message queue (mental model: SQS, RabbitMQ classic queues): producers send messages; consumers pull or receive push; messages are often deleted after successful processing. Good for job dispatch and decoupling services with at-least-once delivery.
  • Event stream / log (mental model: Kafka, Pulsar, Kinesis): messages are append-only partitions; consumers track offsets and can replay history. Good for analytics, event sourcing, and multiple independent consumer groups.

Queue versus stream consumer model showing competing queue workers, delete-on-ack behavior, retained stream partitions, consumer-group offsets, and replay.

When to use

NeedFavor
Simple task queue, per-message visibility timeoutQueue (SQS-like)
Fan-out to many services, replay, stream processingKafka-like stream
Strict ordering for one entitySingle partition key / single shard
Global orderingHard—usually avoid; partition by key

Concrete examples

  • Queue: image resize jobs after upload. Each image needs one worker, retries are useful, and the message can disappear after successful processing.
  • Stream: order lifecycle events. Billing, fulfillment, analytics, and search indexing all consume the same retained history at their own pace.

Processing workflow

  1. Producer writes business state, ideally using a database outbox when the event must match the transaction.
  2. Broker stores the message or event and exposes retry, retention, or offset mechanics.
  3. Consumer processes idempotently using a message id, business key, or dedupe table.
  4. Consumer acknowledges, deletes, or commits offset only after the side effect is durable.
  5. Poison messages move to a DLQ with enough metadata to replay or inspect later.

Ordering guarantees

  • Single partition / single queue with one consumer: FIFO processing (often exactly one effective consumer at a time).
  • Multiple consumers: ordering not global unless you assign partition key so related events share one partition (Kafka) or use FIFO queues with limited throughput (AWS FIFO).
  • Queues: ordering between unrelated messages is weak; visibility timeout can cause redelivery and reordering perception.
Loading diagram…

Alternatives

  • Synchronous RPC for critical path when you need immediate consistency (simpler operationally, tighter coupling).
  • Database outbox pattern: transactional writes + relay to broker—stronger than “fire and forget”.

Failure modes

  • Poison messages: infinite retries; use DLQ (dead-letter queue) and alerts.
  • Consumer lag: scale consumers; partition bottleneck for streams.
  • Duplicate processing: design idempotent handlers (at-least-once is default for many systems).
  • Back-pressure: unbounded queues hide overload until catastrophic lag—monitor depth.

Common mistakes

  • Assuming "exactly once" means no duplicates anywhere; most designs still need idempotent side effects.
  • Committing offsets before the database write succeeds; crashes then lose work.
  • Using one partition for perfect order, then discovering throughput is capped by that partition.
  • Letting queues grow without an overload policy; lag becomes user-visible long before the broker fails.

Interview talking points

  • Clarify delivery semantics: at-most-once vs at-least-once vs exactly-once (often “exactly-once” is effectively via idempotency + offsets).
  • Compare ordering scope vs throughput (more partitions → more parallelism, weaker global order).
  • Mention replay for streams vs delete-on-ack for queues.
  • Relate consumer throughput and end-to-end latency to latency and throughput.

Interview answer shape

  1. Name the workload: job dispatch, fan-out, replay, ordering, retention, and expected throughput.
  2. Choose queue or stream and state the reason in one sentence.
  3. Define delivery semantics and idempotency before discussing scale.
  4. Pick the ordering key and explain what ordering you are giving up.
  5. Add operational controls: DLQ, retry backoff, lag alerts, partition count, and replay plan.

Common follow-ups: offset commit timing, visibility timeouts, consumer rebalancing, outbox pattern, and how to recover from a bad deploy that processed messages incorrectly.

Mark this page when you finish learning it.

Last updated on

Spotted something unclear or wrong on this page?

On this page