THN Interview Prep

Streams in Node.js: A Comprehensive Guide for Senior Developers

Streams are one of the most powerful and characteristic features of Node.js. They provide an elegant, memory-efficient way to handle continuous data flows - whether reading from files, writing to network sockets, processing large datasets, or transforming data in real time.

Core Concept

A stream is an abstraction for working with streaming data in a continuous, chunk-by-chunk manner rather than loading the entire content into memory at once.

This approach is particularly important when dealing with:

  • Large files (> available RAM)
  • Real-time network data (HTTP requests/responses, WebSockets)
  • Data transformation pipelines (compression, encryption, parsing)
  • High-throughput systems where memory pressure must be minimized

Four Fundamental Stream Types

TypePurposeReadable?Writable?Example Use Cases
ReadableSource of dataYesNoFile read, HTTP request, process.stdin
WritableDestination for dataNoYesFile write, HTTP response, process.stdout
DuplexBoth readable and writableYesYesTCP sockets, WebSocket connections
TransformDuplex stream that modifies dataYesYeszlib compression, JSON parsing, line splitting

Key Stream Modes

  1. Flowing mode (old mode / "push" mode)

    • Data is automatically pushed as soon as it's available
    • Consumer must listen to 'data' events quickly or risk buffer overflow
    • Triggered by calling .resume(), attaching 'data' listener, or piping
  2. Paused mode (recommended / "pull" mode)

    • Data is buffered internally until consumer explicitly requests it
    • Controlled by calling .read() or using the pipe mechanism
    • Backpressure is naturally respected

Modern Node.js code should almost always prefer paused mode + piping.

Most Important Stream Events & Methods

Stream TypeKey EventsKey MethodsPurpose
Readable'data', 'end', 'error', 'close'read(), pause(), resume(), pipe()Consume data, control flow
Writable'drain', 'finish', 'error'write(), end(), cork(), uncork()Send data, handle backpressure
All'error', 'close'destroy()Error handling & cleanup
Transformsame as Duplex_transform(chunk, encoding, callback)Implement data transformation logic

Backpressure - The Most Critical Concept

When a writable stream receives data faster than it can process it, it applies backpressure:

  1. write() returns false
  2. Writable emits 'drain' event when it's ready for more data
  3. Readable stream should pause sending until 'drain' is received

Piping automatically handles backpressure - this is the primary reason stream composition is preferred over manual 'data' event handling. For multi-step production pipelines, prefer pipeline() or stream/promises.pipeline() so completion and errors are coordinated in one place.

import { pipeline } from "node:stream/promises";

await pipeline(readable, transform1, transform2, writable);

Practical Patterns & Best Practices (2025-2026)

// 1. Classic file copy - most efficient
const fs = require("fs");

fs.createReadStream("input.txt")
  .pipe(fs.createWriteStream("output.txt"))
  .on("finish", () => console.log("Copy completed"));

// 2. Real-world transform pipeline
const { Transform } = require("stream");

const upperCase = new Transform({
  transform(chunk, encoding, callback) {
    callback(null, chunk.toString().toUpperCase());
  },
});

fs.createReadStream("data.csv")
  .pipe(upperCase)
  .pipe(fs.createWriteStream("data-upper.csv"));

// 3. Handling errors in pipeline correctly
const { pipeline } = require("node:stream/promises");

try {
  await pipeline(readable, transform, writable);
} catch (err) {
  console.error("Pipeline failed:", err);
}

// 4. Object mode streams (very common in modern Node.js)
const objectStream = new Transform({
  objectMode: true,
  transform(chunk, encoding, callback) {
    // chunk is already an object
    callback(null, { processed: chunk.value * 2 });
  },
});

When to Implement Custom Streams

You should create custom streams when you need to:

  • Transform data in a reusable way (CSV → JSON, compression, encryption)
  • Aggregate or split streams (line-by-line processing, multiplexing)
  • Bridge incompatible APIs (promise → stream, callback → stream)
  • Implement protocol parsers (HTTP/2 frames, WebSocket frames, custom binary protocols)

Modern Recommendations (2026)

  • Prefer stream/promises API when working with async/await
  • Use pipeline() from stream/promises for safer, promise-based pipelines
  • Know that pipeline() destroys streams on error; handle HTTP response/socket cases deliberately
  • Consider third-party libraries only when core streams are insufficient (very rare nowadays)
  • Be extremely cautious with objectMode streams in high-throughput scenarios - they have higher overhead

Production checklist

  • Set request/body/file size limits before starting expensive work.
  • Prefer byte streams for high-throughput data; use object mode when object boundaries are worth the overhead.
  • Use highWaterMark deliberately when memory and latency tradeoffs matter.
  • Attach cancellation to client disconnects and deadlines.
  • Watch RSS, heap, external memory, throughput, and slow-destination errors.
  • Test failure paths: source error, transform error, destination close, client abort.

Interview answer structure

“Streams keep memory bounded by processing chunks and honoring backpressure. I avoid readFile for large payloads, compose transforms with pipeline(), set size limits and timeouts, and test what happens when the source, transform, destination, or client connection fails.”

Follow-ups to expect:

  • Why can .pipe() still be risky if errors are not handled?
  • What does write() returning false mean?
  • When is object mode a bad idea?
  • How does pipeline() behave when one stream errors?

Summary - Quick Reference Table

GoalRecommended ApproachAvoid Doing This
Copy large filepipeline(createReadStream(), createWriteStream())readFile()writeFile()
Transform large filepipeline() through Transform streamLoad entire file → transform → write
Parallel processingMultiple pipeline() calls + worker threadsSingle thread + synchronous processing
Error handling in pipelinepipeline() with explicit failure pathOnly listening on final destination
Promise-based pipelinestream/promises.pipeline()Manual event juggling

Mastering streams is one of the clearest differentiators between intermediate and senior Node.js developers. When used correctly, they enable applications to process gigabytes of data with minimal memory footprint and predictable backpressure behavior - a capability that remains unmatched by most other server-side platforms.

Mark this page when you finish learning it.

Spotted something unclear or wrong on this page?

On this page