THN Interview Prep

271. Encode and Decode Strings

Quick Identifier

  • Topic: arrays-hashing
  • Pattern: Length-Prefix Encoding (Serialization)
  • Hint: When you need to serialize a list of strings where the strings themselves may contain any characters (including your delimiter), prepend each string with its length followed by a special separator (e.g., #). This guarantees unambiguous round‑tripping.
  • Difficulty: Medium
  • Companies: Amazon, Meta, Google, Microsoft, Bloomberg
  • LeetCode: 271

Quick Sights

  • Time Complexity: O(L) where L is the total number of characters across all strings. We scan each character exactly once during encode and once during decode.
  • Space Complexity: O(L) for the output string (the encoded representation) plus O(k) auxiliary for the list of decoded strings (k = number of strings). No extra data structures beyond these are needed.
  • Core Mechanism: For each string s in the input list, output len(s) + "#" + s. During decoding, read digits until # to obtain the length, then read exactly that many characters as the next payload.

Concept Explanation

As a senior engineer, you recognise that any delimiter‑based protocol fails when the delimiter can appear inside the payload. The length‑prefix approach sidesteps this entirely: the length tells you exactly where the payload ends, regardless of its contents.

  1. The Over‑Simplistic Trap: Simply join(strings, ',') collapses the list, but any string containing a comma breaks the round‑trip.
  2. The Length‑Prefix Invariant: By emitting len + "#" + payload, you create a self‑describing segment. The # is merely a sentinel that separates the length field from the payload; it never appears inside the length field because the length is a series of digits.
  3. Robustness Under Edge Cases: Empty strings become 0#. A list of empty strings becomes a series of 0# segments, which decode cleanly. Very long strings are supported as long as the length integer fits within the language’s integer limits (LeetCode guarantees this).

Diagram

The following flowchart visualises the encode‑decode cycle.

Loading diagram…

Study Pattern (SDE3+)

  • 0–3 min: State the need for unambiguous serialization. Mention the failure of naive delimiter approaches and introduce length‑prefix.
  • Implementation pass: Write a small helper readNumber that stops at #. Emphasise early exit on malformed input (e.g., missing #).
  • 5 min extension: Discuss alternatives such as escaping special characters (e.g., \), or using Base64 encoding of each string. Highlight why length‑prefix is preferred for interview clarity: O(L) time, O(1) extra space besides output.

Approaches

  • Naïve delimiterjoin(strings, ','). Time: O(L), Space: O(L). Fails when payload contains delimiter.
  • Escaping delimiter — Replace , with \, inside strings. Messy, prone to bugs, O(L) but higher constant factor.
  • Length‑Prefix (Optimal) — Encode each string as len + "#" + payload. Time: O(L), Space: O(L). (Always pick this)

Optimal Solution

Go

type Codec struct{}

func (c *Codec) Encode(words []string) string {
	var encoded strings.Builder
	for _, w := range words {
		encoded.WriteString(strconv.Itoa(len(w)))
		encoded.WriteByte('#')
		encoded.WriteString(w)
	}
	return encoded.String()
}

func (c *Codec) Decode(encoded string) []string {
	var words []string
	i := 0
	for i < len(encoded) {
		// Read length prefix
		j := i
		for j < len(encoded) && encoded[j] != '#' {
			j++
		}
		length, _ := strconv.Atoi(encoded[i:j])
		j++ // skip '#'
		word := encoded[j : j+length]
		words = append(words, word)
		i = j + length
	}
	return words
}

JavaScript

class Codec {
    encode(words) {
        let encoded = '';
        for (const w of words) {
            encoded += `${w.length}#${w}`;
        }
        return encoded;
    }

    decode(encoded) {
        const words = [];
        let i = 0;
        while (i < encoded.length) {
            // Find the next '#'
            let j = i;
            while (encoded[j] !== '#') j++;
            const length = Number(encoded.slice(i, j));
            const start = j + 1;
            const word = encoded.slice(start, start + length);
            words.push(word);
            i = start + length;
        }
        return words;
    }
}

Walkthrough

Input list: ["hello", "5#world", ""]

Encode

  • "hello" → 5#hello
  • "5#world" → 7#5#world
  • "" → 0# Result: 5#hello7#5#world0#

Decode

StepPosition startLength parsedExtracted word
105hello
2675#world
3140`` (empty)

Round‑trip succeeds.

Edge Cases

  • Empty list → "" (encoded string empty). Decoding loop exits immediately, returns [].
  • Strings containing # or digits → No issue because length tells us exactly how many characters to read.
  • Very long strings → Length integer may be multiple digits; the while loop correctly reads all digits until #.

Pitfalls

  • Forgetting to increment past the # after reading the length; you would incorrectly include the delimiter in the payload.
  • Using parseInt on a substring that includes non‑digit characters (e.g., if the delimiter is missing). Always guard against malformed input in production code.

Similar Problems

  • [438. Find All Anagrams in a String] – also uses a sliding‑window + frequency map, but here we focus on serialization.
  • [271. Encode and Decode Strings] – the exact problem we just solved.
  • [394. Decode String] – nested encoded patterns, a different format but shares the idea of parsing length‑prefixed or pattern‑based streams.

Mind‑Map Tags

#strings #serialization #length‑prefix #encoding #hash‑map

Mark this page when you finish learning it.

Last updated on

Spotted something unclear or wrong on this page?

On this page