271. Encode and Decode Strings

Quick Identifier

Topic: arrays-hashing
Pattern: Length-Prefix Encoding (Serialization)
Hint: When you need to serialize a list of strings where the strings themselves may contain any characters (including your delimiter), prepend each string with its length followed by a special separator (e.g., #). This guarantees unambiguous round‑tripping.
Difficulty: Medium
Companies: Amazon, Meta, Google, Microsoft, Bloomberg
LeetCode: 271

Quick Sights

Time Complexity: O(L) where L is the total number of characters across all strings. We scan each character exactly once during encode and once during decode.
Space Complexity: O(L) for the output string (the encoded representation) plus O(k) auxiliary for the list of decoded strings (k = number of strings). No extra data structures beyond these are needed.
Core Mechanism: For each string s in the input list, output len(s) + "#" + s. During decoding, read digits until # to obtain the length, then read exactly that many characters as the next payload.

As a senior engineer, you recognise that any delimiter‑based protocol fails when the delimiter can appear inside the payload. The length‑prefix approach sidesteps this entirely: the length tells you exactly where the payload ends, regardless of its contents.

The Over‑Simplistic Trap: Simply join(strings, ',') collapses the list, but any string containing a comma breaks the round‑trip.
The Length‑Prefix Invariant: By emitting len + "#" + payload, you create a self‑describing segment. The # is merely a sentinel that separates the length field from the payload; it never appears inside the length field because the length is a series of digits.
Robustness Under Edge Cases: Empty strings become 0#. A list of empty strings becomes a series of 0# segments, which decode cleanly. Very long strings are supported as long as the length integer fits within the language’s integer limits (LeetCode guarantees this).

Diagram

The following flowchart visualises the encode‑decode cycle.

Loading diagram…

Study Pattern (SDE3+)

0–3 min: State the need for unambiguous serialization. Mention the failure of naive delimiter approaches and introduce length‑prefix.
Implementation pass: Write a small helper readNumber that stops at #. Emphasise early exit on malformed input (e.g., missing #).
5 min extension: Discuss alternatives such as escaping special characters (e.g., \), or using Base64 encoding of each string. Highlight why length‑prefix is preferred for interview clarity: O(L) time, O(1) extra space besides output.

Approaches

Naïve delimiter — join(strings, ','). Time: O(L), Space: O(L). Fails when payload contains delimiter.
Escaping delimiter — Replace , with \, inside strings. Messy, prone to bugs, O(L) but higher constant factor.
Length‑Prefix (Optimal) — Encode each string as len + "#" + payload. Time: O(L), Space: O(L). (Always pick this)

Optimal Solution

Go

type Codec struct{}

func (c *Codec) Encode(words []string) string {
	var encoded strings.Builder
	for _, w := range words {
		encoded.WriteString(strconv.Itoa(len(w)))
		encoded.WriteByte('#')
		encoded.WriteString(w)
	}
	return encoded.String()
}

func (c *Codec) Decode(encoded string) []string {
	var words []string
	i := 0
	for i < len(encoded) {
		// Read length prefix
		j := i
		for j < len(encoded) && encoded[j] != '#' {
			j++
		}
		length, _ := strconv.Atoi(encoded[i:j])
		j++ // skip '#'
		word := encoded[j : j+length]
		words = append(words, word)
		i = j + length
	}
	return words
}

JavaScript

class Codec {
    encode(words) {
        let encoded = '';
        for (const w of words) {
            encoded += `${w.length}#${w}`;
        }
        return encoded;
    }

    decode(encoded) {
        const words = [];
        let i = 0;
        while (i < encoded.length) {
            // Find the next '#'
            let j = i;
            while (encoded[j] !== '#') j++;
            const length = Number(encoded.slice(i, j));
            const start = j + 1;
            const word = encoded.slice(start, start + length);
            words.push(word);
            i = start + length;
        }
        return words;
    }
}

Walkthrough

Input list: ["hello", "5#world", ""]

Encode

"hello" → 5#hello
"5#world" → 7#5#world
"" → 0# Result: 5#hello7#5#world0#

Decode

Step	Position start	Length parsed	Extracted word
1	0	5	`hello`
2	6	7	`5#world`
3	14	0	`` (empty)

Round‑trip succeeds.

Edge Cases

Empty list → "" (encoded string empty). Decoding loop exits immediately, returns [].
Strings containing # or digits → No issue because length tells us exactly how many characters to read.
Very long strings → Length integer may be multiple digits; the while loop correctly reads all digits until #.

Pitfalls

Forgetting to increment past the # after reading the length; you would incorrectly include the delimiter in the payload.
Using parseInt on a substring that includes non‑digit characters (e.g., if the delimiter is missing). Always guard against malformed input in production code.

Mind‑Map Tags

#strings #serialization #length‑prefix #encoding #hash‑map