271. Encode and Decode Strings
Quick Identifier
- Topic: arrays-hashing
- Pattern: Length-Prefix Encoding (Serialization)
- Hint: When you need to serialize a list of strings where the strings themselves may contain any characters (including your delimiter), prepend each string with its length followed by a special separator (e.g.,
#). This guarantees unambiguous round‑tripping. - Difficulty: Medium
- Companies: Amazon, Meta, Google, Microsoft, Bloomberg
- LeetCode: 271
Quick Sights
- Time Complexity:
O(L)whereLis the total number of characters across all strings. We scan each character exactly once during encode and once during decode. - Space Complexity:
O(L)for the output string (the encoded representation) plusO(k)auxiliary for the list of decoded strings (k= number of strings). No extra data structures beyond these are needed. - Core Mechanism: For each string
sin the input list, outputlen(s) + "#" + s. During decoding, read digits until#to obtain the length, then read exactly that many characters as the next payload.
Concept Explanation
As a senior engineer, you recognise that any delimiter‑based protocol fails when the delimiter can appear inside the payload. The length‑prefix approach sidesteps this entirely: the length tells you exactly where the payload ends, regardless of its contents.
- The Over‑Simplistic Trap: Simply
join(strings, ',')collapses the list, but any string containing a comma breaks the round‑trip. - The Length‑Prefix Invariant: By emitting
len + "#" + payload, you create a self‑describing segment. The#is merely a sentinel that separates the length field from the payload; it never appears inside the length field because the length is a series of digits. - Robustness Under Edge Cases: Empty strings become
0#. A list of empty strings becomes a series of0#segments, which decode cleanly. Very long strings are supported as long as the length integer fits within the language’s integer limits (LeetCode guarantees this).
Diagram
The following flowchart visualises the encode‑decode cycle.
Loading diagram…
Study Pattern (SDE3+)
- 0–3 min: State the need for unambiguous serialization. Mention the failure of naive delimiter approaches and introduce length‑prefix.
- Implementation pass: Write a small helper
readNumberthat stops at#. Emphasise early exit on malformed input (e.g., missing#). - 5 min extension: Discuss alternatives such as escaping special characters (e.g.,
\), or using Base64 encoding of each string. Highlight why length‑prefix is preferred for interview clarity: O(L) time, O(1) extra space besides output.
Approaches
- Naïve delimiter —
join(strings, ','). Time: O(L), Space: O(L). Fails when payload contains delimiter. - Escaping delimiter — Replace
,with\,inside strings. Messy, prone to bugs, O(L) but higher constant factor. - Length‑Prefix (Optimal) — Encode each string as
len + "#" + payload. Time: O(L), Space: O(L). (Always pick this)
Optimal Solution
Go
type Codec struct{}
func (c *Codec) Encode(words []string) string {
var encoded strings.Builder
for _, w := range words {
encoded.WriteString(strconv.Itoa(len(w)))
encoded.WriteByte('#')
encoded.WriteString(w)
}
return encoded.String()
}
func (c *Codec) Decode(encoded string) []string {
var words []string
i := 0
for i < len(encoded) {
// Read length prefix
j := i
for j < len(encoded) && encoded[j] != '#' {
j++
}
length, _ := strconv.Atoi(encoded[i:j])
j++ // skip '#'
word := encoded[j : j+length]
words = append(words, word)
i = j + length
}
return words
}JavaScript
class Codec {
encode(words) {
let encoded = '';
for (const w of words) {
encoded += `${w.length}#${w}`;
}
return encoded;
}
decode(encoded) {
const words = [];
let i = 0;
while (i < encoded.length) {
// Find the next '#'
let j = i;
while (encoded[j] !== '#') j++;
const length = Number(encoded.slice(i, j));
const start = j + 1;
const word = encoded.slice(start, start + length);
words.push(word);
i = start + length;
}
return words;
}
}Walkthrough
Input list: ["hello", "5#world", ""]
Encode
- "hello" →
5#hello - "5#world" →
7#5#world - "" →
0#Result:5#hello7#5#world0#
Decode
| Step | Position start | Length parsed | Extracted word |
|---|---|---|---|
| 1 | 0 | 5 | hello |
| 2 | 6 | 7 | 5#world |
| 3 | 14 | 0 | `` (empty) |
Round‑trip succeeds.
Edge Cases
- Empty list →
""(encoded string empty). Decoding loop exits immediately, returns[]. - Strings containing
#or digits → No issue because length tells us exactly how many characters to read. - Very long strings → Length integer may be multiple digits; the
whileloop correctly reads all digits until#.
Pitfalls
- Forgetting to increment past the
#after reading the length; you would incorrectly include the delimiter in the payload. - Using
parseInton a substring that includes non‑digit characters (e.g., if the delimiter is missing). Always guard against malformed input in production code.
Similar Problems
- [438. Find All Anagrams in a String] – also uses a sliding‑window + frequency map, but here we focus on serialization.
- [271. Encode and Decode Strings] – the exact problem we just solved.
- [394. Decode String] – nested encoded patterns, a different format but shares the idea of parsing length‑prefixed or pattern‑based streams.
Mind‑Map Tags
#strings #serialization #length‑prefix #encoding #hash‑map
Mark this page when you finish learning it.
Last updated on
Spotted something unclear or wrong on this page?