Trade-off Thinking & Approaches
In system design, there are no perfect solutions, only trade-offs. Senior engineers are evaluated not by their ability to draw box diagrams, but by how they navigate competing constraints and justify their engineering decisions.
⚖️ 1. Core Architectural Trade-offs
Every technical choice has a cost. Below are the foundational trade-offs you must balance.
Consistency vs. Availability (CAP & PACELC)
When a network partition occurs, a distributed system must choose between Consistency (CP) and Availability (AP).
- CP (Consistency/Partition Tolerance): Refuses request if it cannot guarantee the most recent write. Use when correctness is absolute (e.g., financial transactions, inventory counts).
- AP (Availability/Partition Tolerance): Returns a stale response or accepts a write that may conflict later. Use when uptime is critical (e.g., social feeds, shopping carts).
- Beyond Partition (PACELC): If there is Partition, choose Availability or Consistency; Else, choose Latency or Consistency.
- Example: MongoDB (CP/EC) prioritizes consistency under normal conditions, incurring network latencies for write replication. Cassandra (AP/EL) prioritizes write latency under normal conditions, returning success immediately and propagating updates asynchronously.
Latency vs. Throughput
- Optimize for Latency: Aim to minimize the duration of a single request. (Approaches: CDN caching, index lookups, HTTP keep-alive, TCP tuning).
- Optimize for Throughput: Aim to maximize the number of requests processed per second. (Approaches: request batching, asynchronous background workers, streaming pipelines).
- The Trade-off: Batching increases throughput but adds latency to individual requests waiting for the batch to fill.
Read-Heavy vs. Write-Heavy Optimizations
- Read-Heavy Systems: Optimized by caching hot data in memory, introducing read-replicas, or pre-computing feeds.
- Write-Heavy Systems: Optimized by buffer writes in memory (LSM-trees), event streaming (Kafka/RabbitMQ) to absorb spikes, or database sharding.
- The Trade-off: Pre-computing feeds makes reads instantaneous but increases write amplification and complexity.
Stateful vs. Stateless Architectures
- Stateful Services: Maintain client context (e.g. server-side sessions, open socket connections).
- Trade-off: Low latency for contextual actions, but difficult to scale horizontally and vulnerable to server crashes.
- Stateless Services: Outsource state to external stores (e.g. databases, Redis).
- Trade-off: Trivial to scale (just spin up new containers), but adds network hop latencies to retrieve session data on every request.
🧠 2. Engineering Mental Models
Use these frameworks to reason through engineering challenges.
1. Cost vs. Scale vs. Complexity
Do not build a multi-region sharded database system for an application with 1,000 active users.
- Over-Engineering: Adding components (like Kafka, Kubernetes, or vector databases) without a workload to justify them increases maintenance overhead, testing difficulty, and hosting costs.
- Under-Engineering: Storing a high-throughput time-series metrics feed in a single relational database instance without partition strategies, leading to CPU lockups and tablespace bloat.
2. Build vs. Buy (Self-Hosted vs. Managed SaaS)
- Managed Services (e.g., AWS RDS, DynamoDB, Confluent Kafka):
- Pros: Zero maintenance overhead, automatic scaling, automated backups, and built-in replication.
- Cons: Vendor lock-in, higher infrastructure costs, less granular optimization control.
- Self-Hosted (running open-source Kafka/Postgres on raw EC2 VMs):
- Pros: Complete control over hardware configuration, cheaper at extreme scale.
- Cons: Massive engineering labor overhead (requires dedicated SRE teams for updates, backups, and failovers).
3. Phased Approaches & Migration Strategies
When transitioning architectures, you must maintain system uptime.
- Dual-Write Strategy: When migrating databases, write to both the old and new databases concurrently, read from the old, backfill historic data, switch reads to the new, and finally deprecate the old.
- Feature Flag Rollout: Deploy new microservices behind feature toggles, routing 1% of traffic to the new service and monitoring error rates before scaling to 100%.
💬 3. How to Present Trade-offs in Interviews
Use these structural phrasing approaches to demonstrate senior-level architectural thinking:
Step 1: State the Alternative
Never present a design as "the only way." Explicitly call out the alternative.
Phrase: "We could design this using a document store like MongoDB to support a flexible schemaless profile structure..."
Step 2: Analyze the Drawbacks of the Alternative
Show that you understand the scaling limits of that choice.
Phrase: "...however, because this system requires multi-row atomic transactions across billing and inventory, MongoDB would require complex application-level checks, introducing race condition risks."
Step 3: Justify the Final Choice
Explain how the chosen technology's advantages outweigh its drawbacks for this specific problem statement.
Phrase: "Therefore, I chose PostgreSQL. While it requires a rigid schema and manual sharding at scale, the native ACID compliance guarantees financial transaction safety, which is our primary non-functional requirement."
Mark this page when you finish learning it.
Last updated on
Spotted something unclear or wrong on this page?