Core Concepts (System Components)
To build robust, scalable, and fault-tolerant architectures, you must master the fundamental building blocks of distributed systems. This page serves as a comprehensive reference guide for each core component.
💾 1. Databases
Choosing the right data store is one of the most critical design decisions. You must justify your database selection based on the data model, access patterns, and consistency requirements.
Relational (SQL) Databases
- Key Systems: PostgreSQL, MySQL, Oracle.
- Core Characteristics: Structured schemas, tabular format, relations defined by foreign keys, strong ACID compliance.
- Storage Engines: B-Trees (optimized for read-heavy workloads with rapid range queries).
- Best Used For: Financial transactions, user profile management, ordering systems, or any domain requiring complex joins and multi-row transactions.
- Scaling Limit: Difficult to scale horizontally. Requires manual sharding or complex replication models.
Non-Relational (NoSQL) Databases
- Key-Value Stores:
- Examples: Redis, DynamoDB.
- Characteristics: Super high-speed lookup by key. Minimal structure.
- Use Cases: Session caching, user preferences, rate limiting.
- Document Databases:
- Examples: MongoDB, CouchDB.
- Characteristics: Stores semi-structured data as JSON/BSON documents. Schemaless.
- Use Cases: Catalog management, content management systems, user profiles.
- Column-Oriented (Wide-Column) Stores:
- Examples: Apache Cassandra, ScyllaDB, HBase.
- Characteristics: Data stored by column families rather than rows. Uses LSM-Trees (Log-Structured Merge-Trees) which append writes to memory and flush them sequentially to disk, offering extremely fast write throughput.
- Use Cases: High-volume telemetry, logging, time-series data, chat histories.
- Graph Databases:
- Examples: Neo4j, Amazon Neptune.
- Characteristics: Stores data in nodes (entities) and edges (relationships). Optimized for traverse-heavy queries.
- Use Cases: Social networks, recommendation engines, fraud detection.
⚡ 2. Caching
Caching stores copy data in fast, volatile memory to reduce database load and improve response latencies.
Cache Eviction Policies
- LRU (Least Recently Used): Discards the least recently accessed items first. The standard default.
- LFU (Least Frequently Used): Discards items with the lowest access count. Useful for identifying static hot keys.
- FIFO (First-In, First-Out): Evicts items in the order they were added.
- TTL (Time-To-Live): Evicts items once they reach a predefined expiration duration.
Caching Patterns
- Cache-Aside (Lazy Loading):
- Flow: Application checks cache. If hit, returns data. If miss, queries database, updates cache, and returns data.
- Trade-off: Cache only contains requested data, but a cache miss incurs a double hop latency penalty.
- Write-Through:
- Flow: Application writes to cache. Cache synchronously updates database.
- Trade-off: Data is never stale, but write latency is higher because of the synchronous write to DB.
- Write-Back (Write-Behind):
- Flow: Application writes to cache. Cache immediately acknowledges write, and updates the database asynchronously (in batches).
- Trade-off: Extremely fast writes, but high risk of data loss if the cache server crashes before flushing to the database.
- Refresh-Ahead:
- Flow: Cache automatically refreshes hot keys before they expire.
- Trade-off: Eliminates latency for hot keys, but requires complex predictive logic.
⚖️ 3. Load Balancing
Load balancers distribute incoming network traffic across multiple servers to prevent any single server from becoming a bottleneck.
Layer 4 (L4) vs. Layer 7 (L7) Load Balancing
- Layer 4 Load Balancing:
- Operation: Works at the Transport layer (TCP/UDP).
- Mechanism: Makes routing decisions based on IP address and port number without reading packet contents.
- Pros/Cons: Extremely fast and low overhead, but cannot inspect headers or route based on URL path.
- Layer 7 Load Balancing:
- Operation: Works at the Application layer (HTTP/HTTPS).
- Mechanism: Inspects packet payloads (HTTP headers, cookies, URL paths).
- Pros/Cons: Supports smart routing (e.g., sending
/imagesto image servers), SSL termination, and session stickiness, but requires more CPU power.
Load Balancing Algorithms
- Consistent Hashing: critical for distributed systems. Uses a hash ring to map both servers and request keys. When a server is added or removed, only a fraction of keys ($1/N$) are remapped.
🌐 4. Content Delivery Networks (CDNs)
A CDN is a geographically distributed network of proxy servers that cache content close to users.
Push vs. Pull CDNs
- Pull CDNs:
- Mechanism: The CDN edge server pulls content from the origin server on a cache miss.
- Use Case: High-traffic websites with dynamic content.
- Push CDNs:
- Mechanism: The origin server proactively pushes content to the CDN.
- Use Case: Large, static assets (game patch files, video libraries, software installers).
✉️ 5. Message Queues & Event Streaming
Decoupling services via asynchronous message flows is essential for scalability.
Pub/Sub vs. Point-to-Point Queueing
- Point-to-Point (Queue): Messages are delivered to exactly one consumer (e.g., AWS SQS).
- Pub/Sub (Publish/Subscribe): Messages are broadcasted to all active subscribers of a topic (e.g., Apache Kafka).
Partitioning & Order Guarantees
- Strict ordering is guaranteed only within a single partition. Events are routed to partitions using a partition key (e.g.,
user_id).
Message Delivery Guarantees
- At-most-once: Messages might get lost, but they are never delivered twice.
- At-least-once: Messages are guaranteed to be delivered, but they may be duplicated. Requires consumers to be idempotent.
- Exactly-once: Messages are delivered exactly once. Achieved using transactional coordinates between producers, message brokers, and consumers.
🛑 6. Rate Limiting
Rate limiting controls the rate of traffic sent or received by a network interface or service.
Algorithms
- Token Bucket: A bucket holds a maximum number of tokens. Tokens refill at a constant rate. Requests consume tokens. Supports traffic bursts.
- Leaky Bucket: Requests enter a queue. Requests are processed at a constant, smooth rate. Smooths out traffic spikes.
- Fixed Window Counter: Tracks requests within static time windows (e.g., 100/minute). Vulnerable to boundary bursts (up to 200 requests within a second around the boundary).
- Sliding Window Counter: Blends the count of the previous window with the current window progress to calculate a moving average. Low memory footprint, prevents boundary spikes.
🛡️ 7. Fault Tolerance & Resilience
Distributed systems will fail. Your design must degrade gracefully.
- Circuit Breaker: Monitors downstream failure rates. If failures exceed a threshold, it trips to Open, immediately failing requests without calling the downstream service, letting it recover.
- Retries with Exponential Backoff & Jitter: Prevents a "thundering herd" by increasing delays exponentially and adding random noise (jitter) to the retry times.
- Bulkhead: Segregates resources (like thread pools or host processes) so that a failure in one subsystem does not consume all system resources.
Mark this page when you finish learning it.
Last updated on
Spotted something unclear or wrong on this page?