Blob / Object Storage
What it is
Object storage (S3, GCS, Azure Blob, MinIO) stores immutable-ish blobs addressed by bucket + key over HTTP(S). Optimized for large sequential reads/writes, durability (replication, erasure coding), and cheap $/GB—not POSIX filesystem semantics.
Core concepts
- Bucket: namespace and policy boundary.
- Object key: logical path string (implementation may shard by prefix).
- Multipart upload: split large objects into parts uploaded in parallel; commit assembles with ETags; resume failed parts. Essential for large uploads and unreliable networks.
- Lifecycle rules: transition to infrequent / archive tiers; expire old versions or incomplete multipart uploads after abandonment.
Client -- initiate multipart --> get uploadId
Client -- upload part N --> store part
Client -- complete --> assembled object visibleWhen to use
- User-generated content: images, video, PDFs, backups.
- Data lake files (Parquet, CSV) for analytics.
- Static assets behind reverse-proxy-cdn.
Alternatives
- Block storage (EBS): attached volumes for databases—not shared blob semantics.
- NFS shared file: POSIX; harder at cloud scale for internet-facing blob workloads.
Failure modes
- Public bucket misconfiguration: data leak; use bucket policies, block public access defaults.
- Incomplete multipart junk consuming space—lifecycle abort cleanup.
- LIST heavy workloads with huge prefixes—expensive; use inventory or metadata DB for catalog.
- Hot prefix throttling on some providers—distribute key prefixes (shuffle).
Interview talking points
- Presigned URLs: client uploads/downloads direct to object storage, bypassing app servers (ties to api-gateway for auth issuance only).
- Durability vs latency: first-byte latency higher than SSD; design for large sequential patterns.
- Estimate cost and egress with back-of-envelope; note egress dominates bill for global users—pair with CDN.
Related fundamentals
Last updated on
Spotted something unclear or wrong on this page?