AWS compute patterns — ECS, EKS & Lambda

AWS compute decision path diagram comparing Lambda, ECS Fargate, ECS on EC2, and EKS against workload shape, VPC data access, IAM, latency, capacity, and operational complexity.

Core details

Problem this solves: choose the runtime that fits workload shape, latency, cost, operability, and team maturity instead of defaulting to the service you last used.

Choice	Sweet spot	Watch
Lambda	event-driven, bursty, short executions	cold start, 15m cap, VPC ENI latency (mitigations evolve)
ECS Fargate	containers without managing nodes	task CPU/mem pricing, platform versions
ECS EC2	need custom AMIs, GPUs, cost at steady high load	ops of capacity
EKS	Kubernetes standard, multi-team platform	control plane cost, complexity

RDS / data in private subnets; tasks in VPC need security groups, NAT for outbound depending on design.

ALB → target group → ECS service / EKS Ingress — health checks align with app readiness.

Understanding

Lambda for transform + orchestration glue; long-lived HTTP APIs often ECS/EKS for connection pooling and predictable latency—interviews reward honest boundaries.

Decision lens

Workload signal	First choice to consider	Why
Spiky event handler, seconds long	Lambda	pay per use, simple scaling
Long-lived HTTP API with pools	ECS Fargate	container runtime, simpler ops than Kubernetes
Multi-team platform already on Kubernetes	EKS	shared Kubernetes abstractions and ecosystem
High steady utilization and custom hosts	ECS EC2 or EKS managed nodes	better cost control, more node control
Queue worker with predictable concurrency	ECS/Fargate or Lambda	choose by run duration, cold-start tolerance, and dependency pooling

Tie the choice to the bottleneck. Lambda can be excellent for bursty glue but awkward for connection-heavy workloads. Kubernetes can be powerful for platform teams but excessive for one service with simple scaling.

Senior understanding

Provisioned concurrency / always-on tasks trade money for tail latency. IAM least privilege per task role. Secrets from Secrets Manager / SSM — not env in CI logs.

Probe	Strong answer
"Lambda cold starts?"	quantify p95/p99 impact; use provisioned concurrency only for paths that need it
"ECS or EKS?"	ECS for simpler container ops; EKS when Kubernetes platform value outweighs complexity
"Private RDS access?"	VPC subnets, security groups, DNS, pool sizing, NAT/endpoints for outbound dependencies
"Cost control?"	right-size tasks, autoscaling policy, scheduled capacity, spot where safe, reserved baseline

Failure modes

Choosing Lambda for a connection-heavy API and exhausting DB connections during bursts.
Putting tasks in private subnets without a NAT gateway or VPC endpoints for required outbound calls.
Reusing broad IAM roles across services, turning one service compromise into account-wide access.
Scaling workers faster than the downstream database, queue, or third-party API can absorb.
Treating EKS as "managed everything" while still owning upgrades, ingress, policy, and cluster add-ons.

Interview drill

Question: "You need to run a Node.js API, a thumbnail worker, and a nightly reconciliation job. What AWS compute choices do you make?"

Model answer structure:

Node API: ECS Fargate behind ALB if the team does not need Kubernetes; EKS only if the platform already exists.
Thumbnail worker: Lambda for small bounded images; ECS worker if processing can exceed Lambda limits or needs native tooling.
Nightly job: scheduled ECS task or EventBridge-triggered Lambda depending on duration and dependencies.
Shared controls: task/function IAM roles, secrets manager, VPC routing, logs/metrics/traces, autoscaling guardrails.
Cost and reliability: scale by queue depth, cap concurrency to protect dependencies, and measure tail latency/cold starts.

Follow-ups to expect:

"How do you avoid DB connection storms?"
"What if the worker needs GPUs?"
"How do you deploy safely across regions?"