AWS Architecture Design

Production-Grade Cloud Infrastructure

ExpertAWSTerraformCI/CDSecurity

About This Prompt

This comprehensive architectural prompt is designed for cloud solution architects building enterprise-grade, production-ready applications on AWS. It covers everything from VPC design and security to CI/CD pipelines and cost optimization strategies.

The prompt ensures proper implementation of high availability, security best practices, multi-tenancy, observability, and scalability for applications targeting ~10,000 concurrent users with clear environment promotion strategies.

Security & Compliance

IAM least privilege, VPC security groups, secrets management, and tenant isolation

Scalability

Auto-scaling ECS services, Multi-AZ deployments, and Redis caching for high performance

Observability

Elastic/OpenSearch logging, CloudWatch metrics, and comprehensive alerting strategies

Data Layer

Aurora MySQL with multi-tenancy, ElastiCache Redis with cost-optimization techniques

The Prompt

## Prompt: Production-Grade AWS Full-Stack Architecture Design

You are a **cloud solution architect** designing a **production-ready full-stack application architecture**.

### Application Context

* **Frontend**: React (SPA), hosted as static assets
* **Backend**: Spring Boot REST APIs
* **Datastores**:
  * MySQL (Aurora MySQL / RDS)
  * Redis (ElastiCache for Redis) for caching
* **Authentication**: Firebase (Google) Identity Provider using JWT
* **Target scale**: ~10,000 concurrent users
* **Cloud provider**: AWS
* **Deployment model**: Containerized (Docker) with ECS (EC2 launch type preferred)
* **Infrastructure as Code**: Terraform
* **CI/CD**: AWS CodePipeline + CodeBuild
* **Logging & Monitoring**: Elastic/OpenSearch + CloudWatch
* **Security & Secrets**: AWS Secrets Manager, IAM least privilege
* **Quality & Security Scanning**: SonarQube, dependency and container scanning
* **CDN & Storage**: S3 + CloudFront
* **Goal**: Highly available, secure, scalable, cost-optimized system

---

### 1. Goals and Non-Goals

**Goals**

* Highly available, secure, and scalable architecture for ~10k users
* Low-latency reads using Redis caching
* Clear environment promotion flow: **Dev → QA → Prod**
* Strong observability:
  * Centralized, searchable logs in Elastic/OpenSearch
  * Metrics and alarms in CloudWatch
* Secure authentication and authorization:
  * Firebase JWT validation
  * Tenant isolation
  * Least-privilege IAM
* Infrastructure fully managed using Terraform
* Cost optimization, with special focus on Redis and networking

**Non-Goals**

* Deep application-level code or domain modeling
* Vendor comparison or selection for Elastic/OpenSearch

---

### 2. Target AWS Architecture (Logical)

Design a logical architecture including:

* Optional Route 53 → CloudFront
* CloudFront:
  * Serves React static assets from S3
  * Forwards API traffic to ALB
* Application Load Balancer (ALB):
  * Routes traffic to ECS services (or EC2 ASG if ECS not used)
* ECS (EC2 launch type):
  * Spring Boot containers
  * Horizontal autoscaling based on CPU, request count, and latency
* Database:
  * Aurora MySQL / RDS MySQL (Multi-AZ, backups, optional read replicas)
* Cache:
  * ElastiCache Redis
  * Cluster mode on/off based on throughput
  * Multi-AZ replication group
* Logging:
  * Application logs → FireLens / Fluent Bit → Elastic/OpenSearch
  * Metrics and alarms → CloudWatch
* Secrets:
  * AWS Secrets Manager for DB, Redis, Firebase config
* CI/CD:
  * CodePipeline + CodeBuild → ECR → ECS deployment
* Security scanning:
  * SonarQube quality gates
  * Dependency and container scanning

---

### 3. Network and Security (VPC Design)

Design a **per-environment VPC** with:

* VPC CIDR (e.g., /16)
* Public subnets (2–3 AZs):
  * ALB
  * NAT Gateway (HA or cost-optimized per env)
* Private application subnets (2–3 AZs):
  * ECS tasks / EC2 instances
* Private data subnets (2–3 AZs):
  * RDS MySQL
  * ElastiCache Redis

**Security Groups**

* ALB SG: inbound 443 from CloudFront / internet, outbound to app SG
* App SG: inbound app port from ALB SG, outbound to DB, Redis, Elastic, AWS APIs
* RDS SG: inbound 3306 from app SG only
* Redis SG: inbound 6379 from app SG only

**VPC Endpoints**

* S3 (Gateway)
* ECR (api + dkr)
* CloudWatch Logs
* Secrets Manager
* SSM
* STS
  (Used to reduce NAT cost and improve security)

---

### 4. Authentication & Authorization (Firebase / Google IdP)

Design the auth flow:

* User authenticates via Firebase (Google / email)
* Client receives Firebase ID token (JWT)
* Client calls backend with Authorization: Bearer <token>
* Spring Boot:
  * Validates JWT signature using Firebase JWKS
  * Verifies issuer, audience, expiry
  * Extracts custom claims (tenant, roles)
  * Builds Spring Security principal

**Authorization**

* RBAC using roles and permissions
* Tenant isolation enforced at API and DB layers

**Custom Claims Strategy**

* tenant_id
* roles[]

---

### 5. Application Runtime on ECS

* Dockerized Spring Boot images stored in ECR
* Separate ECS services per environment:
  * app-dev
  * app-qa
  * app-prod
* Autoscaling:
  * CPU (50–60%)
  * ALB RequestCountPerTarget
  * Optional custom latency metric
* Health checks:
  * /actuator/health endpoint
  * Include DB/Redis checks based on tolerance

---

### 6. MySQL and Multi-Tenant Enforcement

* Aurora MySQL / RDS MySQL (Multi-AZ)
* Automated backups and PITR
* Optional read replicas
* Parameter group tuning

**Multi-Tenant Model**

* Shared database
* tenant_id column in all business tables
* Composite indexes (tenant_id, business_key)
* Enforcement via:
  * Spring Security context
  * Repository-level filters
  * DB constraints where applicable

---

### 7. Redis Caching Design

**What to Cache**

* Read-heavy entities (profiles, configs, permissions)
* Derived/computed views
* Avoid identity/session caching (Firebase handles identity)

**Caching Pattern**

* Cache-aside
* TTL-based invalidation
* Write-through only if justified

**Key Strategy**

* t:{tenantId}:entity:{id}

**TTL**

* Hot data: 30s–5m
* Reference data: 1h–24h
* Add TTL jitter to avoid stampedes

**HA**

* Multi-AZ replication group
* Snapshots only if cache metadata is critical

---

### 8. Cost-Optimized Redis Techniques

Include practical strategies such as:

* Separate caches by workload
* Right-size nodes
* TTL + eviction policy alignment
* Payload compression for large objects
* Memory-efficient DTOs
* Avoid unbounded lists
* Dev/QA cost controls:
  * Smaller nodes
  * No Multi-AZ in dev
  * Scheduled scaling
* Use VPC endpoints to reduce NAT costs

---

### 9. Logging and Observability

**Logging**

* ECS → Fluent Bit → Elastic/OpenSearch
* Structured JSON logs
* Required fields:
  * timestamp, level, service, env, version
  * trace_id, span_id, tenant_id, user_id
  * request_path, status, latency_ms

**Metrics & Alarms**

* ALB: 4XX/5XX, latency, healthy hosts
* ECS: CPU, memory, task count, restarts
* RDS: CPU, connections, slow queries
* Redis: memory, evictions, hit rate, replication lag

---

### 10. CI/CD: Dev → QA → Prod

**Pipeline Stages**

* Source
* Build + unit tests
* SonarQube scan + quality gate
* Build Docker image + tagging (git_sha, env-latest)
* Push to ECR
* Deploy to Dev
* Smoke tests
* Manual approval
* Deploy to QA
* Performance tests (optional)
* Manual approval
* Deploy to Prod
* Post-deploy smoke tests

**Deployment Strategy**

* Blue/Green with CodeDeploy for Prod
* Rolling updates for Dev/QA
* Automated rollback on:
  * Health check failures
  * Latency spikes
  * Error rate thresholds

---

### 11. Secrets and Configuration

* Secrets stored in AWS Secrets Manager:
  * DB credentials
  * Redis auth
  * App secrets
* ECS task roles scoped per environment
* Fetch secrets at startup and cache in memory
* Non-secret config via SSM Parameter Store
* Separate namespaces per environment

---

### 12. Terraform Structure

Design Terraform modules:

modules/
  vpc/
  alb/
  ecs_cluster/
  ecs_service/
  rds_mysql/
  elasticache_redis/
  s3_cloudfront/
  iam/
  codepipeline/
  monitoring/

envs/
  dev/
  qa/
  prod/

* Remote state in S3
* DynamoDB lock table
* Separate state per environment

---

### 13. Performance Considerations (10k Users)

* Connection pooling (HikariCP)
* Tenant-aware DB indexing
* Aggressive Redis usage for hot reads
* Pagination everywhere
* Autoscaling based on request count / latency
* Monitor slow queries and scale DB read replicas if needed

---

### 14. Environment Strategy

* Prefer separate AWS accounts per environment
* Or isolated VPCs per environment
* Never share prod data/secrets
* Use sanitized data in QA

---

### 15. Final Deliverables

Produce:

* Architecture diagrams (logical + network)
* Terraform module structure
* Security model
* CI/CD flow
* Redis cost optimization plan
* Logging & monitoring standards
* Environment promotion strategy

What This Prompt Delivers

Complete AWS Architecture

Full logical and network architecture diagrams with VPC design, subnets, security groups, and service interconnections

Terraform Infrastructure as Code

Modular Terraform structure with separate environments, remote state management, and DynamoDB locking

CI/CD Pipeline Design

Complete deployment pipeline with quality gates, security scanning, environment promotion, and automated rollback strategies

Security & Compliance Model

Firebase authentication integration, IAM roles, tenant isolation, secrets management, and security best practices

Cost Optimization Strategy

Redis cost optimization techniques, VPC endpoint usage, NAT Gateway strategies, and environment-specific sizing

Observability Framework

Centralized logging with Elastic/OpenSearch, CloudWatch metrics, structured log standards, and comprehensive alarming

Tips for Using This Prompt

  • 1

    Customize the scale parameters (10k users) based on your actual expected load and growth projections

  • 2

    Adjust the authentication provider if not using Firebase - specify your IdP (Auth0, Cognito, Okta)

  • 3

    Specify compliance requirements (HIPAA, PCI DSS, SOC 2) if applicable to get tailored security recommendations

  • 4

    Modify the tech stack (Spring Boot, React) to match your actual application framework choices

  • 5

    Request specific AWS service alternatives if preferred (e.g., Fargate instead of EC2 launch type for ECS)