# Infrastructure Terraform IaC, Docker, and CI/CD for the monolith. All services run on AWS ECS Fargate. ## Directory Layout ``` infrastructure/ ├── env/ # Per-environment Terraform configs │ ├── dev/ │ ├── stg/ │ ├── prd/ │ ├── testslot1/ │ └── testslot2/ └── modules/ # Reusable Terraform modules ├── services/ # Service-specific (one per NODE_TYPE) │ ├── app/ │ ├── backoffice/ │ ├── public-api/ │ ├── super-admin/ │ ├── grpc/ │ ├── worker/ │ ├── event-handler/ │ └── monolith-cdc/ ├── rest_server/ # Base module for REST services ├── grpc_server/ # Base module for gRPC services ├── worker_server/ # Base module for worker services ├── cross_services/ # Shared resources (S3 notifications, SNS, SQS, Lambda) ├── messages/ # SNS topics ├── sns_sqs_service/ # SQS queues, DLQs, EventBridge schedulers ├── debezium_connector/ # MSK Connect for CDC ├── transcoding/ # MediaConvert, SQS, EventBridge └── datadog/ # APM and logging integration ``` ## Environments | Environment | AWS Account | Trigger | |-------------|-------------|---------| | dev | 923929101992 | Push to `develop` branch | | stg | 302630094508 | Manual dispatch | | prd | 887841176879 | GitHub release | | testslot1/2 | — | Manual dispatch | Migrations run automatically on dev and stg deploys. Production migrations are manual. ## AWS Resources Managed | Module | Resources | |--------|-----------| | `services/*` | ECS Fargate task definitions, services, autoscaling, IAM roles, CloudWatch logs | | `rest_server` | ECS cluster, load balancer integration, Datadog sidecar, S3 config | | `worker_server` | ECS service with SQS-based autoscaling (queue depth) | | `cross_services` | S3 bucket notifications, SNS topics, SQS queues, Lambda triggers | | `sns_sqs_service` | SQS queues, DLQs, EventBridge schedulers, IAM | | `debezium_connector` | MSK Connect connector, security groups, IAM | | `transcoding` | MediaConvert roles, SQS queues, EventBridge rules | | `datadog` | Agent config, CloudWatch log subscriptions | ## Environment Config Pattern Each environment folder (`env/{name}/`) contains: - `main.tf` — Module instantiations with env-specific values (VPC IDs, subnets, cluster names, scaling config, Fargate Spot ratios) - `versions.tf` — Terraform and provider versions - `outputs.tf` — Output values - `ssm.tf` — AWS SSM Parameter Store resources ## CI/CD Pipeline GitHub Actions workflows in `.github/workflows/`: - `dev.yml`, `stg.yml`, `prd.yml`, `testslot1.yml`, `testslot2.yml` — per-environment triggers - `deployment-apply.yml` — reusable workflow called by all above (runs `terraform apply` on the full env, so it picks up new service modules automatically) - `patch.yml` — manual surgical deploy of a single ECS service (or `all`). **Has a hard-coded `service` enum** that must be updated whenever a new ECS service is added — see "Adding a new ECS service" below. Deployment flow: 1. **ImageUpload**: Build ARM64 Docker image → push to ECR (skips if image already exists) 2. **Deploy**: VPN connect → Terraform init → Terraform apply with `docker_image` and `image_tag` vars Auth: AWS OIDC. Terraform version: 1.12.2. ## Adding a New ECS Service When you create a new service module under `modules/services/{name}/`, the following places must be kept in sync. The first two are infra; the third is CI/CD glue that's easy to forget because it lives outside this directory. 1. **Service module** at `modules/services/{name}/` — `main.tf`, `variables.tf`, `iam.tf`, `lb.tf` (for REST), `versions.tf`. The ECS service name comes from `local.service` (e.g. `humand-super-admin-api`) and ends up as both the ECS service name and the task definition family. 2. **Wire it into all 5 env files** — `env/{dev,stg,prd,testslot1,testslot2}/main.tf`. Each env passes `docker_image`, autoscaling vars, secrets overrides, etc. Without this, `terraform apply` (run by `deployment-apply.yml`) won't create the service. 3. **Update `.github/workflows/patch.yml`** in three places (keep them in sync): - `inputs.service.options` — add the new service name to the dropdown. - "Resolve services to deploy" → `all)` branch — append the new service to the JSON array. - "Resolve cluster" — add the new service to the correct `case` branch (`public-${env}-services` for ALB-facing REST services like `humand-app`, `humand-super-admin-api`; `private-${env}-services` for `humand-grpc`, `humand-event-handler`, `humand-worker`, `monolith-cdc`). The cluster comes from whether the service module sets `cluster_name = var.env_config.public_cluster_name` or `private_cluster_name`. 4. **Add the new node type to `rest_server`/`worker_server`/`grpc_server` `node_type` validator** if your service uses one of those base modules and introduces a new `NODE_TYPE`. If you skip step 3, the per-env deploys (`dev.yml`, etc.) still ship the new service via `terraform apply`, but the on-call `patch.yml` flow won't be able to redeploy that service surgically. ## Docker - `Dockerfile` — Production multi-stage build (ARM64), uses CodeArtifact for npm auth - `Dockerfile.dev` — Development image - `compose.yml` — Local dev with PostgreSQL, Redis, Kafka, LocalStack ## Terraform Commands ```bash export TERRAFORM_ROOT=infrastructure/env/dev terraform -chdir=$TERRAFORM_ROOT init terraform -chdir=$TERRAFORM_ROOT plan -var="docker_image=$IMAGE_NAME" -var="image_tag=$IMAGE_TAG" terraform -chdir=$TERRAFORM_ROOT apply -var="docker_image=$IMAGE_NAME" -var="image_tag=$IMAGE_TAG" ``` ## Linting and Docs ```bash tflint --force --recursive --minimum-failure-severity=warning # Lint Terraform terraform fmt -recursive -check # Check formatting ./generate-terraform-docs.sh # Generate module docs ``` ## Debezium CDC — Adding a New Table Debezium (MSK Connect) watches PostgreSQL tables and publishes changes to Kafka topics. The topic name convention is `monolith.cdc.{TableName}` (e.g. `monolith.cdc.InstanceSamlConfig`). **Use the `add-table-to-debezium` skill** — it handles everything: updating all 5 environment Terraform files, running `terraform fmt`, deciding REPLICA IDENTITY mode (DEFAULT vs FULL), and generating the migration when FULL is needed. **No application code changes are needed** if there is no consumer in the monolith. Consumers (CDC services extending `BaseCDCService`) live in the relevant module under `src/api/modules/{module}/business/services/{module}CDCService.ts` and are registered in `src/api/modules/changeDataCapture/business/services/mainCDCService.ts`. If you only need the Kafka topic (e.g. for an external service to consume), adding the table to Terraform is sufficient. ## Guidelines - Always `plan` before `apply`. Understand what will be created, modified, or destroyed. - Infrastructure changes affect real AWS resources in production. Confirm with the user before modifying any Terraform module or environment config. - Follow the modular pattern: base modules (`rest_server`, `worker_server`, `grpc_server`) provide common infra, service modules (`services/*`) customize per NODE_TYPE. - New services should follow the existing pattern: create a service module under `modules/services/`, instantiate it in each environment's `main.tf`. - Keep secrets in SSM Parameter Store, never in Terraform state or code. - Run `tflint` and `terraform fmt` before committing Terraform changes. See also: extracted application packages under `humand-packages/` (e.g. `humand-packages/scheduled-actions/AGENTS.md`).