# Infrastructure

Terraform IaC, Docker, and CI/CD for the monolith. All services run on AWS ECS Fargate.

## Directory Layout

```
infrastructure/
├── env/                        # Per-environment Terraform configs
│   ├── dev/
│   ├── stg/
│   ├── prd/
│   ├── testslot1/
│   └── testslot2/
└── modules/                    # Reusable Terraform modules
    ├── services/               # Service-specific (one per NODE_TYPE)
    │   ├── app/
    │   ├── backoffice/
    │   ├── public-api/
    │   ├── super-admin/
    │   ├── grpc/
    │   ├── worker/
    │   ├── event-handler/
    │   └── monolith-cdc/
    ├── rest_server/            # Base module for REST services
    ├── grpc_server/            # Base module for gRPC services
    ├── worker_server/          # Base module for worker services
    ├── cross_services/         # Shared resources (S3 notifications, SNS, SQS, Lambda)
    ├── messages/               # SNS topics
    ├── sns_sqs_service/        # SQS queues, DLQs, EventBridge schedulers
    ├── debezium_connector/     # MSK Connect for CDC
    ├── transcoding/            # MediaConvert, SQS, EventBridge
    └── datadog/                # APM and logging integration
```

## Environments

| Environment | AWS Account | Trigger |
|-------------|-------------|---------|
| dev | 923929101992 | Push to `develop` branch |
| stg | 302630094508 | Manual dispatch |
| prd | 887841176879 | GitHub release |
| testslot1/2 | — | Manual dispatch |

Migrations run automatically on dev and stg deploys. Production migrations are manual.

## AWS Resources Managed

| Module | Resources |
|--------|-----------|
| `services/*` | ECS Fargate task definitions, services, autoscaling, IAM roles, CloudWatch logs |
| `rest_server` | ECS cluster, load balancer integration, Datadog sidecar, S3 config |
| `worker_server` | ECS service with SQS-based autoscaling (queue depth) |
| `cross_services` | S3 bucket notifications, SNS topics, SQS queues, Lambda triggers |
| `sns_sqs_service` | SQS queues, DLQs, EventBridge schedulers, IAM |
| `debezium_connector` | MSK Connect connector, security groups, IAM |
| `transcoding` | MediaConvert roles, SQS queues, EventBridge rules |
| `datadog` | Agent config, CloudWatch log subscriptions |

## Environment Config Pattern

Each environment folder (`env/{name}/`) contains:
- `main.tf` — Module instantiations with env-specific values (VPC IDs, subnets, cluster names, scaling config, Fargate Spot ratios)
- `versions.tf` — Terraform and provider versions
- `outputs.tf` — Output values
- `ssm.tf` — AWS SSM Parameter Store resources

## CI/CD Pipeline

GitHub Actions workflows in `.github/workflows/`:
- `dev.yml`, `stg.yml`, `prd.yml`, `testslot1.yml`, `testslot2.yml` — per-environment triggers
- `deployment-apply.yml` — reusable workflow called by all above (runs `terraform apply` on the full env, so it picks up new service modules automatically)
- `patch.yml` — manual surgical deploy of a single ECS service (or `all`). **Has a hard-coded `service` enum** that must be updated whenever a new ECS service is added — see "Adding a new ECS service" below.

Deployment flow:
1. **ImageUpload**: Build ARM64 Docker image → push to ECR (skips if image already exists)
2. **Deploy**: VPN connect → Terraform init → Terraform apply with `docker_image` and `image_tag` vars

Auth: AWS OIDC. Terraform version: 1.12.2.

## Adding a New ECS Service

When you create a new service module under `modules/services/{name}/`, the following places must be kept in sync. The first two are infra; the third is CI/CD glue that's easy to forget because it lives outside this directory.

1. **Service module** at `modules/services/{name}/` — `main.tf`, `variables.tf`, `iam.tf`, `lb.tf` (for REST), `versions.tf`. The ECS service name comes from `local.service` (e.g. `humand-super-admin-api`) and ends up as both the ECS service name and the task definition family.
2. **Wire it into all 5 env files** — `env/{dev,stg,prd,testslot1,testslot2}/main.tf`. Each env passes `docker_image`, autoscaling vars, secrets overrides, etc. Without this, `terraform apply` (run by `deployment-apply.yml`) won't create the service.
3. **Update `.github/workflows/patch.yml`** in three places (keep them in sync):
   - `inputs.service.options` — add the new service name to the dropdown.
   - "Resolve services to deploy" → `all)` branch — append the new service to the JSON array.
   - "Resolve cluster" — add the new service to the correct `case` branch (`public-${env}-services` for ALB-facing REST services like `humand-app`, `humand-super-admin-api`; `private-${env}-services` for `humand-grpc`, `humand-event-handler`, `humand-worker`, `monolith-cdc`). The cluster comes from whether the service module sets `cluster_name = var.env_config.public_cluster_name` or `private_cluster_name`.
4. **Add the new node type to `rest_server`/`worker_server`/`grpc_server` `node_type` validator** if your service uses one of those base modules and introduces a new `NODE_TYPE`.

If you skip step 3, the per-env deploys (`dev.yml`, etc.) still ship the new service via `terraform apply`, but the on-call `patch.yml` flow won't be able to redeploy that service surgically.

## Docker

- `Dockerfile` — Production multi-stage build (ARM64), uses CodeArtifact for npm auth
- `Dockerfile.dev` — Development image
- `compose.yml` — Local dev with PostgreSQL, Redis, Kafka, LocalStack

## Terraform Commands

```bash
export TERRAFORM_ROOT=infrastructure/env/dev
terraform -chdir=$TERRAFORM_ROOT init
terraform -chdir=$TERRAFORM_ROOT plan -var="docker_image=$IMAGE_NAME" -var="image_tag=$IMAGE_TAG"
terraform -chdir=$TERRAFORM_ROOT apply -var="docker_image=$IMAGE_NAME" -var="image_tag=$IMAGE_TAG"
```

## Linting and Docs

```bash
tflint --force --recursive --minimum-failure-severity=warning    # Lint Terraform
terraform fmt -recursive -check                                   # Check formatting
./generate-terraform-docs.sh                                      # Generate module docs
```

## Debezium CDC — Adding a New Table

Debezium (MSK Connect) watches PostgreSQL tables and publishes changes to Kafka topics. The topic name convention is `monolith.cdc.{TableName}` (e.g. `monolith.cdc.InstanceSamlConfig`).

**Use the `add-table-to-debezium` skill** — it handles everything: updating all 5 environment Terraform files, running `terraform fmt`, deciding REPLICA IDENTITY mode (DEFAULT vs FULL), and generating the migration when FULL is needed.

**No application code changes are needed** if there is no consumer in the monolith. Consumers (CDC services extending `BaseCDCService`) live in the relevant module under `src/api/modules/{module}/business/services/{module}CDCService.ts` and are registered in `src/api/modules/changeDataCapture/business/services/mainCDCService.ts`. If you only need the Kafka topic (e.g. for an external service to consume), adding the table to Terraform is sufficient.

## Guidelines

- Always `plan` before `apply`. Understand what will be created, modified, or destroyed.
- Infrastructure changes affect real AWS resources in production. Confirm with the user before modifying any Terraform module or environment config.
- Follow the modular pattern: base modules (`rest_server`, `worker_server`, `grpc_server`) provide common infra, service modules (`services/*`) customize per NODE_TYPE.
- New services should follow the existing pattern: create a service module under `modules/services/`, instantiate it in each environment's `main.tf`.
- Keep secrets in SSM Parameter Store, never in Terraform state or code.
- Run `tflint` and `terraform fmt` before committing Terraform changes.

See also: extracted application packages under `humand-packages/` (e.g. `humand-packages/scheduled-actions/AGENTS.md`).