Disaster Recovery¶
GTIN1 maintains offsite backups in an isolated AWS account to protect against total data loss, account compromise, or accidental deletion.
Architecture Overview¶
Production Account DR Account (Isolated)
┌─────────────────┐ ┌──────────────────────┐
│ Aurora DB │ weekly dump │ S3 Bucket │
│ (PostgreSQL) │ ─────────────► │ (Object Lock) │
│ │ via ECS task │ │
│ S3 Media Bucket │ real-time │ database/ │
│ (gtin1-media) │ ─────────────► │ weekly/monthly/ │
│ │ S3 replication │ yearly backups │
│ │ │ media/ │
│ │ │ replicated objects │
└─────────────────┘ └──────────────────────┘
Key properties:
- Isolated account - DR account has no compute, no VPC, no inbound network access. Even if production is fully compromised, backups are protected.
- Object Lock (GOVERNANCE mode) - Objects cannot be deleted for 365 days, even by account administrators.
- Write-only access - Production can write to DR but never delete. Only the DR account root can override.
- Glacier Deep Archive - Media backups stored at ~$0.00099/GB/month.
What's Backed Up¶
| Data | Method | Frequency | RPO |
|---|---|---|---|
| Database (Aurora PostgreSQL) | pg_dump via ECS task |
Weekly (Sunday 2 AM UTC) | 1 week |
| Media files (S3) | S3 cross-account replication | Near real-time | Minutes |
Aurora automatic backups
Aurora also maintains 7-day automatic backups within the production account. The DR backups are a second layer of protection that survives account compromise.
Database Backup Retention¶
| Tier | Duration | Selection |
|---|---|---|
| Weekly | 28 days (4 weeks) | Every Sunday backup |
| Monthly | 365 days (12 months) | First Sunday of each month |
| Yearly | Forever | First Sunday of each year |
Retention is enforced by S3 lifecycle rules using object tags. The weekly backup workflow automatically determines the correct tier.
Recovery Procedures¶
Recovering Media Files¶
Media files are stored in Glacier Deep Archive. Recovery requires a restore request followed by download.
Estimated recovery time: 12-48 hours (Glacier Deep Archive retrieval)
# 1. List available media backups
aws s3 ls s3://closient-disaster-recovery-backup/media/ --recursive | head -20
# 2. Initiate restore from Glacier Deep Archive (bulk retrieval, 12-48 hours)
aws s3api restore-object \
--bucket closient-disaster-recovery-backup \
--key media/path/to/file.jpg \
--restore-request '{"Days": 7, "GlacierJobParameters": {"Tier": "Bulk"}}'
# 3. After restore completes, download the file
aws s3 cp s3://closient-disaster-recovery-backup/media/path/to/file.jpg ./restored/
# For bulk restore of all media files, use S3 Batch Operations
Recovering Database¶
Database backups are pg_dump custom format files that can be restored with pg_restore.
Estimated recovery time: 1-4 hours (depending on database size)
# 1. List available database backups
aws s3 ls s3://closient-disaster-recovery-backup/database/gtin1/ --recursive | grep manifest
# 2. Download the backup
aws s3 cp s3://closient-disaster-recovery-backup/database/gtin1/monthly/gtin1-dr-20260101-020000-abc1234/backup.dump ./
# 3. Check the manifest for metadata
aws s3 cp s3://closient-disaster-recovery-backup/database/gtin1/monthly/gtin1-dr-20260101-020000-abc1234/manifest.json - | jq .
# 4. Restore to a new database
createdb gtin1_restored
pg_restore --dbname=gtin1_restored --verbose --jobs=4 backup.dump
Full Account Compromise Recovery¶
If the production AWS account is compromised:
- Isolate the production account - Remove from AWS Organizations if necessary
- Provision new infrastructure - Use Terraform to recreate the production account
- Restore database - Download latest backup from DR, restore to new Aurora cluster
- Restore media - Initiate Glacier Deep Archive bulk restore, copy to new S3 bucket
- Update DNS - Point domain to new infrastructure
- Rotate all credentials - New database passwords, API keys, Stripe keys, etc.
Backup Verification¶
Monthly verification recommended
Backups that aren't tested are not backups. Verify monthly that backups can be restored.
Monthly verification runbook:
- Download the latest monthly backup to a testing environment
- Restore to a temporary database
- Run basic integrity checks (row counts, recent data present)
- Delete the temporary database
- Document the verification in the Linear issue
# Quick verification: check backup exists and has reasonable size
aws s3 ls s3://closient-disaster-recovery-backup/database/gtin1/monthly/ --recursive \
| sort -k1,2 | tail -5
Manual Steps (Not Automated)¶
MFA Delete¶
S3 MFA Delete adds an additional layer of protection requiring MFA to delete object versions. This must be enabled manually using root account credentials.
S3 Batch Replication¶
When replication is first enabled, only new objects replicate automatically. Existing objects require a one-time S3 Batch Replication job to backfill.
First Backup Verification¶
After initial setup, manually trigger the GTIN1 DR Backup workflow and verify the dump appears in the DR bucket with the correct retention tag.
Cost¶
Estimated monthly cost: $6-8/month
- Glacier Deep Archive storage: ~$1/TB/month
- S3 replication requests: ~$1-2/month
- CloudTrail data events: ~$2-3/month
- Budget alert threshold: $50/month