Environment Strategy

Development, staging, and production environments with promotion pipeline.

Environment Overview

Aspect	Dev	Staging	Production
Purpose	Development & testing	QA & integration testing	Live users
Users	Dev team	QA + stakeholders	End users
Data	Synthetic	Production-like	Real customer data
Scale	Low (100 RPS)	Medium (1K RPS)	High (10K+ RPS)
Uptime SLA	None	95%	99.9%
Cost	~$300/mo	~$500/mo	~$2000/mo
Deployment	Manual/auto	Manual gated	Fully automated

Development Environment

Purpose

Fast iteration, experimentation, testing new features before staging.

Characteristics

Minimal infrastructure - Single-instance databases, no redundancy
Frequent changes - Deploy multiple times daily
Loose constraints - No strict security or compliance requirements
Easy reset - Can destroy and recreate anytime

Resources

Dev Database: PostgreSQL single instance (2GB)
Dev Cache: Redis single node
Dev App: 1-2 instances (auto-scale disabled)
Dev Storage: 10GB logs (auto-purge after 7 days)

Access

All developers can deploy
No approval required
Full database access for testing
Local development also encouraged

Data Management

Synthetic test data only
Can reset to clean state anytime
PII masking not required
Public test credentials OK

Monitoring

Basic health checks only
Errors logged but no alerts
Manual inspection of logs
Performance not critical

Staging Environment

Purpose

Production-like testing before release. QA, integration testing, performance validation.

Characteristics

Production-like scale - Similar infrastructure to prod
Gated deployments - Require approval before deploying
Real scenarios - Use production-like data volumes
Metrics tracked - Performance and reliability validated

Resources

Staging Database: PostgreSQL HA (multi-AZ capable)
Staging Cache: Redis with replication
Staging App: 2-3 instances with auto-scaling
Staging Storage: 100GB logs (30-day retention)

Access

Developers via merge to main
QA team has full environment access
Product team can test features
Limited access to production data (if needed)

Data Management

Anonymized copy of production data (weekly)
Alternatively: High-volume synthetic data
PII must be masked or removed
Test data reset before major releases

Deployment Process

Automatic from main: Code merged to main → auto-deploys to staging
Gated by tests: All tests must pass
Gated by security: CodeQL, SAST, dependency checks must pass
Monitoring: Error budgets tracked

Testing in Staging

Full test suite passes
Performance within SLA
Load test (1-5x expected traffic)
Chaos engineering (kill pods, latency injection)
Data migration testing (if applicable)
QA manual testing (happy paths + edge cases)
Performance profiling
Security scanning results reviewed

Monitoring

Full observability enabled
Performance metrics tracked
Error budgets calculated
Alerts configured but non-critical

Production Environment

Purpose

Serve real users with high reliability, security, and performance.

Characteristics

High availability - Multi-AZ, auto-failover
Strict deployments - Planned, reviewed, staged
Security hardened - Encryption, WAF, rate limiting
Observable - Comprehensive monitoring & alerting
Optimized - Performance tuned for scale

Resources

Production Database: PostgreSQL HA multi-AZ (16GB+)
Production Cache: Redis cluster with replication & failover
Production App: 4-6 instances, auto-scale 2-20
Production Storage: 1TB logs (2-year retention)
Production CDN: Global content delivery

Access

Code: Only via approved PRs to main
Deployment: Release engineer only
Database: Read-only for debugging (very limited)
Secrets: Via Key Vault (audit logged)
SSH: Emergency only (bastion host + approval)

Data Management

Real customer data - Encrypted at rest
Backups: Daily + point-in-time recovery
Retention: Per compliance requirements
PII: Encrypted, access logged
GDPR: Data deletion honored immediately

Deployment Process

Tag Release: Version created in code
Build Artifact: Docker image built and scanned
Approval Gate: Release manager approves
Deploy to Staging: Automated
Smoke Tests: Automated validation
Blue/Green Deploy: New version alongside old
Traffic Switch: Gradual shift (5% → 50% → 100%)
Monitoring: Watch error rates, latency
Verification: Business metrics validated
Rollback Ready: Previous version available

Total time: 30 minutes to 2 hours (depending on size)

Deployment Approval Checklist

Monitoring

Real-time dashboards:

Request rate, latency (p50/p95/p99)
Error rate by endpoint
Database performance
Cache hit rate
Resource utilization

Alerting:

Critical: Page on-call immediately
High: Create incident ticket
Medium: Log for investigation
Low: Weekly report

SLOs:

Availability: 99.9%
Latency p99: < 200ms
Error rate: < 0.1%

Promotion Pipeline

Workflow

Local Development
       ↓
   [git push to feature branch]
       ↓
GitHub PR (automated checks)
       ↓
   [PR approved, merged to main]
       ↓
Automatic Deploy to Staging
       ↓
   [Automated + manual testing]
       ↓
Manual Deploy to Production
       ↓
   [Release tagged, canary deploy]
       ↓
Production Stable

Environment Variables

Each environment has separate configuration:

Dev:
- LOG_LEVEL=debug
- CACHE_ENABLED=true
- API_TIMEOUT=30s
- DB_POOL_SIZE=5

Staging:
- LOG_LEVEL=info
- CACHE_ENABLED=true
- API_TIMEOUT=10s
- DB_POOL_SIZE=20

Production:
- LOG_LEVEL=warn
- CACHE_ENABLED=true
- API_TIMEOUT=5s
- DB_POOL_SIZE=50
- ENABLE_METRICS=true
- ALERT_ON_ERRORS=true

Stored in Key Vault per environment.

Blue/Green Deployments

Process

Blue (current) - Version 2.0.0 running, receiving 100% traffic
Green (new) - Version 2.1.0 deployed, receiving 0% traffic
Test Green - Run smoke tests against new version
Switch Traffic - Gradually shift traffic to green
- 1% for 5 minutes (watch errors)
- 10% for 10 minutes (watch metrics)
- 50% for 15 minutes (half users, half old)
- 100% (all users on new version)
Monitor - Keep blue running for 1 hour as fallback
Finalize - Keep blue as standby for 24 hours

Rollback

If issues detected during traffic shift:

Immediately stop traffic to green
Revert to 100% traffic to blue
Investigate issues
Fix and try again

Rollback time: < 5 minutes

Database Migrations

Strategy

Migrations run before app deployment to prevent downtime.

1. Add new column (nullable) to database
2. Update app code to use new column
3. Backfill data for old records
4. Drop old column (later release)

Timeline Example

Release 2.1.0:

Migration: Add user_id column (nullable) to transactions table
App code: Start writing to both old and new column
Data backfill: Fill user_id for existing records

Release 2.2.0:

Migration: Make user_id NOT NULL (backfill complete)
App code: Remove writes to old column

Release 2.3.0:

Migration: Drop old column
App code: No longer uses old column

Testing Migrations

Run migration on staging database (full copy of production)
Verify data integrity (counts, checksums)
Test rollback procedure
Verify app handles both old and new schema
Performance test (large tables)

Secrets Management

Per-Environment Secrets

Each environment has separate secrets in Key Vault:

Environment: dev
- POSTGRES_PASSWORD: devpassword123
- JWT_SECRET: dev-signing-key
- API_KEY: demo-key-12345

Environment: staging
- POSTGRES_PASSWORD: [unique strong password]
- JWT_SECRET: [staging specific key]
- API_KEY: [staging key]

Environment: production
- POSTGRES_PASSWORD: [HSM-encrypted, rotated monthly]
- JWT_SECRET: [HSM-encrypted, rotated quarterly]
- API_KEY: [production key, audit logged access]

Secret Rotation

Database passwords: Every 30 days
API keys: Every 90 days
Signing keys: Every 6 months
Certificates: Automated renewal 30 days before expiry

Cost Allocation

Budget by Environment

Dev: $300/month - Flexible, experimentation encouraged
Staging: $500/month - Scaled testing
Production: $2000/month - High availability

Cost Optimization

Kill unused resources immediately
Schedule shutdown during off-hours
Use cheaper instance types in dev/staging
Reserved instances for prod (33% savings)
Archive old logs to cold storage

Uh oh!

FilesExpand file tree

environment-strategy.md

Latest commit

History

environment-strategy.md

File metadata and controls

Environment Strategy

Environment Overview

Development Environment

Purpose

Characteristics

Resources

Access

Data Management

Monitoring

Staging Environment

Purpose

Characteristics

Resources

Access

Data Management

Deployment Process

Testing in Staging

Monitoring

Production Environment

Purpose

Characteristics

Resources

Access

Data Management

Deployment Process

Deployment Approval Checklist

Monitoring

Promotion Pipeline

Workflow

Environment Variables

Blue/Green Deployments

Process

Rollback

Database Migrations

Strategy

Timeline Example

Testing Migrations

Secrets Management

Per-Environment Secrets

Secret Rotation

Cost Allocation

Budget by Environment

Cost Optimization

Compliance & Data

Data Residency

Access Logs

Disaster Recovery

References