Six pillars.
Reviewed every 90 days.
The Well-Architected Framework is AWSβs six-pillar checklist for cloud workloads. Most mid-market AWS shops run an internal review once at launch, then forget about it. Our cadence is quarterly, with concrete remediation tickets per gap. Below: what each pillar is, what we check for, and where mid-market shops consistently have gaps.
Operational Excellence
Run and monitor systems to deliver business value, continuously improve.
What we check for
- Infrastructure as Code (Terraform / CDK / CloudFormation) for everything in production
- CI/CD pipelines with automated test gates, not manual deploy buttons
- CloudWatch dashboards + alarms wired to your on-call rotation
- Runbooks for the 5 most common incident types, tested quarterly
- Post-incident reviews with blameless format and tracked action items
Typical gap at mid-market
The most common gap: 'we have monitoring' that turns out to be a default CloudWatch dashboard nobody looks at. The fix is wiring alarms to a paging system + naming an owner per service.
Security
Protect information, systems, and assets while delivering business value.
What we check for
- IAM Identity Center (SSO) for all human access β no individual IAM users for engineers
- Service Control Policies (SCPs) at OU level preventing destructive actions
- GuardDuty + Security Hub + Config + Inspector enabled organization-wide
- KMS-managed encryption everywhere (S3, RDS, EBS, Secrets Manager)
- Block Public Access enabled by default at the S3 account level
- AWS WAF + Shield Advanced for public-facing workloads
Typical gap at mid-market
The IAM-sprawl gap is universal. Most mid-market AWS shops grew from 5 to 50 engineers without ever transitioning off IAM users to Identity Center. The transition is a one-week project that closes most of the audit findings.
Reliability
Recover from failures, manage demand changes, mitigate disruptions.
What we check for
- Auto-scaling groups + load balancing for stateless tier
- Multi-AZ deployment for stateful services (RDS, ElastiCache, etc.)
- Cross-region failover tested for tier-1 workloads
- Backup + restore procedures tested quarterly, not just configured
- Documented RTO/RPO per workload, with the architecture matching the requirements
Typical gap at mid-market
Backups that have never been restore-tested. We've found organizations whose 'backup strategy' was a CRON job that broke 18 months ago and nobody noticed because nobody checked. Restore-test cadence is the single most important reliability check.
Performance Efficiency
Use computing resources efficiently to meet system requirements.
What we check for
- Right-sized compute instances reviewed quarterly via Compute Optimizer
- Caching layers (ElastiCache, CloudFront) where access patterns warrant
- Async patterns (SQS, SNS, EventBridge) for non-real-time work
- Database choice matched to workload (RDS vs DynamoDB vs Aurora Serverless)
- Performance baselines + load testing for production-bound features
Typical gap at mid-market
Over-provisioning is more common than under-provisioning at mid-market. Engineers default to the next-larger instance size 'to be safe.' Compute Optimizer typically finds 20-30% of instances over-sized.
Cost Optimization
Avoid unnecessary costs.
What we check for
- Cost Explorer + Budgets + Cost Anomaly Detection enabled with alerting
- Tagging policy enforced via SCPs (untagged resources get auto-deleted in non-prod)
- Compute Savings Plans (not legacy RIs) covering 60-80% of baseline
- S3 lifecycle policies on every long-retention bucket
- EBS volume reviews (gp2 β gp3, snapshot cleanup)
- Spot instances for batch / dev / non-critical workloads
Typical gap at mid-market
Tagging discipline is the foundation of FinOps. Without consistent tags you cannot allocate costs to teams or workloads, which means you cannot make informed decisions. Most cost-optimization projects start with a tagging cleanup.
Sustainability
Minimize environmental impact, prefer efficient designs.
What we check for
- Region selection that prefers AWS regions with renewable energy commitment
- Right-sized infrastructure (under-utilized resources waste energy AND money)
- Modern instance families (Graviton processors are more energy-efficient)
- Lifecycle policies that delete unused resources rather than letting them idle
Typical gap at mid-market
Often de-prioritized in mid-market reviews. The work overlaps significantly with cost optimization β most sustainability wins are also cost wins. Worth flagging for ESG-reporting clients.
Reviewed every 90 days, not once at launch.
Most AWS environments fail their first Well-Architected review. Thatβs the point β the review surfaces the work. Our job is to make the work tractable, ticketed, and prioritized so the second review six months later is dramatically cleaner.
