AWS Well-Architected Framework · operator lens

Six pillars.
Reviewed every 90 days.

The Well-Architected Framework is AWS’s six-pillar checklist for cloud workloads. Most mid-market AWS shops run an internal review once at launch, then forget about it. Our cadence is quarterly, with concrete remediation tickets per gap. Below: what each pillar is, what we check for, and where mid-market shops consistently have gaps.

Pillar 1 of 6

Operational Excellence

Run and monitor systems to deliver business value, continuously improve.

What we check for

Infrastructure as Code (Terraform / CDK / CloudFormation) for everything in production
CI/CD pipelines with automated test gates, not manual deploy buttons
CloudWatch dashboards + alarms wired to your on-call rotation
Runbooks for the 5 most common incident types, tested quarterly
Post-incident reviews with blameless format and tracked action items

Typical gap at mid-market

The most common gap: 'we have monitoring' that turns out to be a default CloudWatch dashboard nobody looks at. The fix is wiring alarms to a paging system + naming an owner per service.

Pillar 2 of 6

Security

Protect information, systems, and assets while delivering business value.

What we check for

IAM Identity Center (SSO) for all human access — no individual IAM users for engineers
Service Control Policies (SCPs) at OU level preventing destructive actions
GuardDuty + Security Hub + Config + Inspector enabled organization-wide
KMS-managed encryption everywhere (S3, RDS, EBS, Secrets Manager)
Block Public Access enabled by default at the S3 account level
AWS WAF + Shield Advanced for public-facing workloads

Typical gap at mid-market

The IAM-sprawl gap is universal. Most mid-market AWS shops grew from 5 to 50 engineers without ever transitioning off IAM users to Identity Center. The transition is a one-week project that closes most of the audit findings.

Pillar 3 of 6

Reliability

Recover from failures, manage demand changes, mitigate disruptions.

What we check for

Auto-scaling groups + load balancing for stateless tier
Multi-AZ deployment for stateful services (RDS, ElastiCache, etc.)
Cross-region failover tested for tier-1 workloads
Backup + restore procedures tested quarterly, not just configured
Documented RTO/RPO per workload, with the architecture matching the requirements

Typical gap at mid-market

Backups that have never been restore-tested. We've found organizations whose 'backup strategy' was a CRON job that broke 18 months ago and nobody noticed because nobody checked. Restore-test cadence is the single most important reliability check.

Pillar 4 of 6

Performance Efficiency

Use computing resources efficiently to meet system requirements.

What we check for

Right-sized compute instances reviewed quarterly via Compute Optimizer
Caching layers (ElastiCache, CloudFront) where access patterns warrant
Async patterns (SQS, SNS, EventBridge) for non-real-time work
Database choice matched to workload (RDS vs DynamoDB vs Aurora Serverless)
Performance baselines + load testing for production-bound features

Typical gap at mid-market

Over-provisioning is more common than under-provisioning at mid-market. Engineers default to the next-larger instance size 'to be safe.' Compute Optimizer typically finds 20-30% of instances over-sized.

Pillar 5 of 6

Cost Optimization

Avoid unnecessary costs.

What we check for

Cost Explorer + Budgets + Cost Anomaly Detection enabled with alerting
Tagging policy enforced via SCPs (untagged resources get auto-deleted in non-prod)
Compute Savings Plans (not legacy RIs) covering 60-80% of baseline
S3 lifecycle policies on every long-retention bucket
EBS volume reviews (gp2 → gp3, snapshot cleanup)
Spot instances for batch / dev / non-critical workloads

Typical gap at mid-market

Tagging discipline is the foundation of FinOps. Without consistent tags you cannot allocate costs to teams or workloads, which means you cannot make informed decisions. Most cost-optimization projects start with a tagging cleanup.

Pillar 6 of 6

Sustainability

Minimize environmental impact, prefer efficient designs.

What we check for

Region selection that prefers AWS regions with renewable energy commitment
Right-sized infrastructure (under-utilized resources waste energy AND money)
Modern instance families (Graviton processors are more energy-efficient)
Lifecycle policies that delete unused resources rather than letting them idle

Typical gap at mid-market

Often de-prioritized in mid-market reviews. The work overlaps significantly with cost optimization — most sustainability wins are also cost wins. Worth flagging for ESG-reporting clients.

Reviewed every 90 days, not once at launch.

Most AWS environments fail their first Well-Architected review. That’s the point — the review surfaces the work. Our job is to make the work tractable, ticketed, and prioritized so the second review six months later is dramatically cleaner.

FinOps approach →Case studies →Book a Well-Architected review →

Six pillars.Reviewed every 90 days.

What we check for

Typical gap at mid-market

What we check for

Typical gap at mid-market

What we check for

Typical gap at mid-market

What we check for

Typical gap at mid-market

What we check for

Typical gap at mid-market

What we check for

Typical gap at mid-market

Reviewed every 90 days, not once at launch.

Six pillars.
Reviewed every 90 days.