πŸ‡¨πŸ‡¦VancouverπŸ‡¨πŸ‡¦TorontoπŸ‡ΊπŸ‡ΈLos AngelesπŸ‡ΊπŸ‡ΈOrlandoπŸ‡ΊπŸ‡ΈMiami
1-855-KOO-TECH
KootechnikelKootechnikel
Insights Β· Field notes from the SOC
Plain-language briefings from the people watching the alerts.
Weekly Β· No spam
AWS Well-Architected Framework Β· operator lens

Six pillars.
Reviewed every 90 days.

The Well-Architected Framework is AWS’s six-pillar checklist for cloud workloads. Most mid-market AWS shops run an internal review once at launch, then forget about it. Our cadence is quarterly, with concrete remediation tickets per gap. Below: what each pillar is, what we check for, and where mid-market shops consistently have gaps.

Pillar 1 of 6

Operational Excellence

Run and monitor systems to deliver business value, continuously improve.

What we check for

  • Infrastructure as Code (Terraform / CDK / CloudFormation) for everything in production
  • CI/CD pipelines with automated test gates, not manual deploy buttons
  • CloudWatch dashboards + alarms wired to your on-call rotation
  • Runbooks for the 5 most common incident types, tested quarterly
  • Post-incident reviews with blameless format and tracked action items

Typical gap at mid-market

The most common gap: 'we have monitoring' that turns out to be a default CloudWatch dashboard nobody looks at. The fix is wiring alarms to a paging system + naming an owner per service.

Pillar 2 of 6

Security

Protect information, systems, and assets while delivering business value.

What we check for

  • IAM Identity Center (SSO) for all human access β€” no individual IAM users for engineers
  • Service Control Policies (SCPs) at OU level preventing destructive actions
  • GuardDuty + Security Hub + Config + Inspector enabled organization-wide
  • KMS-managed encryption everywhere (S3, RDS, EBS, Secrets Manager)
  • Block Public Access enabled by default at the S3 account level
  • AWS WAF + Shield Advanced for public-facing workloads

Typical gap at mid-market

The IAM-sprawl gap is universal. Most mid-market AWS shops grew from 5 to 50 engineers without ever transitioning off IAM users to Identity Center. The transition is a one-week project that closes most of the audit findings.

Pillar 3 of 6

Reliability

Recover from failures, manage demand changes, mitigate disruptions.

What we check for

  • Auto-scaling groups + load balancing for stateless tier
  • Multi-AZ deployment for stateful services (RDS, ElastiCache, etc.)
  • Cross-region failover tested for tier-1 workloads
  • Backup + restore procedures tested quarterly, not just configured
  • Documented RTO/RPO per workload, with the architecture matching the requirements

Typical gap at mid-market

Backups that have never been restore-tested. We've found organizations whose 'backup strategy' was a CRON job that broke 18 months ago and nobody noticed because nobody checked. Restore-test cadence is the single most important reliability check.

Pillar 4 of 6

Performance Efficiency

Use computing resources efficiently to meet system requirements.

What we check for

  • Right-sized compute instances reviewed quarterly via Compute Optimizer
  • Caching layers (ElastiCache, CloudFront) where access patterns warrant
  • Async patterns (SQS, SNS, EventBridge) for non-real-time work
  • Database choice matched to workload (RDS vs DynamoDB vs Aurora Serverless)
  • Performance baselines + load testing for production-bound features

Typical gap at mid-market

Over-provisioning is more common than under-provisioning at mid-market. Engineers default to the next-larger instance size 'to be safe.' Compute Optimizer typically finds 20-30% of instances over-sized.

Pillar 5 of 6

Cost Optimization

Avoid unnecessary costs.

What we check for

  • Cost Explorer + Budgets + Cost Anomaly Detection enabled with alerting
  • Tagging policy enforced via SCPs (untagged resources get auto-deleted in non-prod)
  • Compute Savings Plans (not legacy RIs) covering 60-80% of baseline
  • S3 lifecycle policies on every long-retention bucket
  • EBS volume reviews (gp2 β†’ gp3, snapshot cleanup)
  • Spot instances for batch / dev / non-critical workloads

Typical gap at mid-market

Tagging discipline is the foundation of FinOps. Without consistent tags you cannot allocate costs to teams or workloads, which means you cannot make informed decisions. Most cost-optimization projects start with a tagging cleanup.

Pillar 6 of 6

Sustainability

Minimize environmental impact, prefer efficient designs.

What we check for

  • Region selection that prefers AWS regions with renewable energy commitment
  • Right-sized infrastructure (under-utilized resources waste energy AND money)
  • Modern instance families (Graviton processors are more energy-efficient)
  • Lifecycle policies that delete unused resources rather than letting them idle

Typical gap at mid-market

Often de-prioritized in mid-market reviews. The work overlaps significantly with cost optimization β€” most sustainability wins are also cost wins. Worth flagging for ESG-reporting clients.

Reviewed every 90 days, not once at launch.

Most AWS environments fail their first Well-Architected review. That’s the point β€” the review surfaces the work. Our job is to make the work tractable, ticketed, and prioritized so the second review six months later is dramatically cleaner.