AWS in production · mid-market patterns

What actually goes wrong on AWS.
And what makes it go right.

Six anonymized failure patterns we’ve seen at mid-market shops on AWS, and three wins that show what good looks like. The patterns repeat — single-account blast radius, untagged spend, public S3, IAM sprawl, single-region dependency. The fixes are well-understood. The work is operational discipline.

Failures

Six AWS failure patterns we see at mid-market.

Each oblique-references a real engagement (or composite of several). The cards are written for the buyer who suspects they’re running into one of these and wants to see the fix.

AWSarchitecture
The single-account blast radius
The mid-market SaaS that ran prod, staging, and the founder's experiments in one AWS account — and lost a week of customer data when an IAM policy was over-broadened during a Friday deploy.
The lesson. Multi-account architecture from day one. Production isolated from staging isolated from sandbox. SCPs at the OU level prevent the single-policy-edit blast radius.
AWScost
The $40,000 weekend
The startup whose untagged Lambda function ran in an infinite loop over a long weekend, generating an AWS bill the founders had to negotiate down with the CFO at 2am Tuesday morning.
The lesson. Budget alerts at the account level, hard alarm thresholds, and cost-anomaly-detection enabled before the first deploy. Tagging is a Day-1 requirement, not a future cleanup project.
AWSdata exposure
The S3 bucket that wasn't supposed to be public
Pick a year between 2017 and 2026 — every year has a marquee S3-misconfiguration breach. The pattern: a bucket policy or ACL that 'temporarily' allowed public access, never reverted, and got indexed by a search engine.
The lesson. Block Public Access enabled at the account level by default. S3 Access Analyzer running. CSPM tools (AWS Security Hub, Wiz, Prisma Cloud) catching policy drift in CI/CD before the deploy lands.
AWSpermissions sprawl
The 47 IAM users with AdministratorAccess
The mid-market that grew from 5 to 50 engineers, granting AdministratorAccess to everyone for speed, then discovering during the SOC 2 audit that 47 active human IAM users could delete the entire production environment.
The lesson. IAM Identity Center (formerly SSO) for human access. Permission boundaries on every role. Quarterly access reviews. The transition from 'IAM users' to 'Identity Center + role assumption' is the single biggest IAM win for mid-market AWS shops.
AWScost
The Reserved Instance debt
The company that committed to 3-year RIs in 2022 for an instance family they migrated off in 2024 — paying for compute they don't use until late 2025.
The lesson. Savings Plans (compute-flexibility) instead of family-locked RIs. Quarterly portfolio rebalancing. Treat AWS commitments as a financial portfolio, not a one-time procurement.
AWSvendor concentration
The us-east-1 dependency
The SaaS with all production workloads in a single AWS region, who lost 14 hours of revenue during the AWS regional incident in late 2024 because cross-region failover had been on the roadmap for two years.
The lesson. For tier-1 workloads: cross-region failover tested quarterly. For everything else: at minimum a documented degradation runbook for AWS-region incidents. Single-region dependency is a debt, not a default.

Wins

Three wins that show what good looks like.

The wins below come from real engagements, anonymized. The pattern is consistent: small operational investments unlock large outcomes when the foundations are right.

AWS engagement
Multi-account landing zone deployed in 2 weeks
A 200-person fintech replaced its single-account AWS environment with a Control Tower landing zone (5 OUs, SCPs, GuardDuty org-wide) in 14 working days — closing the SOC 2 finding that had blocked their next round.
What made it work. AWS Control Tower + Account Factory + Customizations for Control Tower (CfCT) makes landing zones a 2-3 week project, not a 6-month one. The blocker is usually appetite, not capability.
AWS engagement
22% AWS bill reduction in one quarter
A mid-market healthcare SaaS cut their AWS bill 22% in one quarter — without touching application code — through Compute Savings Plans rebalancing, EBS gp2→gp3 migration, S3 lifecycle policies, and decommissioning 18 dormant resources.
What made it work. First-quarter FinOps wins are almost always low-hanging fruit. Cost Explorer + Trusted Advisor + Compute Optimizer surfaces 80% of the opportunities. The work is operational discipline, not architecture.
AWS engagement
GuardDuty caught the foothold attempt at 2am
A mid-market SaaS's GuardDuty deployment detected a credential-stuffing pattern against an exposed staging environment at 2am on a Tuesday. Automated response disabled the affected IAM user before any production data was touched.
What made it work. GuardDuty + Security Hub + EventBridge wired to automated IAM response gives you 24/7 detection-and-response without staffing a 24/7 SOC. The configuration is a few hours of work and prevents the scenario nobody wants to be in at 2am.

The pattern: AWS rewards operational discipline.

AWS gives you the building blocks to do almost anything. The failures above all share a missing foundation — multi-account architecture, IAM Identity Center, GuardDuty + Security Hub, tagging discipline, FinOps cadence. The wins all started with getting those foundations right.

See our Well-Architected approach →See our FinOps approach →Book an AWS audit →

What actually goes wrong on AWS.And what makes it go right.

The single-account blast radius

The $40,000 weekend

The S3 bucket that wasn't supposed to be public

The 47 IAM users with AdministratorAccess

The Reserved Instance debt

The us-east-1 dependency

Multi-account landing zone deployed in 2 weeks

22% AWS bill reduction in one quarter

GuardDuty caught the foothold attempt at 2am

The pattern: AWS rewards operational discipline.

What actually goes wrong on AWS.
And what makes it go right.