Learning from the AWS Outage: Designing Cloud Strategies That Withstand Failure
October 27, 2025
Earlier this week, a major Amazon Web Services (AWS) outage rippled across the UK and beyond, leaving organisations unable to log in, connect, or carry out basic business functions. It was a reminder that even the most trusted hyperscalers aren't invincible, and that resilience in the cloud needs to be designed, not assumed.
In a conversation following the outage, FourNet's infrastructure experts, Joe James and Robert Brown discussed what resilience really means in today's cloud-first world, and how organisations can make sure their operations stay running even when the unexpected happens.
Understanding What Resilience Really Means
Resilience isn't just about having a disaster recovery plan written down somewhere. As Rob explains, it's the ability to keep business and technical functions running under pressure, whether that's from heavy data loads or a temporary loss of availability in a specific region.
It's not enough to assume continuity or to rely entirely on your provider's redundancy. True resilience starts with understanding exactly where your systems sit, how they interact, and where the potential weak points lie. That means asking the practical questions:
- Which platforms handle authentication
- Where your data is physically stored
- What happens if a particular region fails
When AWS's US-E zone went down, many organisations discovered they didn't actually know the answers. "The key issue wasn't just the outage itself," Rob notes. "It was that people didn't understand how reliant they were on those zones."
The Real Risk in Cloud-First Strategies
For most organisations, cloud-first is no longer just an ambition; it's the backbone of their digital transformation. The scalability, flexibility and cost efficiency it brings are too valuable to ignore. But resilience doesn't come automatically with a move to the cloud.
As Rob puts it, "Cloud is just a data centre you don't own." Every system still needs careful architecture and ongoing review. The biggest risk is putting everything in one place, with one provider, and assuming that scale equals immunity. Vendor lock-in can quietly create a single point of failure. If every core function, from collaboration tools to payroll, sits with one cloud provider, even a short outage can have far-reaching consequences.
This isn't about stepping back from cloud adoption. It's about building a cloud environment that can adapt when one part fails, using multi-region design, hybrid connectivity, or cross-cloud services to keep operations steady.
The Hidden Dependencies That Outages Expose
This week's outage also revealed how modern systems depend on one another in ways that aren't always obvious. Even companies whose core applications were fine found themselves locked out because their identity provider was hosted on AWS. "They were blind," Rob explains. "They couldn't log in to monitor their platforms, even though everything was running under the hood."
Dependencies like authentication, APIs, or third-party integrations can easily become invisible vulnerabilities. Mapping them, testing them, and having fallback routes in place is what separates inconvenience from real disruption. A secondary login route or backup monitoring system can make all the difference when things go wrong.
Practical Steps Towards Resilient Cloud Infrastructure
When the next outage hits, organisations that have prepared will recover faster and with less impact. Rob's advice is to start simple:
- Map what you have
- Identify your business-critical systems
- Where they live and how they connect
- Prioritise what really matters to keeping services running.
Testing those plans is just as important as writing them. "Run tabletop exercises," Rob says. "Sit around a table and say, 'AWS has gone down. What have we lost?' Then work through the impact from there." That kind of review helps uncover hidden dependencies and highlight where backup plans need to be stronger.
From there, resilience comes down to design. Build systems that can "fail gracefully", not catastrophically. That might mean using multiple regions or providers, maintaining some on-premise recovery capability, or designing for quick restoration when services return.
How FourNet Builds for Continuity
At FourNet, resilience is built in from the start. "We design everything to be secure by design," Rob explains. "We map where services are deployed and how connectivity is maintained."
For customers who need the highest levels of availability, that means designing for five or six nines uptime, with dual links, redundant networks, and intelligent routing. "We always try to build that way," Rob adds, "but we also make sure it's proportionate and cost-effective."
Even internally, FourNet applies the same approach. Platforms are distributed across our own private data centres and cloud environments, ensuring that no single outage can take everything offline.
Planning for the Inevitable
The takeaway from the AWS incident isn't to question cloud strategies, but to strengthen them. As Rob summarises, "Don't hope the hyperscalers won't fail. Plan for the fact that they will."
Resilience isn't about retreating from the cloud; it's about knowing your dependencies, designing intelligently, and building the flexibility to recover quickly when something goes wrong. Outages are unavoidable, but disruption doesn't have to be.