Cloudflare, one of the most vital internet infrastructure firms serving tens of thousands of businesses, saw its services fail today, causing many well-known sites like X, Spotify, and this site you are visiting now.
After several hours of disruptions, Cloudflare issued a statement sharing that an intrusion or hack did not cause the outage but rather something more mundane – a bug in their bot management feature that caused the module to balloon in size, causing the entire kit and kaboodle to collapse. A rollback to a previous version fixed the immediate problem, and engineers have apparently modified the service so that if a similar event took place, it would not crater the internet.
The entire fiasco highlighted the fragility of key services which everyone takes for granted. It seems a better plan for Cloudflare would have been a contingency plan that provided a backup service.
Thomas Gillan, CEO at BR-DGE, shared his thoughts with CI, stating the outage is just another reminder of how calamitous a fault in the internet can be, as a single point of failure can tear through global systems with great rapidity.
“When a core infrastructure provider goes down, everything connected to it feels the impact. When an outage like this occurs, it’s not just a single site going offline, but potentially all the dependent services, from checkout pages to payment APIs and token services, that fail together. For merchants that rely on a single payment provider plus a single hosting or edge layer, the risk compounds: an infrastructure outage can cascade into a payments disruption, revenue loss and stalled expansion,” said Gillan.
He shared that their research indicates that 92% of enterprise ecommerce merchants have suffered payments outages in the past year losing millions of pounds in the process.
“Events like today underline why businesses can’t afford to rely on one provider or one route to keep revenue flowing. Resilience isn’t a ‘nice to have’, it’s a strategic necessity.”
He advises that merchants need infrastructures that can reroute traffic automatically, thus preventing a technical issue – anywhere in the chain – “from becoming a business-critical outage.”
Perhaps Cloudflare could consider offering a service like this.