
A routine security measure turned into a significant service disruption when a Cloudflare employee accidentally disabled the entire R2 Gateway service while attempting to block a phishing URL. The incident, which lasted 59 minutes on yesterday from 08:10 to 09:09 UTC, affected multiple Cloudflare services.
Impact Overview:
– Complete failure of R2 Object Storage
– Stream service: 100% failure in video operations
– Images service: Total disruption in image handling
– Cache Reserve: Complete operational failure
– Vectorize: 75% query failures, 100% failure in data operations
– Log Delivery: Data loss up to 13.6% for R2-related logs
– Key Transparency Auditor: Complete service disruption
Secondary Effects:
– Durable Objects: 0.09% error rate increase
– Cache Purge: 1.8% error spike and increased latency
– Workers & Pages: Minor deployment issues for R2-linked projects
Root Cause Analysis:
The outage stemmed from human error combined with insufficient system safeguards. Instead of blocking a specific phishing endpoint, the entire R2 Gateway service was mistakenly disabled during abuse remediation.
Remediation Steps:
Cloudflare has implemented immediate fixes including:
– Removing system shutdown capabilities from abuse review interface
– Restricting Admin API service disablement
– Planning improved account provisioning
– Implementing stricter access controls
– Establishing two-party approval for high-risk actions
This incident follows a previous outage in November 2024 that resulted in significant log data loss, highlighting the need for robust safety measures in system operations.