On December 11, 2024, between 4:30 PM and 7:00 PM UTC, a single customer experienced a service disruption affecting their internal traffic routing to Adobe Experience Manager. The incident was caused by a configuration change in our content delivery infrastructure that interacted unexpectedly with the customer's network security setup (zScaler). The issue was identified and resolved, with no impact to other Adobe Experience Manager customers.
Other zScaler customers were not affected by the issue.
The incident began at 4:29 PM UTC when our team implemented a configuration change to enable optional mTLS on the Edge Delivery Services CDN. At 4:52 PM UTC, Adobe observed a service disruption and informed the affected customer. An incident response team was formed from 5:40 PM UTC and began a thorough investigation.
At 7:05 PM UTC the customer disabled zScaler’s traffic inspection and while traffic resumed, no root cause has been identified yet. At 7:43 PM UTC the initial configuration change on mTLS was rolled back, but root cause analysis was not yet completed.
After analysis, at 9:45 PM UTC, the team identified the root cause as an unexpected interaction between our new mTLS configuration and the customer's zScaler security setup. We then worked with the customer to verify service restoration, confirmed at 9:50 PM UTC.
For 2.5 hours, Edge Delivery Services experienced a global traffic drop of about 1%. This was not due to service unavailability or degraded performance, but a single customer ceasing requests due to internal network configuration blocking.
Enabling Mutual Transport Layer Security (mTLS) authentication on our Edge Delivery Services (aem.live) infrastructure triggered the incident.
Although the CDN didn’t enforce mutual TLS, only clients with a client certificate would interact with the new security layer. However, this created an unexpected interaction with zScaler’s traffic inspection that cannot inspect traffic using mTLS, even if unenforced. This limitation was not documented in zScaler’s architecture guide to TLS inspection and unknown to the Adobe team.
An unconventional setup combined zScaler inspection and enforcement, an on-site HTTP server proxying Edge Delivery Services, and a little-known mTLS support limitation. This led to a severe impact for internal users of a single customer and localized sites using the same HTTP server as a translation service origin.
Our priority was establishing a joint incident response team with Adobe engineers and the customer's network operations staff. This collaboration was crucial for quick problem resolution.
We identified two potential paths to restore service. The customer's team temporarily disabled their TLS inspection - a security feature that checks data for threats as it passes through their network - as a workaround, while our infrastructure team prepared to roll back the mTLS configuration changes. To minimize downtime, we pursued both in parallel.
After implementing these changes, we conducted thorough verification testing with the customer's team. This included monitoring traffic patterns and confirming functionality across all affected services until we confirmed full service restoration.
A simultaneous outage of our Single-Sign-On (SSO) infrastructure hindered our response and analysis of the incident.
We have identified the following key action items to prevent similar incidents and improve our response capabilities:
Monitoring and alerting improvements
Documentation and change management
Security and access management
Technical remediation
Communication and process improvements
We’ve assigned owners and deadlines to each action item to ensure timely completion of these improvements.