Reason for Outage
Incident Date: 28th Jun 2022 Incident Time: 09:46 AM - 10:48 AM
Services Affected
Inbound & Outbound calls to the CircleLoop web applications.
Impact
Inbound & Outbound call failures for the majority of CircleLoop customers using the
Windows/Mac & Mobile applications. SIP Devices were unaffected.
Notification
The first sign of any issue was an alert via the CircleLoop monitoring system at 9:48AM,
at this point in time calls were intermittently functioning. Subsequently, the CircleLoop
Operations team and customers reported issues using the applications to make or
receive calls from 10.05AM, at this time calls were now consistently failing to connect.
Diagnosis & Cause
Upon investigation it was determined that both two components of CircleLoop platform
were unhealthy, with their application processes restarting and failing on a continual
basis.
The root cause of this was the deployment of a routine change to the CircleLoop
platform. This had the unforeseen consequence of causing an error in the Live Services
component of the platform, which began returning 400 responses to all requests.
SIP device users were unaffected as they do not use Live Services in their call flows.
Resolution
Several attempts were made to restore service while the incident was progressing,
initially rebooting the Live Services component which seemed successful, but quickly
reverted to an unhealthy state.
The issue was resolved by restoring the previous configuration, bringing Live Services
back to a healthy state.
Mitigation
The logic in Live Services has now been made more robust and has been tested to
ensure it gracefully handles errors, expected or otherwise, to ensure the issue does not reoccur.