Calls Failing
Incident Report for CircleLoop
Postmortem

Reason for Outage
Incident Date: 28th Jun 2022 Incident Time: 09:46 AM - 10:48 AM

Services Affected
Inbound & Outbound calls to the CircleLoop web applications.
Impact
Inbound & Outbound call failures for the majority of CircleLoop customers using the
Windows/Mac & Mobile applications. SIP Devices were unaffected.

Notification
The first sign of any issue was an alert via the CircleLoop monitoring system at 9:48AM,
at this point in time calls were intermittently functioning. Subsequently, the CircleLoop
Operations team and customers reported issues using the applications to make or
receive calls from 10.05AM, at this time calls were now consistently failing to connect.

Diagnosis & Cause
Upon investigation it was determined that both two components of CircleLoop platform
were unhealthy, with their application processes restarting and failing on a continual
basis.
The root cause of this was the deployment of a routine change to the CircleLoop
platform. This had the unforeseen consequence of causing an error in the Live Services
component of the platform, which began returning 400 responses to all requests.
SIP device users were unaffected as they do not use Live Services in their call flows.

Resolution
Several attempts were made to restore service while the incident was progressing,
initially rebooting the Live Services component which seemed successful, but quickly
reverted to an unhealthy state.
The issue was resolved by restoring the previous configuration, bringing Live Services
back to a healthy state.

Mitigation
The logic in Live Services has now been made more robust and has been tested to
ensure it gracefully handles errors, expected or otherwise, to ensure the issue does not reoccur.

Posted Jun 29, 2022 - 08:44 UTC

Resolved
This incident is now resolved.

A full post mortem will follow shortly.
Posted Jun 28, 2022 - 15:14 UTC
Update
Calls appear to be fully operational but we are still monitoring the issue and identifying the overall cause.

A full post-mortem will be posted once the issue is deemed completely resolved.
Posted Jun 28, 2022 - 11:37 UTC
Update
Thanks for you patience.

Call operations appear to be mostly back to normal as the incident remains to be monitored.

We will update those concerned with a full post-mortem when the incident is deemed completely resolved.
Posted Jun 28, 2022 - 11:06 UTC
Monitoring
The fix has been implemented and we are monitoring the results.

Call performance appears to now be improved but bear in mind it still be partially degraded.
Posted Jun 28, 2022 - 10:21 UTC
Update
Thanks for your patience.

We are moving closer to a fix and you may begin to see improved performance.

We can confirm that only app-based users are currently affected and those using SIP devices such as deskphones or ATA's should be unaffected.

Apologies again for the inconvenience.

Another update will follow shortly.
Posted Jun 28, 2022 - 10:03 UTC
Update
Thank you for your ongoing patience.

Unfortunately, the issue remains unchanged and performance is still degraded.

We will continue to update you as the incident develops.
Posted Jun 28, 2022 - 09:51 UTC
Update
We are continuing to work on a fix for this issue.
Posted Jun 28, 2022 - 09:49 UTC
Update
Our deepest apologies for this inconvenience.

A fix is still being implemented and performance is still degraded.

We will remain to keep you updated in the meantime.
Posted Jun 28, 2022 - 09:37 UTC
Update
Thanks for your patience.

A fix is still in the process of being implemented and therefore call performance is still degraded.

Please subscribe for further updates.
Posted Jun 28, 2022 - 09:20 UTC
Identified
We have identified an issue affecting both incoming and outgoing calls.

The cause has been identified and a fix is being implemented currently.

Apologies for any inconvenience caused, we are working to resolve this as soon as possible.
Posted Jun 28, 2022 - 09:04 UTC
This incident affected: Inbound calls, Outbound calls, API, and SMS.