We would like to acknowledge the outage that disrupted service for some of our clients earlier today. We understand that Borealis is an important part of your day-to-day activities, so to ensure transparency, we are providing a detailed analysis of the situation.
In publishing this article, we wish to reassure our North American clients that we have resolved the issues. The service is now running at optimum target levels, and we do not anticipate any further disruptions. We apologize for any inconvenience this may have caused.
What happened?
Two related issues occurred on May 9, 2023 for clients whose instances are hosted in North America.
- Early in the morning, our clients hosted in this region experienced some difficulties creating communications and individuals in Borealis. Our technical team investigated the issue and made the decision to deactivate the service causing the problem, to allow day-to-day activities to continue while they examined the problem’s deeper roots. Any linked communications and individuals that were impacted by the issue will be reprocessed by Borealis to ensure that they are created in the system as expected, and to minimize the impact on users.
- Later in the day, some clients experienced intermittent downtime on their Borealis instance between 1PM and 4PM (EST). The incidents were related, but independent of one another. Fixes were completed at 3:45PM (EST).
Our network administrators identified an issue that impacted the high availability chain, leading to intermittent downtime. Upon determining the cause, they rebooted the network to restore full functionality.
How did we respond?
- At present, our decision to deactivate the service that was causing errors when creating records was still the best approach. It was not related to the outage that occurred during the afternoon. Deactivating the service allowed users to continue using Borealis and logging their communications as usual.
- Following the detection of the second issue, targeted servers experienced some instability, causing the outage detailed here. The problem was isolated to minimize the impact on instance availability.
What are we doing to prevent situations like this in the future?
Our system administrators will continue to prioritize resolution of all issues that cause unplanned downtime, in keeping with the highest industry standards.