Optus outage review flags process, protocol gaps; Singtel expects recommendations to kick in ‘swiftly’
The report finds that ‘challenges in Optus’ culture’ affected decision-making and response times
[SINGAPORE] An independent review into Optus’ major September outage has uncovered gaps in process, accountability, and escalation and information protocols.
The Sep 18 incident when Triple Zero calls – emergency numbers used for life-threatening situations or emergencies – were blocked due to an outage following a network upgrade led to two deaths, based on the review.
The report, which Optus’ board released on Thursday (Dec 18), found that “challenges in Optus’ culture” affected decision-making and response times.
It also reinforced the need for “further work to address the issues facing emergency call services and device behaviour, and the importance of industry-wide collaboration”, said Optus, a Singtel subsidiary and Australia’s second-largest telco provider.
Optus said that its board accepted all 21 recommendations put forth by the report and agreed to “move swiftly with their implementation”.
Singtel group chief executive Yuen Kuan Moon noted: “We welcome Dr Kerry Schott’s independent and comprehensive review. We are deeply sorry for Optus’ September outage and will continue supporting Optus as it works to rebuild its network resilience and reliability as a critical infrastructure provider.”
He added: “With the board’s oversight, we expect CEO Stephen Rue and his team to move swiftly with the implementation of the recommendations which will accelerate the transformation programme already under way.”
Optus said that the recommendations build on the changes it introduced to address shortcomings that were identified during the initial response to the incident, as well as its multi-year transformation plans that are already being implemented.
The independent review conducted a comprehensive examination of the causes of the incident that led to a failure of Triple Zero services in South Australia, Western Australia, the Northern Territory and part of New South Wales.
It was led by Dr Schott, who has experience chairing rail-related panels and reviews. She previously led a short review into incident management at Sydney Trains for the New South Wales Government Cabinet Office.
What caused the outage?
First, the review found that mistakes made during the planning and implementing of a firewall upgrade led to the disconnection of Triple Zero calls from the network shortly after midnight on Sep 18. The services were restored only more than 14 hours later.
At least 10 actions were not properly executed, the report noted. Incorrect instructions about the upgrade’s implementation were provided, which appeared to be due to a lack of attention by firewall network engineers.
Second, the response was severely delayed as Optus failed to recognise the problem for around 13 hours despite receiving two earlier alerts. Thus, no corrective action was taken immediately.
The report noted that information about the issue was made available to Optus' contractor, Nokia, around 13 hours before the incident was recognised.
It added that Nokia checked the first alert but did not conduct further investigations beyond noting that it was related to the firewall upgrade. Optus was notified of the second alert, which was also noted as being related to the upgrade.
Customers also informed the Optus call centre about the Triple Zero call blockage, but there was no immediate follow-up.
The Singtel subsidiary commenced investigations into the problem only at 1.15 pm, when the South Australia Ambulance Service informed its Optus Enterprise division of Triple Zero calls being blocked.
Had the alerts received “more than cursory attention”, it would have allowed the issue to be corrected about 30 minutes into the outage, the report said.
“The fact that alerts can be overlooked because they are related to ongoing equipment upgrade work is astounding, when the reason for those alerts may be unanticipated problems caused by that work,” it added.
Third, only 150 of the 605 call attempts to reach emergency call services were successfully connected, with the remaining 75 per cent not going through. This was despite how emergency calls are supposed to automatically divert to other available networks in the event of a network failure, the report stated.
Recommendations
The report provided recommendations pertaining to Optus’ board, networks, call centre, general management, communications as well as the Triple Zero system.
It proposed that Optus ensure controls are in place to support robust execution of the correct processes and procedures when making routine changes within the networks area.
Network staff should be encouraged to escalate issues beyond their immediate group if they have doubts, and could benefit from more experienced oversight, it noted. Incident management exercises should be conducted to improve judgment about when to escalate incidents, it added.
In terms of general management, the report highlighted a need for improved risk management to avoid further incidents. While there was no firm conclusion on whether contract management dynamics played a role in the incident, the report recommended a more cooperative and productive approach to managing contracts for more complex work, including Optus’ contract with Nokia.
It also noted that the siloed nature of the company’s work, which led to poor information flow internally, should be addressed to facilitate a shift towards a more cooperative, company-wide way of working. It acknowledged, however, that such cultural shifts could take time.
Regarding communications, the report recommended frequent updates to the communications teams’ contact list details and the establishment of closer relationships with state and territory governments above the operational level of emergency services.
It proposed that the board consider the adequacy of its skill base and make changes if necessary, monitor efforts to upgrade risk management and ensure that the CEO and executive team are equipped to manage the company’s reform.
The report also noted that improvement processes under way at the call centre should be continued, and encouraged investigations into whether a data-enabled Triple Zero system should be implemented.
Shares of Singtel ended Wednesday 0.2 per cent or S$0.01 lower at S$4.55.
Decoding Asia newsletter: your guide to navigating Asia in a new global order. Sign up here to get Decoding Asia newsletter. Delivered to your inbox. Free.
Copyright SPH Media. All rights reserved.