FCC post-mortem on AT&T outage uncovers similar QA issues to those…

Mitigation and recommendations

In light of the incident, AT&T has taken “numerous steps” to put better QA in place to avoid such slip-ups in the future, including additional steps that ensure confirmation that “required peer reviews have been completed” before deploying any maintenance work.

The provider also implemented technical controls within 48 hours of the incident to scan the network “for any network elements lacking the controls that would have prevented the outage,” so those controls could be put in place. AT&T continues to be engaged in a forensic investigation of the incident and also has enhanced its network for “robustness and resilience,” according to the report.

The FCC also recommended that only previously approved network changes developed “pursuant to internal procedures and industry best practices” should be deployed on the AT&T production network in the future. “It should not be possible to load changes that fail to meet those criteria,” the FCC said in the report.

Indeed, proper peer review also could have helped avoid the scenario that befell CrowdStrike on Friday, when “a defect found in a Falcon content update for Windows hosts” delivered the infamous Blue Screen of Death across millions of Windows systems worldwide, resulting in missed flights, closed call centers, and cancelled surgeries.

However, these reviews “are not adequate for the implementation of code at this level of hardware/software risk,” noted Marcus Merrell, principal test strategist at Sauce Labs.

“’Peer reviews’ imply that a peer is looking over code, to make sure it’s high quality,” he said. “It rarely, if ever, involves actually executing said code on the target hardware in the target environment.”

Source link

FCC post-mortem on AT&T outage uncovers similar QA issues to those that plagued CrowdStrike

Mitigation and recommendations

VMWARE

Helping Public Sector Organisations Define Cloud Strategy

How to change the VLAN ID of the Service Console in ESX from the command line/console

Cisco UCS and Vmware Interfaces (Vnics) HA Design Considerations

Troubleshooting network and TCP/UDP port connectivity issues on ESX/ESXi(2020669)

vSphere Client Parameters

Configuration Templates

CUE Licenses

Trouble shooting Unity Express with Call Manager Integeration & Operational Issues

CME Configuration Example: SIP Trunks to Viatalk and VoIP.ms

SIP Phone registration – CME Configuration

CUE Voicemail + VPIM networking (CUE to unity)

Related Post

Mitigation and recommendations

VMWARE

Configuration Templates