What Twitter outage says about (over) zealous downsizing
The central feature of Twitter’s outage last Wednesday was a message to users that “You are over the daily limit for sending Tweets.” A spokesperson for network enterprise firm Ookla, which owns outage monitoring site Downdetector, said that on Feb. 8, starting at 10 a.m. UTC, about 50,000 Twitter users reported access issues.
While the Twitter outage affected comparatively few Twitter users, it could hold a larger message about the dangers not just to operations but also security for organizations mulling big cuts in workforce.
With just 1,300 active staff, Twitter now has 80% fewer workers than the roughly 8,000 the company had on its payroll before the October 2022 takeover by Elon Musk, by some reports. Among his early decisions on taking the helm were to shut down one of Twitter’s data centers and fire half the workforce.
Jump to:
Cut staff now, pay later
Reportedly, many of the Twitter employees who were let go or who have walked out voluntarily in recent months were working on projects that are fundamental to company operations, and former staffers and observers alike predicted that firing employees would lead to just the kinds of outages the company is experiencing.
Justin Cappos, professor of computer science at the NYU Tandon School of Engineering, developer of the in-toto supply chain security framework and a member of the Linux Advisory Team, offered a sports-friendly analogy:
“Imagine someone buys a professional sports team then looks around and says ‘You know, we need these coaches over here because they call the plays, but we don’t need the strength coach, the conditioning coach and we don’t need the nutritionist.’ So, when that team goes out and plays next week, they will play about as well as they did last week, and a week later maybe similar, but a month later they start to take a hit, and then the wheels start to fall off. That’s what’s happening; he has fired people who are doing the work that keeps this large distributed service running.”
SEE: Don’t overlook supply chain security in your 2023 security plan (TechRepublic)
Adam Marrè, chief information security officer at cybersecurity operations firm Arctic Wolf, concurred that the outage means there are now likely too many vacant IT chairs at the blue bird’s command center.
“If an understaffed team is trying to change things quickly, that can be a recipe for unintended consequences with downstream or ancillary dependencies to code you are changing,” Marrè said. “They will not have the capacity to manage access provisions and offboard users in a timely fashion, and in cases like an outage, get systems back up and running quickly.
“With an under-resourced team, the maintenance of tools across the enterprise stack may fall by the wayside, as priorities shift and adjust to reflect a team’s limited bandwidth.”
Twitter: Both outlier and emblem of job cuts in tech
Twitter’s staff cuts are unique because of the extremely high percentage of the company’s total employee population being offboarded, but the company is not alone. TrueUp’s Tech Layoff Tracker found that over 400 tech companies have laid off employees in 2023, with 127,359 people affected. Complicating matters, over the past several months, security firms have also slimmed their ranks, including Okta, SecureWorks and Snyk, Sophos, Lacework, and OneTrust.
SEE: Top cybersecurity threats for 2023 (TechRepublic)
The U.S. Bureau of Labor Statistics predicted security analyst jobs will grow by 35% between 2021 and 2031 with 19,500 openings for information security analysts projected each year (Figure A).
Figure A
Marrè said layoffs may, to some extent, constitute an adjustment after a hiring spree during the COVID-19 pandemic.
“Actually many companies, including tech companies, are still hiring,” Marrè said. “Set against the backdrop of massive hiring that was done during the years of the pandemic, the general job cuts across the tech industry do not seem as significant — of course, job cuts are always significant for those directly affected.
“The good news is there are still many unfilled job openings out there for tech workers, so optimistically, this will end up being more of a reshuffling than a massive downsizing.”
With GitHub downsizing, security automation taking up slack?
Among tech cuts recently announced, both Microsoft’s GitHub unit and competitor GitLab announced plans to downsize by 10% and 7% of staff, respectively. GitHub, which has a reported 3,000 employees, will go fully remote, per initial coverage in Fortune — Microsoft’s CEO in January announced plans to cut 10,000 jobs through fiscal 2023, or 5% of its workforce.
The 300 jobs GitHub plans to cut constitutes a relatively small number in the scheme of things, but the code hub is used by over 100 million developers and claims to have more than 372 million open-source code repositories used by software builders worldwide.
Although employing open-source code has numerous security implications, Cappos said the advent of DevSecOps has improved the security environment and made it easier for developers to work fast within cloud environments like AWS without sacrificing security. This takes some pressure off of staff who may, at least in the short term, have fewer colleagues on hand.
“The DevSecOps paradigm started with lightweight containerization and microservice architecture because of Kubernetes,” Cappos said. “The way security caught up is that people have done a lot of work to make things like Kubernetes not as easy to misconfigure.
“There are a lot of really great software projects and security projects in that space, and Kubernetes has a very good security team working on this. They have made it more difficult to shoot oneself in the foot; they have defined better tooling around it so that people who do DevOps work can do security as part of that.”
Martin Mao, co-founder and CEO of cloud-native data and metrics company Chronosphere, pointed out that Prometheus is the de facto standard of Kubernetes monitoring today.
“We work with Julius Volz, one of that project’s creators,” Mao said. “I do think investments in open source are here to stay, and I think every company will continue to recognize that they need to be aware of issues and continue to address them.”
Looking at the past months’ tech layoffs, almost no team within a company is sacrosanct, and Mao argues that at the end of the day, most companies would like to automate more of their human-run processes for scale and efficiency.
“It’s important to remember, though, that moving to DevOps or DevSecOps or platform engineering means that you are purposefully transferring complexity from one solution to another,” Mao said.
He said that, in the best of all worlds, security tech staff would gain the same benefits as other teams from working in a DevOps or DevSecOps paradigm: less low-level work, less fighting fires and more time to be proactive about their company’s security posture.
Former staffers as attack vectors
Is there an increased security risk consequent to staffing cuts, potentially worsened by poor organizational hygiene? Marrè said yes, pointing, for example, to the potential for insider threats after the so-called Great Resignation and the need for proper protocols for deprovisioning users.
“People who have been laid off may become the next target or vehicle to deploy ransomware attacks,” Marrè said. “Bad actors will most likely continue to offer ex-employees money in exchange for user credentials to gain access to critical systems and infrastructures or offer them money in exchange for information about the company which can be used to attack it.
“Insider threat is always a risk, but large-scale layoffs and widespread employee dissatisfaction increases that risk significantly.”
Transparency is key to incident response
Marrè suggests that companies with outages, whether in their cloud operations, on-premises systems or customer engagement platforms should:
- Communicate clearly and effectively with customers about the problem, the status and the in-progress solution.
- Make sure they have plans to deal with the increased workload per employee to maintain the same infrastructure and systems as when they were fully staffed.
He added that preventing disruptions requires retaining people in key positions with institutional knowledge of infrastructure and operations, including security operations.
“This can allow organizations to maintain uptime without significant outages and remain resilient in the face of incidents,” Marrè said. “Cuts across those roles can have an asymmetrically impactful effect on quality of service as compared to other roles in the company.”
The risks of doing more with less
Mao noted that, across the board, his firm is seeing that the engineering teams at many tech companies are now being asked to do more with less and that companies need to pay attention.
“I think that the message here is companies need to understand how much work and complexity is being absorbed by employees running around with their hair on fire,” Mao said. “Every outage has a root cause, but during an outage, it comes down to employees who have to find, understand and fix the problem.”
Chronosphere recently conducted research showing that developers and engineers spend at least a quarter of their work time performing low-level troubleshooting tasks.
“If a company is asking fewer employees to monitor more systems, then there is a higher likelihood of an issue slipping past undetected and spiraling into a much bigger problem,” Mao said. “And, unfortunately, many of the systems in place today are ill-equipped to lend a helping hand.”