The real risk of AI in network operations


OK, you used to worry about nuclear war, then pandemics, then maybe an asteroid hitting earth or the sun going nova. Now, some want you to add AI to the list of things to worry about, and yes, you should probably do that. I’d hold off on worrying that AI will end life on earth, but users themselves tell me that AI does pose some risks, particularly the current ultra-hot “generative AI” that ChatGPT popularized. That’s particularly true for those who want to apply it to network operations.

I got input from 197 senior IT and network professionals over the last month, and none of them believed that AI could lead to the mass extinction of humanity. Well over half said that they hadn’t seen any crippling long-term downsides to AI use, and all of them said that their company used AI “somewhere.” Thirty-four offered real insight into the use of AI in network operations, and I think this group offers us the best look at AI in network missions.

The hottest part of AI these days is the generative AI technology popularized by ChatGPT. None of the 197 enterprises reported using this for operations automation, but 57 said they’d considered it for that mission and quickly abandoned the idea for two reasons. First, actual errors were found in the results, sometimes serious enough to have caused a major problem had the results been acted on. Second, they found that it was nearly impossible to understand how AI reached its conclusion, which made validating it before acting on it very difficult.

The accuracy problem was highlighted in a recent article in Lawfare. A researcher used ChatGPT to research himself and got an impressive list of papers he’d done and conference presentations he made. The problem is that these references were totally wrong; he’d never done what was claimed. Enterprise IT pros who tried the same thing on operations issues said they were often treated to highly credible-sounding results that were actually completely wrong.

One who tried generative AI technology on their own historical network data said that it suggested a configuration change that, had it been made, would have broken the entire network. “The results were wrong a quarter of the time, and very wrong maybe an eighth of the time,” the operations manager said. “I can’t act on that kind of accuracy.” He also said that it took more time to test the outcomes than it would have taken his staff to do their own professional analysis of the same data and take action on the results.

That raises my second point about a lack of detail on how AI reached a conclusion. I’ve had generative AI give me wrong answers that I recognized because they were illogical, but suppose you didn’t have a benchmark result to test against? If you understood how the conclusion was reached, you’d have a chance of picking out a problem. Users told me that this would be essential if they were to consider generative AI a useful tool. They don’t think, nor do I, that the current generative AI state of the art is there yet.

What about the other, non-generative, AI models? There are well over two dozen operations toolkits out there that claim AI or AI/ML capability. Users are more positive on these, largely because they have a limited scope of action and leave a trail of decision-making steps that can be checked quickly. Even a scan of the way results are determined is enough, according to some users, to pick out the questionable results and avoid acting on them. Even these tools, though, present problems for users, and the biggest is what we could call “loss of situational awareness.”

Your network or IT operations center is staffed with professionals who have to respond to problems. Most AI operations tools aren’t used to take automatic action; rather, they’re used to diagnose problems that are then acted on. In most cases, this has the effect of filtering the events that the operations staff must handle, and that’s actually something that event/fault correlation and root cause analysis also does. Unloading unnecessary work from ops professionals is a good thing, up to a point. That point is reached when the staff “loses the picture” of what’s happening and can’t contextualize what’s happening in order to know what to do and when. The move toward AI is really a move toward more automation, and a greater risk that the staff is sheltered from too much, and so loses touch with the network.

OK, you’re thinking, all this is bad news for AI-driven operations. It is, sort of, but there are two good-news counterpoints.

First, none of the users who had issues with AI were abandoning it completely. They could see the good even through the bad, and they were working to help that good shine through. Second, most of the problems reported were the result of the AI equivalent of miscommunications, often the result of human errors in devising what used to be called the “inference engine,” a software tool at the center of most AI implementations that uses rules and a knowledge base to make deductions. The developers of the tools are hearing these same stories and working hard to correct them.

How do you, as a prospective user of AI operations tools, get the most out of AI? The users I chatted with had some tips.

Look for contained missions for AI in operations

The broader the AI mission the more difficult it is to support a hand-off to operations professionals when needed, and the more difficult it is to validate the assessments an AI tool offers or the steps it wants to take. Parts of your network can almost surely be managed with the aid of AI, but with the current state of the art, managing all of it will likely prove very challenging. Also, narrowing the mission could enable the use of “closed-loop” systems that take action rather than suggest it. Almost 80% of users who employ closed-loop technologies do so for limited missions.

Try to pick AI packages that have been on the market for at least nine to twelve months 

That’s enough time for the early and most egregious problems to have come out and get fixed. If you absolutely cannot do that, then do an in-house trial for six months, where you parallel the AI processes with traditional operations tools and check the two against each other. Most users recommended that trial period even for packages with a long-installed history, because it helps acquaint your organization with the ways AI will change ops practices and reduce that situational awareness problem.

Be very methodical in evaluating AI tools and vendors

User satisfaction with AI tools varies from over 90% to as little as 15%, and the favorite tools for some users are given the worst marks by others. It’s clear that the capabilities of AI vary as much as the possible missions, and getting the two aligned is going to take careful evaluation. You can’t simply take recommendations, even if they’re from another user who seems to have similar requirements.

Don’t believe AI extremism

This final point is simple. What AI is doing is really nothing more than applying human processes without the humans, processes we’re teaching it. AI doesn’t “know,” doesn’t “think,” and doesn’t “care.” Dodging AI faults is pretty much like dodging human error. If human intelligence is the goal of AI, then the risk of AI is like human risk. Our biggest risk with AI isn’t that it’s getting too powerful. It’s our believing AI is better than we are and failing to apply the controls to it that we’d apply to human processes. So, if you want to fend off the end of civilization, keep dodging those plummeting asteroids (made you look!), wear a lot of sunblock, and maybe buy a pandemic-proof bubble. What’s most likely to come for you won’t be AI.

Copyright © 2023 IDG Communications, Inc.



Source link