The Journey to Automating Network Troubleshooting with Machine Reasoning – Cisco Blogs


In my role at Cisco as Distinguished Engineer, I’ve been working on how to simplify network management for our customers. Networks are becoming too complex to operate manually. Troubleshooting a network problem involves managing and applying reasoning over very large volumes of data. It is predominantly a high-touch, time-consuming, and error prone endeavor. It’s a task reserved for “masters of complexity”—network administrators who have amassed over decades of work the required expert knowledge to diagnose symptoms, identify the root causes, and perform the necessary remediation steps required to resolve network issues.

Surely, there is room for simplification through automation, especially with all the strides made in the field of Artificial Intelligence (AI). This led to an idea—What if we apply Machine Reasoning (MR) to automate network troubleshooting. Machine Reasoning is a branch of AI that relies on formal human knowledge capture and symbolic inference to achieve automation. It is the perfect technology to have in the AI toolkit to address problems that require expert knowledge. Network troubleshooting is clearly one such problem!

The industry has been ubiquitously looking at AI for network management, but Machine Learning (ML) was stealing the spotlight. While ML excels at a number of automation problems, it falls short with problems that require explicit reasoning using knowledge that is not part of the input data. Clearly, MR was the (almost) forgotten half of AI, providing us with an opportunity to create a differentiating solution for troubleshooting automation for our customers. And so, we set to work…

Building a Team with Cisco Innovation Fund

Initially it was a team of one engineer, yours truly, ferociously prototyping the first proof of concept. Soon after my manager saw the first prototype in action, he arranged for a developer to join the project on an interim basis. And then another engineer volunteered to implement one use-case that was interesting to his team. However, it became clear that this grassroots, cycle-stealing approach was only going to enable us to build primitive prototypes. But we did learn a lot in the process about the technology, its strengths and limitations, and how domain experts are expecting to interact with the system.

At this stage we had enough data to apply to the Cisco Innovation Fund – think of Shark Tank minus all the drama but with all the high stakes at play. The proposal was approved, and the project funded to continue incubation of the technology. A dedicated team quickly assembled, and we hit the keyboards (no, not the musical ones, but the QWERTY type). The team worked relentlessly for a period of close to two years, building software infrastructure, exploring use-cases, conducting proofs of concept and engaging with product teams to secure an eventual path towards productization.

About half-way through that incubation phase, when we had a good handle on the technology, we started reaching out to customers for feedback. With the help of the Product Management team, we surveyed multiple customers, asked them about their challenges in network troubleshooting, and gathered their input on what the automation system should do and provide. Those customer discussions provided invaluable insights to the team.

After the completion of the first incubation phase, and based on internal and external customer feedback, we realized that we had to pivot away from some of the initial design choices. This required that we rework part of our implementation, and by that time our funding was depleted. So, we applied to another round of Innovation Funding within our Business Unit and made the case for why we want to continue work on this endeavor. The funding was approved to continue, and we had an additional runway of three months to complete our work.

As this second phase of the project came to an end, we presented the final outcomes to the engineering executive team. This was the moment of truth—either the project crosses the chasm to product, or it stops. Given the value that MR brings to our customers and the maturity of the technology, we were given the executive go-ahead to proceed with productization. A productization engineering team was formed, and we started the product journey, with focus on quality and customer delivery.

Throughout this project, we had the unwavering support of our leadership team at Cisco. They provided the team with the space to innovate and encouraged a risk-taking culture. They fueled our passion for the technology, saw its potential to solve customer pain points, and provided us with the time and resources to succeed in turning an idea into product.

This journey culminated in the release of the Machine Reasoning Engine (MRE), as part of the Cisco AI Network Analytics application in Cisco DNA Center. MRE is active in over 4,700 customer deployments—and growing. The early incubation team, engineering team, product management and our internal partners all played a part in turning the concept into reality.

For more on applying AI and Machine Reasoning in network assurance, refer to this blog post. To learn more about the Machine Reasoning Engine, refer to the following links:

Cisco DNA Assurance User Guide

Cisco Champion Radio: The Cisco DNA Center Machine Reasoning Engine

Machine Reasoning is the new AI/ML technology that will save you time and facilitate offsite NetOps

 

Check out our Cisco Networking video channel

Subscribe to the Networking blog

Share:



Source link