An In-Depth Look at the Cisco CCDE-AI Infrastructure Certification
Since OpenAI’s mic-drop moment at the end of last year, it seems that AI—and generative AI in particular—is suddenly everywhere. For network engineers, we see two big areas of change. The first is AI in the network: By integrating AI into networks, we can make those networks more secure, resilient, and higher-performing. The second is AI on the network. The networks that run AI workloads and support the training of generative AI models need to be highly scalable, highly resilient, and capable of pushing vast amounts of data at tremendous speed.
AI on the network, in particular, will require new skills on the part of network engineers. And the stakes couldn’t be higher. Various forms of AI will permeate our lives in ways we can only guess at today. Even before the current boom in generative AI, other forms of artificial intelligence were being used in everything from criminal justice to supply chain optimization. If the networks that run AI are not robust and secure, and if the models running on them are not similarly protected, the opportunities for identity theft, misinformation, and bias—already concerning—will only multiply.
Existing networks are already feeling the strain. In our most recent survey of expert-level certification holders, 25% of respondents said that AI demands were having a “significant” or “transformative” effect on their networks. That’s especially notable because the Cisco AI Readiness Index shows that most organizations are still in the early stages of generative AI deployment.
To better prepare IT professionals to build, run, and secure the networks that support AI, we announced a new area of expertise within the CCDE certification, called CCDE-AI Infrastructure, at Cisco Live. The process of designing this certification started with an extensive job role analysis, which helped us better understand which skills are most needed. Then we consulted with partners across the AI ecosystem to understand their needs as this exciting technology matures and AI use cases continue to multiply. While most organizations will not need networks that can support the training of large language models, the vast majority will need to consider the privacy, security, and cost implications—at the very least—of running generative AI applications.
Here are just some of the factors we considered and how we considered them when designing the blueprint, tutorials, hands-on exercises, and the test.
Networking
Fast, reliable ethernet, enabled with new protocols such as RoCEv2, is key to accessing data quickly and consistently enough to train large language models. Memory needed for in-process computation is often distributed when working with generative AI, but RoCEv2 is designed to provide direct memory access, allowing data to be delivered as if it were on the mainboard. Without this access, information is copied repeatedly, increasing latency.
Security
From a data security point of view, many of the challenges inherent in running AI workloads are qualitatively similar to the challenges of running other workloads. The concepts of data at rest and data in motion remain the same. The difference lies in the sheer volume and variety of data that is accessed and moved, especially when training a model. Some data may not need to be encrypted – anonymization might be an efficient alternative. Obviously, this is a choice that needs to be made carefully; and one that depends greatly on the specific use case.
Generative AI adds another consideration: the model itself needs to be secured. OWASP has compiled a top ten list of vulnerability types for AI applications built on large language models. The CCDE-AI Infrastructure exam will include a task on protection against malicious use cases. We want candidates to be proactive about security and understand the signs that a model may have been compromised.
Data gravity
Data gravity is intertwined with security, resilience, and speed. As data sets become larger and more complex, they acquire gravity—they tend to attract other applications and services, in an effort to decrease latency. And they become increasingly difficult to copy or move. With AI, we don’t yet have the ability to do training and processing in the cloud while the data is on-premises. In some cases, the data may be so sensitive or so difficult to move that it makes sense to bring the model to the data. In other cases, it may make sense to run the model in the cloud, and send the data to the model.
Again, these choices will vary greatly by use case, because some use cases won’t require massive amounts of data to be moved quickly. To build an online medical portal, for instance, it might not be necessary to have all the data in a centralized store, because the algorithm can fetch the data as it needs it.
In the CCDE-AI Infrastructure certification, we cover hosting implications with respect to security. When do you need a connected AI data center? When could training take place in an air-gapped environment? Like other exam questions, these are asked in the context of hypothetical scenarios. All of the answers might be “right,” but only one will fit the environment and constraints of the scenario.
Accelerators
High-speed networks increase the demands on CPUs. These networks can boost processing loads significantly, decreasing the number of cycles available for application processing. Luckily, there are a wide variety of specialized hardware components designed to relieve some of the pressure on CPUs: GPUs, DPUs, FPGAs, and ASICs all can offload specific tasks from CPUs and get these tasks accomplished quickly and efficiently.
For IT professionals, it’s not enough to be able to describe each of these alternatives and know their capabilities. Those who are building, running, and securing the networks that support AI need to be able to balance each of these potential choices against business constraints such as cost, power, and physical space.
Sustainability
The technology industry is broadly aware of the sustainability challenges – with regard to both power and water—raised by AI, but a reckoning is yet to take place. Sustainability makes up just a small part of the current exam, but we believe these concerns will only become more important over time.
Hopefully, this discussion has also helped to answer another common question: Why is this new certification positioned at the expert level? There are a few reasons. One is that this area of expertise specifically addresses network design, so it fits neatly into the CCDE certification. Another is that the optimal design for an AI infrastructure is tightly bound to the business context in which that infrastructure exists.
We’re not asking candidates to show they can design a secure, fast, resilient network by starting from scratch in a perfect world. Instead, the exam lays out hypothetical scenarios and asks candidates to address them. After all, that’s closer to the environment our certification holders are likely to walk into: there’s an existing network in place, and the job is to make it better support AI workloads or training. There isn’t an unlimited budget and unlimited power, and the network may already be using equipment and software that, in another context, wouldn’t be the first choice.
That’s also why this certification is vendor-agnostic. A professional at the expert level has to be able to walk into any environment and, frankly, make a difference. We know that’s a big ask, as do hiring managers. We also know that historically, Cisco Certified Experts have been up to the task—and then some.
We’re excited to see that continue as we work together to find the best use cases and build the best networks for this exciting new technology. Get started with one of our free AI tutorials at Cisco U.
Sign up for Cisco U. | Join the Cisco Learning Network today for free.
Follow Cisco Learning & Certifications
X | Threads | Facebook | LinkedIn | Instagram | YouTube
Use #CiscoU and #CiscoCert to join the conversation.
Read next:
Cisco Helps Build AI Workforce With New Skills Certification
Share: