How to manage scripts that manage network automation


Most major network outages happen as a result of human error, not equipment failures—mistakes in the settings themselves, missed steps in a sequence, steps taken out of order, etc. Automation through scripting is meant not only to speed up network operations activities but, as importantly, to reduce the chance of such mistakes by ensuring consistency. A script executes the same steps, in the same order, every time.

Ad-hoc, scripting, or programmatic automation doesn’t eliminate the possibility of error, of course. It does limit the scope of the mistakes to the programs themselves, and robust testing should uncover most of them before they have a chance to be put into production. And, should a mistake get through and result in a bunch of misconfigured switches, there is one place to fix it—the script—that also provides the means of correcting the problem at machine speeds.

Of course, this implies that there is testing and that there is a single place to go fix the mistake, namely a single authoritative version of the script. And those things are not, as it were, automatic. They are the result of deliberate process choices, particularly making the decision to properly manage the automation that manages the network.

Network automation is production software

Network admins who undertake ad-hoc automation for their networks become developers, in important ways the same as those developing user-facing applications. Given that, it is reasonable to decide that network automators should follow some of the same good practices in managing their programs that application developers follow: code management for network management scripts; change management for network scripts used in the production environment; proper annotation of what the programs or scripts do and how; and programming standards to use in naming programs, procedures, configuration files, variables, etc.

Code management

At a minimum, code management means having a single source of authoritative code and mechanisms for tracking when new versions are added. It should support check-out of code, so everyone knows what is in development and by whom, and check-in, to maintain accountability for code in use. If a mistake is found or one admin has trouble understanding what another admin’s code is doing, knowing who created or last modified it is a huge advantage.

Change management

The reliability and accountability benefits of code management can be enhanced by change management. The enterprise network will be subject to fewer script-driven mistakes when new versions of production scripts are tested before being deployed, per good change management process, and when deployment of a new script is announced beforehand. Change-management best practice also includes having a plan in place to roll back to a previous version of the code if the new version fails to perform as expected, again with the goal of minimizing the disruption resulting from a mistake.

Annotation

It may seem trivial to some, but annotation is basic good programming practice that is far from trivial in its benefits across time and in large team environments. When a network codebase is many years deep or has many people using and contributing to it, chances are that there will be important scripts in use that no one has examined closely in recent memory. Even the original author, if still working there, would have trouble understanding them without some guidance. Putting notes in the code to explain what it is supposed to do, and to some extent how it does it, increases the odds that someone other than the original author will be able to use it properly, debug it at need, and extend or modify the behavior of that program as needs change. It also increases the odds that the original author, if coming back to the code after a long time away from it, will remember what it is for and why it is built the way it is.

Coding standards

Like annotation, coding standards make code more readable, which in turn makes code more generally useful and maintainable. If variables are named meaningfully and in some consistent format across scripts, and if functions or subroutines are named in a similarly consistent way, then a group of colleagues will be able to use and maintain the codebase more easily and reliably. The name of the game is to avoid creating “write once” code and building a repository full of dead and dying code that  requires rewriting from scratch if conditions or needs change. Instead, build a sustainable, living code base easily understood by anyone who might need to use it.

Properly managing the code that manages the network is a crucial discipline in organizations pushing to automate their network environment. Good practices reduce the chance of mistakes and of mistakes turning into major outages.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2022 IDG Communications, Inc.



Source link