- If your AI-generated code becomes faulty, who faces the most liability exposure?
- These discoutned earbuds deliver audio so high quality, you'll forget they're mid-range
- This Galaxy Watch is one of my top smartwatches for 2024 and it's received a huge discount
- One of my favorite Android smartwatches isn't from Google or OnePlus (and it's on sale)
- The Urgent Need for Data Minimization Standards
Towards Inclusive Language in Code
In this post, to communicate the contents of the policy and language shifts, we may use language that is harmful or upsetting for some readers. Do what is safe for your well-being, and know we are available to answer questions in the comments sections below. Thank you for doing this work with us.
When we speak, it’s important to choose our words carefully, so the people hear only the meaning we want to convey, and not an unintended, perhaps hurtful subtext.
When we code, we have the same responsibility, both to the people who may read the code, as well as to the larger society. And there’s a lot to lose when our code offends or discourages another programmer from working with our codebase or engaging with the industry itself.
At Cisco, we are working on a modernization of our coding tools and our codebases. It starts with an inclusive language policy that eradicates the use of outmoded terms like master, slave, or blacklist, whitelist, and provides more descriptive and precise replacements.
Cisco previously addressed the issue of gender pronouns in documentation. The focus now is on racially-tinged wording, due in large part to the rise and global visibility of the Black Lives Matter (BLM) movement.
Our longer-term goal is to provide a repeatable framework for refining our use of language that can expand beyond North American, English-based biases, and move towards global awareness and multiple languages to free even more code from biased language.
It’s an ongoing process to change how we use language, so I want to explore some of the inventory tools, plans, triage methods, and execution that it takes from an engineering viewpoint to make these changes. We also want to ensure these words do not sneak back into our code, our products, our configurations, or our everyday language.
First, we need to take an inventory of what we have. We also need to place our findings into categories so that we can prioritize the work that comes next. I’ll walk through examples using some developer and code assets.
Categorizing Engineering Assets
At Cisco, we found that using four categories for language issues was helpful to prioritize the work and in determining what to change and when. For example, you want to change the Command-Line Interface (CLI) or user interface before you change the documentation. You may want to approach your code and product assets in a similar way.
Category #1 Simple usages: For example, a variable name that is internal to code and not exposed via Application Programming Interfaces (APIs) or other external methods.
Category #2 CLI (config, show)/API/schema usages: We need to deprecate the old use and create a new one with text substitutions. This fix is complex because two terms may need to work simultaneously to avoid breakage. While we drive users to the newer CLI language, the old CLI needs to keep working.
Category #3 Logging/telemetry/SNMP/monitoring: Support old and new (again, we don’t want existing scripts or tools to break). We will deprecate the old usages but must figure out when to “rip off the Band-Aid,” and remove support for the old terminology.
This deprecation can take years and requires carefully planned outreach because we need to communicate about potential script or tool changes.
Category #4 Documentation changes: Simple cases are easy to do. Complex cases (like documentation of a CLI) must follow CLI changes, meaning Category #2 changes must happen first.
As a worked example, the Firewall Management Center has a REST API that can GET, PUT, POST, or DELETE an object called “ftddevicecluster.” When doing an inventory, the team discovered that the payload had field names for those API calls that contained both master and slave references for the devices based on the hierarchy: masterDevice, slaveDevices. There were six instances of master and slave in the field names for these API calls.
The Category for this asset is Category #2, API, but in this case, the team decided that a text substitution would work in a new release. The team also had Category #4 Documentation changes to do in the REST API documentation. But of course, the API has to change before the documentation can change.
Technically, changing a field name in a payload for an API is a breaking change as it can break code already written against the API. If anyone has written scripts for the GET call in version 7.0 of the API, their script will receive the “old” field names. Version 7.1 has the modern field names.
If you write code for this API, you need to match the version value to the expected field names.
As a firewall product, there are also blacklist and whitelist examples to count, so the team repeats the inventory and analysis process for the additional terms.
Try the Inclusive Language Tool Collection
To help analyze your code and docs for lapses of inclusive language, we have a collection of inclusive language tools on GitHub. You can start with inventories of how many times words are in your codebase. You can point an inventory tool at the files you want to examine as you begin to analyze your codebase.
Using either Bash and a text-based search, or Python and the GitHub API, use the inventory helper tool. It creates a CSV (Comma Separated Values) file that helps you sort through your files. To run the tool, you need:
- An org-level personal access token for GitHub with repo-read permissions.
- Python environment installed locally.
- A keyword you know you want to look for in your codebase.
- Excel or a similar spreadsheet tool to import the CSV file.
Once you install the Python prerequisites and set up your GitHub token in the environment, enter a keyword to search for. In return, you’ll get a CSV file with the file type, repository, file where that keyword is found, and an exact path to the file.
Now that you know which files have an offending word, you can start to organize and track your work to improve inclusiveness.
Depending on a team’s preferences for tracking work, you can modify the script to use the GitHub API to add an Issue to each repo with the term to work on for tracking purposes.
Teams can also put each request to “please change this keyword, here are your alternatives” into a work tracker of choice, such as JIRA.
Getting Changes into the Codebase
Let’s say you have an inventory, with each issue categorized, and that you have a policy on replacements. You’ve triaged until you have a list of Issues or tickets. Now comes the arduous work.
As an example, the Cisco Subscriber Services group, which houses 5G and Cable features, identified more than 3,000 inclusiveness occurrences across all four categories.
They mapped out the remediation work from November 2020 until March 2022 and did the work in two phases. In the first phase, teams made the required changes that had dependencies for the second phase. They used JIRA and Rally for tracking. And I congratulate the teams for sticking to the tracking and the changes and getting the hard work done.
Word Lists and Tiers
At Cisco, we have specifically chosen four words for immediate replacement (“master,” “slave,” “blacklist,” and “whitelist.”) These are our Tier 1 words. The Inclusive Naming Initiative word list also includes “abort” and “abortion” on their Tier 1 policy list. Different companies and organizations govern their word lists differently. You can learn more about words in all tiers, as well as see words deemed acceptable to keep, in the Inclusive Naming Initiative’s Language recommendations lists.
Automation with Linters
Next, you want to be sure you automatically lint your code so that these terms do not make their way back into your code or products.
You can use a tool like the woke linter.
Linters analyze your source code looking for patterns based on rules that you feed into the tool, and then can offer suggestions for fixes. This style of code improvement fits well with inclusive language as you can learn more about language while improving your code.
At Cisco, we have a shared copy of the rules, based on our policy so that teams can consistently search for the same words and use similar or identical replacements.
What’s Next?
Keep an eye on the work here at Cisco with our Social Justice Beliefs and Actions and within the Inclusive Naming Initiative. We look to expand beyond wording and create frameworks to enable global language internationalization work.
The work has just begun, and we are here to organize it with automation tooling, as engineers do.
We’d love to hear what you think. Ask a question or leave a comment below.
And stay connected with Cisco DevNet on social!
LinkedIn | Twitter @CiscoDevNet | Facebook | Developer Video Channel
Share: