Ensuring Diversity and Addressing Bias in Data and Software Development
Organizations are increasingly focused on diversity, equity, and inclusion in their hiring practices and workplace culture not only because it’s the right thing to do, but by not doing so, it can be detrimental to the business.
With software at the core of every business, and organizations deriving more value and insights from their data collected by the software, having non-diverse data sets and software can result in products and services that only cater to a specific group of people and under-serves another, or worse, harms them. The reality is that developers and data scientists encode their beliefs, conviction, and bias – most often unconsciously – in their data and when they design software.
We’ve already seen in real life the negative impacts of when data science and software development go unchecked without considering DE&I. For example, in an early attempt by Amazon to design a computer program to guide its hiring decisions, the company used submitted resumes from the previous decade as training data. Because most of these resumes came from men, the program taught itself that male candidates were preferable to women. While Amazon realized this tendency early on and never used the program to evaluate candidates, the example highlights how relying on biased data can reinforce inequality.
Ultimately, these issues come up not because of malicious intent but rather being “blind” or ignorant of all viewpoints and potential outcomes that groups of people experience differently. The best way to mitigate and avoid the problem is to have a team with a diverse representation spanning various professional backgrounds, genders, race, ethnicities, and so on. A diverse team can look at each stage of building and managing data pipelines (collecting, cleansing, etc.) and the software delivery process considering all kinds of outcomes.
While we are seeing developments and improvements in increasing diversity in data science and software roles, more needs to be done. A 2020 study in AI suggests that while data science is a rather new field and will take time to respond to diversity initiatives, some of the efforts to increase diversity in other tech fields may be succeeding. Over the past several years, numerous diverse conferences and coding events have been developed, with participation rates rapidly growing.
One of the first places to start is committing to hiring diverse candidates, and fostering an inclusive workplace culture that retains and ensures the ongoing development of diverse teams. Likewise, managers must ensure they create an inclusive and open culture that gives a voice to underrepresented talent.
From there, ensuring the integrity of your organization’s data and software delivery can start to take shape.
How to ensure the integrity of your data and its outcomes
As we know, the ramifications of biased data can impact society as a whole, so having the right data set and applying it correctly is important. Programmatically, software teams have a lifecycle that they follow – collecting the data, cleaning and classifying it, then writing code that uses that data, and testing it to deliver outcomes that meet business and customer needs. Having a diverse set of people working throughout every step of the lifecycle will help organizations avoid some of these pitfalls mentioned earlier.
Spending time on defining what’s a “good” data set that will deliver equitable outcomes is key to ensuring the integrity of your data. Specifically, when looking at a data set, teams should consider if the outcome can be detrimental or if there is anything to learn from it. They should ask questions like, what does good look like, where could there be biases, what populations can be harmed by this? If the data doesn’t represent the population, you can expect to get bad outcomes or output from that data set. Through the data collection process, make sure you’re collecting all viewpoints, not throwing away critical information, and feeding into the data with the notion of what will result in “good” outcomes.
The iterative nature of software development also gives teams the opportunity to continuously course correct as they see issues within the data, where data may be ‘contaminated’ with personal biases, and constantly adjust.
Addressing issues of unconscious bias at every stage of the product life cycle starting from strategy to product definition, requirements, user experience, engineering, and product marketing will ensure organizations are delivering software that meets more needs. Likewise, diverse teams working on data sets and software that’s equitable and more inclusive can drive innovation that creates competitive advantage, enhances the customer experience, and improves service quality – all of which can lead to greater business outcomes.
To learn more, visit us here.