8 tips for unleashing the power of unstructured data
The Workhuman cloud contains millions of recognition messages from employees around the world, sharing positive feedback about someone with whom they work.
“They do this in their own words, so each recognition moment is completely unique,” Harriott says. “We use this data to power AI models that help companies better define how employees are collaborating in their organization, what topics come up most frequently in messages, and whether there is equity in recognition awards across the organization.”
The company also uses large language models (LLMs) to summarize recognition trends over time and to suggest language for an effective recognition message.
“One initiative I’m particularly proud of is our tool Inclusion Advisor, an in-the-moment AI-based coaching tool that identifies and suggests corrections for unconscious bias in award language before it is sent to the recipient,” Harriott says.
One of the biggest challenges of getting value out of unstructured data is limited access to reliable and valid training data for the business use cases that are the focus for the organization.
“You can have large amounts of unstructured data, but without effective training data to create and validate a model, progress and quality will suffer,” Harriott says. “Leveraging LLMs can certainly help in this regard, but many business use cases are not effectively captured by existing LLMs.”
In addition, “in an LLM there can still be the issue of bias in the training data,” Harriott says. Workhuman has a linguistics team that is responsible for data annotation, augmentation, and validation to deal with some of these issues. “We also partner with our large, multinational customers to make sure models yield meaningful and useful results,” Harriott says.
Tips for transforming unstructured data into value
Harriott, Konoval, and other data experts offer advice on how to ensure success when working with unstructured data.
1. Tie initiatives to business outcomes. IT leaders should make sure initiatives to leverage unstructured data are tightly aligned to business needs and have executive sponsorship, Harriott says.
“Too often, a team may have a creative use case for unstructured data, but the connection to a key business outcome is not obvious to others and may lose support,” Harriott says. “It’s the leader’s responsibility to educate the organization on why the use case is important and how it can directly or indirectly drive business benefit.”
2. Recognize the journey. Also, data leaders should set and celebrate initiative milestones as they are met, especially given how difficult the challenge of creating value with unstructured data can be.
“Making unstructured data actionable may require more time and effort than the business expects,” Harriott says. “By recognizing milestones, leaders give other stakeholders visibility into the progress being made, and also ensure that their team members feel appreciated for the level of effort they are putting in to make unstructured data actionable.”
3. Quality is job one. Another key to success is to prioritize data quality.
“The adage ‘garbage in, garbage out’ couldn’t be more appropriate,” Konoval says. “Going into analysis without ensuring data quality can be counterproductive. We have always taken this approach: Clean the data, remove what is unnecessary, and ensure that it meets quality standards.”
In the gaming industry, “misinformed decisions can result in expensive feature developments that players might not resonate with, or even worse, bugs that could tarnish our reputation,” Konoval says. “Our rigorous data governance framework ensures the foundation of our analyses is rock-solid.”
4. Separate the actionable from the informative. Prioritizing data that business users can act on is also vital. “What’s important is the volume of data and being able to parse what is actionable versus what is informative,” says Joe Minarik, COO at colocation and data services provider DataBank.
To underscore the importance of this, Minarik gives the example of using unstructured data for systems monitoring. “Actionable aspects have to be prioritized and addressed quickly,” he says. “Because so many aspects of systems are monitored, a single issue can generate alarms and information from downstream devices, causing an overabundance of alerts, alarms, and information that needs to be sifted through to identify what single aspect really needs to be addressed.”
5. Make ample use of AI. Continuing with his example, Minarik points out the valuable role AI and machine learning play in analyzing unstructured data streams over time. “It helps you build system correlation,” he says. “That allows you to drop the noise and get to the root issue immediately.”
For instance, organizations can deploy named entity recognition (NER), a component of natural language processing (NLP) that focuses on identifying and categorizing named entities within unstructured text, with tags such as “person,” “organization,” or “location.”
“In practical terms, entity recognition plays a crucial role in a multitude of applications,” Minarik says. These include information retrieval systems that index and organize content, question-answering systems that locate answers within text, and content recommendation engines that personalize content based on recognized entities.
“By identifying and categorizing named entities, NER empowers data analysts and system engineers to unlock valuable insights from the vast data collected,” Minarik says.
6. Ensure value with visualizations. The process of making unstructured data usable doesn’t end with analysis, Minarik says. It culminates in the reporting and communication of findings.
“Reports typically involve a structured presentation of key findings, methodologies, and the implications of the analysis,” Minarik says. “Visualizations, such as charts, graphs, and dashboards, are instrumental in conveying complex data in an understandable format. Visual representations not only facilitate comprehension but also make it easier for stakeholders to identify trends, outliers, and critical insights, ensuring that timely data-driven decisions are made.”
7. Monitor as you go. Another key practice that is sometimes overlooked is the need for continuous monitoring and maintenance, Minarik says. “Real-life data is dynamic and ever-evolving,” he says. “Continuous monitoring and maintenance are critical to ensuring that the data remains usable over time.”
The key to this is to regularly clean and perform quality checks to maintain data accuracy and reliability, Minarik says. Data anomalies, inconsistencies, and duplicates must be identified and rectified promptly to prevent skewed or erroneous analyses.
8. Keep your team’s skills sharp. Finally, it’s a good practice to invest in the development of the right skills — an effort that, given the constant evolution of underlying tools, must be ongoing.
“The world of data analytics, particularly around unstructured data, is dynamic,” Konoval says. “The smallest advantage, such as a team skilled in the latest image recognition technology and analyzing concept art, can be the difference between a game being a hit or a failure. We’ve already seen how the results of advanced technology have impacted the storytelling and design of our games, resulting in positive feedback and increased player engagement.”