- The Model Context Protocol: Simplifying Building AI apps with Anthropic Claude Desktop and Docker | Docker
- This robot vacuum and mop performs as well as some flagship models - but at half the price
- Finally, a ThinkPad model that checks all the boxes for me as a working professional
- Why I recommend this Android phone for kids over a cheap Samsung or Motorola model
- My favorite USB-C accessory of all time scores a magnetic upgrade
For IT leaders, operationalized gen AI is still a moving target
But it all begins with data, and it’s an area where many companies lag behind. Without a single and holistic strategy, every department will set up its own individual solutions.
“If you do that, you’ll end up making a lot more mistakes and re-learning the same things over and over again,” says Monteiro. “What you have to do as a CIO is take an architectural approach and invest in a common platform.”
Then there’s the hard work of collecting and prepping data. Quality checks and validation are critical to create a solid base, he says, so you don’t introduce bias, which undermines customers and business.
So if a particular data set excludes the highest-value transactions because those are all handled manually, then the resulting model could potentially have a bias toward smaller, less profitable business lines. Garbage in, garbage out applies to the new era of gen AI as much as it did in previous technological periods.
For companies that have already invested in their data infrastructure, those investments will continue to pay off into the future, says Monteiro. “Companies that invested in data foundations have a tremendous head start in what they’re doing with generative AI,” he says.
Still, these traditional data foundations originally designed for advanced analytics and machine learning use cases only go so far.
“If you want to go beyond the basics, you’ll need to understand some of the deeper subtleties of generative AI,” says Omdia’s Shimmin. “What’s the difference between different embedding models, what is chunking, what is overlap? What are all the different methodologies you can use to tokenize data in the most efficient way? Do you want high or low dimensionality to save space in your vector database? The MLOps tools we have weren’t built to do that. It’s all very complicated and you can waste a lot of time and money if you don’t know what you’re doing.”
But MLOps platforms vendors are stepping up, he says. “Companies like Dataku, DataRobot, and Databricks all have retooled to support LLMOps or GenAIOps. All the little pieces are starting to come into place.”
Analyzing the abstraction layer
Last November, OpenAI, the go-to platform for enterprise gen AI, unexpectedly fired its CEO, Sam Altman, which set off a circus-like scramble to find a new CEO, staff threatening to walk out, and Microsoft offering to take everyone in. During those tumultuous days, many companies using OpenAI’s models suddenly realized they put all their eggs into one unstable basket.
“We saw a lot of OpenAI integrations,” says Dion Hinchcliffe, VP and principal analyst at Constellation Research. “But the whole management issue that happened with OpenAI has made people question their over-commitment.”
Even if a company doesn’t go out of business, it might quickly become obsolete. Early last summer, ChatGPT was pretty much the only game in town. Then Facebook released Llama 2, free for most enterprise customers followed by Anthropic’s Claude 2, which came out with a context window of 200,000 tokens — enough for users to cut-and-paste the equivalent of a 600-page book right into a prompt — leaving GPT-4’s 32,000 tokens in the dust. Not to be outdone, however, Google announced in February its new Gemini 1.5 model can handle up to 10 million tokens. With that, and greater speed, efficiency and accuracy across video, audio, and written copy, there were virtually no limits.
The number of free, open-source models continues to proliferate, as well as industry-specific models, which are pre-trained on, say, finance, medicine or material science.
“You’ve got new announcements every week, it seems,” says Publicis Sapient’s Monteiro.
That’s where a “model garden” comes in, he says. Companies that are disciplined about how they select and manage their models, and architect their systems so models can be easily swapped in and out, will be able to handle the volatility in this space.
But this abstraction layer needs to do more than just allow a company to upgrade models or pick the best one for each particular use case.
It can also be used for observability, metering, and role-based access controls, says Subha Tatavarti, CTO at technology and consulting firm Wipro Technologies.
Wipro, with 245,000 employees, has no choice but to adopt gen AI, she says, because its customers are expecting it to.
“We’re foundationally a technology company,” she says. “We have to do this.”
Broadening perspectives
Observability allows a company to see where data is going, what models and prompts are being used, and how long it takes for responses to come back. It can also include a mechanism to edit or obfuscate sensitive data.
Once a company knows what’s happening with its models, it can implement metering controls — limits on how much a particular model can be used, for example — to avoid unexpected spikes in costs.
“Right now, the way the metering works is the token consumption model,” Tatavarti says. “And it could get very expensive.”
In addition, for FAQs, companies can cache responses to save time and money. And for some use cases, an expensive, high-end commercial LLM might not be required since a locally-hosted open source model might suffice.
“All of that is fascinating to us and my team is definitely working on this,” she adds. “This is imperative for us to do.”
And when it comes to access controls, the fundamental principle should be to never expose native APIs to the organization but instead have a middle layer that checks permissions and handles other security and management tasks.
If, for example, an HR platform uses gen AI to answer questions based on a vector database of policies and other information, an employee should be able to ask questions about their own salary, says Rajat Gupta, chief digital officer at Xebia, an IT consultancy. But they shouldn’t be able to ask questions about those of other employees — unless they’re a manager or work in HR themselves.
Given how fast gen AI is being adopted in enterprises across all different business units and functions, it would be a nightmare to build these controls from scratch for every use case.
“The work would be enormous,” he says. “There’d be chaos.”
Gupta agrees enterprises that need to build this kind of functionality should do so once and then reuse it. “Take everything they need in common — security, monitoring, access controls — and build it as part of an enterprise-level platform,” he says.
He calls it an AI gateway, with the open source MLflow AI Gateway being one example. Released last May, it’s already been deprecated in favor of the MLflow Deployments Server. Another tool his company is using is Arthur AI’s Arthur Shield, a firewall for LLMs. It filters prompt injection attacks, profanity, and other malicious or dangerous prompts.
And then there’s Ragas, which helps check a gen AI response against the actual information in a vector database in order to improve accuracy and reduce hallucinations.
“There are many such projects both in the open source and the commercial space,” he says.
Third-party AI platforms, startups, and consultants are also rushing in to fill the gaps.
“The way the AI ecosystem is evolving is surprising,” says Gupta. “We thought the pace would slow down but it’s not. It’s rapidly increasing.”
So to get to market faster, Xebia is weaving these different projects together, he says, but it doesn’t help that AI companies keep coming up with new stuff like autonomous AI-powered agents, for example.
“If you’re using autonomous agents, how do you actually measure the efficacy of your overall agents project?” he asks. “It’s a challenge to actually monitor and control.”
Today, Xebia hobbles agents, curtailing their autonomy and allowing them to carry out only very limited and precise tasks. “That’s the only way to do it right now,” he adds. “Limit the skills they have access to, and have a central controller so they’re not talking to each other. We control it until we have more evolved understanding and feedback loops. This is a pretty new area, so it’s interesting to see how this evolves.”
Building guardrails
According to the cnvrg.io survey, compliance and privacy were top concerns for companies looking to implement gen AI, ahead of reliability, cost, and lack of technical skills.
Similarly, in the IBM survey, for companies not implementing gen AI, data privacy was cited as the barrier by 57% of respondents, and transparency by 43%. In addition, 85% of all respondents said consumers would be more likely to pick companies with transparent and ethical AI practices, but fewer than half are working toward reducing bias, tracking data provenance, working on making AI explainable, or developing ethical AI policies.
It’s easy for technologists to focus on technical solutions. Ethical AI goes beyond the technology to include legal and compliance perspectives, and issues of corporate values and identity. So this is an area where CIOs or chief AI officers can step up and help guide the larger organizations.
And it goes even further than that. Setting up gen AI-friendly data infrastructures, security and management controls, and ethical guide rails can be the first step on the journey to fully operationalize LLMs.
Gen AI will require CIOs to rethink technology, says Matt Barrington, EY Americas emerging technologies leader. Prior to gen AI, software was deterministic, he says.
“You’d design, build, test, and iterate until the software behaved as expected,” he says. “If it didn’t, it was a bug, and you’d go back and fix it. And if it did, you’d deploy it into production.” All the large compute stacks, regardless of software pattern, were deterministic. Now, other than quantum computing, gen AI is the first broadly known non-deterministic software pattern, he says. “The bug is actually the feature. The fact it can generate things on its own is the main selling point.”
That doesn’t mean the old stuff should all be thrown out. MLOps and Pytorch are still important, he says, as is knowing when to do a RAG embedding model, a DAG, or go multi-modal, as well as getting data ready for gen AI.
“All those things will remain and be important,” he says. “But you’ll have the emergence of a new non-deterministic platform stack that’ll sit alongside the traditional stack with a whole new area of infrastructure engineering and ops that will emerge to support those capabilities.”
This will change how businesses operate at a core level, and moving in this direction to become a truly AI-powered enterprise will be a fast-paced shift, he says. “Watching this emerge will be very cool,” he says.