Should you build or buy generative AI?

Rather than devote resources to replicate generative AI capabilities already available, that time and effort will go to automating existing manual processes and exploring new possibilities. “We’re not imagining utilizing AI to do the same things just because that’s the way we’ve always done it,” he says. “With this new superpower, how should we develop or refine refactoring these business processes?”

Buying rather than building will make it easier to take advantage of new capabilities as they arrive, he suggests. “I think one of the success of organizations in being able to utilize the tools that are becoming more readily available will lie in the ability to adapt and review.”

In a larger organization, using commercially available LLMs that come with development tools and integrations will allow multiple departments to experiment with different approaches, discover where generative AI can be useful, and get experience with how to use it effectively. Even organizations with significant technology expertise like Airbnb and Deutsche Telekom are choosing to fine-tune LLMs like ChatGPT rather than build their own.

“You take the large language model, and then you can bring it within your four walls and build that domain piece you need for your particular company and industry,” National Grid group CIDO Adriana Karaboutis says. “You really have to take what’s already there. You’re going to be five years out here doing a moonshot while your competitors layer on top of everything that’s already available.”

Panasonic’s B2B Connect unit used the Azure OpenAI Service to build its ConnectAI assistant for internal use by its legal and accounting teams, as well as HR and IT, and the reasoning was similar, says Hiroki Mukaino, senior manager for IT & digital strategy. “We thought it would be technically difficult and costly for ordinary companies like us that haven’t made a huge investment in generative AI to build such services on our own,” he says.

Increasing employee productivity is a high priority and rather than spend time creating the LLM, Mukaino wanted to start building it into tools designed for their business workflow. “By using Azure OpenAI Service, we were able to create an AI assistant much faster than build an AI in-house, so we were able to spend our time on improving usability.”

He also views the ability to further shape the generative AI options with plugins as a good way to customize it to Panasonic’s needs, calling plugins important functions to compensate for the shortcomings of the current ChatGPT.

Fine-tuning cloud LLMs by using vector embeddings from your data is already in private preview in Azure Cognitive Search for the Azure OpenAI Service.

“While you can power your own copilot using any internal data, which immediately improves the accuracy and decreases the hallucination, when you add vector support, it’s more efficient retrieving accurate information quickly,” Microsoft AI platform corporate VP John Montgomery says. That creates a vector index for the data source—whether that’s documents in an on-premises file share or a SQL cloud database—and an API endpoint to consume in your application.

Panasonic is using this with both structured and unstructured data to power the ConnectAI assistant. Similarly, professional services provider EY is chaining multiple data sources together to build chat agents, which Montgomery calls a constellation of models, some of which might be open source models. “Information about how many pairs of eyeglasses the company health plan covers would be in an unstructured document, and checking the pairs claimed for and how much money is left in that benefit would be a structured query,” he says.

Use and protect data

Companies taking the shaper approach, Lamarre says, want the data environment to be completely contained within their four walls, and the model to be brought to their data, not the reverse. While whatever you type into the consumer versions of generative AI tools is used to train the models that drive them (the usual trade-off for free services), Google, Microsoft and OpenAI all say commercial customer data isn’t used to train the models.

For example, you can run Azure OpenAI over your own data without fine-tuning, and even if you choose to fine-tune on your organization’s data, that customization, like the data, stays inside your Microsoft tenant and isn’t applied back to the core foundation model. “The data usage policy and content filtering capabilities were major factors in our decision to proceed,” Mukaino says.

Although the copyright and intellectual property aspects of generative AI remain largely untested by the courts, users of commercial models own the inputs and outputs of their models. Customers with particularly sensitive information, like government users, may even be able to turn off logging to avoid the slightest risk of data leakage through a log that captures something about a query.

Whether you buy or build the LLM, organizations will need to think more about document privacy, authorization and governance, as well as data protection. Legal and compliance teams already need to be involved in uses of ML, but generative AI is pushing the legal and compliance areas of a company even further, says Lamarre.

Unlike supervised learning on batches of data, an LLM will be used daily on new documents and data, so you need to be sure data is available only to users who are supposed to have access. If different regulations and compliance models apply to different areas of your business, you won’t want them to get the same results.

Source and verify

Adding internal data to a generative AI tool Lamarre describes as ‘a copilot for consultants,’ which can be calibrated to use public or McKinsey data, produced good answers, but the company was still concerned they might be fabricated. “We can’t be in the business of being wrong,” he says. To avoid that, it cites the internal reference an answer is based on, and the consultant using it is responsible to check for accuracy.

But employees already have that responsibility when doing research online, Karaboutis points out. “You need intellectual curiosity and a healthy level of skepticism as these language models continue to learn and build up,” she says. As a learning exercise for the senior leadership group, her team crated a deepfake video of her with a generated voice reading AI-generated text.

Apparently credible internal data can be wrong or just out of date, too, she cautioned. “How often do you have policy documents that haven’t been removed from the intranet or the version control isn’t there, and then an LLM finds them and starts saying ‘our maternity policy is this in the UK, and it’s this in the US.’ We need to look at the attribution but also make sure we clean up our data,” she says.

Responsibly adopting generative AI mirrors lessons learned with low code, like knowing what data and applications are connecting into these services: it’s about enhancing workflow, accelerating things people already do, and unlocking new capabilities, with the scale of automation, but still having human experts in the loop.

Shapers can differentiate

“We believe generative AI is beneficial because it has a much wider range of use and flexibility in response than conventional tools and service, so it’s more about how you utilize the tool to create competitive advantage rather than just the fact of using it,” Mukaino says.

Reinventing customer support, retail, manufacturing, logistics, or industry specific workloads like wealth management with generative AI will take a lot of work, as will setting usage policies and monitoring the impact of the technology on workflows and outcomes. Budgeting for those resources and timescales are essential, too. It comes down to can you build and rebuild faster than competitors that are buying in models and tools that let them create applications straight away, and let more people in their organization experiment with what generative AI can do?

General LLMs from OpenAI, and the more specialized LLMs built on top of their work like GitHub Copilot, improve as large numbers of people use them: the accuracy of code generated by GitHub Copilot has become significantly more accurate since it was introduced last year. You could spend half a million dollars and get a model that only matches the previous generation of commercial models, and while benchmarking isn’t always a reliable guide, these continue to show better results on benchmarks than open source models.

Be prepared to revisit decisions about building or buying as the technology evolves, Lamarre warns. “The question comes down to, ‘How much can I competitively differentiate if I build versus if I buy,’ and I think that boundary is going to change over time,” he says.

If you’ve invested a lot of time and resources in building your own generative models, it’s important to benchmark not just how they contribute to your organization but how they compare to the commercially available models your competition could adopt today, paying 10 to 15 cents for around a page of generated text, not what they had access to when you started your project.

Major investments

“The build conversation is going to be reserved for people who probably already have a lot of expertise in building and designing large language models,” Montgomery says, noting that Meta builds its LLMs on Azure, while Anthropic, Cohere, and Midjourney use Google Cloud infrastructure to train their LLMs.

Some organizations do have the resources and competencies for this, and those that need a more specialized LLM for a domain may make the significant investments required to exceed the already reasonable performance of general models like GPT4.

Training your own version of an open source LLM will need extremely large data sets: while you can acquire these from somewhere like Hugging Face, you’re still relying on someone else having curated them. Plus you’ll still need data pipelines to clean, deduplicate, preprocess, and tokenize the data, as well as significant infrastructure for training, supervised fine-tuning, evaluation, and deployment, as well as the deep expertise to make the right choices for every step.

There are multiple collections with hundreds of pre-trained LLMs and other foundation models you can start with. Some are general, others more targeted. Generative AI startup Docugami, for instance, began training its own LLM five years ago, specifically to generate the XML semantic model for business documents, marking up elements like tables, lists and paragraphs rather than the phrases and sentences most LLMs work with. Based on that experience, Docugami CEO Jean Paoli suggests that specialized LLMs are going to outperform bigger or more expensive LLMs created for another purpose.

“In the last two months, people have started to understand that LLMs, open source or not, could have different characteristics, that you can even have smaller ones that work better for specific scenarios,” he says. But he adds most organizations won’t create their own LLM and maybe not even their own version of an LLM.

Only a few companies will own large language models calibrated on the scale of the knowledge and purpose of the internet, adds Lamarre. “I think the ones that you calibrate within your four walls will be much smaller in size,” he says.

If they do decide to go down that route, CIOs will need to think about what kind of LLM best suits their scenarios, and with so many to choose from, a tool like Aviary can help. Consider the provenance of the model and the data it was trained on. These are similar questions that organizations have learned to ask about open source projects and components, Montgomery points out. “All the learnings that came from the open source revolution are happening in AI, and they’re happening much quicker.”

IDC’s AI Infrastructure View benchmark shows that getting the AI stack right is one of the most important decisions organizations should take, with inadequate systems the most common reason AI projects fail. It took more than 4,000 NVIDIA A100 GPUs to train Microsoft’s Megatron-Turing NLG 530B model. While there are tools to make training more efficient, they still require significant expertise—and the costs of even fine-tuning are high enough that you need strong AI engineering skills to keep costs down.

Docugami’s Paoli expects most organizations will buy a generative AI model rather than build, whether that means adopting an open source model or paying for a commercial service. “The building is going to be more about putting together things that already exist.” That includes using these emerging stacks to significantly simplify assembling a solution from a mix of open source and commercial options.

So whether you buy or build the underlying AI, the tools adopted or created with generative AI should be treated as products, with all the usual user training and acceptance testing to make sure they can be used effectively. And be realistic about what they can deliver, Paoli warns.

“CIOs need to understand they’re not going to buy one LLM that’s going to change everything or do a digital transformation for them,” he says.



Source link