VMware, Nvidia team on enterprise-grade AI platform


Companies trying to deploy generative AI today have a major problem. If they use a commercial platform like OpenAI, they have to send data up to the cloud, which may run afoul of compliance requirements and is expensive. If they download and run a model like Llama 2 locally, they need to know a lot about how to fine-tune it, how to set up vector databases to feed it live data, and how to operationalize it.

VMware’s new partnership with Nvidia aims to solve these issues by offering a fully integrated, ready-to-go generative AI platform that companies can run on premises, in colocation facilities, or in private clouds. The platform will include Llama 2 or a choice of other large language models, as well as a vector database to feed up-to-date company information to the LLM.

The product, VMware Private AI Foundation with Nvidia, will feature generative AI software and accelerated computing from Nvidia, and it will be built on VMware Cloud Foundation and optimized for AI.

The need for a platform like this is dramatic. According to Lucidworks’ global generative AI benchmark study released this month, 96% of executives and managers involved in AI decision processes are actively prioritizing generative AI investments, and 93% of companies plan to increase their AI spend in the coming year.

But risk management is a serious concern. The uncertain and evolving regulatory landscape significantly impacts generative AI investment decisions, said 77% of CEOs polled for a recent KPMG survey. Prioritizing effective risk management has increased across the board over the past few months, KPMG reported, with protecting personal data and privacy concerns leading the priority list at 63%, followed by cybersecurity at 62%.

Running large language models on premises, or within other enterprise-controlled environments, can significantly alleviate many of these concerns.

“Having the option to run a model locally can open many doors for companies that were simply prohibited from using publicly hosted models, even if they were hosted in a virtual public cloud,” says Bradley Shimmin, chief analyst for AI platforms, analytics, and data management at research firm Omdia.

This is particularly important for heavily regulated sectors like finance, he says, or for government use cases. Local LLMs can also address data residency concerns.

“Having the ability to have state-of-the-art models that you can run completely in air-gapped systems is pretty compelling,” Shimmin says. “It’s all about bringing the model to the data. Data gravity is driving the entire industry.”

If the locally run models are also free and open source, then companies stand to save quite a bit of money on not having to pay for OpenAI API calls. “Latency is lower, cost is lower, and you have more control over it,” says Manish Goyal, global AI and analytics leader at IBM Consulting.

VMware’s new offering is positioned to catch the wave.

And, this week at the VMware Explore 2023 conference, Nvidia and VMware are demonstrating how enterprises can use their tools to download free, open source LLMs, customize them, and deploy production-grade generative AI in VMware environments.

The catch? VMware Private AI Foundation won’t be available until early next year.

How VMware Private AI Foundation works

“We believe companies will bring more of their gen AI workloads to their data, rather than moving their data to the public cloud services,” says Paul Turner, vice president of product management for vSphere and cloud platform at VMware.

Enterprises can take models like Meta’s Llama 2, place the models in their data centers next to their data, optimize and fine-tune them, and create new business offerings, he says. “It helps build business differentiators for companies.”

When companies try to do this on their own, however, it can be difficult to integrate all the hardware and software components with all the necessary applications and toolkits. “We want to make it simple for our customers,” Turner says.

VMware Private AI Foundation is the complete stack, he says. It starts with a foundational model: Meta’s Llama 2, or Falcon, or Nvidia’s own NeMo AI. Building on top of existing models is more efficient than building new foundational models from scratch, he says.

After the models are fine-tuned, they need a way to get up-to-date information without retraining. This typically comes in the form of vector databases. The VMware Private AI Foundation has a vector database built in: PostgreSQL with the PGVector extension.

“The vector database is very useful if they have fast-moving information,” says Turner. “It’s part of building a complete solution.”

In addition, VMware has done the heavy lifting on performance optimization.

“Models don’t just fit in a single GPU,” Turner says. “They need two GPUs, possibly four. Sometimes you want to spread to 8 to get the performance you need – and we can scale it up to 16 GPUs.”

Storage is also optimized, he adds. There’s a direct path from the GPU to the storage, bypassing the CPU. Dell, HPE and Lenovo are already signed up as partners to deliver the rest of the stack.

“It will be a single SKU product from VMware,” says Turner, “but will also ship from these vendors as pre-integrated, ready-to-go systems. We give customers that choice.”

VMware Private AI Foundation will also be available through VMware’s OEM channels and distributors, as well as more than 2,000 MSP partners.

Nvidia’s AI products will also be available through a broad system of partners, says Justin Boitano, vice president of enterprise computing at Nvidia. “We have over 20 global OEMs and ODMs.”

The pricing will be based on GPUs, says VMware’s Turner. “We want to tie it to the value for the customers.” However, he declined to give more details. “We are not ready to share the pricing on this.”

If customers don’t want to wait until next year, reference architectures are already available. “Customers can roll their own,” Turner says. “But the fully integrated single suite product will be early 2024.”

Fine-tuning LLMs

According to Nvidia’s Boitano, generative AI is the most transformational technology of our lifetimes.

“These models are amazing,” he says. “They provide a natural language interface to a company’s business systems. The power is phenomenal. We see AI being infused into every business in the next decade.”

The problem is that off-the-shelf models only know the data they were trained on. If they know anything about a specific company, it’s only the public information available on the Web when they were trained.

Plus, foundation models like ChatGPT are trained on everything. They can write poetry, and code, and help plan meals, but they often are not very good at the specific tasks a company might want them to do. “You have to customize models against your private business information,” Boitano says. “That’s where the true business value is unlocked.”

This could be a company’s call center records, or IT tickets. “But you don’t want to give this data to a model that takes it and encodes it into a public thing,” he says.

That’s where open-source models like Llama 2 come in, he says. “You can pull in those models and easily combine them with your proprietary information, so that the model has a nuanced understanding of what you need.”

VMware Private AI Foundation comes with pre-packaged models, Boitano says, training frameworks, and an AI workbench. “This makes it easy to start on your laptop or PC but provides an easy path to move into the data center, where the bulk of computing and inference work will happen,” he says.

Fine-tuning can take as little as eight hours on eight GPUs to create a 40-billion parameter model. Then the vector database is plugged in, so that the AI can have access to current information from across the enterprise. “We think all this unlocks previously impossible-to-solve problems,” Boitano says.

The platform will support the A100 AI chip, first introduced in 2020, the H100 chip released in 2022, and the new L40S chip when it ships next year, says Boitano.

The L40S will offer 1.2 times more generative AI inference performance and 1.7 times more training performance compared to the A100, he says.

“A lot of partners are excited about L40S because it’s not just for generative AI but can do virtual desktops and rendering as well,” he says.

What is Llama 2 from Meta?

The VMware Private AI Foundation will be able to run a variety of generative AI models, but the one mentioned most frequently for enterprise deployments these days is Llama 2.

Meta released Llama 2 in July. It’s free for commercial use and open source – kind of. Companies with more than 700 million active monthly users will need to apply for a license.

Today, nearly all of the large language models at the top of the HuggingFace Open LLM Leaderboard are variants of Llama 2. Previously, open-source foundational models were limited in usability, many based on Llama 2’s precursor, Llama, and only licensed for non-commercial use.

“Now we have a commercially licensable open source-ish model that you don’t have to pay for,” says Juan Orlandini, CTO, North America at Insight, a Chandler, Ariz.-based solution integrator. “The genie is out of the bottle.”

Companies can download these models, fine-tune them by doing additional training on their own data, and give them access to real-time data via embeddings, he says.

Llama 2 comes in three sizes, allowing companies to optimize performance versus hardware requirements. “You can actually take that and turn it into something that can run on relatively low-powered devices,” he says.

Private LLMs are beginning to be the way that organizations are going, says John Carey, managing director of the Technology Solutions group at global consulting firm AArete.

The biggest advantage is that they allow enterprises to bring the AI to their data, rather than the other way around.

“They need to secure their data, they need to make sure that their data has access controls and all the standard data governance, but they want ChatGPT-like functionality,” says Carey. “But there are real concerns about ChatGPT or Bard or whatever, especially for proprietary data – or healthcare data, or contract data.”

VMware isn’t the only platform offering support for Llama 2.

“AWS has their Titan family of models, but they’ve also recently partnered with Meta to host the Llama models next to that,” says Omdia’s Shimmin.

Microsoft has also announced support for Llama 2 on Azure, and it is already available in the Azure Machine Learning model catalog.

“I would imagine, given the way Google has architected their tools, that they would also be able to host and work with third-party models, both closed and open source,” says Shimmin.

IBM plans to make Llama 2 available within its Watsonx AI and Data Platform.

Copyright © 2023 IDG Communications, Inc.



Source link