AI/ML Brain Food – Part 1: Where to Start? – VMware Cloud Community
Are you ready to start leveraging Artificial Intelligence Machine Learning (AI/ML) in your business? Maybe you want to get ready but
aren’t sure where to start?
Whether you are just starting to investigate the potential benefits of AI/ML for your business, or maybe you are already advanced, running deep learning algorithms using neural nets. Either way, I think most of us already agree, AI/ML is the future of technology. We are going through an industry transformation and that future is coming quickly, whether we’re ready or not.
I’m keeping this article at a high level, generic to AI/ML, so it isn’t going to tell you how to train a reinforcement learning model, but the idea is to give some food-for-thought on the challenges and potential pitfalls of running high resource AI/ML type workloads in a business.
Food-for-thought………That’s actually a nice way of describing this article. We often talk about eating “brain food” to help us perform better, but when the brain is actually a neural net performing machine learning tasks, what “brain food” do we provide it to run at its best?
We’re going to dive into that topic today, but first, let’s think about what type of AI/ML you might want to develop.
There are 4 main types of machine learning in the context of AI:
- Reinforcement Learning
- Supervised learning
- Unsupervised learning
- Semi-supervised learning
I’d like to expand on this in another article, so watch this space, but no matter which of these ML types you chose to work with, the main things to think about in the planning stage of an A.I project are the same:
The Reward
What would you like to ask the AI/ML for? for example “How can I improve my sales forecast”. It’s important to pick an achievable outcome, this is the entire reason for the existence of the AI/ML. The output of this example could then be a continually updated and accurate forecast based on all of your existing data and pipeline.
The Data
Do you have enough data to be useful? You will need a lot of it. What data is available to add context for the AI/ML? The bigger the maze for the algorithm to explore, the more it will learn and the more accurate its predictions or output will be.
The Quality
Structured, Semi-Structured, Unstructured. The type of the data, the accuracy and the quality will make or break your model. Get the house in order as much as possible first.
Staying with the brain food analogy, next we need to understand what type of food does the AI/ML need to perform best? Is your existing compute platform lacking something? Does it need a vitamin kick?
In this analogy, the nutrition required by the AI/ML is almost always…..POWER!
If there’s one thing we all know about artificial intelligence, it’s that it requires a lot of computations. The vast complexity that’s intrinsic to things like neural nets was previously only possible in theory, but now we are in the right time and place. Moore’s law continues to accurately predict the future of compute power and thankfully AI/ML is now within our grasp. The increase in compute power has now truly opened the doors for AI/ML, but does that mean we all need to build supercomputers in our data centers? Not necessarily.
At a restaurant? From a supermarket? Food grown in an allotment? Just as there are many ways to find food, with pros and cons to each, there are many options available for leveraging AI/ML as a business. Should we keep it on-prem? Leverage the cloud? Have it hosted? Use an appliance? Push it to the edge? The list goes on.
Depending on the model you wish to design, it would pay to think hard about where you would like to run which components of the AI/ML workload. Remember, there is no need to limit it to one location.
Here’s an example, using self-driving cars:
- Yes! – To program a self-driving car with the knowledge of how to avoid pedestrians and other cars, the model should probably be trained in a cloud environment at a massive scale.
- No! – The car should probably not need to be constantly connected to the cloud to make decisions about where to drive.
With the vRealize AI project at VMware, the team chose a similar route, where the model is trained in the cloud, then connected to the on-premises datacentre to tweak the storage config through vRealize Operations software endpoints.
Eating Out – (Outsourcing)
There are now many supercomputer vendors, allowing you to consume their platform for a period of time, which is going to be a great choice for short term projects, but potentially costly for longer term, so it’s about getting the right balance. There are also a number of ML platforms available to leverage on the Mega Clouds, so if you have models you would like to run at scale and already have network connectivity ready to go, this could be a great way to hit the ground running and have something straight away. I actively encourage everyone to explore the AI/ML content the Mega Clouds have already made available in the public domain. This could be the most appropriate solution assuming you are leveraging public cloud already and there’s no compliance/security risks.
Cooking at Home – (Keeping it in house)
Clearly most businesses don’t have supercomputers or racks of GPUs to offload the kind of compute power some ML workloads will require. People such as Na Zhang, in the office of the CTO at VMware, are spending time evaluating different ways to effectively run deep learning models in house on vSphere with the best reliability and performance. Years ago, at VMware we achieved making containers run faster on vSphere than bare metal or anywhere else. The industry is now starting to see the benefits of that with the Tanzu portfolio.
Na Zhang’s article on “Accelerating Machine Learning Inference on CPU with VMware vSphere and Neural Magic” explores different combinations of software and hardware acceleration for deep learning workloads on vSphere and some significant performance improvements can be seen. Really interesting area and much more to come!
So, it is completely achievable running AI/ML workloads on premises, and this could be the way to go for some or all of the workload, but either way, first it’s time to prepare your house, as your existing platform might not cut it….
Thanks for reading today and I look forward to you joining me on the second part of this article – AI/ML Brainfood – Part 2: What’s on the Menu, coming soon…`