Google Cloud Run now allows AI inferencing on Nvidia GPUs

The combination of GPU support and the serverless nature of the service, according to experts, should benefit enterprises trying to run AI workloads as with Cloud Run they don’t need to buy and station hardware compute resources on-premises and not spend relatively more by spinning up a typical cloud instance.

“When your app is not in use, the service automatically scales down to zero so that you are not charged for it,” Google wrote in a blog post.

The company claims that the new feature opens up new use cases for developers, including performing real-time inference with lightweight open models such as Google’s open Gemma (2B/7B) models or Meta’s Llama 3 (8B) to build custom chatbots or on-the-fly document summarization, while scaling to handle spiky user traffic.

Another use case is serving custom fine-tuned gen AI models, such as image generation tailored to your company’s brand, and scaling down to optimize costs when nobody’s using them.

Additionally, Google said that the service can be used to speed up compute-intensive Cloud Run services, such as on-demand image recognition, video transcoding and streaming, and 3D rendering.

But are there caveats?

To being with, enterprises may worry about cold start — a common phenomenon with serverless services. Cold start refers to the amount of time needed for the service to load before running actively.



Source link