Startup gives IT control of GPU pools to maximize their use


Among the greatest component shortages bedeviling everyone is that of GPUs, both from Nvidia and AMD. GPUs are used in Bitcoin farming, and with massive farms around the world gobbling up every GPU card, getting one is nigh impossible or prohibitively expense.

So customers need to squeeze every last cycle out of the GPUs they have in service. An Israeli company called Run:AI claims it has a fix with a pair of technologies that pool GPU resources and maximize their use.

The technologies are called Thin GPU Provisioning and Job Swapping. Not the most creative of names but they describe what the two do in tandem to automate the allocation and utilization of GPUs.

Data scientists and other AI researchers often receive an allocation of GPUs, with the GPUs reserved for individuals to run their processes and no one else’s. That’s how high performance computing (HPC) and supercomputers operate, and getting processor allocation just right is something of a black art for administrators.

With Thin GPU Provisioning and Job Swapping, whenever a running workload is not utilizing its allocated GPUs, those resources are pooled and can be automatically provisioned for use by a different workload. It’s similar to the thin provisioning first introduced by VMware for storage-area networks, where available storage disk space is allocated but not provisioned until necessary, according to a statement by Run:AI.

Thin GPU Provisioning creates over-provisioned GPUs, while Job Swapping uses preset priorities to reassign unused GPU capacity. Together, Run:AI says, the two technologies maximize overall GPU utilization.

Data Scientists, whose specialties aren’t always technical, don’t have to deal with scheduling and provisioning. At the same time, IT departments have control over GPU utilization across their networks, the company says.

“Researchers are no longer able to ‘hug’ GPUs—making them unavailable for use by others,” said Dr. Ronen Dar, CTO and co-founder of Run:AI in a statement. “They simply run their jobs and Run:AI’s quota management, Thin GPU Provisioning and Job Swapping features seamlessly allocate resources efficiently without any user intervention.”

Thin GPU Provisioning and Job Swapping are currently in testing in Run:AI customer labs. They are expected to be generally available in Q4 2021.

Run:AI was founded in 2018 and has $43 million in funding.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2021 IDG Communications, Inc.



Source link