Startup gives IT control of GPU pools to maximize their use

Among the greatest component shortages bedeviling everyone is that of GPUs, both from Nvidia and AMD. GPUs are used in Bitcoin farming, and with massive farms around the world gobbling up every GPU card, getting one is nigh impossible or prohibitively expense.

So customers need to squeeze every last cycle out of the GPUs they have in service. An Israeli company called Run:AI claims it has a fix with a pair of technologies that pool GPU resources and maximize their use.

The technologies are called Thin GPU Provisioning and Job Swapping. Not the most creative of names but they describe what the two do in tandem to automate the allocation and utilization of GPUs.

Data scientists and other AI researchers often receive an allocation of GPUs, with the GPUs reserved for individuals to run their processes and no one else’s. That’s how high performance computing (HPC) and supercomputers operate, and getting processor allocation just right is something of a black art for administrators.

With Thin GPU Provisioning and Job Swapping, whenever a running workload is not utilizing its allocated GPUs, those resources are pooled and can be automatically provisioned for use by a different workload. It’s similar to the thin provisioning first introduced by VMware for storage-area networks, where available storage disk space is allocated but not provisioned until necessary, according to a statement by Run:AI.

Thin GPU Provisioning creates over-provisioned GPUs, while Job Swapping uses preset priorities to reassign unused GPU capacity. Together, Run:AI says, the two technologies maximize overall GPU utilization.

Data Scientists, whose specialties aren’t always technical, don’t have to deal with scheduling and provisioning. At the same time, IT departments have control over GPU utilization across their networks, the company says.

“Researchers are no longer able to ‘hug’ GPUs—making them unavailable for use by others,” said Dr. Ronen Dar, CTO and co-founder of Run:AI in a statement. “They simply run their jobs and Run:AI’s quota management, Thin GPU Provisioning and Job Swapping features seamlessly allocate resources efficiently without any user intervention.”

Thin GPU Provisioning and Job Swapping are currently in testing in Run:AI customer labs. They are expected to be generally available in Q4 2021.

Run:AI was founded in 2018 and has $43 million in funding.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Source link

Startup gives IT control of GPU pools to maximize their use

VMWARE

Helping Public Sector Organisations Define Cloud Strategy

How to change the VLAN ID of the Service Console in ESX from the command line/console

Cisco UCS and Vmware Interfaces (Vnics) HA Design Considerations

Troubleshooting network and TCP/UDP port connectivity issues on ESX/ESXi(2020669)

vSphere Client Parameters

Configuration Templates

CUE Licenses

Trouble shooting Unity Express with Call Manager Integeration & Operational Issues

CME Configuration Example: SIP Trunks to Viatalk and VoIP.ms

SIP Phone registration – CME Configuration

CUE Voicemail + VPIM networking (CUE to unity)

Related Post

VMWARE

Configuration Templates