Attackers Could Eavesdrop on AI Conversations on GPUs
Researchers at cybersecurity research and consulting firm Trail of Bits have discovered a vulnerability that could allow attackers to read GPU local memory from affected Apple, Qualcomm, AMD and Imagination GPUs. In particular, the vulnerability—which the researchers named LeftoverLocals—can access conversations performed with large language models and machine learning models on affected GPUs.
Which GPUs are affected by the LeftoverLocals vulnerability, and what has been patched?
Apple, Qualcomm, AMD and Imagination GPUs are affected. All four vendors have released some remediations, as follows:
- Apple has released fixes for the A17 and M3 series processors and for some specific devices, such as the Apple iPad Air 3rd G (A12); Apple did not provide a complete list of which devices have been secured. As of Jan. 16, the Apple MacBook Air (M2) was vulnerable, according to Trail of Bits. Recent Apple iPhone 15s do not appear to be vulnerable. When asked for more detail by TechRepublic, Apple provided a prewritten statement thanking the researchers for their work.
- AMD plans to release a new mode to fix the problem in March 2024. AMD released a list of affected products.
- Imagination updated drivers and firmware to prevent the vulnerability, which affected DDK Releases up to and including 23.2.
- Qualcomm released a patch for some devices, but it did not provide a complete list of which devices are and are not affected.
How does the LeftoverLocals vulnerability work?
Put simply, it’s possible to use a GPU memory region called local memory to connect two GPU kernels together, even if the two kernels aren’t on the same application or used by the same person. The attacker can use GPU compute applications such as OpenCL, Vulkan or Metal to write a GPU kernel that dumps uninitialized local memory into the target device.
CPUs typically isolate memory in a way that it wouldn’t be possible to use an exploit like this; GPUs sometimes do not.
SEE: Nation-state threat actors were found to be exploiting two vulnerabilities in Ivanti Secure VPN in early January (TechRepublic)
In the case of open-source large language models, the LeftoverLocals process can be used to “listen” for the linear algebra operations performed by the LLM and to identify the LLM using training weights or memory layout patterns. As the attack continues, the attacker can see the interactive LLM conversation.
The listener can sometimes return incorrect tokens or other errors, such as words semantically similar to other embeddings. Trail of Bits found their listener extracted the word “Facebook” instead of the similar Named Entity token such as “Google” or “Amazon” the LLM actually produced.
LeftoverLocals is tracked by NIST as CVE-2023-4969.
How can businesses and developers defend against LeftoverLocals?
Other than applying the updates from the GPU vendors listed above, researchers Tyler Sorensen and Heidy Khlaaf of Trail of Bits warn that mitigating and verifying this vulnerability on individual devices may be difficult.
GPU binaries are not stored explicitly, and not many analysis tools exist for them. Programmers will need to modify the source code of all GPU kernels that use local memory. They should ensure that GPU threads clear memory to any local memory locations not used in the kernel, and check that the compiler doesn’t remove these memory-clearing instructions afterward.
Developers working in machine learning or application owners using ML apps should take special care. “Many parts of the ML development stack have unknown security risks and have not been rigorously reviewed by security experts,” wrote Sorensen and Khlaaf.
Trail of Bits sees this vulnerability as an opportunity for the GPU systems community to harden the GPU system stack and corresponding specifications.