Deepseek's AI model proves easy to jailbreak – and worse

goc/Getty Images

Amidst equal parts elation and controversy over what its performance means for AI, Chinese startup DeepSeek continues to raise security concerns.

On Thursday, Unit 42, a cybersecurity research team at Palo Alto Networks, published results on three jailbreaking methods it employed against several distilled versions of DeepSeek’s V3 and R1 models. According to the report, these efforts “achieved significant bypass rates, with little to no specialized knowledge or expertise being necessary.”

Also: Public DeepSeek AI database exposes API keys and other user data

“Our research findings show that these jailbreak methods can elicit explicit guidance for malicious activities,” the report states. “These activities include keylogger creation, data exfiltration, and even instructions for incendiary devices, demonstrating the tangible security risks posed by this emerging class of attack.”

Researchers were able to prompt DeepSeek for guidance on how to steal and transfer sensitive data, bypass security, write “highly convincing” spear-phishing emails, conduct “sophisticated” social engineering attacks, and make a Molotov cocktail. They were also able to manipulate the models into creating malware.

“While information on creating Molotov cocktails and keyloggers is readily available online, LLMs with insufficient safety restrictions could lower the barrier to entry for malicious actors by compiling and presenting easily usable and actionable output,” the paper adds.

Also: OpenAI launches new o3-mini model – here’s how free ChatGPT users can try it

On Friday, Cisco also released a jailbreaking report for DeepSeek R1. After targeting R1 with 50 HarmBench prompts, researchers found DeepSeek had “a 100% attack success rate, meaning it failed to block a single harmful prompt.” You can see how DeepSeek compares to other top models’ resistance rates below.

“We must understand if DeepSeek and its new paradigm of reasoning has any significant tradeoffs when it comes to safety and security,” the report notes.

Also on Friday, security provider Wallarm released its own jailbreaking report, stating it had gone a step beyond attempting to get DeepSeek to generate harmful content. After testing V3 and R1, the report claims to have revealed DeepSeek’s system prompt, or the underlying instructions that define how a model behaves, as well as its limitations.

Also: Copilot’s powerful new ‘Think Deeper’ feature is free for all users – how it works

The findings reveal “potential vulnerabilities in the model’s security framework,” Wallarm says.

OpenAI has accused DeepSeek of using its models, which are proprietary, to train V3 and R1, thus violating its terms of service. In its report, Wallarm claims to have prompted DeepSeek to reference OpenAI “in its disclosed training lineage,” which — the firm says — indicates “OpenAI’s technology may have played a role in shaping DeepSeek’s knowledge base.”

deepseek-img-2 — Wallarm’s chats with DeepSeek, which mention OpenAI.

Wallarm

“In the case of DeepSeek, one of the most intriguing post-jailbreak discoveries is the ability to extract details about the models used for training and distillation. Normally, such internal information is shielded, preventing users from understanding the proprietary or external datasets leveraged to optimize performance,” the report explains.

“By circumventing standard restrictions, jailbreaks expose how much oversight AI providers maintain over their own systems, revealing not only security vulnerabilities but also potential evidence of cross-model influence in AI training pipelines,” it continues.

Also: Apple researchers reveal the secret sauce behind DeepSeek AI

The prompt Wallarm used to get that response is redacted in the report, “in order not to potentially compromise other vulnerable models,” researchers told ZDNET via email. The company emphasized that this jailbrokem response is not a confirmation of OpenAI’s suspicion that DeepSeek distilled its models.

As 404 Media and others have pointed out, OpenAI’s concern is somewhat ironic, given the discourse around its own public data theft.

Wallarm says it informed DeepSeek of the vulnerability, and that the company has already patched the issue. But just days after a DeepSeek database was found unguarded and available on the internet (and was then swiftly taken down, upon notice), the findings signal potentially significant safety holes in the models that DeepSeek did not red-team out before release. That said, researchers have frequently been able to jailbreak popular US-created models from more established AI giants, including ChatGPT.

Source link

Deepseek's AI model proves easy to jailbreak – and worse

VMWARE

Helping Public Sector Organisations Define Cloud Strategy

How to change the VLAN ID of the Service Console in ESX from the command line/console

Cisco UCS and Vmware Interfaces (Vnics) HA Design Considerations

Troubleshooting network and TCP/UDP port connectivity issues on ESX/ESXi(2020669)

vSphere Client Parameters

Configuration Templates

CUE Licenses

Trouble shooting Unity Express with Call Manager Integeration & Operational Issues

CME Configuration Example: SIP Trunks to Viatalk and VoIP.ms

SIP Phone registration – CME Configuration

CUE Voicemail + VPIM networking (CUE to unity)

Related Post

VMWARE

Configuration Templates