Prompt Injection Tricks AI Into Downloading and Executing Malware
[wunderwuzzi] demonstrates a proof of concept in which a service that enables an AI to control a virtual computer (in this case, Anthropic’s Claude Computer Use) is made to download …read more
![Prompt Injection Tricks AI Into Downloading and Executing Malware](https://hackaday.com/wp-content/uploads/2021/08/GithubCopilot.jpg?#)
![](https://hackaday.com/wp-content/uploads/2021/08/GithubCopilot.jpg?w=800)
[wunderwuzzi] demonstrates a proof of concept in which a service that enables an AI to control a virtual computer (in this case, Anthropic’s Claude Computer Use) is made to download and execute a piece of malware that successfully connects to a command and control (C2) server. [wonderwuzzi] makes the reasonable case that such a system has therefore become a “ZombAI”. Here’s how it worked.
After setting up a web page with a download link to the malicious binary, [wunderwuzzi] attempts to get Claude to download and run the malware. At first, Claude doesn’t bite. But that all changes when the content of the HTML page gets rewritten with instructions to download and execute the “Support Tool”. That new content gets interpreted as orders to follow; being essentially a form of prompt injection.
Claude dutifully downloads the malicious binary, then autonomously (and cleverly) locates the downloaded file and even uses chmod
to make it executable before running it. The result? A compromised machine.
Now, just to be clear, Claude Computer Use is experimental and this sort of risk is absolutely and explicitly called out in Anthropic’s documentation. But what’s interesting here is that the methods used to convince Claude to compromise the system it’s using are essentially the same one might take to convince a person. Make something nefarious look innocent, and obfuscate the true source (and intent) of the directions. Watch it in action from beginning to end in a video, embedded just under the page break.
This is a demonstration of the importance of security and caution when using or designing systems like this. It’s also a reminder that large language models (LLMs) fundamentally mix instructions and input data together in the same stream. This is a big part of what makes them so fantastically useful at communicating naturally, but it’s also why prompt injection is so tricky to truly solve.