Bad actors continue to leverage large language models (LLMs) to create functional tooling more quickly, but a recent exploitation of the React2Shell vulnerability by a malware sample generated by AI/LLM shows how easy it is for low-skilled operators to rapidly construct exploitation frameworks.
Security researchers from Darktrace, which runs a global honeypot network called CloudyPots, observed the intrusion against the company’s Docker honeypot, designed to expose the Docker daemon to the internet without requiring authentication. An attacker can discover the daemon and then use the Docker API to create a container.
While “there’s nothing novel about the attack, vulnerability, or exploit,” it is “interesting that the dramatic reduction in the effort required to assemble an end-to-end intrusion chain,” says Black Duck Senior R&D Manager Christopher Jess.
“Obviously, time-to-value is key for threat actors, and the ability for lower-skilled actors to build and deploy these capabilities is made possible by vibe-coding,” says Trey Ford, chief strategy and trust officer at Bugcrowd. “What I’m most interested in will be how our threat intel operators find these lower-skilled groups integrating commercial crimeware into their vibe-coded payload delivery and management infrastructure.”
Ford expects “to see an increase in smaller-scale threat actor communities, and an uptick in commercial crimeware adoption by these groups, noting that he’s “curious how the more established operators will deal with these smaller players.” Super scalers and browser providers will definitely face additional workload, he says.
“The attacker was observed spawning a container named “python-metrics-collector”, configured with a start-up command that first installed prerequisite tools including curl, wget, and Python 3,” according to a blog penned by Darktrace Malware Research EngineerNathaniel Bill and VP of Threat Research and Field CISO AI Security Nathaniel Jones.
Subsequently, it downloaded a list of required Python packages before downloading and running a Python script.
“The downloaded Python payload was the central execution component for the intrusion,” the researchers wrote, with “obfuscation by design within the sample was reinforced between the exploitation script and any spreading mechanism.”
While “Docker malware samples typically include their own spreader logic,” the omission in this case “suggests that the attacker maintained and executed a dedicated spreading tool remotely,” they said.
The exploitation is bad news for CISOs and SOC leaders, whom the researchers urged to treat it as a preview of the not-so-distant future. “Threat actors can now generate custom malware on demand, modify exploits instantly, and automate every stage of compromise,” they said.
Because the downloaded script doesn’t (appear to) include a Docker spreader, the malware “will not replicate to other victims from an infected host,” the researchers wrote, noting how uncommon that is for Docker malware. “This indicates that there is a separate script responsible for spreading, likely deployed by the attacker from a central spreader server,” a theory that “is supported by the fact that the IP that initiated the connection, 49[.]36.33.11, is registered to a residential ISP in India,” they said.
“While it is possible the attacker is using a residential proxy server to cover their tracks, it is also plausible that they are running the spreading script from their home computer,” they maintained.
“This is something of a Pandora’s Box issue with LLMs, because it’s looking like prompt injection is going to be an intractable problem,” says Michael King, senior solutions engineer at Black Duck.
“Even if providers lock their frontier models down,” King says, “any open weight model that’s up to the task can be trivially jailbroken,” with “this ability is here to stay.”
But there is a silver lining. “LLMs are still limited by their training data” so they are not “producing fundamentally new threats, just increasing the speed of development for both attackers and defenders,” says King.
Shoring up Defenses
“For CISOs and SOC leaders, this activity should be viewed as an early indicator of what is rapidly becoming the norm,” says Chrissa Constantine, senior cybersecurity solution architect at Black Duck.
“Traditional indicators such as malware uniqueness or code quality are becoming less reliable signals of threat maturity because automation has steadily eroded the link between technical sophistication and operator skill,” says Constantine.
Saumitra Das, vice president of engineering at Qualys, says, “enterprises should expect not only more automated attacks but also stealthier agent-based reconnaissance and a need for faster risk-based remediation due to all the zero days LLMs will discover.”
Since “qualitatively, things are still the same: Once a vulnerability is known, it’s important to patch as quickly as possible,” says King.
Constantine urges security teams to “respond by prioritizing hardening of exposed services, particularly cloud and container management interfaces that are frequently misconfigured,” and “adopting continuous monitoring of runtime behavior, rather than static signatures…, as AI-generated malware can be easily altered to evade known detections.”
Organizations, too, “should also ensure that honeypots, deception technologies, and anomaly-based detection are integrated into their security operations to identify novel or previously unseen attack patterns,” she says.
And Constantine advises security leaders to “incorporate the assumption of AI-enabled adversaries into threat modeling and incident response planning, treating this research not as an isolated case but as a preview of the near-term future of cyber threats.”
Ram Varadarajan, CEO at Acalvio, says operators aren’t going to have any other choice but to assume “breach as baseline,” always assuming “that the bad guys are inside your firewall.”
He believes the best defense “will be AI-tuned tripwires, in everything from honeypots to game theory” and that “organizations will need deception techniques that leverage the algorithmic behavior that offensive AI models bring, to impel those intruders to blunder into an ambush.”
As Varadarajan bluntly notes: “That’s our future.”
