A leading artificial intelligence company claims to have stopped a China-backed “cyber espionage” campaign that was able to infiltrate financial firms and government agencies with almost no human oversight.
The US-based Anthropic said its coding tool, Claude Code, was “manipulated” by a Chinese state-sponsored group to attack 30 entities around the world in September, achieving a “handful of successful intrusions”.
This was a “significant escalation” from previous AI-enabled attacks it monitored, it wrote in a blogpost on Thursday, because Claude acted largely independently: 80 to 90% of the operations involved in the attack were performed without a human in the loop.
“The actor achieved what we believe is the first documented case of a cyber-attack largely executed without human intervention at scale,” it wrote.
Anthropic did not clarify which financial institutions and government agencies had been targeted, or what exactly the hackers had achieved – although it did say they were able to access their targets’ internal data.
It said Claude had made numerous mistakes in executing the attacks, at times making up facts about its targets, or claiming to have “discovered” information that was free to access.
Policymakers and some experts said the findings were an unsettling sign of how capable certain AI systems have grown: tools such as Claude are now able to work independently over longer periods of time.
“Wake the f up. This is going to destroy us – sooner than we think – if we don’t make AI regulation a national priority tomorrow,” the US senator Chris Murphy wrote on X in response to the findings.
“AI systems can now perform tasks that previously required skilled human operators,” said Fred Heiding, a computing security researcher at Harvard University. “It’s getting so easy for attackers to cause real damage. The AI companies don’t take enough responsibility.”
Other cybersecurity experts were more sceptical, pointing to inflated claims about AI-fuelled cyber-attacks in recent years – such as an AI-powered “password cracker” from 2023 that performed no better than conventional methods – and suggesting Anthropic was trying to create hype around AI.
“To me, Anthropic is describing fancy automation, nothing else,” said Michal Wozniak, an independent cybersecurity expert. “Code generation is involved, but that’s not ‘intelligence’, that’s just spicy copy-paste.”
Wozniak said Anthropic’s release was a distraction from a bigger cybersecurity concern: businesses and governments integrating “complex, poorly understood” AI tools into their operations without understanding them, exposing them to vulnerabilities. The real threat, he said, were cybercriminals themselves – and lax cybersecurity practices.
Anthropic, like all leading AI companies, has guardrails that are supposed to stop its models from assisting in cyber-attacks – or promoting harm generally. However, it said, the hackers were able to subvert these guardrails by telling Claude to role-play being an “employee of a legitimate cybersecurity firm” conducting tests.
Wozniak said: “Anthropic’s valuation is at around $180bn, and they still can’t figure out how not to have their tools subverted by a tactic a 13-year-old uses when they want to prank-call someone.”
Marius Hobbhahn, the founder of Apollo Research, a company that evaluates AI models for safety, said the attacks were a sign of what could come as capabilities grow.
“I think society is not well prepared for this kind of rapidly changing landscape in terms of AI and cyber capabilities. I would expect many more similar events to happen in the coming years, plausibly with larger consequences.”
