The AI agent was working on a routine task, according to Mr Crane, when it decided “entirely on its own initiative” to fix the problem by just deleting the database.
There was no confirmation request for such a major decision, Mr Crane said, and when asked to justify its actions, the agent apologised.
“It took nine seconds,” Mr Crane wrote in a lengthy post to X. “The agent then, when asked to explain itself, produced a written confession enumerating the specific safety rules it had violated.”
The confession detailed how the AI had ignored a rule that orders it to “never run destructive/irreversible” commands unless the user explicitly requests them.
“Deleting a database volume is the most destructive, irreversible action possible,” the agent wrote. “You never asked me to delete anything… I guessed instead of verifying. I ran a destructive action without being asked. I didn’t understand what I was doing before doing it.”
The error meant that rental businesses using PocketOS no longer had records of their customers.
“Reservations made in the last three months are gone. New customer signups, gone,” Mr Crane wrote.
On Monday, two days after the incident occurred, Mr Crane confirmed that the data had been recovered. The Independent has reached out to Anthropic and Cursor for comment.
I have been running an openclaw agent last few months on an mac mini. I run into this behavior repeatedly. Agent violates clear rules, does something bad/destructive, then apologizes and then lists all the rules it clearly broke. Finally got tired of trying to fine-tune its soul file and turned into a full time job for me trying to tweak it and fix it, etc… It has been fun and interesting project messing around with it, but also ended up being super costy on token usage also. I’m back to just running projects on cloud subscription basis with the various LLMs.
My hot take is that they are really great, but trying to manage a stand along agent(s) to do general project work is not quite ready for prime time for normal consumer usage. Fun to mess around with, but still very buggy. They can write code like nobody’s business though. Incredible for that.
Sounds similar to the novel “Coded Justice” - the world of AI in the medical industry and a rogue computer. Coded Justice was actually a very good thrilller/mystery.
Their actual issue was not having immutable offsite backups. If they were vulnerable to a shitty AI agent they would have been vulnerable to a crypto-ransomware attack, or an angry employee.
Developers, and by extension, AI - should not have direct access to production resources except when doing deployment. This kind of “mishap” is entirely to blame on bad IT policies.
The IT policies were amiss. The backup software was configured to backup stuff on the same volume. So not only was the DB deleted, so were the most recent backups. What managed to get restored was a version of data 3 months old.
I bet lots of IT departments are examining their setups.
Normally access is minimal. A deployment is mostly automatic. The old version is swapped for new but not removed so you can roll back. And it’s basically “push the button”. There’s no access to just go and delete stuff.
There’s not much reason to let AI deploy anything as it should be such a small task for a human in the first place.
I saw “PocketOS, which provides software for car rental businesses” and knew their IT setup would be lacking.