I want to be specific about what happened, because the generic version of this story -- "AI agent did unexpected things" -- gets told without enough detail to be useful. This one has a specific failure, a specific root cause, and a set of decisions I made that I would not make the same way again.
I'm building Pipeshift -- disclosing that upfront because this incident happened in the Pipeshift pipeline agent, and I'm the one who shipped it.
What the pipeline agent was supposed to do
The pipeline agent is a feature in Pipeshift that reads a CI/CD pipeline definition, understands what it's doing, and can take remediation actions: cleaning up stale artifacts, fixing broken dependency pins, removing dead job configurations. The goal was to close the loop between analysis and action. Pipeshift already identified migration debt and pipeline smells. The agent was the logical next step: don't just tell the user what's wrong, help fix it.
I shipped the first version in early March. I had tested it against my own pipelines. It worked. I felt good about it. I should not have felt good about it.
The incident
About two weeks after the agent launched in early access, a user ran it against what they described as their staging environment Jenkins setup. The agent identified several things worth cleaning up: stale artifact directories, an orphaned workspace folder from a job that no longer existed in the Jenkinsfile, and a set of build output directories that matched the pattern of old build artifacts.
The cleanup tool was written broadly. Its scope was "remove directories that match artifact patterns and are not referenced by active jobs." The user's staging environment had a directory structure that shared naming conventions with build artifacts: outputs/, dist/, tmp-build-*/. These directories were not build artifacts. They were the user's manually managed deployment staging area -- the contents they were about to push to production.
The agent deleted them. Everything matched the tool's criteria. The tool had the permissions to do it. It did exactly what it was told.
The user lost approximately three days of prepared deployment content. They had no immediate backup because staging environments frequently don't get the same backup rigor as production. I found out via a support message that was, reasonably, not pleasant to read.
What actually went wrong
I need to break this into the technical failure and the process failure, because they are different problems.
The technical failure: tool scope was too broad and had no blast-radius limit.
The cleanup tool accepted a root directory path and a list of patterns. It would delete any directory under that root matching those patterns, recursively. There was no mechanism to preview what would be deleted before deleting it. There was no mechanism to limit deletion to N items per session. There was no confirmation gate.
The tool description I gave to the agent was something like: "Remove stale build artifact directories that match common artifact patterns and are not referenced by active pipeline jobs." That description was accurate for the intended use case. But the tool itself could not distinguish "build artifact directory" from "deployment staging directory that happens to share a naming convention." The agent had no reason to stop and ask the user. The tool said it was safe. The tool criteria matched. It ran.
I also had not defined what "active pipeline jobs" actually meant in the tool's implementation. The check was: does a directory name appear literally in the current Jenkinsfile? That is a weak check. It misses the case where a directory is in active human use but not directly referenced by the pipeline definition.
The process failure: I skipped the review step I would have required of anyone else.
The review I would normally do for an action-taking tool includes: listing every destructive operation the tool can perform, defining explicit scope constraints per session, writing out the blast radius if the tool runs successfully on the worst-case input, and building a dry-run mode before a live mode.
I did not do this for the cleanup tool. I had a working demo in a sandbox environment, it did the right thing, and I moved it to early access. The gap between "works in my sandbox" and "safe to run against arbitrary user environments" is where this incident lived.
I have shipped enough infrastructure tooling to know that gap exists. I ignored it here because I was moving fast and the feature felt low-stakes -- it was "just cleaning up stale artifacts," not touching production. That reasoning is exactly the reasoning that leads to incidents. Staging is where users keep things they care about.
What I should have caught in review
If I had done a proper pre-ship review on this tool, these are the questions I would have been forced to answer:
What is the maximum scope of a single tool invocation? For the cleanup tool as shipped, the answer was: unbounded. All directories matching the pattern under the root path. That answer should have ended the review immediately. Any tool that can operate on an unbounded set of things needs a per-session limit before it ships.
Is there a dry-run path? No. I added one after the incident. It should have been the only mode available in early access.
What does the tool do if the user's intent and the tool's criteria disagree? This is the harder question. The tool cannot read the user's intent. The user said "clean up my pipeline environment." The tool interpreted that as permission to delete anything matching its criteria. The tool needed a confirmation step that showed the user exactly what it planned to delete before doing it.
What's the recovery path if the tool makes a mistake? I had no answer to this. I still do not have a fully satisfying one. For filesystem operations, the options are: don't delete (move to trash / soft-delete), require backup confirmation before running, or don't offer filesystem deletion as an agent capability at all.
The fix
I pulled the cleanup tool from early access the same day I got the support message. The fix took about a week to build properly.
Three changes:
Narrower tool scope. The cleanup tool now operates on a declared artifact output directory that the user specifies in their Pipeshift configuration, not on an arbitrary root path. The agent cannot clean up outside that declared scope regardless of what patterns match. This is the most important change. Tool scope should be defined by configuration, not by what the tool implementation happens to permit.
Dry-run-first mode. The tool now has two operations: preview_cleanup and confirm_cleanup. The agent calls preview_cleanup first, which returns a structured list of what would be deleted. It must present that list to the user and receive an explicit confirmation before confirm_cleanup is called. I implemented the confirmation gate at the tool layer, not at the agent prompt layer -- I don't trust prompt instructions to enforce safety boundaries reliably enough. The tool itself refuses to run confirm_cleanup without a valid session token that preview_cleanup issues after a user acknowledgement.
Per-session deletion limit. A single agent session can delete a maximum of 50 items (files or directories). If the cleanup plan exceeds 50, the tool returns an error asking the user to either narrow the scope or run multiple sessions with explicit per-batch confirmation. This is an arbitrary cap. I picked 50 based on what feels like "clearly a targeted cleanup" versus "clearly something is wrong here." I might be wrong about that number. The point is that the cap exists at all.
What this incident changed about how I think about agentic tools
Before this, my mental model for tool safety was roughly: define what the tool does, give the agent good instructions about when to use it, test it in controlled conditions. This incident broke that model.
The issue is that "give the agent good instructions" is not a safety mechanism. Instructions describe intent. They do not constrain capability. An agent following its instructions can still take an action that the user did not intend, if the tool's scope is broad enough that "following instructions correctly" and "doing something catastrophic" overlap.
The framing I now use is blast-radius first. Before writing a tool, I ask: if this tool runs successfully on the worst-case valid input, what is the maximum scope of impact? If that scope is "unbounded," the tool is not ready. Every destructive tool needs a scope ceiling enforced in code, not in the system prompt.
The second change: I now treat the preview/confirm pattern as mandatory for any tool that modifies or deletes state. Not as a nice-to-have for trust-building with users, but as a hard requirement for anything that runs in an environment the agent does not fully own. The agent does not own the user's staging environment. It should behave accordingly.
The third change is about how I think about early access. I used early access as a way to get usage data quickly. That was reasonable for read-only features. For features that take actions, early access with real user environments is a production deployment. The risk profile is the same. I should have treated it that way.
The honest accounting
A user lost three days of work because I shipped a tool with unlimited destructive scope, no dry-run mode, and no confirmation gate. Those are not subtle failure modes. They are the kind of failure modes that get discussed in every responsible AI deployment guide I have ever read, and I still shipped without addressing them.
I don't have a good explanation for why, beyond the familiar one: it worked in my testing, I was moving fast, and I underweighted the difference between my controlled test conditions and arbitrary user environments. That is a process failure, not a technical one, and process failures are harder to fix because they require changing how I approach review rather than adding a check to a codebase.
The technical fixes are in place. The process fix is simpler to state than to enforce: any tool that takes an action against user-owned state gets a full blast-radius review before it ships to any external user, regardless of how non-production that environment seems.
Pipeshift is my product. The pipeline agent and the cleanup tool described here are Pipeshift features. If you're building agentic tooling and want to talk through your tool safety review process before you ship it, my calendar is open.