What I'd Do Differently If I Started Pipeshift Today

I am the founder of Pipeshift -- CI/CD migration intelligence for Jenkins-to-GitHub Actions moves. I want to be upfront about that every time I write about it, because everything that follows is colored by the fact that I have skin in the game and a strong incentive to frame things favorably. I am going to try not to do that here. This is a retrospective, and retrospecitves are only worth reading if they are honest about the failures.

Pipeshift has been in development for about 18 months. It is not a runaway success. It is also not dead. It is somewhere in the uncomfortable middle space that most developer tools occupy -- real users, real feedback, a core concept that I still believe in, and a pile of decisions I made early on that I would make differently if I started today. Some of those decisions cost us months. Some of them are still limiting us. A few of them were correct.

Here is the honest version.

The abstraction that took three months to undo

The first serious engineering mistake was building a generalized pipeline analysis abstraction before building anything concrete.

The idea seemed reasonable at the time. Pipeshift was going to analyze pipelines. Pipelines come in many forms. Build the generic pipeline representation layer first, wire concrete parsers (Jenkins, GitHub Actions, CircleCI, GitLab) on top of it, and you have a foundation that scales. I spent approximately three months on that layer. I built a common AST for CI pipeline configuration, a normalization pass that mapped provider-specific constructs to the generic form, a serialization format for storing analyzed pipeline state.

It was elegant and it was wrong.

The problem was not that abstraction is bad. The problem was I built it without having analyzed enough real pipelines to know what the abstraction should represent. The generic layer made assumptions that turned out to be false once I worked through the Jenkins Shared Library format, Multibranch Pipeline configuration, and jenkinsfiles that mixed declarative and scripted syntax in a single file. Jenkins is genuinely unusual. Jenkins pipelines are Groovy code, not configuration. The "normalize it to a common format" instinct breaks down when the source is a Turing-complete language wrapped around your CI logic.

I spent three months building an abstraction layer that I then spent another three months partially dismantling, because the Jenkins-to-GitHub Actions conversion was the core use case and it required handling Jenkins's peculiarities directly rather than through a lossy normalization pass.

What I should have done: build the Jenkins parser first, get the conversion working end-to-end for ten real-world Jenkinsfiles, understand what the actual hard problems were, and then decide whether generalization was worth the cost. I should have taken the advice that Patrick McKenzie and others have given about building the boring vertical slice before the fancy horizontal platform.

I would do that differently. It would have saved probably four to five months of elapsed time.

The integration nobody used

About nine months in, I built a Jira integration. The rationale was that CI/CD migrations happen as work items inside engineering orgs, Jira is ubiquitous in those orgs, and linking migration progress to Jira issues would reduce friction for the people managing the migration project.

I spent roughly six weeks on it. OAuth flow, webhook handling, bidirectional sync of migration stage status to Jira issue transitions, a configuration UI for mapping Pipeshift pipeline states to Jira workflow stages. It was not a trivial feature.

To my knowledge, three users have touched it. One of them is me, testing it.

The signal I had before building it was wrong. I had three conversations with potential customers where Jira came up. In retrospect, Jira came up because I asked leading questions about their toolchain, not because they were asking for a Pipeshift-Jira integration. The actual workflow these teams use is: migration work shows up in Jira as a ticket, someone runs Pipeshift, they update the Jira ticket manually. That manual step is not painful enough to be worth a six-week integration.

I should have built nothing and told those three conversations "we're thinking about a Jira integration -- would you actually use it if it existed?" The honest answer from at least two of them would probably have been "probably not, it's fine the way it is."

The six weeks are not recoverable. The code is still in the codebase, maintained, tested. It will probably be used by exactly nobody for the foreseeable future.

The architectural decision that limits us now

Pipeshift stores analyzed pipeline data per repository. The analysis run for a given repo produces a structured assessment -- complexity score, identified patterns, conversion blockers, suggested target configuration. That assessment is stored as a document keyed by repo ID and analysis timestamp.

This seemed like the right shape early on. Users analyze a repo, they get a report, they act on it.

The problem: large engineering organizations do not migrate one repository at a time. They migrate cohorts of repositories -- often dozens or hundreds in a coordinated effort driven by a platform team. The use case that keeps emerging from the users I care most about is "I need to understand my entire Jenkins estate, prioritize which repos to convert first, and track the migration across all of them as a portfolio."

The per-repo document storage model does not serve that use case well. Portfolio-level queries -- "show me all repos with more than 20 shared library dependencies, sorted by estimated migration effort" -- require full scans of per-repo documents rather than a query against a properly indexed structure. At small scale this works. As the number of analyzed repos grows, it is going to be a real problem.

The right data model for the portfolio use case is one where the pipeline analysis components (stages, dependencies, patterns, complexity metrics) are normalized into relational or semi-relational structures that support cross-repo queries. That design is significantly different from what we have today, and migrating to it without breaking existing users is not a small project.

I could have built it right the first time. I knew portfolio-level views were a likely use case when I was making the initial storage decisions -- it was in the product spec. I chose the simpler per-repo document model because it was faster to ship. That was probably the right call for getting something in users' hands, but I want to be honest that it was a deliberate short-cut that I am now paying for.

The eval gate: I got this right from day one

Not everything was a mistake. The core concept at the center of Pipeshift is an eval gate that runs against generated GitHub Actions configurations before they are surfaced to the user.

The concern with LLM-generated pipeline configuration is obvious: the model might produce YAML that is syntactically valid and semantically plausible-looking but will fail when the pipeline actually runs. Wrong runner labels. Missing environment variable references. Action versions that have breaking changes from the source version. Step ordering that seems correct but breaks because a downstream step assumes an artifact that an earlier step conditionally produces.

From the beginning, I built a validation pass that runs the generated configuration through a set of checks before the user ever sees it. Syntax validation. Schema validation against GitHub's workflow specification. Static analysis for common failure patterns. A structural diff between the inferred intent of the source Jenkinsfile and the generated Actions workflow, flagging places where the two diverge.

This has held up. The eval gate is the thing users comment on most positively -- "it catches the things I would have had to discover by pushing and watching the run fail." It is also the thing that gives me the most confidence that Pipeshift is not just running an LLM prompt over a Jenkinsfile and calling it migration intelligence.

I could have built this later. I could have shipped faster by generating configurations and deferring quality control to the user. I chose not to, partly because I thought it was the right product decision and partly because I had been burned enough times by LLM output quality that I did not trust shipping generated config without a validation layer. That instinct was correct.

The specific implementation has evolved significantly from v0.1, but the concept has been right from the start.

What is still unsolved

Two things are genuinely unsolved and I am not confident I know how to solve them.

Jenkinsfile complexity at the tail of the distribution. The typical Jenkinsfile that a developer writes is reasonably tractable. The Jenkinsfiles that live in large enterprise CI systems after several years of accumulation are not. I have seen Jenkinsfiles that load five shared libraries, use custom DSL extensions registered through those libraries, contain conditional logic keyed on environment variables that only exist at runtime, and mix declarative Pipeline blocks with script {} blocks that reach into internal APIs of Jenkins plugins. Converting those files is not a problem I can fully automate. The eval gate catches the known failure patterns. It does not catch the unknown unknowns.

My current answer is: the eval gate surfaces what it can detect, human review is required for complex files, and the complexity score provides signal about which files need the most review time. That is an honest answer about the state of the tool, not a solution. I have not found a solution.

Trust calibration. Users do not know how much to trust the generated configuration. A high-confidence output that has passed all eval gate checks might still have a subtle issue that only surfaces under a specific branch condition or with a specific secrets configuration. A low-confidence output with several flagged items might actually convert cleanly in practice. The confidence signals I currently surface do not accurately convey the right level of trust.

This is partially a UI/UX problem and partially a fundamentally hard problem about communicating uncertainty in LLM-assisted workflows. I have read everything Hamel Husain has written about LLM evaluation and I still do not have a clean answer. I might be wrong about whether this is solvable at all within the current architecture.

What the next six months look like

I have three concrete priorities.

First, the portfolio-level data model migration. This is the architectural debt I described above. I have a design for the new structure and a migration path that preserves backward compatibility for existing users. The work is scoped at roughly eight to ten weeks of focused engineering. It unlocks the platform team use case, which is the customer segment I have the most conviction about.

Second, a significant expansion of the eval gate's coverage. Right now the gate covers the patterns I have encountered across the Jenkins repositories I have analyzed. That is a reasonably large corpus, but it is not representative of the full space. I am building a contribution mechanism for users to submit anonymized failing patterns so the eval gate improves from real-world failure data rather than only from what I have seen personally. Whether anyone will actually use it is an open question.

Third, I need to stop building and spend more time in conversations with the platform engineers who are running these migrations at scale. The Jira integration mistake happened because I let a few conversations substitute for systematic understanding of the actual workflow. I need to fix that pattern. The product roadmap after those three priorities is going to be shaped by what I learn, not by what I currently believe is important.

That is the honest state of things.

Pipeshift is at pipeshift.dev. I am the founder -- not a neutral observer. If you are running a Jenkins-to-GitHub Actions migration and want to compare notes on what you are hitting, my calendar is open.