Writing

Cloud & AI Engineering Blog

Practical patterns, trade-offs, and lessons learned from real systems across OCI, AWS, and Azure — plus DevOps and agentic AI engineering.

building-ai-developer-tools

What I'd Do Differently If I Started Pipeshift Today

I am the founder of Pipeshift -- CI/CD migration intelligence for Jenkins-to-GitHub Actions moves.

Jun 30, 2026Read More →

fine-tuning-with-synthetic-data

I Fine-Tuned a Model on Synthetic Data. Here Is What the Numbers Actually Said.

The popular claim is that synthetic data closes the gap between a capable base model and a task-specific one -- no labeling budget, no data collection headaches, just generate-and-fine-tune. That claim is half right.

Jun 29, 2026Read More →

qdrant-out-of-memory

Qdrant Ran Out of Memory During a Client Demo. Here Is the Four-Hour Postmortem.

The system had been running fine for three weeks. Queries returning in under 200ms, recall looking reasonable in our offline evals, nothing alarming in the logs.

Jun 28, 2026Read More →

teaching-engineers-to-use-llms

The Four-Week Pattern I Use to Get Skeptical Engineers Actually Shipping AI Features

Most engineering teams that bring me in for AI consulting have the same latent problem: one or two people understand how LLMs actually work, the rest have absorbed enough LinkedIn content to be confused, and management has already promised.

Jun 27, 2026Read More →

llm-judge-retrieval-quality

The Three-Layer Metric Pyramid for RAG Retrieval Evaluation

Most RAG evaluation frameworks I encounter measure the wrong thing with high precision.

Jun 26, 2026Read More →

autonomous-ai-agents-failure-rate

I Build AI Agents for Clients. Here Is Why I Advise Against Fully Autonomous Ones in Most Production Systems.

Let me be clear about where I am coming from before I say anything critical: I build agents. I am the founder of [Pipeshift](https://mohakdeepsingh.dev/products), which uses a multi-agent LangGraph pipeline under the hood.

Jun 25, 2026Read More →

llm-api-cost-reduction

I Cut a Client's LLM Bill by $8k/Month With a Routing Classifier. Here Is What the Benchmark Actually Looked Like.

The pitch for model routing is simple: not every task needs GPT-4 or Claude Sonnet.

Jun 24, 2026Read More →

tenant-isolation-vector-database

Multi-Tenant LLM Architecture: Decisions I Got Right and One I'd Fully Reverse

The B2B SaaS context changes almost every architectural decision in an LLM system. You are not building a product with one corpus and one user population.

Jun 23, 2026Read More →

langgraph-vs-crewai-vs-autogen

The Agentic AI Tools I Trust in Production (and the Ones I've Stopped Recommending)

I want to be upfront about what this post is and isn't. It's not a comprehensive market survey.

Jun 22, 2026Read More →

prompt-management-llm-production

How I Actually Version Prompts in Production (Not the Ideal System)

There is a version of this post where I describe a clean prompt management architecture: a dedicated prompt store, semantic versioning, A/B deployment, automatic rollback on degradation, beautiful dashboards. That system exists.

Jun 21, 2026Read More →

ml-pipeline-dag-checkpointing

Building the CI/CD Engine Inside Pipeshift: What the Architecture Looks Like and What I Got Wrong

I am the founder of Pipeshift, so everything in this post is written with that bias on the table. This is not an objective analysis of ML CI/CD tooling in general.

Jun 20, 2026Read More →

claude-code-workflow

Claude for Code: What Works and What Costs Me Time

I use Claude daily. I have built parts of Pipeshift's pipeline tooling on top of the Claude API -- I am the founder of Pipeshift, so that context matters when you weigh anything I say here.

Jun 19, 2026Read More →

llm-api-cost-optimization

The Real Cost of Running Frontier Models at Production Scale

The API pricing page is not your cost. That number -- $15 per million output tokens, $3 per million input tokens, whatever it is this week -- is the floor.

Jun 18, 2026Read More →

llm-agent-blast-radius

I Shipped an Agentic Feature Before It Was Ready. A User's Staging Environment Paid for That.

I want to be specific about what happened, because the generic version of this story -- "AI agent did unexpected things" -- gets told without enough detail to be useful.

Jun 17, 2026Read More →

when-to-fine-tune-vs-rag

My Decision Tree: RAG or Fine-Tune?

Every engagement starts the same way. The client has a use case -- a domain-specific assistant, a document QA system, a support bot that needs to sound like the company -- and someone on their team has already formed an opinion.

Jun 16, 2026Read More →

self-hosted-llm-cost-comparison

The Calculation I Run Before Sending a Request to OpenAI

I did not set out to reduce my OpenAI usage. I set out to understand what I was actually paying for, and the answer surprised me enough that three workload categories are now running on something else.

Jun 15, 2026Read More →

gpu-node-pool-kubernetes

How I Actually Set Up Kubernetes for ML Inference in Production

Every tutorial for deploying ML models on Kubernetes follows the same path: create a Deployment, set up a Service, maybe wire in an HPA on CPU utilization, call it done. That path is fine for getting something running in an afternoon.

Jun 14, 2026Read More →

llm-evaluation-pipeline

I Built an LLM Eval Harness From Scratch. Here Is What That Cost Me and What I'd Do Differently.

The honest version of "why I built my own eval harness instead of using an off-the-shelf tool" is not ideological. I did not build it because I think NIH is virtuous or because I distrust existing tools.

Jun 13, 2026Read More →

long-context-vs-rag

Bigger Context Windows Are Actively Harmful for Some Production Workloads

Every few months a model provider announces a larger context window as if it is a straightforward quality improvement. 200k tokens. 1M tokens. And yes, for some workloads those numbers matter.

Jun 12, 2026Read More →

agent-orchestration-layer-design

The Architecture Patterns I Use for Every Agentic System I Build

At 2am on a Tuesday, one of my early agent systems was stuck in a loop -- tool call, failed parse, retry, failed parse, retry -- and the retries were not bounded. It had been running for forty minutes.

Jun 11, 2026Read More →

mlops-platforms-worth-it

MLOps Is a Sales Category. Good ML DevOps Is Just Engineering Discipline.

The term "MLOps" started as a reasonable shorthand for "operational practices applied to machine learning systems." Somewhere between 2021 and now it became a vendor category, a conference track, a job title prefix, and a bucket of platform marketing.

Jun 10, 2026Read More →

rag-retrieval-quality-production

How a RAG System I Built Was Hallucinating on Every Third Query -- and What I Missed

I got a Slack message at 3:14am on a Tuesday. The client -- a B2B SaaS company, anonymized here -- had deployed a RAG-powered internal knowledge assistant I had built for them about six weeks earlier.

Jun 09, 2026Read More →

ragas-metrics-production

The RAG Evaluation Setup I Actually Use in 2026

The standard advice on RAG evaluation is to "use RAGAS and check your metrics." That advice is not wrong.

Jun 08, 2026Read More →

retrieval-vs-llm

Most "AI Features" in B2B SaaS Are Just Search With a Better Mouth

The demo looks like this: user types a question in natural language, a friendly response appears that cites specific records, the founder calls it "AI-powered." Investors nod. Engineers ship it.

Jun 07, 2026Read More →

qdrant-vs-weaviate-vs-pgvector

Vector Database Benchmarks I Actually Ran: Qdrant, Weaviate, and pgvector at 1M and 10M Vectors

The vector database benchmark posts I keep finding online share one characteristic: they were run by the vendors, on hardware the vendors control, against query distributions that favor their product. I don't find them useful.

Jun 06, 2026Read More →

building-a-developer-tool-startup

What the First 90 Days of Building Pipeshift Actually Looked Like

*Full disclosure: I am the founder of Pipeshift. Everything I write about Pipeshift is written from that position.

Jun 05, 2026Read More →

llm-agent-reliability

The Five Agentic AI Failure Modes I Keep Seeing in Production

Every team I review thinks they built a different system. The product names differ, the domains differ, the models differ. The failure modes are nearly identical.

Jun 04, 2026Read More →

jenkins-ml-pipelines

The ML Pipeline CI/CD Setup I Actually Use -- and the Failure That Forced Me to Build It Right

*Full disclosure: I'm building [Pipeshift](https://mohakdeepsingh.dev/products), a tool for managing ML pipeline deployments. The architecture I describe here is the direct predecessor to what Pipeshift automates.

Jun 03, 2026Read More →

rag-chunk-size-production

Section-Level vs Paragraph Chunking: I Benchmarked Both on 15k Technical Docs

The answer to "what chunk size should I use?" is always "it depends," and I've always found that answer useless. It depends on *what*, specifically, is the question. I ran a benchmark to find out.

Jun 02, 2026Read More →

langchain-alternatives

Why I Stopped Using LangChain

There was a specific moment. I was three hours into debugging a production retrieval failure, staring at a traceback that ran through six layers of LangChain internals before surfacing anything I could act on.

Jun 01, 2026Read More →

kubernetesgodevopsociawsinfrastructure

Why I Wrote a Go CLI Instead of Bash for Kubernetes Provisioning (And What Idempotency Actually Required)

The Kubernetes provisioning runbook was a bash script that had grown to several hundred lines with conditional logic for three environments across two clouds. That is the point where you rewrite it in Go. Here is what that decision cost and what it gave back.

May 01, 2026Read More →

ai-governanceenterprisesecurityrbacmcpsso

Enterprise AI Tooling Governance: RBAC, MCP Configs, and SSO for GitHub Copilot, Claude Code, and Cursor

Most enterprise AI tooling rollouts are underprepared for the governance questions. Here is the framework I use: RBAC tiers, MCP server configurations, Keycloak SSO integration, and LLM acceptance criteria for engineering workflows.

Apr 30, 2026Read More →

terraformociinfrastructuredevopsiac

A Week of terraform import: Recovering a Drifted OCI State File

The engagement did not start with building the OCI landing zone. It started with figuring out what actually existed in the tenancy. A week of terraform import, in dependency order, and what that methodology looked like.

Apr 29, 2026Read More →

ragaillmvector-searchoraclepython

Fixed-Size Chunking Breaks Structured Documents. Here Is What I Use Instead.

Fixed-size chunking is the default for RAG tutorials and the wrong choice for hierarchically structured documents. The section-level strategy I built for a production Oracle HLD generator -- and where it still falls short.

Apr 28, 2026Read More →

agentic-aiarchitecturesecurityaicloudllm

The Well-Architected Framework for Agentic AI: A Practical Guide

Operational excellence, security, reliability, performance, cost, sustainability — applied to AI agents that actually survive production. The rubric most teams skip.

Apr 27, 2026Read More →

ci-cdjenkinsgithub-actionsdevopspipeshift

How We Migrated from Jenkins to GitHub Actions (And What We Learned Building Pipeshift)

A practical account of migrating production CI/CD from Jenkins to GitHub Actions — the decisions, the tradeoffs, and the patterns that didn't survive the translation.

Apr 26, 2026Read More →

kubernetesociawsdevopsinfrastructure

OCI vs AWS for Kubernetes: An Honest Comparison After Running Both

After running production Kubernetes on both OKE and EKS, here's where OCI wins, where AWS wins, and where the answer is genuinely 'it depends' for real reasons.

Apr 26, 2026Read More →

terraformdevopsmulti-cloudociawsazure

Terraform State Strategy for Multi-Cloud Teams

Remote state, environment isolation, and the guardrails that prevent your multi-cloud IaC from becoming a liability. What I've learned running OCI, AWS, and Azure in parallel.

Apr 21, 2026Read More →

langgraphaiociarchitecturepythonllm

LangGraph in Production on OCI: Patterns That Hold Up

What actually breaks when you take a LangGraph agent pipeline off your laptop and run it on OCI Functions — and the patterns that survive contact with production.

Apr 20, 2026Read More →

kubernetescostdevopsreliabilityociaws

Kubernetes Cost Optimization Playbook

The actual levers for reducing Kubernetes spend without regressing reliability — with numbers from real clusters on OKE, EKS, and AKS.

Apr 19, 2026Read More →