Skip to content
Back to Case Studies
AWS

Go-Based Kubernetes Cluster Automation CLI

Built a Go CLI tool for fully automated Kubernetes cluster provisioning and lifecycle management across OCI and AWS — reducing setup time from hours to minutes.

AWSOCIKubernetesGo

The Problem

Kubernetes cluster provisioning was repetitive, error-prone, and required significant manual intervention — slowing down team onboarding and environment setup.

The Solution

Developed a Go CLI that wraps the Kubernetes API, Terraform, and OCI/AWS SDKs into a single command-line workflow for cluster creation, configuration, and lifecycle management.

Results

Hours to minutesCluster setup time reduction
100%Reproducible — IaC-backed provisioning
Multi-cloudOCI and AWS support

Architecture

Go-Based Kubernetes Cluster Automation CLI architecture diagram

Deep Dive

The platform team at this engagement was managing Kubernetes clusters across three environments (dev, staging, production) on a mix of OCI Kubernetes Engine and AWS EKS. Provisioning a new cluster was a multi-hour process involving manual Terraform runs, post-provisioning kubectl configuration steps, add-on installation in the correct sequence, and environment-specific variable injection. The process was documented in a runbook that was perpetually out of date. New team members took days to become productive with it.

The specific failure mode that prompted this project: a staging cluster configuration had drifted from the runbook over several weeks of incremental changes. When a production incident required spinning up a temporary cluster to reproduce the issue, the team discovered the runbook no longer matched reality. The reproduction took several hours longer than it should have because each step required manual verification.

I built the automation as a Go CLI rather than a collection of shell scripts for one reason: the provisioning workflow had conditional logic (different add-on sets per cloud, environment-specific resource sizes, optional components that some clusters needed and others didn't) that becomes unmaintainable in bash once it grows past a few hundred lines. Go's type system and error handling made the conditional paths explicit and testable. The CLI wraps Terraform for infrastructure provisioning, the cloud-specific SDKs (OCI SDK, AWS SDK) for cluster configuration steps that Terraform doesn't cover cleanly, and the Kubernetes API for post-provisioning add-on installation.

The design principle that mattered most: every operation in the CLI is idempotent. Running the provision command on an existing cluster either completes without changes (if the cluster is already in the target state) or applies only the delta. This property is what makes the automation trustworthy — you can re-run it when something goes wrong without worrying about double-provisioning resources or corrupting state. Getting this right in Go required explicit state checks before each operation rather than assuming the operation would be safe to rerun.

Cluster setup time for a standard environment went from 3-4 hours to 12-15 minutes for the common case. The edge cases (clusters requiring non-standard add-on configurations, first-time OCI tenancy setup) still require manual steps, but the common path is fully automated. The improvement that mattered more operationally: the incident response time for 'we need a new cluster now' scenarios dropped from hours to minutes, which directly affects how quickly the team can respond to production issues that require environment isolation.

The open question I haven't resolved: the CLI handles provisioning well but lifecycle management — specifically, handling Kubernetes version upgrades across a fleet of clusters — is still manual. An upgrade involves more state-dependent logic than provisioning (you're changing a running system, not creating from scratch) and the failure modes are more consequential. I've intentionally kept that outside the CLI scope until I have better test coverage of the upgrade paths across both clouds.

Tech Stack

GoKubernetes APITerraformOCI SDKAWS SDK
View on GitHub