A Week of terraform import: Recovering a Drifted OCI State File

The plan was to build an enterprise OCI landing zone: multi-tier compartments, Dynamic Routing Gateways, FastConnect, hybrid connectivity to the client's Azure and AWS environments. The engagement kicked off. Week one was not what I expected.

The client's internal team had started a Terraform implementation several months earlier. It got abandoned about halfway through -- not deleted, just deprioritized when the team moved to other work. The OCI tenancy had real infrastructure: VCNs, subnets, security lists, a DRG already configured for FastConnect. Some of it was tracked in the Terraform state file. Some was not. Some was in the state file but had been manually modified after Terraform created it.

Running terraform plan on that state produced a list of changes that would have destroyed real infrastructure and recreated it from scratch. I spent that first week not building anything. I spent it doing terraform import.

Why drift happens and why the fix is harder than it looks

Terraform state drift is common in enterprise environments. The typical sequence: initial IaC implementation, some manual changes made "just for now," runbook updates that lag behind reality, team changeover. By the time the IaC picks up again, the state file and the real infrastructure have diverged in both directions.

The naive fix is terraform state rm plus reimport. The problem: resources have dependencies. You cannot remove a VCN from state and reimport it without also managing the subnets that reference it, the route tables attached to those subnets, and the security lists associated with them. Remove and reimport in the wrong order and terraform plan starts throwing dependency errors instead of making progress.

The correct fix is systematic: enumerate everything that exists in the cloud account, enumerate what the state file thinks exists, identify the gaps in both directions, and import in dependency order. The last part is what takes the time.

Step 1: Inventory the actual tenancy

Before touching Terraform, I inventoried what OCI actually had using the OCI CLI and Console. The goal was a complete list of resources by compartment, by resource type, with their OCIDs -- Oracle Cloud Identifiers, OCI's stable resource IDs that Terraform uses as the primary import key.

# list all VCNs in a compartment
oci network vcn list \
  --compartment-id $COMPARTMENT_OCID \
  --all \
  --query 'data[*].{id:id, name:"display-name"}' \
  --output table

# list subnets with their parent VCN
oci network subnet list \
  --compartment-id $COMPARTMENT_OCID \
  --all \
  --query 'data[*].{id:id, name:"display-name", vcn:"vcn-id"}' \
  --output table

# list DRGs
oci network drg list \
  --compartment-id $COMPARTMENT_OCID \
  --all \
  --query 'data[*].{id:id, name:"display-name"}' \
  --output table

I ran this for every resource type the Terraform configuration referenced: VCNs, subnets, route tables, security lists, NSGs, DRGs, DRG attachments, gateways, IAM compartments, IAM policies, dynamic groups, compute instances, and block volumes. The output was a spreadsheet mapping resource type and display name to OCID.

Step 2: Compare against the state file

terraform state list

This lists every resource Terraform currently tracks. Cross-referencing against the inventory spreadsheet produces three categories:

Resources in OCI and in state -- no action needed
Resources in OCI but not in state -- need to be imported
Resources in state but not in OCI -- stale state entries pointing at deleted resources

Category 3 is the most immediately dangerous. Running terraform apply with stale state entries causes Terraform to reference deleted resource IDs in dependency calculations and fail with errors that do not point at the actual problem. Remove these first:

terraform state rm oci_core_subnet.some_deleted_subnet
terraform state rm oci_identity_policy.old_policy

Verify terraform state list no longer shows these before moving on.

Step 3: Import in dependency order

This is the step that cannot be scripted generically because the dependency order depends on your specific resource graph. For OCI networking, the order I used:

Compartments -- no dependencies, import these first
VCNs -- depend on compartments
Internet gateways, NAT gateways, service gateways -- depend on VCN
DRGs -- depend on compartments
Route tables -- depend on VCN, reference gateways by OCID
Security lists -- depend on VCN
NSGs -- depend on VCN
Subnets -- depend on VCN, reference route tables and security lists
DRG attachments -- depend on DRG and VCN

The import command for each resource type uses the OCI Terraform provider's import ID format, which for most networking resources is just the OCID:

# import the compartment
terraform import oci_identity_compartment.network_compartment ocid1.compartment.oc1..aaaa...

# verify before continuing
terraform plan -target=oci_identity_compartment.network_compartment
# expected: "No changes. Your infrastructure matches the configuration."

# import the VCN
terraform import oci_core_vcn.main_vcn ocid1.vcn.oc1.ap-mumbai-1.aaaa...

# verify
terraform plan -target=oci_core_vcn.main_vcn

The verify-after-each-import step is not optional. Importing a resource without checking the plan can mean the resource is in state but with configuration attributes that do not match the Terraform config, which surfaces as errors during the next import in the dependency chain rather than at the import that caused the problem. Running plan -target after each import and confirming "No changes" before moving forward catches configuration divergence immediately.

Step 4: Reconcile configuration divergence

Several resources in OCI had been manually modified after Terraform created them. These showed up in the post-import plan as changes Terraform wanted to apply: a security list rule added manually, a route table entry pointing to a gateway the original config did not include.

For each one, a decision: update the Terraform configuration to match reality and make the manual change authoritative, or let Terraform revert the manual change and restore IaC as source of truth. In most cases the manual changes were intentional. I updated the Terraform to match and added inline comments explaining what the non-standard element was and why it existed.

The exception: three NSG rules that had been added manually and were redundant with security list rules already in the config. I removed those from OCI manually and let the state reflect the Terraform config without them.

By end of week one, terraform plan showed zero changes. Every resource in the tenancy was represented accurately in the state file, and the configuration matched what was deployed.

The module structure I built on top of the clean state

With a clean state, the landing zone build proceeded. The module hierarchy I used:

landing-zone/
  main.tf               # root module, compartment structure
  backends.tf           # remote state config per module
  modules/
    networking/         # VCNs, subnets, DRG, gateways, route tables
    security/           # NSGs, WAF policies, network firewall rules
    compute/            # instance shapes, images, instance config
    iam/                # policies, groups, dynamic groups
    monitoring/         # OCI Monitoring, log groups, alarms

Each module has its own state file in OCI Object Storage. The reasoning: a change to an IAM policy should not require locking the same state file as a networking change. Module-level state separation allows concurrent Terraform runs by different team members without state lock contention.

The root module creates the compartment structure first. Child modules reference compartment OCIDs via remote state data sources rather than passing them as input variables. This keeps the dependency chain explicit in code without creating a single monolithic state file that everything competes for.

What I would do differently

The import week was avoidable in its current form. Not entirely -- I did not create the drift -- but the investigation time would have been shorter with two things I did not use.

Terraformer (the open-source reverse-Terraform tool from GoogleCloudPlatform) generates Terraform resource blocks by reading a provider's API. OCI support covers the common networking and compute resource types. Running it at the start would have given me a draft configuration to diff against the existing one rather than building the inventory spreadsheet manually. On subsequent engagements I have used it; it cuts roughly 30-40% of the investigation time for the category of "we have infrastructure, we need to reconstruct the IaC."

Terraform 1.5 import blocks allow declarative imports inside configuration files. Instead of one-off CLI commands, imports become version-controlled code that can be reviewed in a PR and re-run deterministically:

import {
  to = oci_core_vcn.main_vcn
  id = "ocid1.vcn.oc1.ap-mumbai-1.aaaa..."
}

import {
  to = oci_identity_compartment.network_compartment
  id = "ocid1.compartment.oc1..aaaa..."
}

The OCI provider version on this engagement predated that feature being stable for OCI resources. On any new engagement today I would use import blocks from the start. The CLI import approach has no audit trail beyond bash history; the declarative approach has a PR.

The thing I would not change: the verify-after-each-import discipline. It adds time to what is already a slow process, and it catches configuration divergence before it compounds into errors that are hard to trace back to the actual cause.

The OCI landing zone work described here is covered in the OCI Landing Zone case study. If you are recovering a drifted Terraform state and want a second opinion on the approach before running plan, my calendar is open.