Skip to content
Back to Case Studies
OCI

Agentic AI Billing & CDR Reconciliation

Designed and delivered a 5-agent LangGraph pipeline for automated billing and CDR reconciliation, hosted on OCI Functions as an event-triggered serverless architecture — eliminating idle compute costs entirely.

OCIAI/MLLangGraphServerless

The Problem

Manual CDR reconciliation was error-prone, time-intensive, and required always-on VM compute — creating significant operational and cost overhead.

The Solution

Built a LangGraph 5-agent pipeline with specialised agents for ingestion, parsing, validation, reconciliation, and reporting. Hosted on OCI Functions with event-triggered invocation — zero idle compute.

Results

35%Reduction in AI hosting costs vs always-on VMs
5Specialised agents in the pipeline
100%Serverless — zero idle compute

Architecture

Agentic AI Billing & CDR Reconciliation architecture diagram

Deep Dive

Billing state for usage-based AI products is non-trivial. A single user session can span multiple LLM calls across different models, each with distinct pricing tiers, potentially retried on transient failures, with token counts that need to be metered at the call level rather than the session level. The existing billing process was a manual reconciliation run at end-of-month: an analyst would export CDR (Call Detail Record) data, cross-reference it against the usage logs, and produce invoices. The error rate on that process was creating billing disputes on roughly 8% of invoices.

The core architectural question was whether to build this as a procedural billing script or model it as a state machine. I chose LangGraph for the same reason I prefer explicit state machines over long procedural scripts in any domain where steps have dependencies and failures need to be recoverable at the step level rather than at the whole-process level. If the anomaly detection step fails, I want to retry it in isolation — not rerun the entire ingestion and parsing pipeline.

The pipeline runs five specialised nodes in sequence with conditional branching. The usage_collector node ingests CDR data from object storage and normalises field names across the different source formats. The anomaly_detector node compares each record's token consumption against historical usage patterns for that customer using Oracle DB 23ai vector search — a customer's usage profile is stored as an embedding, and records that fall outside a similarity threshold are flagged. The billing_calculator node applies tier pricing and generates per-call billing records. The reconciliation_node matches billing records against the usage log and surfaces discrepancies. The invoice_emitter node generates structured invoice output and triggers downstream notifications.

Hosting on OCI Functions was the right fit for this workload. Reconciliation runs are event-triggered — a new CDR file arrives in object storage, the OCI Events rule fires, the function executes. The workload is bursty rather than sustained. The always-on VM alternative cost a fixed monthly amount regardless of how many invoices were processed. At current invocation volume, the serverless cost is approximately 35% lower. That margin grows as invocation volume increases because the marginal cost per invocation is low, while the VM cost was fixed.

The billing dispute rate dropped from 8% to under 1% after deployment. The disputes that remain are in the category of genuinely ambiguous cases — records where the source system logged an error and the billing policy for failed calls is unclear. Those now surface as flagged records for human review rather than passing through and generating disputed invoices.

What I would instrument differently: the current system logs at the node level but does not capture the full execution trace per thread. When a disputed invoice needs investigation, tracing a specific CDR record through the pipeline requires correlating logs from five separate function invocations. A structured trace with a consistent thread ID propagated through all nodes from the start would make that investigation much faster. I've added this to the backlog for the next iteration.

Tech Stack

PythonLangGraphOCI FunctionsOracle DB 23ai
View on GitHub