Skip to content
Back to Case Studies
OCI

RAG-Based HLD Document Generation System

Architected a production RAG system on OCI Generative AI that automates High-Level Design document generation — cutting generation time by 50-70% while controlling hallucinations via grounded prompting.

OCIAI/MLLangChainOracle DB 23ai

The Problem

Enterprise delivery teams were spending weeks manually authoring HLD documents for each client engagement, leading to inconsistency and delays in project kick-off.

The Solution

Built a RAG pipeline using OCI Gen AI (LLM inference), Oracle DB 23ai vector search for knowledge base retrieval, LangChain for orchestration, and a Next.js + FastAPI production stack for the interface.

Results

40%Faster KB retrieval latency
50-70%Reduction in HLD generation time
~0Hallucination rate with grounded prompting

Architecture

RAG-Based HLD Document Generation System architecture diagram

Deep Dive

Oracle architects were spending 20-40 hours per engagement producing High-Level Design documents from scratch. The process was repetitive: gather requirements, search for applicable architecture patterns from Oracle's internal library, draft the structure, validate against Oracle-specific standards. The documents followed predictable structures across engagements, but the search-and-draft cycle was manual every time.

What made this harder than a generic document automation problem: Oracle's HLD templates reference internal architecture patterns and Oracle Cloud service configurations that are not publicly documented. A RAG system grounded on public documentation would produce architecturally valid but Oracle-context-incorrect output. The retrieval corpus had to be indexed from Oracle's internal architecture library, which meant the chunking and indexing strategy had to preserve the semantic relationships between architecture patterns, not just surface keyword matches.

I built the retrieval pipeline on Oracle DB 23ai's native vector search capability rather than an external vector store. The reasoning: the project context data (client requirements, prior engagement summaries) was already in Oracle DB, and co-locating the vector index with the relational data simplified the retrieval join. For a query like 'retrieve relevant OCI network patterns for a financial services client with PCI-DSS requirements', the query hits the vector index for architecture patterns and a standard SQL join for client context — one round-trip to one database rather than two calls to two systems.

The chunking strategy was the most non-trivial decision. Oracle HLD templates are hierarchically structured — a section on network architecture depends on context from the security section, which in turn references the compute topology. Naive fixed-size chunking breaks these cross-section references. I chunked at the section level and stored the parent document ID and section sequence number alongside each embedding, so retrieval could pull related sections by proximity in the original document structure when needed.

Hallucination control came from two mechanisms: strict source grounding (every generated HLD section was required to cite a retrieved chunk as its basis) and a validation pass using a separate LLM call that checked the generated output against a rubric encoding Oracle's HLD standards. Sections that failed the validation check were flagged for human review rather than silently included. This is where the near-zero hallucination figure comes from — the system does not suppress hallucinations, it surfaces them for review rather than letting them pass.

The outcome: a typical HLD engagement dropped from 20-40 hours of manual drafting to 6-12 hours including review and customisation. The variability exists because complex engagements require more human judgment, particularly around novel architecture combinations not well-represented in the indexed corpus. What I would build differently: an explicit feedback loop where architect corrections to generated sections are captured and fed back into the knowledge base. The current system is static — the corpus improves only when someone manually updates it. A lightweight correction-capture mechanism would make it self-improving over time.

Tech Stack

PythonLangChainLangGraphOCI Gen AIOracle DB 23aiNext.jsFastAPI
View on GitHub