Knowledge graph RAG
Learn how combining a knowledge graph with vector search lets LLMs answer multi-hop relational questions that pure embedding retrieval cannot handle.
TL;DR
- Knowledge graph RAG augments vector search with a structured graph of entities and relationships, enabling multi-hop relational reasoning.
- Standard vector search retrieves "semantically similar" chunks but cannot traverse relationships: it cannot answer "who reports to the CTO?" or "which customers use products that have open critical bugs?"
- Microsoft's GraphRAG (2024) showed a 40% improvement over naive RAG on "sensemaking" queries, the class of questions that require understanding the full corpus rather than one or two passages.
- Graph construction is expensive: expect 5-15x more LLM API calls at ingestion time compared to standard chunking.
- The key limitation is graph quality: entity extraction achieves roughly 85-90% F1 with GPT-4, meaning errors baked in at ingestion compound at query time and are difficult to correct without re-ingesting.
The Problem It Solves
Your product team asks: "What are all the open critical bugs for products that CustomerX has an active subscription to?" You have thousands of documents covering product releases, bug reports, and subscription records. You spin up a standard RAG pipeline, embed everything, and ask the question. The model returns a confident-sounding answer about bugs in general, maybe mentioning one product by name. It is wrong because it pulled fragments that mentioned "CustomerX" and fragments that mentioned "critical bugs" and tried to reason across them. It had no way to know those two topics are connected through a structured relationship.
That is not a prompt engineering problem. It is a retrieval architecture problem. Vector search finds chunks that are semantically similar to the query, not chunks that participate in a specific relationship chain. Similarity is not traversal.
The same problem appears in every domain with relational data: org charts (who reports to the VP of Engineering?), software codebases (what services call the authentication module?), research literature (what drugs interact with treatment X for patients who also have condition Y?), and supply chains (which suppliers are two hops from the disrupted port?). In every case, the question is really a graph query dressed up in natural language.
What Is It?
Knowledge graph RAG augments standard vector retrieval with a structured graph of named entities and their typed relationships, so that query time combines semantic similarity search with graph traversal to produce a richer, relationally grounded context window for the LLM.
Think of a library with two systems. One system is a search index: you describe what you're looking for and it finds documents that best match your description. The other system is a card catalogue cross-referenced by author, subject, and co-citation: you can start from one card and follow explicit links to every connected card. Knowledge graph RAG uses both. The search index handles "is this relevant?". The card catalogue handles "how does this connect to that?"
How It Works
Step 1: Building the Knowledge Graph at Ingestion
The first challenge is construction. You cannot hand-craft a graph for a large document corpus. Instead, you use a pipeline that feeds each chunk to an LLM extractor and asks it to return a structured list of entities and relationships found in that text.
EXTRACT_PROMPT = """
Extract entities and relationships from this text.
Return valid JSON with:
entities: [{name, type, description}]
relationships: [{source, relation_type, target, description}]
Entity types to extract: Person, Organization, Product, Event, Concept.
Relation types: REPORTS_TO, OWNS, USES, HAS_BUG, AUTHORED, ACQUIRED, DEPENDS_ON.
Text:
{chunk}
"""
Each entity extraction call produces nodes and edges. Multiple chunks mentioning the same entity must be deduplicated and merged. This is the step where entity resolution matters: "Alice Smith", "A. Smith", and "Smith (Engineering VP)" may all refer to the same person. Without a deduplication pass, the graph fragments.
After extraction, entity descriptions are themselves embedded so that later queries can find the right entity node via semantic similarity, not just exact name matching.
Step 2: The Ingestion Pipeline in Full
The cost is real. For a 500-page document split into 1,000 chunks, you make 1,000 LLM extraction calls, each reading one chunk. Compare that to standard RAG where you only make calls to embed the chunks (much cheaper). Budget 5-15x more API spend at ingestion time.
Step 3: Query Time Retrieval
When a user query arrives, two parallel retrieval operations run:
- Vector search: embed the query and find the top-K most similar chunks. This handles questions that are about content ("what does the product documentation say about X?").
- Entity lookup and graph traversal: extract entities from the query (or use an LLM to translate the query to a graph query), look up their nodes in the graph, and traverse outbound/inbound edges to collect connected entities. This handles questions that are about relationships ("what products does CustomerX use?").
The two result sets are merged into a single context window. The LLM sees both the relevant text passages and the structured entity graph context.
The merge strategy matters. A naive approach concatenates everything. A better approach interleaves graph facts as structured context at the top ("EntityX has the following relationships: ...") and positions retrieved chunks below. The LLM processes graph context first, then uses the chunks to fill in prose detail.
Step 4: Microsoft GraphRAG's Two-Level Architecture
Microsoft's 2024 paper introduced an important architectural refinement: community detection. Rather than just extracting individual entities and edges, GraphRAG runs a community detection algorithm (Leiden algorithm) on the graph to identify clusters of tightly connected entities. For each community, a summary is pre-computed.
This enables two query modes:
- Local search: entity + neighbor context. Standard graph traversal. Good for specific questions about known entities.
- Global search: ranks community summaries by relevance to the query and uses them as context. This is the mode that handles "sensemaking" queries like "what are the main themes across all our customer feedback documents?" Pure vector search would need to retrieve nearly every document to answer this, which exceeds any context window. Community summaries give a compressed, pre-computed view.
The 40% improvement on global queries is real because without community summaries, those queries either retrieve irrelevant chunks or exceed the context window. With community summaries, the LLM sees a pre-computed, structured overview of the relevant domain.
Step 5: Concrete End-to-End Example
Corpus: a company's internal Confluence wiki covering 3,000 pages of org charts, product documentation, bug trackers, and customer contracts.
Query: "Find all customers who are impacted by the authentication service outage that occurred in the Q3 incident review."
Without knowledge graph RAG:
- Vector search finds chunks mentioning "authentication service" and "outage."
- It cannot traverse to find which customers are impacted because that relationship lives across multiple documents.
- The model guesses or hallucinates a list.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.