Knowledge-Graph RAG for Explainable Lead and Account Recommendation

June 5, 2026 · 11 min read

Senior Software Engineer

If your first instinct on hearing "knowledge graph" is to reach for Neo4j, you may be over-engineering a lead recommendation system. The past year's research on combining knowledge graphs with retrieval-augmented generation (KG-RAG) for recommendations converges on a pragmatic insight: the most effective KG is often the one you already have. In this design, that is a normalized relational schema of companies, contacts, opportunities, and emails living in a Cloudflare D1 database. Traversing those foreign keys at query time, bounding the fan-out, and feeding the resulting subgraph into an LLM can produce recommendations that are grounded by construction — every explanation path is required to trace back to a real row in the operational store.

This is a research-grounded design rather than a system with measured production results. A 2026 survey covering 20 papers (IEEE KST, DOI:10.1109/kst67832.2026.11432099) concludes that combining KGs with LLMs improves recommendation quality, context, and explainability. The architecture described here — a single retrieve_kg node added to an existing agentic_rag graph — requires no new storage services, no trained embeddings, and no new secrets. Here is how the evidence stacks up and why you might consider the same approach before reaching for a graph database.

Why Explainability Matters for Lead Scoring

Lead and account recommendations are high-stakes. A sales team that follows a black-box suggestion loses three things: trust, the ability to audit, and legal defensibility. In B2B, where a single missed opportunity can be costly, an unexplained recommendation is hard to act on confidently.

The encouraging part is that explainability does not have to come at the expense of grounding. The IEEE KST 2026 survey's central thesis — that KG + LLM improves quality, context, and explainability — is the entire justification for a recommend mode. Surfacing explicit KG paths is what makes a recommendation auditable: every path is a concrete set of edges that can be inspected and, if necessary, removed.

A note of caution that the recommendation-fairness literature raises: explanations themselves can encode bias. An explanation of the form "because this lead is similar to your top customer" can reinforce historical skew. KG-RAG does not solve that automatically, but it does make the reasoning visible — every path is a concrete set of edges that can be audited.

Architecture: A `retrieve_kg` Node That Treats Relations as Edges

The design adds a single node to an existing LangGraph-based agentic_rag graph. When mode="recommend", the graph routes through START -> retrieve_kg -> retrieve -> generate_answer -> END. The existing Qdrant hybrid search node (agentic_rag_companies collection, dense bge-small 384-dim plus BM25) is reused unchanged. The new node traverses the relational D1 schema using four edge types, each with a bounded fan-out:

Edge	From → To	Purpose	Default cap
`employs`	`companies.id` → `contacts.company_id`	Find people at the seed company	25
`has_opportunity`	`companies.id` → `opportunities.company_id`	Surface active deals	25
`contacted`	`contacts.id`/`companies.id` → `emails.contact_id`/`company_id`	Show recent outreach	25
`looks_like`	`companies.id` → peer `companies.id`	Discover similar accounts by vertical and score window	25

The seed entity can be a company_id, contact_id, or opportunity_id — each resolves to a company and then follows the same company-rooted traversal. All SQL reads use parameterized queries and a chunked helper to stay under the D1 bound-parameter limit of roughly 100 (MAX_PARAMS=90). The result is a subgraph of at most roughly 100 nodes (1 company + up to 25 contacts + up to 25 opportunities + up to 25 emails + up to 25 look-alikes). That is a focused, low-noise context that mirrors the "redundancy, not insufficiency" finding of PathRAG (DOI:10.1609/aaai.v40i36.40268). PathRAG (2026, AAAI) found that graph-based RAG retrieval suffers from redundancy, not insufficiency, and proposed pruning the graph and organizing retrieval around relational paths to reduce noise. The bounded fan-out aims at the same effect without a trained pruning model.

Sketch: `build_subgraph` in Python

async def build_subgraph(
    seed_id: int,
    seed_type: Literal["company", "contact", "opportunity"],
    db: D1Database,
    caps: dict = None
) -> dict:
    if caps is None:
        caps = {"contact": 25, "opportunity": 25, "email": 25, "lookalike": 25}
    # resolve to company_id if not already a company
    if seed_type == "contact":
        company_id = await d1_one(db, "SELECT company_id FROM contacts WHERE id=?", [seed_id])
    elif seed_type == "opportunity":
        company_id = await d1_one(db, "SELECT company_id FROM opportunities WHERE id=?", [seed_id])
    else:
        company_id = seed_id
    nodes = [{"type": "company", "id": company_id, "label": await d1_one(db, "SELECT name FROM companies WHERE id=?", [company_id])}]
    edges = []
    # employs
    contacts = await d1_all(db, "SELECT id, full_name, title FROM contacts WHERE company_id=? ORDER BY vertical_fit_score DESC NULLS LAST LIMIT ?", [company_id, caps["contact"]])
    for c in contacts:
        nodes.append({"type": "contact", "id": c["id"], "label": c["full_name"]})
        edges.append({"src": f"company:{company_id}", "rel": "employs", "dst": f"contact:{c['id']}"})
    # similar edges for opportunity, email, lookalike (omitted for brevity)
    return {"nodes": nodes, "edges": edges}

This code is PII-safe: it reads only IDs and safe labels (names, titles). No email bodies or sensitive fields are retrieved.

Path Reasoning: The LLM Verbalizes Only What It Sees

The core contribution of LLMEKERec (2026, ICASSP, DOI:10.1109/icassp55912.2026.11463420) is to have the LLM produce multi-hop, path-based explanations from a KG. On MovieLens, Amazon-Book, and Yelp, it improved NDCG by up to 6.3% and substantially improved explanation quality over earlier methods. The authors extracted fine-grained semantic features from user/item reviews via LLMs, fused them with KG embeddings through heterogeneous encoding, performed multi-hop path reasoning to capture high-order user–item relations, and had the LLM transform these paths into concise natural-language explanations. This design adopts the same idea with one crucial twist: it enforces grounding in code, not trust. It deliberately drops the trained KG-embedding / heterogeneous-encoding half (offline ML, deferred).

The Recommendation model includes an explanation_path field consisting of EdgeRef triples ({src, rel, dst}). Every edge in that list must exist verbatim in the subgraph returned by build_subgraph. After generation, a pure validator drops any recommendation whose entity_id is not a retrieved node id or whose explanation_path cites an absent edge. Hallucinated recommendations silently reduce toward [] — they are never surfaced.

This is the load-bearing Grounding-First mechanism. K-RagRec (2025, ACL Long, DOI:10.18653/v1/2025.acl-long.1317) shows that vanilla RAG introduces noise and neglects structural relationships in knowledge. Its framework retrieves high-quality, up-to-date structure information from a knowledge graph to augment recommendation generation, with experiments demonstrating its effectiveness. By requiring each explanation path to correspond to a real edge that was traversed, this design turns the LLM into a verbalizer of retrieved structure, not a free-form reasoner. The result is a recommendation that says "Recommend contact 123 because company 42 employs them and company 42 looks like company 7" — and every hop is checkable against the D1 tables.

Preference Hints for Unseen Items

One of the trickiest problems in lead recommendation is suggesting accounts you have never touched. The preference-hint paper (2026, arXiv, DOI:10.48550/arxiv.2601.18096) addresses this by borrowing hints from similar users' explicit interactions. Its collaborative extraction schema borrows hints from similar users' explicit interactions for unseen items, and an instance-wise dual-attention mechanism scores attribute credibility to pick item-specific hints. The paper reports an average relative improvement of more than 3.02% over baselines on pair-wise and list-wise tasks. The looks_like edge is a simple analogue: it finds peer companies in the same vertical with a similar overall score, then recommends their contacts or opportunities that have not been seen yet. This is a heuristic, zero-ML approximation of the "unseen item" pattern — it works because the live D1 store reflects real engagement.

The bounded fan-out (no more than 25 look-alikes) keeps the LLM from drowning in noise. The preference-hint paper's finding — that careful selection of KG attributes beats feeding everything — is the same lesson PathRAG reports about redundancy, and it motivates the fan-out caps and the ids-plus-safe-labels-only payload.

Bounded vs. Unbounded Retrieval: The Design Argument

The case for bounded subgraph retrieval rests on the ablation studies reported in PathRAG and the preference-hint paper. PathRAG compared full graph retrieval (all neighbors) against pruned relational-path retrieval and found that full retrieval introduced excess noise; pruning around relational paths reduced it. The preference-hint paper similarly contrasted feeding all attributes against selected hints, reporting the >3.02% lift noted above.

This design has not run a formal production ablation, so it makes no in-house performance claims. The structural argument is what carries weight: without caps, the LLM would receive hundreds of nodes and thousands of edges, making it far more likely to reference edges that were never retrieved — which the grounding validator would then drop. The bounding heuristics (ordering by recency, fit score, score proximity) keep the subgraph focused so that valid, grounded paths are the easy path for the model to produce. This is the same lesson PathRAG states as "redundancy, not insufficiency": more is not better.

The grounding validator is the other half. Without it, an LLM can freely invent paths referencing entities outside the subgraph. With the post-generation filter, any recommendation citing an invalid path is removed before it is surfaced. The validator is not a nice-to-have — it is the structural guarantee that makes the approach trustworthy. It is a pure function with no IO and is unit-tested against a stub subgraph.

Fail-Open Design: The Graph Never Breaks

The retrieve_kg node is additive. If Qdrant is unseeded or throws, the vector hits default to []. If the D1 traversal returns an empty subgraph (schema drift or missing data), the node returns {recommendations: []} and the graph degrades gracefully to the existing answer path. The LLM_KILL_SWITCH is respected: when enabled, the node returns the retrieved subgraph with no LLM call, allowing inspection of retrieval quality without incurring generation cost.

This fail-open philosophy is the opposite of most academic benchmarks, which assume perfect infrastructure. In production, every component can fail — but the recommendation path should never throw. The code uses try/except around every database and vector-search call, and the post-generation validator is a pure function with no IO.

Costs are contained. The feature reuses the existing agentic_rag LLM call; fastembed is in-process and disabled on Render unless the FASTEMBED_ON_RENDER=1 env var is set. No new secrets, no new cloud services. A KG-RAG recommendation costs about the same as a single RAG answer.

Evaluation: Structural Guarantees Today, Offline Metrics Tomorrow

Offline ranking metrics (precision, recall, NDCG, F1) are deferred to a roadmap follow-up, as they require a held-out test set of ground-truth recommendations. For the shipped design, two gates carry the load:

Structural tests — every recommendation's entity_id must be a retrieved node, and every EdgeRef must be in subgraph.edges. This is unit-tested against a stub subgraph.
LangSmith trace — each retrieve_kg call logs a tool_call_span with seed type/id and node/edge counts (no labels or bodies). An Eval-First monitor scores recommendation relevance on a 0–1 scale, with a threshold of >=0.80.

These are proxies for true explanation quality, which the literature acknowledges is still hard to measure directly. Until the community converges on a standard for explanation faithfulness, the structural guarantee is the strongest practical signal: if the path is not real, the explanation does not exist.

Practical Takeaways: A Decision Framework

When should you build a relational-schema KG-RAG versus a graph database or pure vector RAG?

Situation	Recommended approach	Why
Data lives in normalized relational tables (CRM, ERP)	Query-time traversal → LLM	Lowest risk, no new store, explanation paths map to real foreign keys
You need real-time freshness (lead scores, email interactions)	Live-store traversal	The KG is never stale; it reads the live operational store
Graph depth >3 hops, billions of nodes	Materialized graph + trained embeddings	Query-time traversal will not scale; invest in a graph store + encoder (K-RagRec pattern)
Simple similarity-based recs (collaborative filtering)	Pure vector RAG	You don't need the structural overhead
Regulated industry requiring a full audit trail	Grounded KG-RAG with post-generation validation	Every rec decomposes into database rows; no black box

This design fits the first two rows. Using the existing D1 schema adds no new operational surface and gives grounding that a materialized KG would have required careful consistency management to match.

The Broader Implication

The hype around "knowledge graphs for AI" often becomes complex pipelines: Neo4j clusters, RDF triples, SPARQL endpoints, graph-embedding training. The 2025–2026 research surveyed here — the IEEE KST survey, LLMEKERec, K-RagRec, Preference Hint, PathRAG — points a different way. The key insight is retrieval structure, not storage structure. You don't need a graph database to do graph-based reasoning; you need a query-time traversal of your existing relations, bounded intelligently, with an LLM that turns those paths into natural-language justifications — and a validator that guarantees every path is real.

The next recommendation system you build might not need a single new table. It just needs to look at the data you already have and explain why.

Why Explainability Matters for Lead Scoring​

Architecture: A retrieve_kg Node That Treats Relations as Edges​

Sketch: build_subgraph in Python​

Path Reasoning: The LLM Verbalizes Only What It Sees​

Preference Hints for Unseen Items​

Bounded vs. Unbounded Retrieval: The Design Argument​

Fail-Open Design: The Graph Never Breaks​

Evaluation: Structural Guarantees Today, Offline Metrics Tomorrow​

Practical Takeaways: A Decision Framework​

The Broader Implication​