Research Brief + System Architecture · April 2026
ClinicalGraph
Knowledge Enrichment Layer · clinicgraph-enrichment
01 — Problem Statement

Ambient AI transcribes well.
It does not understand what it writes.

A new class of ambient AI — Suki, Abridge, Nuance DAX — converts doctor-patient conversations into LOINC-structured notes with ~98.5% transcription accuracy. The gap is not transcription. It is documentation completeness and ontology-grounded coding accuracy — whether the diagnoses discussed are correctly captured as billable, traceable ICD-10/HCC codes with structured evidence.

⚠️
Important clarification: The 1.47% hallucination / 3.45% omission rate (npj Digital Medicine 2025) measures transcription faithfulness. ClinicalGraph operates on a larger, separate problem: of the diagnoses that should be coded from what was correctly transcribed, how many are missing from the structured output? That gap averages $3,000 per member per year in HCC undercoding losses — entirely independent of transcription quality. These are different metrics measuring different problems.

The market is here. The semantic gap is real.

Ambient AI adoption grew 62% YOY. 68% of health systems are deployed. But ambient AI solves transcription — not semantic completeness. CDI, which addresses completeness, shows 71% 2× ROI yet only 43% adoption. That gap is ClinicalGraph's market.

68%
health systems using ambient AI in 2026
Eliciting Insights, Feb 2026 (N=120)
+37%
more diagnoses/encounter with ambient AI alone (3.0 → 4.1)
Texas Oncology · JCO Oncol. Pract. 2024
+14%
HCC diagnoses per encounter from ambient scribes at Riverside Health, VA
npj Digital Medicine, Dec 2025
71%
AI-CDI implementers report 2× ROI — highest ROI of all AI categories
Eliciting Insights, Feb 2026 (N=120)

After perfect transcription, the coding gap begins

1

No ontology traversal. Suki outputs LOINC-coded sections with draft ICD-10 codes but performs zero knowledge graph traversal — it cannot infer that T2DM + leg edema implies CKD (N18.x), a separate HCC-billable condition with clinical management implications.

2

No structured coding validation. ICD-10 suggestions are generative — not grounded in ontology paths. Drug recognition failures and missing diagnoses are user-reported in App Store reviews and independent analysis (DeepCura review, March 2026).

3

No explainable audit trail. Coding suggestions lack traceable reasoning — a regulatory liability for CMS RADV audits. Over 2,000 missed HCCs per health plan per year are defensible-but-uncoded (HIT Consultant, March 2026).

4

$3,000/member/year structural gap. HCC undercoding is not a transcription problem — it is a semantic completeness problem that transcription accuracy alone cannot fix. This is documented across health plans and reaffirmed by CMS RADV audit patterns.

"A health plan reviewing CMS submissions for 2024–2025 uncovered more than 2,000 undercoded or missed HCCs, each tied to defensible documentation that existed but was never structured into coding." — HIT Consultant, March 2026 · hitconsultant.net
02 — Research Evidence

Six findings. All peer-reviewed or primary sourced. All with numbers.

The right metric for ClinicalGraph is not hallucination rate (transcription problem). It is documentation completeness and coding capture rate (semantic problem).

Finding 01 · Transcription Accuracy (the baseline — not our problem)
Ambient AI transcription is accurate. The semantic gap is downstream.
3.45%
Omission rate across 12,999 clinician-annotated sentences in LLM clinical note generation studies — 2.3× higher than the hallucination rate (1.47%). This is the transcription floor. ClinicalGraph operates on a larger gap: semantic completeness of structured coding, not note text accuracy.
Asgari E. et al. "A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation." npj Digital Medicine, May 2025. doi:10.1038/s41746-025-01670-7
Finding 02 · Error Propagation (why prompting alone fails)
LLMs don't just miss — they propagate errors confidently at 83%.
83%
300 doctor-designed clinical vignettes with a single planted error (fake lab value, sign, or disease). Leading LLMs repeated or elaborated on the error in up to 83% of cases. A mitigation prompt halved the rate but did not eliminate it. This is why KG schema validation — not prompting — is required for clinical grounding.
Omar M. et al. "Multi-model assurance: LLMs highly vulnerable to adversarial hallucination attacks during clinical decision support." Communications Medicine (Nature), Aug 2025. doi:10.1038/s43856-025-01021-3
Finding 03 · HCC Gap (the actual problem ClinicalGraph solves)
Ambient AI raised diagnosis capture — but left the larger gap open.
3.0 → 4.1
Documented diagnoses per encounter rose from 3.0 to 4.1 with ambient AI (Texas Oncology 2024). Riverside Health saw +14% HCC diagnoses per encounter. But the expected range for a complex chronic patient is 5–7 HCC-relevant diagnoses. The gap between 4.1 and 7 — diagnoses discussed but not coded — is ClinicalGraph's addressable opportunity.
Doshi GK et al. JCO Oncol. Pract. 2024; Riverside Health data in: "Policy brief: ambient AI scribes and the coding arms race." npj Digital Medicine, Dec 2025. doi:10.1038/s41746-025-02272-z
Finding 04 · Financial Impact of Undercoding
Undercoding is a $4B+ structural problem. Not an edge case.
$3,000
Average HCC undercoding gap per member per year. Audit-related recoupments have historically topped $4 billion across health plans. High-prevalence conditions — T2DM, hypertension, CKD — frequently lack annual documentation meeting MEAT standards, even when actively managed and discussed in the encounter.
RAAPID Inc. "Risk Adjustment Coding in 2026: Complete HCC Guide." Feb 2026. raapidinc.com · HIT Consultant. "AI Didn't Fix HCC Coding." March 2026. hitconsultant.net
Finding 05 · KG Grounding Accuracy (the fix)
Ontology-grounded KG lifts clinical accuracy from 37% to 98%.
37% → 98%
GraphRAG framework using an ontology-grounded RDF/OWL knowledge graph evaluated on 60 clinical questions. ChatGPT-4 alone: 37% accuracy, 63% hallucination rate. DeepSeek-R1: 52%. Ontology-grounded framework: 98% accuracy (59/60), 1.7% hallucination. This is the strongest published evidence for KG grounding in clinical contexts as of 2026.
Mavridis A. et al. "Ontology-grounded knowledge graphs for mitigating hallucinations in LLMs for clinical question answering." Journal of Biomedical Informatics, Jan 2026. pii/S1532046426000171
Finding 06 · Curated KG-RAG vs Generic RAG
Curated ontology-aligned graphs reach 0% hallucination in scoped domains.
0%
GPT-4 with curated Cancer Information Service (CIS) database via RAG: 0% hallucination rate vs. 6% on general search results. Unstructured RAG still hallucinates 5–15% with poorly chunked documents. This is why ClinicalGraph uses a hand-curated 150-node subset rather than raw PrimeKG — precision over scale for the demo domain.
JMIR Cancer study (2024), cited in brics-econ.org, Jan 2026. · K2view RAG Hallucination Report, 2024. · ClinicalGraph uses curated subset of PrimeKG (Chandak et al., Scientific Data, 2023).
03 — Impact Table

Before and after ClinicalGraph — all numbers sourced

MetricWithout ClinicalGraphWith ClinicalGraph KG LayerSource
Diagnoses documented per encounter (complex chronic patient) 4.1 avg (ambient AI best case) Target: 5–7 (KG gap-flagging) JCO Oncol. Pract. 2024; npj Digital Med. 2025
Clinical QA accuracy (ontology-grounded vs generative) 37% (ChatGPT-4 alone) 98% (KG-grounded GraphRAG) ScienceDirect pii/S1532046426000171, Jan 2026
Hallucination rate in clinical reasoning 63% (ChatGPT-4); 8–20% (CDI tools avg) 1.7% (ontology-grounded) ScienceDirect 2026; BHM Healthcare 2024
HCC undercoding cost per member / per year $3,000 average gap Measurable per-encounter HCC recovery RAAPID Inc., Feb 2026
Missed HCC codes per health plan per year 2,000+ defensible-but-uncoded Flagged with KG path audit trail HIT Consultant, March 2026
Audit trail for coding decisions None — black-box generative output Full KG traversal path per suggestion ClinicalGraph design (this project)
Error propagation from single incorrect clinical input Up to 83% repeat/elaboration rate Blocked by KG schema validation Communications Medicine (Nature), Aug 2025
AI-CDI ROI among implementers 43% adoption despite proven ROI 71% of implementers see 2× ROI Eliciting Insights, Feb 2026 (N=120)
04 — Patient Value

This is not only a revenue story.
Patients benefit directly from complete documentation.

An undocumented CKD diagnosis doesn't just cost a health system revenue — it means a patient on metformin may not receive a contraindication review. A missed HCC flag isn't only a billing gap — it is a chronic condition excluded from the patient's care plan and resource allocation.

💊

Medication Safety

Metformin is contraindicated in CKD stage 3b+ (eGFR <45). If CKD is not documented and coded, this contraindication flag may not trigger in the EHR. ClinicalGraph flags the CKD comorbidity relationship from the KG, giving the care team an opportunity to review the medication plan — reducing avoidable adverse drug events for patients.

Ambient AI reduces documentation time 41% but does not flag drug-disease interactions arising from undocumented comorbidities. — npj Digital Medicine, 2025 · JAMA Netw. Open, 2025
📋

Equitable Care Allocation

For Medicare Advantage patients, an accurate RAF score determines resources allocated to their care management. An undercoded patient appears clinically simpler than they are — receiving less care coordination funding. Correct HCC coding ensures the care team has resources proportional to the patient's true complexity and chronic burden.

"Precise HCC coding provides a more accurate representation of a patient's overall health, allowing organizations to understand which patients are at increased risk." — Innovaccer HCC Guide, 2025
🔍

Longitudinal Continuity

A diagnosis that goes undocumented in one encounter does not enter the patient's longitudinal record. It may be invisible to the next provider, the next specialist, or the next care setting. ClinicalGraph's gap-flagging targets the space between 4.1 (ambient AI average) and 5–7 (expected for complex patients) — each recovered diagnosis is a condition that persists through the record.

Texas Oncology (+37% diagnoses) + Riverside Health (+14% HCC) — JCO Oncol. Pract. 2024; npj Digital Med. 2025
Clinician benefit
Ambient AI reduces documentation time by 41% and after-hours charting by 29%. ClinicalGraph adds coding completeness without adding clinician burden — enrichment happens post-note, invisibly, before EHR write-back.
npj Digital Med. 2025 · JAMA Netw. Open 2025
Health system benefit
71% of AI-CDI implementers report 2× ROI. $3,000 average HCC gap per member per year recoverable. Defensible KG audit trails reduce CMS RADV recoupment risk across health plans.
Eliciting Insights 2026 · RAAPID 2026 · HIT Consultant 2026
Patient benefit
Complete chronic disease documentation. Appropriate Medicare Advantage care resources. Avoidable adverse drug event prevention. Conditions entering the longitudinal record — not disappearing between visits.
Innovaccer HCC Guide 2025 · Clinical reasoning from KG comorbidity paths
05 — Solution

A platform-neutral KG enrichment layer —
not clinical decision support.

ClinicalGraph is a post-processing microservice for ambient AI output. It does not tell clinicians what to do medically. It tells the documentation system what structured coding is missing from what was already decided and transcribed. That distinction keeps it out of CDSS regulatory territory — squarely in CDI/RCM, where 71% of implementers report 2× ROI and sales cycles are driven by CFOs, not CMIOs.

Pillar 01

Platform Neutral

One webhook endpoint works with Suki, Abridge, Nuance DAX, or any ambient tool returning LOINC-structured JSON. 52% of health systems run non-Epic EHRs — systematically underserved by vendor-specific solutions.

Pillar 02

Ontology-Grounded

Built on curated subsets of PrimeKG, SNOMED CT, and ICD-10-CM. Every suggestion links to a KG traversal path — not a generative output. 37%→98% accuracy improvement grounded in peer-reviewed evidence (ScienceDirect 2026).

Pillar 03

CDI, Not CDSS

ClinicalGraph does not recommend treatments or diagnoses. It validates and enriches what was already documented and discussed. No FDA pathway concerns. Proven category: 71% 2× ROI, faster sales cycles, CFO-driven procurement.

06 — System Architecture

How data flows through ClinicalGraph

Top to bottom · Encounter → Ambient AI → KG Enrichment → EHR

Input
Layer
Source
Patient Encounter
Live audio · Natural doctor-patient conversation
Source
EHR Context
Problem list · Prior notes · Current medications
Optional
Patient History
Demographics · Chronic conditions · Prior HCC codes
Ambient
AI Layer
Suki · Abridge · Nuance DAX · Any LOINC-output partner
Ambient AI Engine
Transcription · Speaker diarization · SOAP generation · LOINC-coded sections · Draft ICD-10 codes · 98.5% transcription accuracy · avg 4.1 diagnoses/encounter
Output Format
Structured Note JSON
LOINC sections + text + draft diagnoses + medications. Coding gap begins here.
Webhook: note_complete POST /v1/enrich
← YOU BUILD
THIS →
ClinicalGraph · clinicgraph-enrichment · FastAPI + NetworkX
KG Enrichment API
1. Extract CUIs from note text (dictionary NER — no heavy ML, <5ms)
2. Query in-memory KG (NetworkX, loads at startup, <10ms traversal)
3. 2-hop traversal: diagnosis → comorbidities → ICD-10 evidence
4. Score and rank missing codes + documentation gaps
5. Return structured audit trail with full KG path reference
Demo Scope
No Auth · No HIPAA
Deferred per scope. FastAPI /docs IS the demo UI. <300ms response target. 32 TDD tests.
KG Data
Layer
PrimeKG Subset · Free · GitHub
Disease Graph
Diabetes · CV · Obesity clusters. ~150 nodes / ~400 edges curated for <50ms load.
SNOMED CT Subset · Free Browse
Clinical Ontology
Condition hierarchies · Comorbidity relationships · ICD-10 mappings
ICD-10-CM · Free CDC FTP
Code Mapping
CUI → ICD-10 · HCC relevance flags · MEAT standard tags
Phase 2 — Enterprise Add-On
Health System KG
Population patterns · Org-specific coding rules · RAF score signals
Enriched payload returned
Output
Layer
Validated Output
Confirmed Codes
ICD-10 · HCC · CPT with KG-sourced confidence scores
New Output
Gap Flags
Missing diagnoses · Undocumented comorbidities · HCC recovery opportunities
Compliance Output
KG Audit Trail
Traversal path · Source node · Evidence reference · RADV-defensible
Downstream
EHR Write-Back
→ Suki → Epic / Cerner / MEDITECH. Platform neutral by design.
Data ingest
Webhook trigger
Enriched response
Open data (free)
07 — What You Build for the Hackathon

5 components. 32 TDD tests. One working demo.

Mock Suki Output
3 synthetic LOINC-structured notes: gap (T2DM + leg edema, missing CKD code), clean (all codes present, should pass through), partial (obesity uncoded). No real Suki API needed.Python JSON fixtures · Suki REST API webhook format
NER Extractor
~100-term medical dictionary → CUI mapping. Covers T2DM, CKD, HTN, obesity, CVD domains. No scispaCy (800MB — kills demo). Pure Python dict lookup.app/ner.py · <5ms per note · dictionary-based
KG Query Engine
~150 nodes / ~400 edges in NetworkX (in-memory, loads at startup). 2-hop traversal per CUI. <10ms per query. Returns confidence score + KG path string.app/graph_loader.py + app/enricher.py · NetworkX · no Neo4j
Enrichment API
POST /v1/enrich accepts Suki-format JSON, returns validated codes + gap flags + audit trail. GET /v1/health, GET /v1/graph/stats.FastAPI · app/main.py · <300ms target
Demo Endpoints
POST /v1/demo/{gap|clean|partial} — preloaded fixtures for bulletproof live demo. FastAPI /docs at localhost:8000/docs IS the frontend.FastAPI Swagger UI · no React/HTML needed
MCP Server (Bonus)
Expose KG enrichment as MCP server at /mcp. Suki already has MCP infrastructure (developer.suki.ai/mcp). One URL, zero custom integration — strongest demo moment.Mintlify MCP pattern · https://your-service/mcp
🏆 Hackathon (Now)

5 components + 32 TDD tests. Working demo: gap → clean → partial. Story: "I built the KG layer Suki needs in 18 months — here it is working today."

🚀 MVP (3–6 months)

Real Suki Early Access API. HIPAA BAA. Neo4j replaces NetworkX. Suki for Partners program submission. Target non-Epic health systems (52% underserved).

🏢 Enterprise (12–18 months)

Health-system-specific population graph. CDI compliance module. $3,000/member/year HCC recovery quantified per org. 71% of CDI implementers see 2× ROI.

08 — References

All 12 citations used in this document

All numbers are from original sources. No figures are interpolated or estimated without attribution.

R-01
npj Digital Medicine — Hallucination & Omission Rates in LLM Clinical Notes
Asgari E. et al. npj Digital Medicine, May 2025. doi:10.1038/s41746-025-01670-7 · Numbers: 1.47% hallucination, 3.45% omission across 12,999 annotated sentences.
R-02
Communications Medicine (Nature) — Error Propagation in Clinical LLMs
Omar M. et al. Commun. Med., Aug 2025. doi:10.1038/s43856-025-01021-3 · Number: 83% error propagation rate on planted clinical vignette errors (N=300).
R-03
ScienceDirect — Ontology-Grounded GraphRAG Clinical QA
Mavridis A. et al. Journal of Biomedical Informatics, Jan 2026. pii/S1532046426000171 · Numbers: 37%→98% accuracy, 63%→1.7% hallucination (N=60 clinical questions).
R-04
npj Digital Medicine — Ambient AI Coding Arms Race Policy Brief
"Policy brief: ambient AI scribes and the coding arms race." npj Digital Med., Dec 2025. doi:10.1038/s41746-025-02272-z · Numbers: 3.0→4.1 diagnoses/encounter (Texas Oncology); +14% HCC diagnoses (Riverside Health, VA).
R-05
Eliciting Insights — Healthcare AI Adoption Survey 2026
Eliciting Insights. Feb 2026. N=120 health systems. Via agentman.ai/blog/healthcare-ai-adoption-2026-survey-data · Numbers: 68% ambient adoption, 71% CDI 2× ROI, 43% CDI adoption, 62% YOY ambient growth.
R-06
RAAPID Inc. — HCC Undercoding Cost Data 2026
RAAPID Inc. "Risk Adjustment Coding in 2026." Feb 2026. raapidinc.com · Numbers: $3,000/member/year undercoding gap; $4B+ historical audit recoupments across US health plans.
R-07
HIT Consultant — Missed HCC Codes at Health Plan Scale
HIT Consultant. "AI Didn't Fix HCC Coding." March 2026. hitconsultant.net · Number: 2,000+ missed HCCs per health plan reviewing 2024–2025 CMS submissions.
R-08
DeepCura — Suki AI Review 2026
Cowan F. deepcura.com, Feb 2026. · Key facts: drug recognition failures user-reported; no explainable audit trail for coding; limited clinical reasoning depth documented.
R-09
Suki Developer Documentation — Partner API (April 2026)
developer.suki.ai, last updated April 2026. · Key facts: Ambient API in Early Access; Notification Webhook available; MCP server at /mcp; iOS only (Android coming soon).
R-10
Frontiers in AI — Suki Note Quality Evaluation (PDQI-9)
Frontiers in Artificial Intelligence, Sept 2025. doi:10.3389/frai.2025.1691499 · Finding: Suki notes more thorough than physician notes but more prone to hallucination and less succinct. Uses PDQI-9 validated framework.
R-11
JMIR AI — DR.KNOWS (UMLS KG + LLM, Diagnosis Prediction)
Gao Y. et al. JMIR AI, Feb 2025. doi:10.2196/58670 · Note: DR.KNOWS is a CDSS (treatment/diagnosis recommendations). Cited for KG traversal methodology only — NOT for ClinicalGraph's CDI positioning. ClinicalGraph does not recommend treatments.
R-12
Scientific Data (Nature) — PrimeKG Dataset
Chandak P. et al. Scientific Data, Feb 2023. doi:10.1038/s41597-023-01960-3 · Key facts: 17,080 diseases, 4,050,249 relationships, 20 source databases. ClinicalGraph uses curated ~150-node subset for demo latency control.