ClinicalGraph — Knowledge Layer for Ambient AI

A new class of ambient AI — Suki, Abridge, Nuance DAX — converts doctor-patient conversations into LOINC-structured notes with ~98.5% transcription accuracy. The gap is not transcription. It is documentation completeness and ontology-grounded coding accuracy — whether the diagnoses discussed are correctly captured as billable, traceable ICD-10/HCC codes with structured evidence.

Market context · 2026

The market is here. The semantic gap is real.

Ambient AI adoption grew 62% YOY. 68% of health systems are deployed. But ambient AI solves transcription — not semantic completeness. CDI, which addresses completeness, shows 71% 2× ROI yet only 43% adoption. That gap is ClinicalGraph's market.

68%

health systems using ambient AI in 2026

Eliciting Insights, Feb 2026 (N=120)

+37%

more diagnoses/encounter with ambient AI alone (3.0 → 4.1)

Texas Oncology · JCO Oncol. Pract. 2024

+14%

HCC diagnoses per encounter from ambient scribes at Riverside Health, VA

npj Digital Medicine, Dec 2025

71%

AI-CDI implementers report 2× ROI — highest ROI of all AI categories

Eliciting Insights, Feb 2026 (N=120)

Suki's documented gaps (post-transcription)

After perfect transcription, the coding gap begins

No ontology traversal. Suki outputs LOINC-coded sections with draft ICD-10 codes but performs zero knowledge graph traversal — it cannot infer that T2DM + leg edema implies CKD (N18.x), a separate HCC-billable condition with clinical management implications.

No structured coding validation. ICD-10 suggestions are generative — not grounded in ontology paths. Drug recognition failures and missing diagnoses are user-reported in App Store reviews and independent analysis (DeepCura review, March 2026).

No explainable audit trail. Coding suggestions lack traceable reasoning — a regulatory liability for CMS RADV audits. Over 2,000 missed HCCs per health plan per year are defensible-but-uncoded (HIT Consultant, March 2026).

$3,000/member/year structural gap. HCC undercoding is not a transcription problem — it is a semantic completeness problem that transcription accuracy alone cannot fix. This is documented across health plans and reaffirmed by CMS RADV audit patterns.

"A health plan reviewing CMS submissions for 2024–2025 uncovered more than 2,000 undercoded or missed HCCs, each tied to defensible documentation that existed but was never structured into coding." — HIT Consultant, March 2026 · hitconsultant.net

02 — Research Evidence

Six findings. All peer-reviewed or primary sourced. All with numbers.

The right metric for ClinicalGraph is not hallucination rate (transcription problem). It is documentation completeness and coding capture rate (semantic problem).

Finding 01 · Transcription Accuracy (the baseline — not our problem)

Ambient AI transcription is accurate. The semantic gap is downstream.

3.45%

Omission rate across 12,999 clinician-annotated sentences in LLM clinical note generation studies — 2.3× higher than the hallucination rate (1.47%). This is the transcription floor. ClinicalGraph operates on a larger gap: semantic completeness of structured coding, not note text accuracy.

Asgari E. et al. "A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation." npj Digital Medicine, May 2025. doi:10.1038/s41746-025-01670-7

Finding 02 · Error Propagation (why prompting alone fails)

LLMs don't just miss — they propagate errors confidently at 83%.

83%

300 doctor-designed clinical vignettes with a single planted error (fake lab value, sign, or disease). Leading LLMs repeated or elaborated on the error in up to 83% of cases. A mitigation prompt halved the rate but did not eliminate it. This is why KG schema validation — not prompting — is required for clinical grounding.

Omar M. et al. "Multi-model assurance: LLMs highly vulnerable to adversarial hallucination attacks during clinical decision support." Communications Medicine (Nature), Aug 2025. doi:10.1038/s43856-025-01021-3

Finding 03 · HCC Gap (the actual problem ClinicalGraph solves)

Ambient AI raised diagnosis capture — but left the larger gap open.

3.0 → 4.1

Documented diagnoses per encounter rose from 3.0 to 4.1 with ambient AI (Texas Oncology 2024). Riverside Health saw +14% HCC diagnoses per encounter. But the expected range for a complex chronic patient is 5–7 HCC-relevant diagnoses. The gap between 4.1 and 7 — diagnoses discussed but not coded — is ClinicalGraph's addressable opportunity.

Doshi GK et al. JCO Oncol. Pract. 2024; Riverside Health data in: "Policy brief: ambient AI scribes and the coding arms race." npj Digital Medicine, Dec 2025. doi:10.1038/s41746-025-02272-z

Finding 04 · Financial Impact of Undercoding

Undercoding is a $4B+ structural problem. Not an edge case.

$3,000

Average HCC undercoding gap per member per year. Audit-related recoupments have historically topped $4 billion across health plans. High-prevalence conditions — T2DM, hypertension, CKD — frequently lack annual documentation meeting MEAT standards, even when actively managed and discussed in the encounter.

RAAPID Inc. "Risk Adjustment Coding in 2026: Complete HCC Guide." Feb 2026. raapidinc.com · HIT Consultant. "AI Didn't Fix HCC Coding." March 2026. hitconsultant.net

Finding 05 · KG Grounding Accuracy (the fix)

Ontology-grounded KG lifts clinical accuracy from 37% to 98%.

37% → 98%

GraphRAG framework using an ontology-grounded RDF/OWL knowledge graph evaluated on 60 clinical questions. ChatGPT-4 alone: 37% accuracy, 63% hallucination rate. DeepSeek-R1: 52%. Ontology-grounded framework: 98% accuracy (59/60), 1.7% hallucination. This is the strongest published evidence for KG grounding in clinical contexts as of 2026.

Mavridis A. et al. "Ontology-grounded knowledge graphs for mitigating hallucinations in LLMs for clinical question answering." Journal of Biomedical Informatics, Jan 2026. pii/S1532046426000171

Finding 06 · Curated KG-RAG vs Generic RAG

Curated ontology-aligned graphs reach 0% hallucination in scoped domains.

GPT-4 with curated Cancer Information Service (CIS) database via RAG: 0% hallucination rate vs. 6% on general search results. Unstructured RAG still hallucinates 5–15% with poorly chunked documents. This is why ClinicalGraph uses a hand-curated 150-node subset rather than raw PrimeKG — precision over scale for the demo domain.

JMIR Cancer study (2024), cited in brics-econ.org, Jan 2026. · K2view RAG Hallucination Report, 2024. · ClinicalGraph uses curated subset of PrimeKG (Chandak et al., Scientific Data, 2023).

03 — Impact Table

Before and after ClinicalGraph — all numbers sourced

Metric	Without ClinicalGraph	With ClinicalGraph KG Layer	Source
Diagnoses documented per encounter (complex chronic patient)	4.1 avg (ambient AI best case)	Target: 5–7 (KG gap-flagging)	JCO Oncol. Pract. 2024; npj Digital Med. 2025
Clinical QA accuracy (ontology-grounded vs generative)	37% (ChatGPT-4 alone)	98% (KG-grounded GraphRAG)	ScienceDirect pii/S1532046426000171, Jan 2026
Hallucination rate in clinical reasoning	63% (ChatGPT-4); 8–20% (CDI tools avg)	1.7% (ontology-grounded)	ScienceDirect 2026; BHM Healthcare 2024
HCC undercoding cost per member / per year	$3,000 average gap	Measurable per-encounter HCC recovery	RAAPID Inc., Feb 2026
Missed HCC codes per health plan per year	2,000+ defensible-but-uncoded	Flagged with KG path audit trail	HIT Consultant, March 2026
Audit trail for coding decisions	None — black-box generative output	Full KG traversal path per suggestion	ClinicalGraph design (this project)
Error propagation from single incorrect clinical input	Up to 83% repeat/elaboration rate	Blocked by KG schema validation	Communications Medicine (Nature), Aug 2025
AI-CDI ROI among implementers	43% adoption despite proven ROI	71% of implementers see 2× ROI	Eliciting Insights, Feb 2026 (N=120)

04 — Patient Value

This is not only a revenue story.
Patients benefit directly from complete documentation.

An undocumented CKD diagnosis doesn't just cost a health system revenue — it means a patient on metformin may not receive a contraindication review. A missed HCC flag isn't only a billing gap — it is a chronic condition excluded from the patient's care plan and resource allocation.

💊

Medication Safety

Metformin is contraindicated in CKD stage 3b+ (eGFR <45). If CKD is not documented and coded, this contraindication flag may not trigger in the EHR. ClinicalGraph flags the CKD comorbidity relationship from the KG, giving the care team an opportunity to review the medication plan — reducing avoidable adverse drug events for patients.

Ambient AI reduces documentation time 41% but does not flag drug-disease interactions arising from undocumented comorbidities. — npj Digital Medicine, 2025 · JAMA Netw. Open, 2025

📋

Equitable Care Allocation

For Medicare Advantage patients, an accurate RAF score determines resources allocated to their care management. An undercoded patient appears clinically simpler than they are — receiving less care coordination funding. Correct HCC coding ensures the care team has resources proportional to the patient's true complexity and chronic burden.

"Precise HCC coding provides a more accurate representation of a patient's overall health, allowing organizations to understand which patients are at increased risk." — Innovaccer HCC Guide, 2025

🔍

Longitudinal Continuity

A diagnosis that goes undocumented in one encounter does not enter the patient's longitudinal record. It may be invisible to the next provider, the next specialist, or the next care setting. ClinicalGraph's gap-flagging targets the space between 4.1 (ambient AI average) and 5–7 (expected for complex patients) — each recovered diagnosis is a condition that persists through the record.

Texas Oncology (+37% diagnoses) + Riverside Health (+14% HCC) — JCO Oncol. Pract. 2024; npj Digital Med. 2025

Clinician benefit

Ambient AI reduces documentation time by 41% and after-hours charting by 29%. ClinicalGraph adds coding completeness without adding clinician burden — enrichment happens post-note, invisibly, before EHR write-back.

npj Digital Med. 2025 · JAMA Netw. Open 2025

Health system benefit

71% of AI-CDI implementers report 2× ROI. $3,000 average HCC gap per member per year recoverable. Defensible KG audit trails reduce CMS RADV recoupment risk across health plans.

Eliciting Insights 2026 · RAAPID 2026 · HIT Consultant 2026

Patient benefit

Complete chronic disease documentation. Appropriate Medicare Advantage care resources. Avoidable adverse drug event prevention. Conditions entering the longitudinal record — not disappearing between visits.

Innovaccer HCC Guide 2025 · Clinical reasoning from KG comorbidity paths

05 — Solution

A platform-neutral KG enrichment layer —
not clinical decision support.

ClinicalGraph is a post-processing microservice for ambient AI output. It does not tell clinicians what to do medically. It tells the documentation system what structured coding is missing from what was already decided and transcribed. That distinction keeps it out of CDSS regulatory territory — squarely in CDI/RCM, where 71% of implementers report 2× ROI and sales cycles are driven by CFOs, not CMIOs.

Pillar 01

Platform Neutral

One webhook endpoint works with Suki, Abridge, Nuance DAX, or any ambient tool returning LOINC-structured JSON. 52% of health systems run non-Epic EHRs — systematically underserved by vendor-specific solutions.

Pillar 02

Ontology-Grounded

Built on curated subsets of PrimeKG, SNOMED CT, and ICD-10-CM. Every suggestion links to a KG traversal path — not a generative output. 37%→98% accuracy improvement grounded in peer-reviewed evidence (ScienceDirect 2026).

Pillar 03

CDI, Not CDSS

ClinicalGraph does not recommend treatments or diagnoses. It validates and enriches what was already documented and discussed. No FDA pathway concerns. Proven category: 71% 2× ROI, faster sales cycles, CFO-driven procurement.

06 — System Architecture

How data flows through ClinicalGraph

Top to bottom · Encounter → Ambient AI → KG Enrichment → EHR

Input
Layer

Source

Patient Encounter

Live audio · Natural doctor-patient conversation

Source

EHR Context

Problem list · Prior notes · Current medications

Optional

Patient History

Demographics · Chronic conditions · Prior HCC codes

Ambient
AI Layer

Suki · Abridge · Nuance DAX · Any LOINC-output partner

Ambient AI Engine

Transcription · Speaker diarization · SOAP generation · LOINC-coded sections · Draft ICD-10 codes · 98.5% transcription accuracy · avg 4.1 diagnoses/encounter

Output Format

Structured Note JSON

LOINC sections + text + draft diagnoses + medications. Coding gap begins here.

Webhook: note_complete POST /v1/enrich

← YOU BUILD
THIS →

ClinicalGraph · clinicgraph-enrichment · FastAPI + NetworkX

KG Enrichment API

1. Extract CUIs from note text (dictionary NER — no heavy ML, <5ms)
2. Query in-memory KG (NetworkX, loads at startup, <10ms traversal)
3. 2-hop traversal: diagnosis → comorbidities → ICD-10 evidence
4. Score and rank missing codes + documentation gaps
5. Return structured audit trail with full KG path reference

Demo Scope

No Auth · No HIPAA

Deferred per scope. FastAPI /docs IS the demo UI. <300ms response target. 32 TDD tests.

KG Data
Layer

PrimeKG Subset · Free · GitHub

Disease Graph

Diabetes · CV · Obesity clusters. ~150 nodes / ~400 edges curated for <50ms load.

SNOMED CT Subset · Free Browse

Clinical Ontology

Condition hierarchies · Comorbidity relationships · ICD-10 mappings

ICD-10-CM · Free CDC FTP

Code Mapping

CUI → ICD-10 · HCC relevance flags · MEAT standard tags

Phase 2 — Enterprise Add-On

Health System KG

Population patterns · Org-specific coding rules · RAF score signals

Enriched payload returned

Output
Layer

Validated Output

Confirmed Codes

ICD-10 · HCC · CPT with KG-sourced confidence scores

New Output

Gap Flags

Missing diagnoses · Undocumented comorbidities · HCC recovery opportunities

Compliance Output

KG Audit Trail

Traversal path · Source node · Evidence reference · RADV-defensible

Downstream

EHR Write-Back

→ Suki → Epic / Cerner / MEDITECH. Platform neutral by design.

Data ingest

Webhook trigger

Enriched response

Open data (free)

5 components. 32 TDD tests. One working demo.

Mock Suki Output

3 synthetic LOINC-structured notes: gap (T2DM + leg edema, missing CKD code), clean (all codes present, should pass through), partial (obesity uncoded). No real Suki API needed.Python JSON fixtures · Suki REST API webhook format

NER Extractor

~100-term medical dictionary → CUI mapping. Covers T2DM, CKD, HTN, obesity, CVD domains. No scispaCy (800MB — kills demo). Pure Python dict lookup.app/ner.py · <5ms per note · dictionary-based

KG Query Engine

~150 nodes / ~400 edges in NetworkX (in-memory, loads at startup). 2-hop traversal per CUI. <10ms per query. Returns confidence score + KG path string.app/graph_loader.py + app/enricher.py · NetworkX · no Neo4j

Enrichment API

POST /v1/enrich accepts Suki-format JSON, returns validated codes + gap flags + audit trail. GET /v1/health, GET /v1/graph/stats.FastAPI · app/main.py · <300ms target

Demo Endpoints

POST /v1/demo/{gap|clean|partial} — preloaded fixtures for bulletproof live demo. FastAPI /docs at localhost:8000/docs IS the frontend.FastAPI Swagger UI · no React/HTML needed

MCP Server (Bonus)

Expose KG enrichment as MCP server at /mcp. Suki already has MCP infrastructure (developer.suki.ai/mcp). One URL, zero custom integration — strongest demo moment.Mintlify MCP pattern · https://your-service/mcp

08 — References

All 12 citations used in this document

All numbers are from original sources. No figures are interpolated or estimated without attribution.

R-01

npj Digital Medicine — Hallucination & Omission Rates in LLM Clinical Notes

Asgari E. et al. npj Digital Medicine, May 2025. doi:10.1038/s41746-025-01670-7 · Numbers: 1.47% hallucination, 3.45% omission across 12,999 annotated sentences.

R-02

Communications Medicine (Nature) — Error Propagation in Clinical LLMs

Omar M. et al. Commun. Med., Aug 2025. doi:10.1038/s43856-025-01021-3 · Number: 83% error propagation rate on planted clinical vignette errors (N=300).

R-03

ScienceDirect — Ontology-Grounded GraphRAG Clinical QA

Mavridis A. et al. Journal of Biomedical Informatics, Jan 2026. pii/S1532046426000171 · Numbers: 37%→98% accuracy, 63%→1.7% hallucination (N=60 clinical questions).

R-04

npj Digital Medicine — Ambient AI Coding Arms Race Policy Brief

"Policy brief: ambient AI scribes and the coding arms race." npj Digital Med., Dec 2025. doi:10.1038/s41746-025-02272-z · Numbers: 3.0→4.1 diagnoses/encounter (Texas Oncology); +14% HCC diagnoses (Riverside Health, VA).

R-05

Eliciting Insights — Healthcare AI Adoption Survey 2026

Eliciting Insights. Feb 2026. N=120 health systems. Via agentman.ai/blog/healthcare-ai-adoption-2026-survey-data · Numbers: 68% ambient adoption, 71% CDI 2× ROI, 43% CDI adoption, 62% YOY ambient growth.

R-06

RAAPID Inc. — HCC Undercoding Cost Data 2026

RAAPID Inc. "Risk Adjustment Coding in 2026." Feb 2026. raapidinc.com · Numbers: $3,000/member/year undercoding gap; $4B+ historical audit recoupments across US health plans.

R-07

HIT Consultant — Missed HCC Codes at Health Plan Scale

HIT Consultant. "AI Didn't Fix HCC Coding." March 2026. hitconsultant.net · Number: 2,000+ missed HCCs per health plan reviewing 2024–2025 CMS submissions.

R-08

DeepCura — Suki AI Review 2026

Cowan F. deepcura.com, Feb 2026. · Key facts: drug recognition failures user-reported; no explainable audit trail for coding; limited clinical reasoning depth documented.

R-09

Suki Developer Documentation — Partner API (April 2026)

developer.suki.ai, last updated April 2026. · Key facts: Ambient API in Early Access; Notification Webhook available; MCP server at /mcp; iOS only (Android coming soon).

R-10

Frontiers in AI — Suki Note Quality Evaluation (PDQI-9)

Frontiers in Artificial Intelligence, Sept 2025. doi:10.3389/frai.2025.1691499 · Finding: Suki notes more thorough than physician notes but more prone to hallucination and less succinct. Uses PDQI-9 validated framework.

R-11

JMIR AI — DR.KNOWS (UMLS KG + LLM, Diagnosis Prediction)

Gao Y. et al. JMIR AI, Feb 2025. doi:10.2196/58670 · Note: DR.KNOWS is a CDSS (treatment/diagnosis recommendations). Cited for KG traversal methodology only — NOT for ClinicalGraph's CDI positioning. ClinicalGraph does not recommend treatments.

R-12

Scientific Data (Nature) — PrimeKG Dataset

Chandak P. et al. Scientific Data, Feb 2023. doi:10.1038/s41597-023-01960-3 · Key facts: 17,080 diseases, 4,050,249 relationships, 20 source databases. ClinicalGraph uses curated ~150-node subset for demo latency control.

Ambient AI transcribes well.It does not understand what it writes.

The market is here. The semantic gap is real.

After perfect transcription, the coding gap begins

Six findings. All peer-reviewed or primary sourced. All with numbers.

Before and after ClinicalGraph — all numbers sourced

This is not only a revenue story.Patients benefit directly from complete documentation.

Medication Safety

Equitable Care Allocation

Longitudinal Continuity

A platform-neutral KG enrichment layer —not clinical decision support.

Platform Neutral

Ontology-Grounded

CDI, Not CDSS

How data flows through ClinicalGraph

5 components. 32 TDD tests. One working demo.

All 12 citations used in this document

Ambient AI transcribes well.
It does not understand what it writes.

This is not only a revenue story.
Patients benefit directly from complete documentation.

A platform-neutral KG enrichment layer —
not clinical decision support.