Artificial Intelligence 21 min read

Construction of Real‑World Medical Knowledge Graphs and Clinical Event Graphs

The article describes how YiduCloud builds real‑world medical knowledge graphs and clinical event graphs from heterogeneous hospital systems (EMR, HIS, LIS, RIS) using data aggregation, de‑identification, quality control, NLP‑driven entity extraction, standardisation, graph construction, cleaning, embedding and various AI‑powered applications such as decision support, intelligent diagnosis, automated medical‑record generation and patient recruitment.

DataFunSummit
DataFunSummit
DataFunSummit
Construction of Real‑World Medical Knowledge Graphs and Clinical Event Graphs

Today’s topic is the construction of real‑world medical knowledge graphs and clinical event graphs. Data sources mainly come from hospital EMR, HIS, LIS, RIS systems as well as medical literature, clinical guidelines, books and drug manuals.

YiduCloud, founded in 2014, provides a data‑intelligent infrastructure called YiduCore to deeply process and analyse large‑scale multi‑source heterogeneous medical data, building disease‑domain models that support medical research, management, public decision‑making, drug discovery and patient disease management.

Because hospital information systems are built by different vendors, patient information is scattered across multiple systems. To use this data, it must first be aggregated into a patient‑centric and visit‑centric panoramic view.

Before data mining or management, the data must be de‑identified and undergo quality control, including handling poor data quality and unreasonable table relationships. Unstructured text is then structured and normalised, which falls under data‑governance.

Based on the panoramic patient data and existing medical knowledge, disease knowledge graphs are built and applied to CDSS, case search, intelligent consultation and knowledge‑fusion with deep learning. By further processing the knowledge graph, clinical diagnostic events are extracted to form event graphs for specialty views, automatic medical‑record generation, event search and causal analysis.

The pipeline includes data aggregation, entity extraction (using dictionary lookup and LSTM‑CRF NER), entity standardisation, relationship construction, graph cleaning, entity ranking and graph embedding. Graph embedding learns vector representations for subjects, predicates and objects, optimising both loss and conditional probability consistency.

Applications demonstrated include:

Intelligent diagnosis: Bayesian model combined with the knowledge graph predicts probable diseases and recommends examinations.

Information retrieval ranking: PSR metric ranks drugs related to a diagnosis, dramatically reducing physicians’ time.

Intelligent consultation: Multi‑turn dialogue simulates doctor questioning to narrow down disease candidates.

Neural‑network integration: Adding a graph‑embedding layer to a Bi‑LSTM model improves convergence and accuracy for next‑step drug prediction.

Event‑graph construction captures generic clinical events (onset, visit, diagnosis, test, medication, surgery, death) and specialty‑specific events (e.g., chemotherapy, radiotherapy for oncology). These enable specialty timelines, automated medical‑record generation via NLG, and precise patient recruitment for clinical trials.

Q&A highlights include the use of Bayesian models for interpretability, handling of probability data from literature, sources of medical knowledge, lack of industry standards for event‑graph schemas, and challenges in Chinese medical NLP such as terminology standards, annotated data scarcity, and model explainability.

Overall, the work showcases a comprehensive AI‑driven pipeline that transforms raw, heterogeneous hospital data into structured, probabilistic medical knowledge and event graphs, supporting a wide range of clinical and research applications.

big dataAINLPgraph embeddingclinical event graphMedical Knowledge Graph
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.