Thematic Role Semantic Recognition and Fine-Grained Biomedical Knowledge Mining for Drug Repurposing
This presentation explores how deep semantic role labeling and multimodal data fusion of biomedical literature can build fine‑grained knowledge graphs that enable drug repurposing, describing the AGAC corpus, modeling approaches, evaluation results, and future research directions.
Speaker: Xia Jingbo, Associate Professor, Huazhong Agricultural University (edited by Wang Jinhua, China Electronics 32 Institute; produced by DataFunTalk).
Introduction: Since the COVID‑19 pandemic, drug repurposing—finding new therapeutic uses for existing drugs—has become a primary strategy because traditional drug development is lengthy and costly. The talk focuses on extracting potential drug–disease relationships from biomedical literature using thematic role semantic recognition.
1. Overview of Drug Repurposing – Traditional data sources are clinical texts and images; this work emphasizes literature‑based clues. Examples include dopamine, originally for cardiovascular disease, now linked to multiple cancers, and rapamycin, originally for immune disorders, now effective against pancreatic cancer.
2. Semantic Role Modeling in Biomedical Texts – Deep semantic extraction builds a knowledge graph distinct from generic industry graphs, requiring finer‑grained, deeper semantics. The goal is to capture events such as "under certain conditions, gene X mutates, leading to functional gain/loss related to disease Y."
3. Corpus Construction (AGAC V1.0) – The Active Gene Annotation Corpus (2017‑2019) defines entities like gene mutation, molecular activity, cellular activity, and pathways. Semantic role labeling (who did what to whom, where, when) annotates these events.
4. Research Paradigms for Deep Semantic Mining
Linguistic methods to model loss‑of‑function/gain‑of‑function semantics.
Text‑mining‑driven NLP for large‑scale semantic prediction.
Multi‑source data integration based on biomedical background.
Mathematical models for reasoning and fusion across heterogeneous sources.
5. Knowledge Graph Construction and Application – The annotated graph links entities (genes, mutations, proteins) with relations (causes, types). Example: RS10719 mutation suppresses miR‑27b, promotes luciferase expression, and up‑regulates DROSHA, contributing to bladder cancer.
Using this graph, one can identify gene‑drug associations, embed them via joint matrix/tensor decomposition, and discover novel therapeutic links.
6. Multimodal Data Fusion – Combines "type mutation" knowledge from text with association data (e.g., Manhattan plots of P‑values) to create richer, complementary insights. Fusion is performed via graph models and variational inference, integrating binary association scores (P) with semantic embeddings (f).
The resulting fused knowledge graph improves disease‑gene‑drug discovery, demonstrated in Alzheimer's disease prediction where most results are supported by the knowledge base.
7. Outlook – Future work aims to build larger, more precise graphs where relationship chains are short, evidence is traceable, and multimodal data continuously enriches drug repurposing pipelines.
References
Kaiyin Zhou et al., "High‑quality Gene/Disease Embedding in a Multi‑relational Heterogeneous Graph After Joint Matrix/Tensor Decomposition", Journal of Biomedical Informatics, 2022.
Sizhuo Ouyang et al., "LitCovid‑AGAC: Cellular and Molecular Level Annotation Data Set Based on Covid‑19", Genomics and Informatics, 2021.
Kaiyin Zhou et al., "Bridging Heterogeneous Mutation Data to Enhance Disease‑Gene Discovery", Briefings in Bioinformatics, 2021.
Thank you for attending.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.