摘 要:针对当前长链非编码 RNA(lncRNA)与疾病关联预测研究中存在的异质网络构建不完善、网络节点信息挖掘不充分问题,提出一种基于关系图卷积网络(Relational Graph Convolutional Network, R-GCN)的方法(RGCNLDA)。首先,构建 lncRNA-miRNA- 疾病异质图,随后在异质图上训练 R-GCN 获取节点嵌入向量,最后使用多层感知机预测 lncRNA- 疾病关联。5 折交叉验证结果显示,RGCNLDA 的受试者工作特征曲线下面积(AUROC)为 0.934,表明其具有良好的预测性能。
关键词:lncRNA;关系图卷积网络;异质图;关联预测
DOI:10.19850/j.cnki.2096-4706.2023.07.022
基金项目:黑龙江省省属高等学校基本科研业务费自然科学类青年创新人才项目(145209206);2021年齐齐哈尔大学研究生创新科研项目 (YJSCX2021076)
中图分类号:TP311 文献标识码:A 文章编号:2096-4706(2023)07-0086-04
LncRNA-Disease Association Prediction Based on R-GCN
DU Xiaoxin, LUO Jinqi, JIN Mei, WANG Zhenfei, ZHOU Wei
(Qiqihar University, Qiqihar 161006, China)
Abstract: Aiming at the problems of imperfect heterogeneous network construction and insufficient network node information mining in the current research on association prediction of long non-coding RNA (lncRNA) and disease, a method based on Relational Graph Convolutional Network (R-GCN) is proposed (RGCNLDA). Firstly, a lncRNA-miRNA-disease heterogeneous graph is constructed, and then R-GCN is trained on the heterogeneous graph to obtain node embedding vectors. Finally, a multi-layer perceptron is used to get lncRNA-disease associations. The results of 5-fold cross validation show that the Area Under Receiver Operating Characteristic curve (AUROC) of RGCNLDA is 0.934, indicating a good predictive performance.
Keywords: lncRNA; R-GCN; heterogeneous graph; association prediction
参考文献:
[1] TAFT R J,PANG K C,MERCER T R, et al. Non-coding RNAs: regulators of disease [J]. J Pathol,2010,220(2):126-139.现代信息科技4月上7期.indd 88 2023/4/20 17:03:43True Positive Rate2023.04 89第7期
[2] JOHNSON R. Long non-coding RNAs in Huntington's disease neurodegeneration [J] Neurobiol Dis,2012,46:245-254.
[3] CHEN X,YAN G Y. Novel human lncRNA-disease association inference based on lncRNA expression profiles [J]. Bioinformatics,2013,29(20):2617-2624.
[4] YANG Q,LI X K. BiGAN: LncRNA-disease association prediction based on bidirectional generative adversarial network [J/OL]. BMC Bioinformatics,2021,22[2022-11-26].https://bmcbioinformatics. biomedcentral.com/articles/10.1186/s12859-021-04273-7.
[5] WANG Y T,JUAN L R,PENG J J,et al. LncDisAP: a computation model for LncRNA-disease association prediction based on multiple biological datasets [J/OL].BMC Bioinformatics, 2019,20[2022-11-22].https://bmcbioinformatics.biomedcentral.com/ articles/10.1186/s12859-019-3081-1.
[6] HU J L,GAO Y Q,LI J, et al. A novel algorithm based on bi-random walks to identify disease-related lncRNAs [J/OL].BMC Bioinformatics,2019,20[2022-11-22].https://pubmed.ncbi.nlm.nih. gov/31760932/.
[7] BAO Z Y,YANG Z,HUANG Z,et al. LncRNADisease 2.0: an updated database of long non-coding RNA-associated disease [J]. Nucleic Acids Res,2019,47(D1):D1034-D1037.
[8] GAO Y,SHANG S P,GUO S,et al. Lnc2Cancer 3.0: an updated resource for experimentally supported lncRNA/circRNA cancer associations and web tools based on RNA-seq and scRNA-seq data[J]. Nucleic Acids Res,2021,49(D1):D1251-D1258.
[9] LI J H,LIU S,ZHOU H,et al. starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data [J]. Nucleic Acids Res,2014,42(Database issue):D92-D97.
[10] TENG X Y,CHEN X M,XUE H,et al. NPInter v4.0: an integrated database of ncRNA interactions [J].Nucleic Acids Res, 2020,48(D1):D160–D165.
[11] HUANG Z,SHI J C,GAO Y X,et al. HMDD v3.0: a database for experimentally supported human microRNA-disease associations [J].Nucleic Acids Res,2019,47(D1):D1013-D1017.
[12] WANG D,WANG J,LU M,et al. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases [J].Bioinformatics,2010,26:1644-1650.
[13] CHEN X,YAN C C,LUO C,et al. Constructing lncRNA functional similarity network based on lncRNA-disease associations and disease semantic similarity [J/OL]. Scientific Reports, 2015, 5[2022-11-22].https://www.nature.com/articles/srep11338.
[14] Fu GY, Wang J, LUO C, et al. Matrix factorization-based data fusion for the prediction of lncRNA–disease associations [J]. Bioinformatics,2018,34(9):1529-1537.
[15] FU G Y,WANG J,LUO C,et al. TPGLDA: Novel prediction of associations between lncRNAs and diseases via lncRNAdisease-gene tripartite graph [J].Scientific Reports,2018,8(1):1-11.
[16] ZENG M,LU C Q,ZHANG F H,et al. SDLDA: lncRNAdisease association prediction based on singular value decomposition and deep learning [J].Methods,2020,179:73-80.
作者简介:杜晓昕 (1983—),女,汉族,江苏徐州人,教授,硕士研究生,研究方向:生物医学大数据分析与处理;罗金琦(1997—),女,汉族,四川绵阳人,硕士在读,研究方向:临床医学大数据挖掘;金梅(1977—),女,汉族,辽宁鞍山人,讲师,硕士研究生,研究方向:机器学习;王振飞(1999—),男,汉族,山东省潍坊人,硕士在读,研究方向:机器学习与群智能优化算法;周薇(1999—),女,汉族,河北定州人,硕士在读,研究方向:群智能优化算法。