摘 要:胰腺癌(PAAD)是一种发生在胰腺的恶性肿瘤,起病隐匿,早期诊断困难,进展迅速,生存时间短,是预后最差的恶性肿瘤之一,被称为“癌中之王”。胰腺癌的致病因素目前还尚不清楚,但生物标志物的发现为胰腺癌的预后诊断指明了一个方向。文章采用了数据挖掘的方法对多个胰腺癌的 RNA 基因表达数据进行分析,挖掘出可能用于胰腺癌诊断的生物标志物。最后经生存分析验证,发现 NDC80,CDC20,CCNB1,KIF11 这四个标志物可能对胰腺癌的治疗起到减轻疼痛和降低病情恶化程度的作用。
关键词:胰腺癌;生物标志物;基因表达;limma;Kaplan-Meier;数据挖掘
DOI:10.19850/j.cnki.2096-4706.2023.05.029
中图分类号:TP391 文献标识码:A 文章编号:2096-4706(2023)05-0120-04
Application of Data Mining in Pancreatic Adenocarcinoma
XIA Wentao, WANG Yun, YAN Xinping
(School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China)
Abstract: Pancreatic Adenocarcinoma (PAAD) is a malignant tumor that occurs in the pancreas with insidious onset, difficult early diagnosis, rapid progression and short survival time. It is one of the malignant tumors with the worst prognosis. Pancreatic Adenocarcinoma is known as the “king of cancers”. The pathogenic factors of Pancreatic Adenocarcinoma are currently unclear, but the discovery of biomarkers points to another direction for the prognosis and diagnosis of Pancreatic Adenocarcinoma. In this paper, the method of data mining is used to analyze the RNA gene expression data of multiple patients with Pancreatic Adenocarcinoma, and the biomarkers that may be used for the diagnosis of Pancreatic Adenocarcinoma are mined. Finally, after verification by survival analysis, it is found that the four biomarkers of NDC80, CDC20, CCNB1 and KIF11 may play a role in reducing pain and the degree of disease deterioration in the treatment of Pancreatic Adenocarcinoma.
Keywords: Pancreatic Adenocarcinoma; biomarker; gene expression; limma; Kaplan-Meier; data mining
参考文献:
[1] 刘宗超,李哲轩,张阳,等.2020全球癌症统计报告解读 [J].肿瘤综合治疗电子杂志,2021,7(2):1-14.
[2] YU C,LIN Chang,LIN Y,et al. Clustering heatmap for visualizing and exploring complex and high-dimensional data related to chronic kidney disease [J].J Clin Med,2020,9(2):403.
[3] LIU S,WANG Z,ZHU R,et al. Three differential expression analysis methods for rna sequencing:limma,edger,deseq2 [J/OL].J Vis Exp,2021(175):(2022-09-12).https://pubmed.ncbi. nlm.nih.gov/34605806/.
[4] QIU W,QI B,LIN W,et al. Predicting the lung adenocarcinoma and its biomarkers by integrating gene expression and dna methylation data [J/OL].Front Genet,2022,13:926927(2022-09-12).https://pubmed.ncbi.nlm.nih.gov/35846148/.
[5] 陈玉升,郭杨,申汉威,等 . 胶质瘤差异表达基因筛选、功能富集和相关信号通路生物信息学分析 [J]. 中华医学杂志,2019,99(29):2311-2314.
[6] ZOU X,AN K,WU Y,et al. PPI network analyses of human WD40 protein family systematically reveal their tendency to assemble complexes and facilitate the complex predictions [J].BMC Syst Biol,2018,12(Suppl 4):41.
[7] DONCHEVA N T,MORRIS J H,GORODKIN J,et al. Cytoscape stringapp:network analysis and visualization of proteomics data [J].J Proteome Res,2019,18(2):623-632.
[8] SCHOBER P,VETTER T R. Kaplan-meier curves,logrank tests,and cox regression for time-to-event data [J].Anesthesia & Analgesia,2021,132(4):969-970.
[9] 陈亮 . 基于多种生物数据的 miRNA 簇进化与 miRNA 肿瘤标志物研究 [D]. 长春:吉林大学,2016.
[10] 鲍宗博,高莹,刘建伟 . 从 Venn 图看信息论中各信息量之间的关系 [J]. 高等数学研究,2020,23(1):69-72+75.
[11] 郑建清,黄碧芬 . 基于 Kaplan-Meier plotter 数据库分析 CD(44)基因表达状态对卵巢癌生存结局的影响 [J]. 吉林医学,2022,43(1):5-7.
作者简介:夏文韬(1996—),男,汉族,江苏宜兴人,硕士研究生在读,主要研究方向:数据挖掘和生物大数据处理;通信作者:王筠(1992—),女,汉族,江西景德镇人,助教,主要研究方向:统计学理论与应用研究。