当前位置>主页 > 期刊在线 > 计算机技术 >

计算机技术2020年4期

​基于 R 软件和数据库的生物信息学分析设计
张婕,李梦婷
(徐州医科大学 生命科学学院,江苏 徐州 221004)

摘  要:选取 NCBI 基因表达谱数据库中访问号为 GSE41439 的基因芯片数据集为分析对象,首先利用 R 软件筛选差异表达基因并绘制成聚类热图,然后将差异基因上传至 DAVID 数据库进行 GO 功能与 KEGG 通路富集分析,接着利用 STRING 数据库构建蛋白质互作网络,并利用 Cytoscape 软件进行可视化,以直观地观察蛋白与蛋白之间的相互关系。由蛋白互作网络筛选出 4 个关键基因:PIK3R1、GNAS、GNAL、GNG4,可对其进行更深入的讨论。此方法适用于多种基因芯片的研究,具有很好的可推广性,将其运用于疾病相关的基因芯片,可为医学诊断与精准治疗提供一定的帮助。


关键词:生物信息学;R 软件;DAVID 数据库;STRING 数据库;Cytoscape



中图分类号:R319         文献标识码:A         文章编号:2096-4706(2020)04-0076-04


Bioinformatics Analysis and Design Based on R-studio and Databases

ZHANG Jie,LI Mengting

(School of Life Sciences,Xuzhou Medical University,Xuzhou 221004,China)

Abstract:The gene chip data set with access number GSE41439 in NCBI gene expression profile database is selected as the analysis object. Firstly,the differential expression genes are screened by R-studio and the clustering heat map is drawn,then the differential genes are uploaded to DAVID database for GO function and KEGG pathway enrichment analysis,and then the protein interaction network is constructed by using STRING database,and can be seen by using Cytoscape software to observe the relationship between protein and protein directly. Four key genes,PIK3R1,GNAS,GNAL and GNG4,were screened out by protein interaction network,which can be further discussed. This method is suitable for the research of many kinds of gene chips,and has good generalization. It can be applied to the disease-related gene chips,which can provide some help for medical diagnosis and precise treatment.

Keywords:bioinformatics;R-studio;DAVID data base;STRING data base;Cytoscape


基金项目:江苏省大学生创新创业训练计划一般项目(201910313070Y);基础医学国家级实验教学示范中心(徐州医科大学)资助项目


参考文献:

[1] 褚皓 . 数据挖掘在生物信息学中的应用 [J]. 数字技术与应用,2018,36(10):123-124.

[2] LUSCOMBE NM,GREENBAUM D,GERSTEIN M. What is bioinformatics? A proposed definition and overview of the field [J]. Methods of Information in Medicine,2001,40(4):346-58.

[3] 吴剑,钱进 .R 软件在工科概率论与数理统计教学中的应用 [J]. 考试周刊,2019(29):29.

[4] HUANG D W,SHERMAN B T,QINA T,et al. DAVID Bioinformatics Resources:expanded annotation database and novel algorithms to better extract biology from large gene lists [J].Nucleic Acids Research,2007,35(Web Server issue):169-175.

[5] FRANCESCHINI A,SZKLARCZYK D,FRANKILD S,et al. STRING v9.1:protein-protein interaction networks,with increased coverage and integration [J].Nucleic Acids Research,2013,41(D1):808-815.

[6] 杨淼,杜菁,李冬果,等 . 基于 Cytoscape 的 miRNA 调控网络的构建与研究 [J]. 中国医学装备,2018,15(10):95-97.

[7] HAMMOND D E,HYDE R,KRATCHMAROVA I,et al. Quantitative Analysis of HGF and EGF-Dependent Phosphotyrosine Signaling Networks [J].Journal of Proteome Research,2010,9(5):2734-2742.


作者简介:张婕(1998.10-),女,汉族,江苏淮安人,本科在读,研究方向:生物信息学。