摘 要:为提高中文电子病历中命名实体识别模型鲁棒性和准确性,为此提出一种基于 BERT 模型融入对抗网络的中文电子命名实体识别模型,该方法使用 BERT 预训练模型动态生成字向量,通过对抗训练生成扰动,将字向量与扰动相加生成对抗样本,再通过膨胀卷积网络(IDCNN)捕捉句子单词间的依赖,最后通过条件随机场(CRF)得到最终预测结果。在 CCKS 2019数据集上的实验表明,模型的 F1 值达到 83.19%,证明该模型的有效性。
关键词:命名实体识别;中文电子病历;BERT;对抗训练;
DOI:10.19850/j.cnki.2096-4706.2023.02.022
基金项目:2021 安徽省重点研究与开发计划项目(202104d07020010)
中图分类号:TP391.1 文献标识码:A 文章编号:2096-4706(2023)02-0090-04
Named Entity Recognition of Chinese Electronic Medical Record Integrated with Confrontation Training
LI Manyu, YU Li
(School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan 232001, China)
Abstract: In order to improve the robustness and accuracy of the named entity recognition model in Chinese electronic medical records, a Chinese electronic named entity recognition model based on the BERT model and the confrontation network is proposed. The method uses the BERT pre-training model to dynamically generate the word vector, generates the disturbance through the confrontation training, adds the word vector and the disturbance to generate the confrontation sample, and then captures the dependency between the words in the sentence through the Iterated Dilated Con-volutional Neural Network(IDCNN). Finally, the final prediction result is obtained by Conditional Random Field (CRF). The experiment on CCKS 2019 dataset shows that the F1 value of the model reaches 83.19%, which proves the effectiveness of the model.
Keywords: named entity recognition; Chinese electronic medical record; BERT; confrontation training
参考文献:
[1] WU F Z,LIU J X,WU C H,et al.Neural Chinese Named Entity Recognition via CNN-LSTM-CRF and Joint Training with Word Segmentation [J].The World Wide Web Conference,2019:3342-3348.
[2] 陈茹,卢先领 . 融合空洞卷积神经网络与层次注意力机制的中文命名实体识别 [J]. 中文信息学报,2020,34(8):70-77.
[3] 赵萍,窦全胜,唐焕玲,等 . 融合词信息嵌入的注意力自适应命名实体识别 [J/OL]. 计算机工程与应用,2022:1-9[2022-08-13].http://kns.cnki.net/kcms/detail/11.2127.tp.20220524.1001.005.html.
[4] 谭岩杰, 陈 玮, 尹 钟 . 门控空洞卷积与级联网络中文命名实体识别 [J/OL]. 小型微型计算机系统,2022:1-10[2022-08-13].http://kns.cnki.net/kcms/detail/21.1106.tp.20220418.1445.032.html.
[5] 杨红梅,李琳,杨日东,等 . 基于双向 LSTM 神经网络电子病历命名实体的识别模型 [J]. 中国组织工程研究,2018,22(20):3237-3242.
[6] 巩敦卫,张永凯,郭一楠,等 . 融合多特征嵌入与注意力机制的中文电子病历命名实体识别 [J].工程科学学报,2021,43(9):1190-1196.
[7] 梁文桐,朱艳辉,詹飞,等 . 基于 BERT 的医疗电子病历命名实体识别 [J]. 湖南工业大学学报,2020,34(4):54-62.
[8] 孔令巍,朱艳辉,张旭,等 . 基于对抗训练的中文电子病历命名实体识别 [J]. 湖南工业大学学报,2022,36(3):36-43.
[9] JAWAHAR G,SAGOT B,SEDDAH D.What Does BERT Learn about the Structure of Language [C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence:ACL,2019:3651-3657.
[10] MIYATO T,DAI A M,GOODFELLOW I.Adversarial Training Methods for Semi-Supervised Text Classification [J/OL].arXiv: 1605.07725 [stat.ML].[2022-08-09].https://arxiv.org/abs/1605.07725.
[11] MADRY A,MAKELOV A,SCHMIDT L,et al.Towards Deep Learning Models Resistant to Adversarial Attacks [J/OL].arXiv: 1706.06083 [stat.ML].[2022-08-11].https://arxiv.org/abs/1706.06083.
[12] STRUBELL E,VERGA P,BELANGER D,et al.Fast and Accurate Entity Recognition with Iterated Dilated Convolutions [J/OL]. arXiv:1702.02098 [cs.CL].[2022-08-16].https://arxiv.org/abs/1702.02098v3.
作者简介:李曼玉(1997—),女,汉族,安徽蚌埠人,硕士研究生在读,研究方向:自然语言处理;通讯作者:于瓅(1973—),女,汉族,安徽宿州人,教授,博士,研究方向:区块链、图像处理。