(安徽理工大学 计算机科学与工程学院,安徽 淮南 232001)

摘  要:对抗样本的存在对自然语言处理领域的众多应用存在安全威胁,对抗攻击方法的研究有助于评估甚至提高深度神经网络模型的鲁棒性。现有的词级文本对抗攻击在生成对抗样本的过程中,依赖于单词重要性评分并排序,但效率低下,需要频繁访问目标模型来获取重要性分数。文章针对该问题,提出通过训练替代模型计算单词重要性分数,并结合语义相似度分层采样后得到的目标模型决策概率差值,对原始输入中的单词进行排序。在文本分类任务上的实验结果证明了该方法的有效性。



中图分类号:TP391                                          文献标识码:A                                 文章编号:2096-4706(2022)17-0078-04

Research on Adversarial Text Generation Based on Word Replacement

WANG Xiaojuan

(School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan 232001, China)

Abstract: The existence of adversarial samples pose a security threat to many applications in the field of natural language processing. Research on adversarial attack methods can help to evaluate and even improve the robustness of deep neural network models. Existing word-level text adversarial attacks rely on word importance scoring and ranking in the process of generating adversarial samples, but they are inefficient and require frequent access to the target model to obtain important scores Aiming at this problem, this paper proposes to rank the words in the original input by calculating the word importance score through training the substitute model and combining the decision probability difference of the target model obtained after stratified sampling of semantic similarity. Experimental results on text classification tasks demonstrate the effectiveness of the method.

Keywords: text adversarial attack; black box attack; deep neural network; natural language processing


