摘 要:基于 R 语言 Shiny 框架,搭建中文交互式在线可视化系统,实现对中文文本数据的初步处理与可视化分析。首先,介绍中文文本可视化理论与 Shiny 框架下交互式应用的搭建;其次,通过对用户需求进行分析,制定系统的设计目标和原则,提出系统功能与 Shiny 框架下交互式界面,并设计相关模块的实现。最后,进行总结,可知:系统能够进行文本分词、清洗与向量化等文本初步处理;同时,能够绘制基于词频、TF-IDF 的条形图,词云图以及基于时序、地理位置的动态图等图形。
关键词:文本可视化;系统设计;R 语言;Shiny
DOI:10.19850/j.cnki.2096-4706.2023.07.006
基金项目:天津市研究生科研创新项目(2021YJSS279)
中图分类号:TP274;TP391.1 文献标识码:A 文章编号:2096-4706(2023)07-0024-04
Research on the Design of Chinese Text Visualization System Based on R-Shiny
GAN Yating, AN Jianye, MIAO Luxin
(Tianjin University of Commerce, Tianjin 300134, China)
Abstract: Based on the R language Shiny framework, a Chinese interactive online visualization system has been built to realize the preliminary processing and visualization analysis of Chinese text data. Firstly, this paper introduces Chinese text visualization theory and the construction of interactive application under the Shiny framework. Secondly, through the analysis of user requirements, it formulates the design objectives and principles of the system, proposes the system functions and the interactive interface under the Shiny framework, and designs the implementation of related modules. Finally, it is concluded that the system can perform text segmentation, cleaning and vectorization and other preliminary text processing. At the same time, it can draw bar charts based on word frequency and TF-IDF, word cloud diagram, dynamic graphs based on time sequence and geographical location and other graphs.
Keywords: text visualization; system design; R language; Shiny
参考文献:
[1] 马创新,陈小荷 . 文本的可视化知识表示 [J]. 情报科学,2017,35(3):122-127.
[2] 刘玉琴,汪雪锋,雷孝平 . 科研关系构建与可视化系统设计与实现 [J]. 图书情报工作,2015,59(8):103-110+125.
[3] 何巍.社交媒体数据可视化分析综述 [J].科学技术与工程,2020,20(32):13085-13090.
[4] shiny 官方网站 [EB/OL].(2022-10-22).https://shiny. rstudio.com.
[5] 熊辉 . 职业健康数据的可视化分析研究 [D]. 福州:福州大学,2018.
[6] Wise J A,Thomas J J,Pennock K,et al. Visualizing the non-visual:Spatial Analysis and Interaction with Information from Text Documents [C]//Proceedings of Visualization 1995 Conference. IEEE, 1995:51-58.
[7] 马明明 . 面向文本的标签云可视化度量模型的研究 [D].北京:北京交通大学,2018.
[8] 袁海,陈康,陶彩霞,等 . 基于中文文本的可视化技术研究 [J]. 电信科学,2014,30(4):114-122.
[9] WATTENBERG M,VIÉGAS F B. The Word Tree. An Interactive Visual Concordance [J]. IEEE transactions on visualization and computer graphics,2008,14(6):1221.
[10] HAM F V,WATTENBERG M,VIÉGAS F B. Mapping Text with Phrase Nets [J]. IEEE transactions on visualization and computer graphics,2009,15(6).
[11] 兰德 .R 语言 [M]. 蒋家坤,译 . 北京:机械工业出版社,2015.
作者简介:淦亚婷(1997—),女,汉族,江西九江人,研究生在读,研究方向:短文本分类;安建业(1969—),男,汉族,内蒙古乌兰察布人,教授,硕士,研究方向:文本挖掘、统计建模与统计模式识别;苗漉欣(2000—),女,汉族,山西长治人,研究生在读,研究方向:中文文本可视化。