IESRL: An information extraction system for research level
LENG Fuhai; BAI Rujiang; ZHU Qingsong; Bai Rujiang (E-mail: bairj@mail.las.ac.cn)
2013-12-25
发表期刊Chinese Journal of Library and Information Science
ISSN1674-3393
卷号6期号:4页码:16-27
其他摘要
Purpose: In order to annotate the semantic information and extract the research level information of research papers, we attempt to seek a method to develop an information extraction system.

Design/methodology/approach: Semantic dictionary and conditional random field model (CRFM) were used to annotate the semantic information of research papers. Based on the annotation results, the research level information was extracted through regular expression. All the functions were implemented on Sybase platform.

Findings: According to the result of our experiment in carbon nanotube research, the precision and recall rates reached 65.13% and 57.75%, respectively after the semantic properties of word class have been labeled, and F-measure increased dramatically from less than 50% to 60.18% while added with semantic features. Our experiment also showed that the information extraction system for research level (IESRL) can extract performance indicators from research papers rapidly and effectively.

Research limitations: Some text information, such as that of format and chart, might have been lost due to the extraction processing of text format from PDF to TXT files. Semantic labeling on sentences could be insufficient due to the rich meaning of lexicons in the semantic dictionary.

Research implications: The established system can help researchers rapidly compare the level of different research papers and find out their implicit innovation values. It could also be used as an auxiliary tool for analyzing research levels of various research institutions.

Originality/value: In this work, we have successfully established an information extraction system for research papers by a revised semantic annotation method based on CRFM and the semantic dictionary. Our system can analyze the information extraction problem from two levels, i.e. from the sentence level and noun (phrase) level of research papers. Compared with the extraction method based on knowledge engineering and that on machine learning, our system shows advantages of the both.
;
Purpose: In order to annotate the semantic information and extract the research level information of research papers, we attempt to seek a method to develop an information extraction system.

Design/methodology/approach: Semantic dictionary and conditional random field model (CRFM) were used to annotate the semantic information of research papers. Based on the annotation results, the research level information was extracted through regular expression. All the functions were implemented on Sybase platform.

Findings: According to the result of our experiment in carbon nanotube research, the precision and recall rates reached 65.13% and 57.75%, respectively after the semantic properties of word class have been labeled, and F-measure increased dramatically from less than 50% to 60.18% while added with semantic features. Our experiment also showed that the information extraction system for research level (IESRL) can extract performance indicators from research papers rapidly and effectively.

Research limitations: Some text information, such as that of format and chart, might have been lost due to the extraction processing of text format from PDF to TXT files. Semantic labeling on sentences could be insufficient due to the rich meaning of lexicons in the semantic dictionary.

Research implications: The established system can help researchers rapidly compare the level of different research papers and find out their implicit innovation values. It could also be used as an auxiliary tool for analyzing research levels of various research institutions.

Originality/value: In this work, we have successfully established an information extraction system for research papers by a revised semantic annotation method based on CRFM and the semantic dictionary. Our system can analyze the information extraction problem from two levels, i.e. from the sentence level and noun (phrase) level of research papers. Compared with the extraction method based on knowledge engineering and that on machine learning, our system shows advantages of the both.
关键词Research Papers Information Extraction Semantic Labeling Regular Expression Conditional Random Fields Research Level
学科领域编辑出版
URL查看原文
项目资助者This work is supported by the National Social Science Foundation of China (Grant No.12CTQ032).
文献类型期刊论文
条目标识符http://ir.las.ac.cn/handle/12502/6703
专题Journal of Data and Information Science_Chinese Journal of Library and Information Science-2013
通讯作者Bai Rujiang (E-mail: bairj@mail.las.ac.cn)
推荐引用方式
GB/T 7714
LENG Fuhai,BAI Rujiang,ZHU Qingsong,等. IESRL: An information extraction system for research level[J]. Chinese Journal of Library and Information Science,2013,6(4):16-27.
APA LENG Fuhai,BAI Rujiang,ZHU Qingsong,&Bai Rujiang .(2013).IESRL: An information extraction system for research level.Chinese Journal of Library and Information Science,6(4),16-27.
MLA LENG Fuhai,et al."IESRL: An information extraction system for research level".Chinese Journal of Library and Information Science 6.4(2013):16-27.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Fuhai LENG.pdf(4090KB) 开放获取请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[LENG Fuhai]的文章
[BAI Rujiang]的文章
[ZHU Qingsong]的文章
百度学术
百度学术中相似的文章
[LENG Fuhai]的文章
[BAI Rujiang]的文章
[ZHU Qingsong]的文章
必应学术
必应学术中相似的文章
[LENG Fuhai]的文章
[BAI Rujiang]的文章
[ZHU Qingsong]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。