中国科学院文献情报中心机构知识库
Advanced  
NSL OpenIR  > Journal of Data and Information Science  > Chinese Journal of Library and Information Science-2013  > 期刊论文
Title: IESRL: An information extraction system for research level
Author: Leng, FuHai(冷伏海) ; BAI Rujiang ; ZHU Qingsong
Source: Chinese Journal of Library and Information Science
Issued Date: 2013-12-25
Volume: 6, Issue:4, Pages:16-27
Keyword: Research papers ; Information extraction ; Semantic labeling ; Regular expression ; Conditional random fields ; Research level
Subject: 编辑出版
Corresponding Author: Bai Rujiang (E-mail: bairj@mail.las.ac.cn)
Abstract:
Purpose: In order to annotate the semantic information and extract the research level information of research papers, we attempt to seek a method to develop an information extraction system.

Design/methodology/approach: Semantic dictionary and conditional random field model (CRFM) were used to annotate the semantic information of research papers. Based on the annotation results, the research level information was extracted through regular expression. All the functions were implemented on Sybase platform.

Findings: According to the result of our experiment in carbon nanotube research, the precision and recall rates reached 65.13% and 57.75%, respectively after the semantic properties of word class have been labeled, and F-measure increased dramatically from less than 50% to 60.18% while added with semantic features. Our experiment also showed that the information extraction system for research level (IESRL) can extract performance indicators from research papers rapidly and effectively.

Research limitations: Some text information, such as that of format and chart, might have been lost due to the extraction processing of text format from PDF to TXT files. Semantic labeling on sentences could be insufficient due to the rich meaning of lexicons in the semantic dictionary.

Research implications: The established system can help researchers rapidly compare the level of different research papers and find out their implicit innovation values. It could also be used as an auxiliary tool for analyzing research levels of various research institutions.

Originality/value: In this work, we have successfully established an information extraction system for research papers by a revised semantic annotation method based on CRFM and the semantic dictionary. Our system can analyze the information extraction problem from two levels, i.e. from the sentence level and noun (phrase) level of research papers. Compared with the extraction method based on knowledge engineering and that on machine learning, our system shows advantages of the both.
English Abstract:
Purpose: In order to annotate the semantic information and extract the research level information of research papers, we attempt to seek a method to develop an information extraction system.

Design/methodology/approach: Semantic dictionary and conditional random field model (CRFM) were used to annotate the semantic information of research papers. Based on the annotation results, the research level information was extracted through regular expression. All the functions were implemented on Sybase platform.

Findings: According to the result of our experiment in carbon nanotube research, the precision and recall rates reached 65.13% and 57.75%, respectively after the semantic properties of word class have been labeled, and F-measure increased dramatically from less than 50% to 60.18% while added with semantic features. Our experiment also showed that the information extraction system for research level (IESRL) can extract performance indicators from research papers rapidly and effectively.

Research limitations: Some text information, such as that of format and chart, might have been lost due to the extraction processing of text format from PDF to TXT files. Semantic labeling on sentences could be insufficient due to the rich meaning of lexicons in the semantic dictionary.

Research implications: The established system can help researchers rapidly compare the level of different research papers and find out their implicit innovation values. It could also be used as an auxiliary tool for analyzing research levels of various research institutions.

Originality/value: In this work, we have successfully established an information extraction system for research papers by a revised semantic annotation method based on CRFM and the semantic dictionary. Our system can analyze the information extraction problem from two levels, i.e. from the sentence level and noun (phrase) level of research papers. Compared with the extraction method based on knowledge engineering and that on machine learning, our system shows advantages of the both.
Related URLs: 查看原文
Content Type: 期刊论文
URI: http://ir.las.ac.cn/handle/12502/6703
Appears in Collections:Chinese Journal of Library and Information Science-2013_期刊论文

Files in This Item: Download All
File Name/ File Size Content Type Version Access License
Fuhai LENG.pdf(4090KB)----开放获取
View Download

Recommended Citation:
LENG Fuhai,BAI Rujiang,ZHU Qingsong. IESRL: An information extraction system for research level[J]. Chinese Journal of Library and Information Science,2013,6(4):16-27.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[LENG Fuhai]'s Articles
[BAI Rujiang]'s Articles
[ZHU Qingsong]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[LENG Fuhai]‘s Articles
[BAI Rujiang]‘s Articles
[ZHU Qingsong]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
文件名: Fuhai LENG.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院文献情报中心 - Feedback
Powered by CSpace