A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications
Qiuzi Zhang; Qikai Cheng; Yong Huang; Wei Lu; Wei Lu (E-mail: weilu@whu.edu.cn).
2016-03-17
发表期刊Journal of Data and Information Science
卷号9期号:1页码:69-85
摘要
Purpose: Our study proposes a bootstrapping-based method to automatically extract datausage statements from academic texts.

Design/methodology/approach: The method for data-usage statements extraction starts with seed entities and iteratively learns patterns and data-usage statements from unlabeled text. In each iteration, new patterns are constructed and added to the pattern list based on their calculated score. Three seed-selection strategies are also proposed in this paper.

Findings: The performance of the method is verified by means of experiments on real data collected from computer science journals. The results show that the method can achieve satisfactory performance regarding precision of extraction and extensibility of obtained patterns.

Research limitations: While the triple representation of sentences is effective and efficient for extracting data-usage statements, it is unable to handle complex sentences. Additional features that can address complex sentences should thus be explored in the future.

Practical implications: Data-usage statements extraction is beneficial for data-repository construction and facilitates research on data-usage tracking, dataset-based scholar search, and dataset evaluation.

Originality/value: To the best of our knowledge, this paper is among the first to address the important task of automatically extracting data-usage statements from real data.
;
Purpose: Our study proposes a bootstrapping-based method to automatically extract datausage statements from academic texts.

Design/methodology/approach: The method for data-usage statements extraction starts with seed entities and iteratively learns patterns and data-usage statements from unlabeled text. In each iteration, new patterns are constructed and added to the pattern list based on their calculated score. Three seed-selection strategies are also proposed in this paper.

Findings: The performance of the method is verified by means of experiments on real data collected from computer science journals. The results show that the method can achieve satisfactory performance regarding precision of extraction and extensibility of obtained patterns.

Research limitations: While the triple representation of sentences is effective and efficient for extracting data-usage statements, it is unable to handle complex sentences. Additional features that can address complex sentences should thus be explored in the future.

Practical implications: Data-usage statements extraction is beneficial for data-repository construction and facilitates research on data-usage tracking, dataset-based scholar search, and dataset evaluation.

Originality/value: To the best of our knowledge, this paper is among the first to address the important task of automatically extracting data-usage statements from real data.
文章类型Research Papers
关键词Data-usage Statements Extraction Information Extraction Bootstrapping Unsupervised Learning Academic Text-mining
学科领域新闻学与传播学 ; 图书馆、情报与文献学
DOI10.20309/jdis.201606
URL查看原文
收录类别其他
所属项目编号Grant No.: 71473183
语种英语
资助项目This work was supported by the National Natural Science Foundation of China
引用统计
文献类型期刊论文
条目标识符http://ir.las.ac.cn/handle/12502/8479
专题Journal of Data and Information Science_Journal of Data and Information Science-2016
通讯作者Wei Lu (E-mail: weilu@whu.edu.cn).
作者单位School of Information Management, Wuhan University, Wuhan 430072, China
推荐引用方式
GB/T 7714
Qiuzi Zhang,Qikai Cheng,Yong Huang,et al. A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications[J]. Journal of Data and Information Science,2016,9(1):69-85.
APA Qiuzi Zhang,Qikai Cheng,Yong Huang,Wei Lu,&Wei Lu .(2016).A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications.Journal of Data and Information Science,9(1),69-85.
MLA Qiuzi Zhang,et al."A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications".Journal of Data and Information Science 9.1(2016):69-85.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
20160106.pdf(1379KB)期刊论文出版稿开放获取CC BY-NC-ND请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Qiuzi Zhang]的文章
[Qikai Cheng]的文章
[Yong Huang]的文章
百度学术
百度学术中相似的文章
[Qiuzi Zhang]的文章
[Qikai Cheng]的文章
[Yong Huang]的文章
必应学术
必应学术中相似的文章
[Qiuzi Zhang]的文章
[Qikai Cheng]的文章
[Yong Huang]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。