A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications
Qiuzi Zhang; Qikai Cheng; Yong Huang; Wei Lu; Wei Lu (E-mail: weilu@whu.edu.cn).
2016-03-17
Source PublicationJournal of Data and Information Science
Volume9Issue:1Pages:69-85
Abstract
Purpose: Our study proposes a bootstrapping-based method to automatically extract datausage statements from academic texts.

Design/methodology/approach: The method for data-usage statements extraction starts with seed entities and iteratively learns patterns and data-usage statements from unlabeled text. In each iteration, new patterns are constructed and added to the pattern list based on their calculated score. Three seed-selection strategies are also proposed in this paper.

Findings: The performance of the method is verified by means of experiments on real data collected from computer science journals. The results show that the method can achieve satisfactory performance regarding precision of extraction and extensibility of obtained patterns.

Research limitations: While the triple representation of sentences is effective and efficient for extracting data-usage statements, it is unable to handle complex sentences. Additional features that can address complex sentences should thus be explored in the future.

Practical implications: Data-usage statements extraction is beneficial for data-repository construction and facilitates research on data-usage tracking, dataset-based scholar search, and dataset evaluation.

Originality/value: To the best of our knowledge, this paper is among the first to address the important task of automatically extracting data-usage statements from real data.
;
Purpose: Our study proposes a bootstrapping-based method to automatically extract datausage statements from academic texts.

Design/methodology/approach: The method for data-usage statements extraction starts with seed entities and iteratively learns patterns and data-usage statements from unlabeled text. In each iteration, new patterns are constructed and added to the pattern list based on their calculated score. Three seed-selection strategies are also proposed in this paper.

Findings: The performance of the method is verified by means of experiments on real data collected from computer science journals. The results show that the method can achieve satisfactory performance regarding precision of extraction and extensibility of obtained patterns.

Research limitations: While the triple representation of sentences is effective and efficient for extracting data-usage statements, it is unable to handle complex sentences. Additional features that can address complex sentences should thus be explored in the future.

Practical implications: Data-usage statements extraction is beneficial for data-repository construction and facilitates research on data-usage tracking, dataset-based scholar search, and dataset evaluation.

Originality/value: To the best of our knowledge, this paper is among the first to address the important task of automatically extracting data-usage statements from real data.
SubtypeResearch Papers
KeywordData-usage Statements Extraction Information Extraction Bootstrapping Unsupervised Learning Academic Text-mining
Subject Area新闻学与传播学 ; 图书馆、情报与文献学
DOI10.20309/jdis.201606
URL查看原文
Indexed By其他
Project NumberGrant No.: 71473183
Language英语
Funding ProjectThis work was supported by the National Natural Science Foundation of China
Citation statistics
Document Type期刊论文
Identifierhttp://ir.las.ac.cn/handle/12502/8479
CollectionJournal of Data and Information Science_Journal of Data and Information Science-2016
Corresponding AuthorWei Lu (E-mail: weilu@whu.edu.cn).
AffiliationSchool of Information Management, Wuhan University, Wuhan 430072, China
Recommended Citation
GB/T 7714
Qiuzi Zhang,Qikai Cheng,Yong Huang,et al. A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications[J]. Journal of Data and Information Science,2016,9(1):69-85.
APA Qiuzi Zhang,Qikai Cheng,Yong Huang,Wei Lu,&Wei Lu .(2016).A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications.Journal of Data and Information Science,9(1),69-85.
MLA Qiuzi Zhang,et al."A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications".Journal of Data and Information Science 9.1(2016):69-85.
Files in This Item: Download All
File Name/Size DocType Version Access License
20160106.pdf(1379KB)期刊论文出版稿开放获取CC BY-NC-NDView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Qiuzi Zhang]'s Articles
[Qikai Cheng]'s Articles
[Yong Huang]'s Articles
Baidu academic
Similar articles in Baidu academic
[Qiuzi Zhang]'s Articles
[Qikai Cheng]'s Articles
[Yong Huang]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Qiuzi Zhang]'s Articles
[Qikai Cheng]'s Articles
[Yong Huang]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 20160106.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.