中国科学院文献情报中心机构知识库
Advanced  
NSL OpenIR  > Journal of Data and Information Science  > Journal of Data and Information Science-2016  > 期刊论文
Title: A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications
Author: Qiuzi Zhang; Qikai Cheng; Yong Huang; Wei Lu
Source: Journal of Data and Information Science
Issued Date: 2016-03-17
Volume: 9, Issue:1, Pages:69-85
Keyword: Data-usage statements extraction ; Information extraction ; Bootstrapping ; Unsupervised learning ; Academic text-mining
Subject: 新闻学与传播学 ; 图书馆、情报与文献学
Indexed Type: 其他
DOI: 10.20309/jdis.201606
Corresponding Author: Wei Lu (E-mail: weilu@whu.edu.cn).
DOC Type: Research Papers
Abstract:
Purpose: Our study proposes a bootstrapping-based method to automatically extract datausage statements from academic texts.

Design/methodology/approach: The method for data-usage statements extraction starts with seed entities and iteratively learns patterns and data-usage statements from unlabeled text. In each iteration, new patterns are constructed and added to the pattern list based on their calculated score. Three seed-selection strategies are also proposed in this paper.

Findings: The performance of the method is verified by means of experiments on real data collected from computer science journals. The results show that the method can achieve satisfactory performance regarding precision of extraction and extensibility of obtained patterns.

Research limitations: While the triple representation of sentences is effective and efficient for extracting data-usage statements, it is unable to handle complex sentences. Additional features that can address complex sentences should thus be explored in the future.

Practical implications: Data-usage statements extraction is beneficial for data-repository construction and facilitates research on data-usage tracking, dataset-based scholar search, and dataset evaluation.

Originality/value: To the best of our knowledge, this paper is among the first to address the important task of automatically extracting data-usage statements from real data.
English Abstract:
Purpose: Our study proposes a bootstrapping-based method to automatically extract datausage statements from academic texts.

Design/methodology/approach: The method for data-usage statements extraction starts with seed entities and iteratively learns patterns and data-usage statements from unlabeled text. In each iteration, new patterns are constructed and added to the pattern list based on their calculated score. Three seed-selection strategies are also proposed in this paper.

Findings: The performance of the method is verified by means of experiments on real data collected from computer science journals. The results show that the method can achieve satisfactory performance regarding precision of extraction and extensibility of obtained patterns.

Research limitations: While the triple representation of sentences is effective and efficient for extracting data-usage statements, it is unable to handle complex sentences. Additional features that can address complex sentences should thus be explored in the future.

Practical implications: Data-usage statements extraction is beneficial for data-repository construction and facilitates research on data-usage tracking, dataset-based scholar search, and dataset evaluation.

Originality/value: To the best of our knowledge, this paper is among the first to address the important task of automatically extracting data-usage statements from real data.
Project Number: Grant No.: 71473183
Project: This work was supported by the National Natural Science Foundation of China
Related URLs: 查看原文
Language: 英语
Citation statistics:
Content Type: 期刊论文
URI: http://ir.las.ac.cn/handle/12502/8479
Appears in Collections:Journal of Data and Information Science_Journal of Data and Information Science-2016 _期刊论文

Files in This Item: Download All
File Name/ File Size Content Type Version Access License
20160106.pdf(1379KB)期刊论文出版稿开放获取View Download

description.institution: School of Information Management, Wuhan University, Wuhan 430072, China

Recommended Citation:
Qiuzi Zhang,Qikai Cheng,Yong Huang,et al. A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications[J]. Journal of Data and Information Science,2016,9(1):69-85.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[Qiuzi Zhang]'s Articles
[Qikai Cheng]'s Articles
[Yong Huang]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[Qiuzi Zhang]‘s Articles
[Qikai Cheng]‘s Articles
[Yong Huang]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
文件名: 20160106.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院文献情报中心 - Feedback
Powered by CSpace