中国科学院文献情报中心机构知识库
Advanced  
NSL OpenIR  > Journal of Data and Information Science  > Journal of Data and Information Science-2016  > 期刊论文
Title: Identifying Scientific Project-generated Data Citation from Full-text Articles: An Investigation of TCGA Data Citation
Author: Jiao Li; Si Zheng; Hongyu Kang; Zhen Hou; Qing Qian
Source: Journal of Data and Information Science
Issued Date: 2016-06-17
Volume: 1, Issue:2, Pages:32-44
Keyword: Scientific data ; Full-text literature ; Open access ; PubMed Central ; Data citation
Subject: 新闻学与传播学 ; 图书馆、情报与文献学
Indexed Type: 其他
DOI: 10.20309/jdis.201600
Corresponding Author: Qing Qian (E-mail: qian.qing@imicams.ac.cn).
DOC Type: Research Paper
Abstract:

Purpose: In the open science era, it is typical to share project-generated scientific data by depositing it in an open and accessible database. Moreover, scientific publications are preserved in a digital library archive. It is challenging to identify the data usage that is mentioned in literature and associate it with its source. Here, we investigated the data usage of a government-funded cancer genomics project, The Cancer Genome Atlas (TCGA), via a full-text literature analysis.
Design/methodology/approach: We focused on identifying articles using the TCGA dataset and constructing linkages between the articles and the specific TCGA dataset. First, we collected 5,372 TCGA-related articles from PubMed Central (PMC). Second, we constructed a benchmark set with 25 full-text articles that truly used the TCGA data in their studies, and we summarized the key features of the benchmark set. Third, the key features were applied to the remaining PMC full-text articles that were collected from PMC.
Findings: The amount of publications that use TCGA data has increased significantly since 2011, although the TCGA project was launched in 2005. Additionally, we found that the critical areas of focus in the studies that use the TCGA data were glioblastoma multiforme, lung cancer, and breast cancer; meanwhile, data from the RNA-sequencing (RNA-seq) platform is the most preferable for use.
Research limitations: The current workflow to identify articles that truly used TCGA data is labor-intensive. An automatic method is expected to improve the performance.
Practical implications: This study will help cancer genomics researchers determine the latest advancements in cancer molecular therapy, and it will promote data sharing and data-intensive scientific discovery.
Originality/value: Few studies have been conducted to investigate data usage by governmentfunded projects/programs since their launch. In this preliminary study, we extracted articles that use TCGA data from PMC, and we created a link between the full-text articles and the source data.

English Abstract:

Purpose: In the open science era, it is typical to share project-generated scientific data by depositing it in an open and accessible database. Moreover, scientific publications are preserved in a digital library archive. It is challenging to identify the data usage that is mentioned in literature and associate it with its source. Here, we investigated the data usage of a government-funded cancer genomics project, The Cancer Genome Atlas (TCGA), via a full-text literature analysis.
Design/methodology/approach: We focused on identifying articles using the TCGA dataset and constructing linkages between the articles and the specific TCGA dataset. First, we collected 5,372 TCGA-related articles from PubMed Central (PMC). Second, we constructed a benchmark set with 25 full-text articles that truly used the TCGA data in their studies, and we summarized the key features of the benchmark set. Third, the key features were applied to the remaining PMC full-text articles that were collected from PMC.
Findings: The amount of publications that use TCGA data has increased significantly since 2011, although the TCGA project was launched in 2005. Additionally, we found that the critical areas of focus in the studies that use the TCGA data were glioblastoma multiforme, lung cancer, and breast cancer; meanwhile, data from the RNA-sequencing (RNA-seq) platform is the most preferable for use.
Research limitations: The current workflow to identify articles that truly used TCGA data is labor-intensive. An automatic method is expected to improve the performance.
Practical implications: This study will help cancer genomics researchers determine the latest advancements in cancer molecular therapy, and it will promote data sharing and data-intensive scientific discovery.
Originality/value: Few studies have been conducted to investigate data usage by governmentfunded projects/programs since their launch. In this preliminary study, we extracted articles that use TCGA data from PMC, and we created a link between the full-text articles and the source data.

Project Number: Grant No.: 13R0101
Project: the Fundamental Research Funds for the Central Universities ; the National Population and Health Scientific Data Sharing Program of China, the Knowledge Centre for Engineering Sciences and Technology (Medical Centre)
Related URLs: 查看原文
Language: 英语
Citation statistics:
Content Type: 期刊论文
URI: http://ir.las.ac.cn/handle/12502/8596
Appears in Collections:Journal of Data and Information Science_Journal of Data and Information Science-2016 _期刊论文

Files in This Item: Download All
File Name/ File Size Content Type Version Access License
20160204.pdf(1807KB)期刊论文出版稿开放获取View Download

description.institution: Institute of Medical Information and Library, Chinese Academy of Medical Sciences, Beijing 100020, China

Recommended Citation:
Jiao Li,Si Zheng,Hongyu Kang,et al. Identifying Scientific Project-generated Data Citation from Full-text Articles: An Investigation of TCGA Data Citation[J]. Journal of Data and Information Science,2016,1(2):32-44.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[Jiao Li]'s Articles
[Si Zheng]'s Articles
[Hongyu Kang]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[Jiao Li]‘s Articles
[Si Zheng]‘s Articles
[Hongyu Kang]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
文件名: 20160204.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院文献情报中心 - Feedback
Powered by CSpace