From Persistent Identifiers to Digital Objects to Make Data Science More Efficient
Peter Wittenburg
2019-03-25
Source PublicationData Intelligence
Volume1Issue:1Pages:6-20
Abstract

Data-intensive science is reality in large scientific organizations such as the Max Planck Society, but due to the inefficiency of our data practices when it comes to integrating data from different sources, many
projects cannot be carried out and many researchers are excluded. Since about 80% of the time in dataintensive projects is wasted according to surveys we need to conclude that we are not fit for the challenges
that will come with the billions of smart devices producing continuous streams of data—our methods do not scale. Therefore experts worldwide are looking for strategies and methods that have a potential for the future. The first steps have been made since there is now a wide agreement from the Research Data Alliance to the FAIR principles that data should be associated with persistent identifiers (PIDs) and metadata (MD). In fact after 20 years of experience we can claim that there are trustworthy PID systems already in broad use. It is argued, however, that assigning PIDs is just the first step. If we agree to assign PIDs and also use the PID to store important relationships such as pointing to locations where the bit sequences or different metadata can be accessed, we are close to defining Digital Objects (DOs) which could indeed indicate a solution to solve some of the basic problems in data management and processing. In addition to standardizing the way we assign PIDs, metadata and other state information we could also define a Digital Object Access Protocol as a universal exchange protocol for DOs stored in repositories using different data models and data organizations. We could also associate a type with each DO and a set of operations allowed working on its content which would facilitate the way to automatic processing which has been identified as the major step for scalability in data science and data industry. A globally connected group of experts is now working on establishing testbeds for a DO-based data infrastructure.

KeywordBig Data Data Management Persistent Identifiers Digital Objects Data Infrastructure Data Intensive Science
MOST Discipline Catalogue管理学::图书情报与档案管理
DOI10.1162/dint_a_00004
URL查看原文
Indexed By其他
Language英语
Citation statistics
Document Type期刊论文
Identifierhttp://ir.las.ac.cn/handle/12502/11456
Collection中国科学院文献情报中心(北京)_编辑出版中心_《数据智能》(英文)
AffiliationMax Planck Computing and Data Facility
First Author Affilication中国科学院文献情报中心
Recommended Citation
GB/T 7714
Peter Wittenburg. From Persistent Identifiers to Digital Objects to Make Data Science More Efficient[J]. Data Intelligence,2019,1(1):6-20.
APA Peter Wittenburg.(2019).From Persistent Identifiers to Digital Objects to Make Data Science More Efficient.Data Intelligence,1(1),6-20.
MLA Peter Wittenburg."From Persistent Identifiers to Digital Objects to Make Data Science More Efficient".Data Intelligence 1.1(2019):6-20.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Peter Wittenburg]'s Articles
Baidu academic
Similar articles in Baidu academic
[Peter Wittenburg]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Peter Wittenburg]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.