There are substantial amounts of data collected by marine scientific investigators that are not deposited in existing data centres or archive sites. The longer data sets remain isolated within investigatorsï¿½ laboratories, the more likely that the data sets will be lost to the community for long-term use . This is a widely recognized problem, but one which has not seen concerted community action to find solutions, in spite of the fact that many funding agencies are now mandating that the data be made publicly available if the research was supported by public funds. The rapid evolution of the high speed internet and the availability of large digital storage capacities have enabled the transfer and storage of comprehensive data sets. Tools for integration and management of disparate data sets are rapidly becoming available. Why then, are the majority of data collected by researchers still inaccessible? Impediments to data submission stem, in part, from the lack of suitable mechanisms to make it easy for an individual to submit datasets and metadata to a data centre or repository, and the lack of knowledge about the existence of appropriate data centres. Moreover, large, multi-PI projects often have little or no data management support to facilitate the organization and aggregation of the data sets, and the subsequent transfer of data from the individual laboratories into accessible databases. Those relatively tractable issues, however, are arguably less a hurdle to data accessibility than the fundamental lack of incentives for researchers to provide their data for general use in the research community.
A joint project has been established between SCOR, MBLWHOI and IODE to study the potential of data publication/ data citation and to implement a number of use cases. One test case has been implemented by MBLWHOI (USA), and one jointly by BODC (UK) and the IODE project office (with assistance from Hasselt University, Belgium):
- Use case 1: data related to traditional journal articles are assigned persistent identifiers referred to in the articles and stored in institutional document repositories;
- Use case 2. data held by data centres are packaged and served in formats that can be cited: The Published Data Library (PDL) and Published Ocean Data repository (POD).
This "Cookbook" has been written for data managers and librarians who are interested in assigning a permanent identifier to a dataset for the purposes of publishing that dataset online and for the citation of that dataset within the scientific literature. A formal publishing process adds value to the dataset for the data originators as well as for future users of the data. Value may be added by providing an indication of the scientific quality and importance of the dataset (as measured through a process of peer review), and by ensuring that the dataset is complete, frozen and has enough supporting metadata and other information to allow it to be used by others. Publishing a dataset also implies a commitment to persistence of the data and allows data producers to obtain academic credit for their work in creating the datasets. One form of persistent identifier is the Digital Object Identifier (DOI). A DOI is a character string (a "digital identifier") used to provide a unique identity of an object such as an electronic document. Metadata about the object is stored in association with the DOI name and this metadata may include a location where the object can be found. The DOI for a document is permanent, whereas its location and other metadata may change. Referring to an online document by its DOI provides more stable linking than simply referring to it by its URL, because if its URL changes, the publisher need only update the metadata for the DOI to link to the new URL. A DOI may be obtained for a variety of objects, including documents, data files and images. The assignment of DOIs to peer-reviewed journal articles has become commonplace. This cookbook provides a step-by-step guide to the data publication process and showcases some best practices for data publication. This cookbook is an outcome of the 5th session of the SCOR/IODE/MBLWHOI Library Workshop on Data Publication.
The Project and meetings
The Project has been implemented through a number of meetings of the partners. Meetings were so far organized in 2008, 2009, 2010, 2011 and 2012. In addition presentations were made by Group members at selected occasions. The data publication activity is also collaborating with the Research Coordination Network (RCN):OceanObs
SCOR/IODE/MBL WHOI Library Workshop on Data Publication, 5th Session
The fifth SCOR/IODE/MBLWHOI Library Workshop on Data Publication was convened by the Scientific Committee on Oceanic Research (SCOR), the International Oceanographic Data and Information Exchange (IODE) of the Intergovernmental Oceanographic Commission (IOC) and the Marine Biological Laboratory/Woods Hole Oceanographic Institution Library (MBLWHOI Library) on 9-10 October 2012 to evaluate progress of the two pilot projects of the activity and to discuss related topics, such as implementation of data repositories in different data centres and cooperation with related national and international efforts, and hear about how data publication is being handled in other disciplines and interactions with publishers of scientific journals.
The report is available as IOC Workshop Report No. 252
SCOR/IODE/MBLWHOI Library Workshop on Data Publication, 4th Session
The fourth SCOR/IODE/MBLWHOI Library Workshop on Data Publication meeting was convened by the Scientific Committee on Oceanic Research (SCOR), the International Oceanographic Data and Information Exchange (IODE) of the Intergovernmental Oceanographic Commission (IOC) and the Marine Biological Laboratory/Woods Hole Oceanographic Institution Library (MBLWHOI Library) on 3-4 November 2011 to evaluate progress of the two pilot projects of the activity and to discuss related topics, such as implementation of data repositories in different data centres, cooperation with related national and international efforts, hear about how data publication is being handled in other disciplines, interactions with publishers of scientific journals, economic implications of data publication.
The report is available as IOC Workshop Report No. 244
Presentation at the "22nd International CODATA Conference"
24-27 October 2010, Cape Town, South Africa (The 22nd International CODATA Conference "Scientific Information for Society: Scientific Data and Sustainable Development"). Roy Lowry gave a powerpoint presentation entitled "Data Centre-Library Co-operation in Data Publication in Ocean Science".
Third SCOR/IODE/MBL WHOI Workshop on Data Citation
The meeting was held at UNESCO/IOC Headquarters on 2 April 2010. More information is avalable HERE. The meeting was informed about the MBL WHOI Library - data repository for data supporting published articles. It also looked at data publication from the point of view of BODC as "typical" IODE data centre. The meeting also considered the way forward and consider technical issues.
The report is avaialable as IOC Workshop Report No. 230
Second SCOR/IODE Workshop on Data Publishing
The meeting was held at the IOC Project Office for IODE, Oostende, Belgium between 9-11 March 2009. Find out more about the meeting HERE .
First SCOR/IODE Workshop on Data Publishing
SCOR and IODE organized the "SCOR/IODE Workshop on Data Publishing " in 2008 (17-18 June 2008, IOC Project Office for IODE) to deal with this issue and decided to start a pilot activity to promote the ability to "publish" data sests as unique objects and their citation by other researchers as a missing incentive to improve data flow to NODCs.
The report of the 2008 Workshop is available as IOC Workshop Report No. 207 .
Background documentation is available HERE
The following Presentations were made during the workshop:
- Citable Data Publication Objectives, Status and Issues (Roy Lowry, BODC, UK) (PPT )
- How to motivate scientists to publish data online (Mark Costello, University of Auckland, New Zealand) (PPT converted to PDF )
- Connecting Researchers with Data (Craig Emerson, ProQuest, USA) (PPT )
- An outsider's view of the GenBank experience (Peter Wiebe, Woods Hole Oceanographic Institution, USA) (PPT )
Additional reading on data publishing:
The skills and career structure of data scientists and curators: an assessment of current practice and future needs (2008) A. Swan & S. Brown
Dealing with data: roles, rights, responsibilities and relationships. (2008) Liz Lyon. http://www.jisc.ac.uk/media/documents/programmes/digitalrepositories/dealing_with_data_report-final.pdf