metadatatron: 2013

Sharing Data

Research data are here considered to include raw data from laboratory instruments, assays, electron micrographs, survey data, transcripts and AV recordings; workflows and experimental protocols, software code and models. Metadata (data documentation) will be considered data also.

It is likely that much data resulting from a current project, are shared within the research group and maybe across the institution and further.

This is usually facilitated by allowing access to folders on a departmental server or institutional server.
It is quite likely data are shared by means of access to cloud storage (DropBox, Google Drive), shared as email attachments (more insecure) or physically on media such as USB drives, DVDs (fairly insecure).
Data may be shared via collaboration software (Microsoft Sharepoint, SAKAI, DataStage) some of which is designed specifically to provide a Virtual Research Environment.

This could be considered ‘live’ ‘dynamic’ or ‘growing’ data, and is unlikely to be shared outside the project group (published) until the project has finished and papers published.

Publishing Data

On completion of the project, you may wish to make data available for reuse by members of the project group and others – effectively publishing the data. Some good advice resources about sharing / publishing data are:-

http://www.dcc.ac.uk/resources/how-guides/develop-data-plan#Access and data sharing

http://www.ed.ac.uk/schools-departments/information-services/services/research-support/data-library/research-data-mgmt/how-to-share

The data needs dealing with differently to the ‘live’ data of a current project. The usual recommendation is to deposit datasets in a repository. These could be:

1. An Institutional (Data) Repository

2. A Research Council Data centre

3. A Disciplinary data service

These are advantageous in that they provide data curation services (for preservation) and DOIs or persistent URLs (for discovery and sharing).

Where to publish data

There is a comprehensive list of Disciplinary Data Repositories and Research Funder run Data Centres provided by DataCite at http://datacite.org/repolist .

As yet the local repository, White Rose Research Online, does not hold research data (there is no functional reason why it cannot do so). It is possible that Sheffield may implement a separate Research Data Repository in the near future, to provide long term storage for data that has no other home. The RDM Infrastructure for Sheffield has yet to be designed.

It is worth thinking about the repository as consisting of two components: a catalogue service on top of a storage system – the catalogue is a metadata store and an index of the files in the storage layer. Indeed many RDM infrastructure options separate these components, providing a ‘Metadata Store’ or catalogue of research data which is stored elsewhere, perhaps on the institutional / departmental servers. Therefore, if you are wishing to publish data now, and the discipline based or Research Funder Data Centre options are not available, it may still be possible by keeping the data in institutional storage, but putting the metadata into the WRRO (although I don’t think anyone has done this yet).

A good example of a (partly local) project that developed their own resource for sharing code and models for Neurophysiology is CARMEN http://www.carmen.org.uk/

Metadata

This may be defined as ‘data about data’ or ‘data documentation’. Metadata is necessary for registering provenance, for discovery and citation, for determining access rights and for determining preservation. For reuse and validation, it is essential for adequate documentation to be created which describes how the data were captured (instrument settings) and processed (analysis methods, processing software code).

Metadata is often considered as three tiers:

Core metadata – Creator, dataset title, dataset location, Publisher, Date, Rights and Access.
Contextual / Administrative metadata – Subject, Description, Project, Funder / Grant.
Discipline based detailed metadata – Discipline specific metadata, classification and ontological, File metadata (size, format, etc), Instrument settings, experiment protocols, Keywords and annotations.

The first two groups are generally considered mandatory and the third optional. A description of this arrangement for the DAMARO project at Oxford is provided here http://damaro.oucs.ox.ac.uk/docs/Just%20enough%20metadata%20v3-1.pdf

With metadata it is worth remembering that it is best to capture metadata when the research data is being created. It is difficult and costly to create metadata ‘after the event’; in this respect automatic metadata capture tools are very useful – such as electronic lab books, or instruments that capture settings and record them as xml files. Also, it is quite acceptable to document the data (capture the metadata) by recording the details in a text , word doc or xml file and keeping this in a folder with the data.

A good introduction to data documentation may be found at

http://libraries.mit.edu/guides/subjects/data-management/metadata.html

Citing data

Generally, the ‘core metadata’ elements are the minimum required for data citation. These allow for correct attribution and location of the unique resource. Information on data citation can be found at

http://www.dcc.ac.uk/resources/how-guides/cite-datasets and http://datacite.org/whycitedata

metadatatron

Wednesday 13 November 2013

JISC Projects

Friday 8 November 2013

Research Data Management in a Coconut shell

Thursday 17 January 2013

Quick guide to publishing data