Wednesday 13 November 2013

Friday 8 November 2013

Research Data Management in a Coconut shell

Introduction to Research Data Management (RDM)

This is an attempt at ‘RDM in a (Coco)nutshell’ primarily for researchers rather than Information management practitioners.

Key Concepts:

1. RDM is good research practice

2. RDM is concerned with looking after your data throughout the research project.

3. RDM involves long-term preservation of some of your data after the project.

4. Metadata is data documentation and is essential for RDM.

5. RDM makes it possible to share your data – if you want to or have to.

6. Data Management Plan

The full post is on the University of Sheffield RDM Blog:
http://researchdatamanagement.blogspot.co.uk/2013/10/introduction-to-research-data.html

Thursday 17 January 2013

Quick guide to publishing data


Sharing Data
Research data are here considered to include raw data from laboratory instruments, assays, electron micrographs, survey data, transcripts and AV recordings; workflows and experimental protocols, software code and models. Metadata (data documentation) will be considered data also.
It is likely that much data resulting from a current project, are shared within the research group and maybe across the institution and further. 
  1. This is usually facilitated by allowing access to folders on a departmental server or institutional server. 
  2. It is quite likely data are shared by means of access to cloud storage (DropBox, Google Drive), shared as email attachments (more insecure) or physically on media such as USB drives, DVDs (fairly insecure). 
  3. Data may be shared via collaboration software (Microsoft Sharepoint, SAKAI, DataStage) some of which is designed specifically to provide a Virtual Research Environment. 
This could be considered ‘live’ ‘dynamic’ or ‘growing’ data, and is unlikely to be shared outside the project group (published) until the project has finished and papers published.

Publishing Data
On completion of the project, you may wish to make data available for reuse by members of the project group and others – effectively publishing the data.  Some good advice resources about sharing / publishing data are:-
The data needs dealing with differently to the ‘live’ data of a current project. The usual recommendation is to deposit datasets in a repository. These could be:
1.       An Institutional (Data) Repository
2.       A Research Council Data centre
3.       A Disciplinary data service
These are advantageous in that they provide data curation services (for preservation) and DOIs or persistent URLs (for discovery and sharing).


Where to publish data
There is a comprehensive list of Disciplinary Data Repositories and Research Funder run Data Centres provided by DataCite at http://datacite.org/repolist .
As yet the local repository, White Rose Research Online, does not hold research data (there is no functional reason why it cannot do so). It is possible that Sheffield may implement a separate Research Data Repository in the near future, to provide long term storage for data that has no other home. The RDM Infrastructure for Sheffield has yet to be designed.
It is worth thinking about the repository as consisting of two components: a catalogue service on top of a storage system – the catalogue is a metadata store and an index of the files in the storage layer. Indeed many RDM infrastructure options separate these components, providing a ‘Metadata Store’ or catalogue of research data which is stored elsewhere, perhaps on the institutional / departmental servers. Therefore, if you are wishing to publish data now, and the discipline based or Research Funder Data Centre options are not available, it may still be possible by keeping the data in institutional storage, but putting the metadata into the WRRO (although I don’t think anyone has done this yet).
A good example of a (partly local) project that developed their own resource for sharing code and models for Neurophysiology is CARMEN http://www.carmen.org.uk/

Metadata
This may be defined as ‘data about data’ or ‘data documentation’. Metadata is necessary for registering provenance, for discovery and citation, for determining access rights and for determining preservation. For reuse and validation, it is essential for adequate documentation to be created which describes how the data were captured (instrument settings) and processed (analysis methods, processing software code).
Metadata is often considered as three tiers:
  1. Core metadata – Creator, dataset title, dataset location, Publisher, Date, Rights and Access.
  2. Contextual / Administrative metadata – Subject, Description, Project, Funder / Grant.
  3. Discipline based detailed metadata – Discipline specific metadata, classification and  ontological, File metadata (size, format, etc), Instrument settings, experiment protocols, Keywords and annotations.

The first two groups are generally considered mandatory and the third optional. A description of this arrangement for the DAMARO project at Oxford is provided here http://damaro.oucs.ox.ac.uk/docs/Just%20enough%20metadata%20v3-1.pdf
With metadata it is worth remembering that it is best to capture metadata when the research data is being created. It is difficult and costly to create metadata ‘after the event’; in this respect automatic metadata capture tools are very useful – such as electronic lab books, or instruments that capture settings and record them as xml files. Also, it is quite acceptable to document the data (capture the metadata) by recording the details in a text , word doc or xml file and keeping this in a folder with the data.   
A good introduction to data documentation may be found at

Citing data
Generally, the ‘core metadata’ elements are the minimum required for data citation. These allow for correct attribution and location of the unique resource. Information on data citation can be found at