7. RDMI Project outputs
7.1. Outputs
from the JISC RDMI 2011-2013 projects
This
JISC RDM Technical Infrastructure strand project involved creation of a blueprint
for RDM infrastructure at the University of Nottingham. Initially a two layer RDMI service was envisaged – an active data
management layer featuring collaborative tools and an archive layer featuring
preservation and publishing. A revised model was proposed, the ADMIRe RDAS
Research Data Archiving System (Sero Consulting, 2012), the
focus of the development model being on opportunities afforded by current
university infrastructure and systems. The RDAS was piloted using the Equella
repository platform for a metadata
store / data catalogue (Berry and Parsons, 2012a).
The
aim of the C4D project was developing a framework for incorporating metadata
into CERIF such that research organisations and researchers can better discover
and make use of existing and future research datasets, wherever they may be
held. C4D built upon the IRIOS project work, which developed a platform for managing
research information exchange using CERIF - supports the import of grant data
(inputs) from Research Councils and publication data (outputs) from HEIs. The
system allows inputs to be linked to outputs, which can then be exported in
CERIF and used in other CERIF-compliant information systems (e.g. Pure, ePrints
and, in the near future, RMAS).
This project, based at the University
of Oxford,pulled together previous developments and elements to create a
federated institutional infrastructure based upon a locally developed platform
‘Dataflow’. WP6 was concerned with
the development of software to facilitate the capture of metadata from existing
research database systems ‘DaaS’, ‘DataStage’ and ‘Colwiz’. The project also
investigated automatic metadata capture from institutional and cloud-hosted
storage, from ‘LabTrove’ and from research using ‘Sharepoint’. WP7 was
concerned with Data storage – developing an ingest service for SWORD-compliant
datasets, integrating local data stores with ‘DataBank’, the Bodleian
Libraries’ archival standard research data repository; WP8 was concerned with
data discovery and access, with the creation of ‘Datafinder’ a semantically
aware data catalogue.
The
data.bris project developed a service providing a front end for the
institutional research data storage facility (RDSF) and an institutional data
repository. RDFS provides space for sharing data and curating data. Originally
designed around security and restricted access, RDFS therefore needed
modification to allow sharing and publication of data. Two types of access to research
data are provided by data.bris: Research data publication, defined as read-only
access to published data; Research-active data sharing, defined as read-only
access to unpublished data (sharing), or read/write access (collaborative
sharing). The CKAN data portal platform was adopted for the public point of
access to published datasets (published dataset catalogue). Metadata is
harvested from data.bris for the CKAN portal. The metadata store is a SPARQL
1.1 service. The new RIS (PURE) will serve as the Institutional repository and
be fully integrated into the RDM infrastructure (Steer, 2012).
The
Datapool project was carried out at the University of Southampton. A primary
requirement was to find a mechanism for data upload and description, with a
drop-box like sharing service, together with parallel deposit with external
data services using SWORD protocol. Datpool developed services based on
existing platforms at Southampton - ePrints and Sharepoint. Eprints provides a
native interface for data capture and description, but Sharepoint does not,
therefore these needed development. EPrints has been modified for research data
using the ReCollect plug-in. The Sharepoint platform was considered appropriate
as it is a versatile platform, providing multiple tools, and the university IT
service has a long-term commitment to supporting it. Sharepoint provides a
deposit interface facilitating metadata capture but does not provide storage
for data – instead, it captures pointers to the data storage location (Hitchcock and White, 2013).
This
project piloted the use of ePrints for a comprehensive research data repository. Resulting outputs were a metadata
profile for describing research data and ‘ReCollect’, an ePrints plug-in to
implement the profile. The metadata profile was developed from the 3 layer
model produced by the IDMB project and mapped to the Datashare, DDI 2.1 and
INSPIRE schemas. The pilot repository was tested with sample datasets from
Essex departments and also ingested from the ESRC Data Store. The project
highlighted gaps in the system: a lack of a facility for tagging multiple files
with metadata; Limits to the size of a file that can be uploaded, so ePrints
could hold metadata only records for large files that had to be stored
elsewhere. Solutions to these problems included improvements to SWORD 2
provision for data transfer or using ‘BitTorrent’ like peer to peer sharing (Ensom, 2013).
The
Iridium project produced a policy and a pilot infrastructure for RDM at
Newcastle University, to enable research data curation throughout the data
lifecycle. For infrastructure development, Iridium undertook evaluation of many
tools and infrastructure components including CKAN, Dataflow and SWORD [5.5].
The pilot infrastructure includes a public Research Data catalogue based on the
CKAN platform, integrated with a bespoke internal facing research data
registry, which is a component of the bespoke CRIS (MyImpact & MyProjects -
all internal access) and the external facing ePrints Institutional
Repository[6.1.11].
The
Kaptur project, led by the VADS[i]
aimed to develop a model of best practice in the management of research data in
the visual arts. The project investigated the nature of arts research data, and
the diverse methods of managing the curation of data outputs. Amongst the
project outputs was a technical analysis report (Garrett et al. 2012) [5.6]. The project has also led to
the piloting of data repositories at Goldsmiths [6.1.7] and UAL [6.1.15], and
the extension of the institutional repositories at GSoA [6.1.6] and UCA
[6.1.16] to hold research data.
The
MRD project, which ran at University of the West of England (UWE), developed a
pilot data management system for two research centres in the Faculty of Health
and Life Sciences. The project aimed to develop processes and infrastructure
which integrated with existing research infrastructure and which were in
keeping with the current culture and practices at the university. The project
outputs included a seven stage roadmap, development of online RDM guidance and
training materials and the development of an RDM service and institutional
policy. The project developed a metadata profile for research data and
implemented an instance of ePrints, modified for data as a pilot repository for
data.
The
MiSS (MaDAM into Sustainable Service) project at the University of Manchester,
aimed to deliver a RDM infrastructure by building on the experience of the
MaDAM (Manchester Data Management) project. The project outputs included an
institutional RDM policy which underpins the RDM service, that provides a point
of contact for RDM enquiries, a website for RDM nformation and resources and
support for data management planning (for which a local DMP tool was
developed). A system allowing researchers to annotate, store, publish and
preserve their data is under development.
This
project, at the University of Exeter, investigated how researchers create,
manage and use research data through case studies. This was followed on with
the development of an advocacy and governance framework to embed RDM policy
across the university. A third work strand, involved with development of
technical infrastructure, resulted in the fully functioning research data
repository.
The
Orbital project was initiated to develop and implement a research data
management infrastructure at the University of Lincoln. The technical
infrastructure was piloted with the School of Engineering, taking into account
the challenging requirements of engineering researchers. This has been extended
to provide data-driven services across the whole university. The project
developed institutional policy, which was approved, and a programme of support
and training for research staff and students. The project has resulted in the
implementation of the ‘Researcher Dashboard’, a single interface for the RDM
infrastructure that links to projects, publications and data.
PIMMS
(Portable Infrastructure for the Metafor Metadata System) developed tools to
capture information about the workflow of running simulations, from the design
of experiments to the implementation of experiments via simulations running
models. PIMMS provides a local portal for research groups
to view and search their own content, and publish their metadata content to
institutional, national and international services. PIMMS also includes data
node software so that data documented with PIMMS can also be published to the
web.
This
project aimed to develop RDM policy, training resources and pilot
infrastructure at the University of Bath. A survey of researcher RDM practices
was carried out, that established the need for development of data management
policy and coordination, and problems with data storage and sharing needed
addressing. A data management policy was developed, guidance provided through a
data management website, and training resources and workshops designed. The
project investigated development of existing infrastructure – integrating Sakai
VRE with SWORD2 deposit protocol and piloting ePrints for a research data
repository. The project concluded that considerable resource will be needed to
sustain the pilot RDM service and that further areas of RDM need to be
investigated.
This
project investigated RDM requirements using three case studies, from different
subject disciplines and being at different stages of the award process
(pre-award, live-award and post-award), and through a survey of researchers at
the University of Leeds. The various workpackages covered all aspects of RDM.
The outputs included the University of Leeds RDM Policy, development of the RDM
website, DMPonline development, and the development of training resources.
Technical infrastructure was investigated through a pilot vitualised storage
system and the development of a set of functional requirements for a research
data repository (Proudfoot et al. 2013).
The
project delivered a toolkit for researchers at the University of Hertfordshire,
which provides advice on all aspects of research data management. The project
also developed technology demonstrators for RDM infrastructure, including a
cloud services (active data management) pilot, a document management pilot and
a pilot data repository system. An instance of DSpace was installed, tested
with data deposit via SWORD2 protocols, and tested for integration with PURE
and DataStage. In due course, the DSpace institutional repository will be
expanded to include data deposit.
The
Archaeological Data Service (ADS) developed an automated deposit facility, by
building upon the SWORD2 protocol and integrating other sources of research and
project metadata. The resulting process helps and encourages researchers to
deposit their data with the ADS.
7.2. Outputs
from the JISC RDMI 2009-2011 projects
Admiral
Project at the University of Oxford, created a two-tier federated data
management infrastructure for life science researchers. The first tier provides
locally managed data storage and staging facility (an ADMIRAL server, which
became DataStage) to meet their local data management needs for the collection,
digital organization, metadata annotation and controlled sharing of biological
datasets. The second tier provides a data preservation
and publication platform created and managed by the Bodleian Library Service as
part of the Oxford Research Archive initiative (the Databank server),
providing an easy and secure route for archiving annotated datasets to an
institutional repository, The Oxford University Data Store, for long-term
preservation and access, complete with assigned DOIs and Creative Commons open
access licences. Two principles were involved (1) Sheer curation - making data management and deposit a
seamless part of normal day-to-day research activity, irrespective of the
instrument, software or OS used to capture data. (2) Curation by addition -
researchers are not required to have everything in place all at once, or in any
particular sequence. Data is accepted in any state and then allowed to be
improved until ready for publication. Research is often an incremental process,
and may change through time, so an incremental, unconstrained approach to data
gathering was taken by this project.
This
project involved investigating the data management practices of freshwater
biology researchers and their concerns about data protection, access control
and copyright. This information was used to develop a web-based data management
and sharing tool, which was piloted and then fully implemented. The tool is
built into the FreshwaterLife website
at: http://new.freshwaterlife.org/
Infrastructure
for Integration in Structural Sciences (I2S2) aimed to identify requirements
for a data-driven research infrastructure in Structural Science, with a focus
on Chemistry. Two pilot were established to investigate the business processes
of research and the benefits of an integrative approach, particularly to issues
of scale (laboratory to national) and issues of boundaries between
institutions.
At
the University of Southampton, the Institutional Data Management Blueprint
Project, aimed to create a framework for RDM suitable for the whole
institutional and that facilitates e-research practice. The project analysed requirements
from a range of disciplines using a diverse range of data. The project
developed an institutional RDM policy and investigated the use of ePrints and
Sharepoint for RDM infrastructure. The project was followed by Datapool.
Through a series of semi-structured interviews, the
project team identified the current practices of researchers (PIs, post-docs
and PhD students) at the universities of Glasgow and Cambridge. From these
findings, services and resources were developed to help researchers manage
their research data. Support services including RDM advice websites were
developed for each institution and the institutional repository, DSpace at
Cambridge [6.1.2], enhanced to accept research data.
The Manchester Data Management project aimed to
develop a pilot RDM infrastructure by focussing on the practices and
requirements of biomedical researchers. The project had an iterative,
user-driven, bottom-up developmental approach and aimed to produce a technical
and governance solution flexible enough to meet the needs of all disciplines
across the institution. The pilot infrastructure was successfully developed to
manage data throughout the data curation lifecycle, from data capture to
storage to publication and reuse. The project also developed DMP service for
researchers at the university. MaDAM was followed by MiSS [7.1.10] to
investigate the pilot infrastructure sustainability and rollout.
This project explored the data management needs of
the Palaeoclimate research community, by means of DAF survey of members of the
BRIDGE[ii]
research group, drawing up a number of use case scenarios. The project
developed tools to facilitate data manipulation, publishing, discovery and
reuse, climate data management policies and guidelines, and metadata schemas
for palaeoclimate data.
The Supporting Data Management for the Humanities
project aimed to develop services for the management of data addressing the
requirements of researchers from the Humanities at the University of Oxford.
The project determined user requirements by means of DAF survey and developed
pilot services to test. The project outputs included training materials and
workshops, which take into account for the ‘life’s work’ nature of Humanities
research, and a pilot ‘DaaS’ (Database as a service) system, providing online
databases designed for typical humanities datatypes. The pilot DaaS was further developed into a
full service through the VIDaaS project.
The VIDaaS (Virtual Infrastructure with Database as
a Service) project developed the pilot DaaS service as part of the Online
Research Database Service (ORDS) at the University of Oxford. ORDS allows
researchers to create, edit, search and share relational databases, online
through the university’s private cloud environment. Although funded by another
JISC programme[iii], this
project developed a service that is an integral part of the current RDM
infrastructure at Oxford [6.1.12].
7.3. Outputs
from other relevant projects
The iRODS evaluation and demonstrator project
provided an evaluation and demonstration of the iRODS system. The project,
based at the University of York, implemented the iRODS system, deployed via the
White Rose Grid (WRG). The capabilities of the demonstrator system were
assessed against use-case requirements from the CARMEN project. The WRG hosts
one of the nodes of the CARMEN e-science infrastructure, providing
collaborative workspace and data archiving facilities. The project demonstrated
the benefits of using the iRODS system in place of the CARMEN SRB system.
CARMEN is an e-Science pilot project funded by the
EPSRC, which started in 2006. The CARMEN consortium initially involved 19
researchers from 11 universities (including Sheffield, York, Manchester and
Newcastle). The project developed a virtual laboratory for neurophysiology
enabling storage, sharing and processing of data (neural activity time and
image series), analysis code and expertise in a distributed environment. The
current system architecture [described above in section 6.2.5] utilises the UK
e-Science Grid infrastructure, within which CARMEN Active Information
Repository Nodes (CAIRNs) have been built, which are the functional units of
the system.
[ii]
BRIDGE (Bristol Research Initiative for the Dynamic Global Environment) http://www.bristol.ac.uk/geography/research/bridge/
[iii] JISC UMF Shared Services and the Cloud programme http://www.jisc.ac.uk/whatwedo/programmes/umf.aspx
No comments:
Post a Comment