8.
Conclusions and Recommendations
The EPSRC expectations of organisations receiving EPSRC
funding requires the research data created as a result of that funding, be
effectively curated and securely preserved and for metadata describing this
research data to be created and published by 1st May 2015. The
University of Sheffield Data Management Policy was developed in response to the
EPSRC expectations. This policy states that the university will provide support
for research data management, including infrastructure and services to be
developed in consultation with researchers.
In addition to surveying researchers to determine their RDM
practices and attitudes, it is appropriate to develop these services to fit
seamlessly in with the researcher workflow and not burden the researcher with
significant change to their work practices. Rather than a researcher filling
out forms, metadata may be exchanged between systems (CRIS, VRE and repository
for example) thus reducing re-keying. Products, processes and practices that
have been developed by a research community should be adopted, adapted and
developed for the needs of other researchers, rather than entirely new
solutions developed. As indicated by Jones et al. (2013: 14):
“…close engagement with
researchers is critical when designing RDM systems to ensure their
applicability and uptake.”
8.1. Infrastructure components and implementation strategy
In choosing
the components of the technical infrastructure and strategy for instituting it,
the following options must be considered:
·
A ‘Big Bang’ implementation or an incremental
roll-out, allowing the service to be adopted slowly.
·
A system wide generic infrastructure or a
‘Bottom-up’ project based approach, pilot infrastructure components being
tested possibly throughout the whole lifecycle of a research project.
·
Utilising existing infrastructure upon which new
services are developed, or implementing new infrastructure. Integrating
existing components may require investment in terms of development work, but
implementing new infrastructure will also be costly.
·
Choosing components where there is local
experience or choosing components where there is as yet, little community of
practice.
·
Choosing open source components, which require
local expertise and development, or proprietary components, supported but
expensive.
Perhaps the
greatest factor in considering these options is that some RDM services, data
catalogue and data archiving, need to be implemented by May 2015. Therefore, in
providing a ‘bare compliance’ option, it may be expedient to utilise the
facilities that are already in place, if they provide the necessary functions
with little modification or development to achieve integration.
8.2. Published research data catalogue or repository
To some extent, choice of the infrastructure
component providing the public research data catalogue is contingent on the
development of the WRRO as a catalogue of all published research outputs,
research data as well as research papers. A White Rose Research Data Catalogue,
perhaps implemented as a separate instance of ePrints as with WREO, would need to
contain only catalogue records, not the datasets themselves, which would be
held in a research data archive. The institutional ownership of research data
will likely require the local control, if not location of the preservation and
storage functions in the research data archive.
EPrints is already integrated with
Symplectic at Sheffield and Leeds, and with PURE at York, so dataset metadata
may be automatically imported into an ePrints research data catalogue. EPrints
would need to be modified to handle a research dataset metadata profile with
the ReCollect plugin. This dataset metadata will be automatically recorded by
the Symplectics’ system, some will be harvested from external repositories (now
harvests from Figshare), and researchers manually input all other necessary
metadata fields. At Open
Research Exeter research data is deposited via Symplectic into the DSpace
repository.
If the decision
is made not to go ahead with a shared WR research data catalogue, then a local
based system may be implemented that combines the catalogue and archive
functions together as an institutional research data repository. A local
instance of ePrints could be considered, as there is much local experience in
the use of ePrints, Symplectics and the connector, and a willingness to share
knowledge within the RDM practitioner community. Alternatively, other
repository and cataloguing systems should be considered. The open source
systems Dspace, Fedora Commons, Datafinder, Hydra and CKAN have a growing
community of users, willing to share expertise. A number of proprietary systems,
such as ContentDM for which there is local expertise, must also be considered.
8.3. Research data archive
A facility for preservation of
research data that has not been submitted to an external repository, needs to
be provided by the institution to comply with EPSRC expectations. Such a data
archive may offer a preservation service for unpublished research data also. Repository
systems, that have been designed for or modified for research data, provide the
archival storage function and the catalogue function. Alternatively, specialist
long-term archiving and preservation systems exist. Possible candidates here
include Rosetta (as the library has experience with ExLibris systems), Figshare
for Institutions (as it is supported by the providers of, and integrated with
Symplectic), Arkivum (as it is involved in the JANET data archiving framework
agreement) and Dataverse (one of the open source data preservation platforms
available).
8.4. Active data management
Currently at Sheffield,
collaborative functionality is provided by Google Drive and the HPC facilities,
but there is no institutional Virtual Research Environment (VRE) as such. A
number of JISC infrastructure projects investigated the use of collaboration
tools such as DataStage, Sharepoint and Sakai, to provide a VRE which is
integrated with the CRIS, file servers, data archive and / or a data
repository. Surveys of researchers have shown a requirement for ‘Academic
Dropbox’ facilities, which allow the sharing of data and for ‘Social network’
style annotation (Garrett
et al. 2012).
For RDM, one function required of the
VRE is that of data registry, defined here as an inward-facing data catalogue.
This is built into DataStage (at Oxford)
and Sharepoint (at Southampton),
though a number of institutions incorporate a separate data registry component
in their active data management facility, for example the use of CKAN at Lincoln and PURE at Bristol. Attention perhaps should be
paid to YouShare developed at the
University of York. The capabilities of the WRG infrastructure for
collaborative active data management should also be investigated.
8.5. Data and metadata capture
There is
currently a lack of information about the data and metadata capture tools being
used and developed at the University of Sheffield, although systems such as
laboratory information management systems may be used by some research groups
at the institution. Experience in the use of such tools needs investigation to
inform the choice of and development of an active data management
infrastructure capable of integrating these tools.
8.6. Final remarks
The great benefits of a shared approach, in terms of saving
money and time, should mean that engaging in collaborative efforts to establish
shared services is a priority concern. Opportunities to collaborate in the
development of a WR Research Data Catalogue and the proposed N8 shared data
archiving service must be exploited. The development of RDM services delivered
through the White Rose Grid and N8 HPC grid infrastructure need to be explored.
Attention should be paid to the national research data service being piloted by
the DCC and JISC.
With consideration
of the time constraint of compliance to the EPSRC expectations, it may be
appropriate to pilot different components of the RDM technical infrastructure
with a number of EPSRC research projects to begin with. If the pilot component
proves sustainable, then it will contribute to the incremental roll-out of a
fully integrated RDM service, whilst fulfilling the requirements of the EPSRC
expectations.
No comments:
Post a Comment