Thursday, 3 July 2014

RDM Technical Infrastructure Review - RDMI Project Outputs

7.     7. RDMI Project outputs

7.1.  Outputs from the JISC RDMI 2011-2013 projects

      This JISC RDM Technical Infrastructure strand project involved creation of a blueprint for RDM infrastructure at the University of Nottingham. Initially a two layer RDMI service was envisaged – an active data management layer featuring collaborative tools and an archive layer featuring preservation and publishing. A revised model was proposed, the ADMIRe RDAS Research Data Archiving System (Sero Consulting, 2012), the focus of the development model being on opportunities afforded by current university infrastructure and systems. The RDAS was piloted using the Equella repository platform for a metadata store / data catalogue (Berry and Parsons, 2012a).

      The aim of the C4D project was developing a framework for incorporating metadata into CERIF such that research organisations and researchers can better discover and make use of existing and future research datasets, wherever they may be held. C4D built upon the IRIOS project work, which developed a platform for managing research information exchange using CERIF - supports the import of grant data (inputs) from Research Councils and publication data (outputs) from HEIs. The system allows inputs to be linked to outputs, which can then be exported in CERIF and used in other CERIF-compliant information systems (e.g. Pure, ePrints and, in the near future, RMAS).

7.1.3.   DaMaRO http://damaro.oucs.ox.ac.uk/
This project, based at the University of Oxford,pulled together previous developments and elements to create a federated institutional infrastructure based upon a locally developed platform ‘Dataflow’. WP6 was concerned with the development of software to facilitate the capture of metadata from existing research database systems ‘DaaS’, ‘DataStage’ and ‘Colwiz’. The project also investigated automatic metadata capture from institutional and cloud-hosted storage, from ‘LabTrove’ and from research using ‘Sharepoint’. WP7 was concerned with Data storage – developing an ingest service for SWORD-compliant datasets, integrating local data stores with ‘DataBank’, the Bodleian Libraries’ archival standard research data repository; WP8 was concerned with data discovery and access, with the creation of ‘Datafinder’ a semantically aware data catalogue. 

7.1.4.   Data.bris  http://data.bris.ac.uk/jisc-project/
      The data.bris project developed a service providing a front end for the institutional research data storage facility (RDSF) and an institutional data repository. RDFS provides space for sharing data and curating data. Originally designed around security and restricted access, RDFS therefore needed modification to allow sharing and publication of data. Two types of access to research data are provided by data.bris: Research data publication, defined as read-only access to published data; Research-active data sharing, defined as read-only access to unpublished data (sharing), or read/write access (collaborative sharing). The CKAN data portal platform was adopted for the public point of access to published datasets (published dataset catalogue). Metadata is harvested from data.bris for the CKAN portal. The metadata store is a SPARQL 1.1 service. The new RIS (PURE) will serve as the Institutional repository and be fully integrated into the RDM infrastructure (Steer, 2012). 

7.1.5.   Datapool  http://datapool.soton.ac.uk/
      The Datapool project was carried out at the University of Southampton. A primary requirement was to find a mechanism for data upload and description, with a drop-box like sharing service, together with parallel deposit with external data services using SWORD protocol. Datpool developed services based on existing platforms at Southampton - ePrints and Sharepoint. Eprints provides a native interface for data capture and description, but Sharepoint does not, therefore these needed development. EPrints has been modified for research data using the ReCollect plug-in. The Sharepoint platform was considered appropriate as it is a versatile platform, providing multiple tools, and the university IT service has a long-term commitment to supporting it. Sharepoint provides a deposit interface facilitating metadata capture but does not provide storage for data – instead, it captures pointers to the data storage location (Hitchcock and White, 2013).   

      This project piloted the use of ePrints for a comprehensive research data  repository. Resulting outputs were a metadata profile for describing research data and ‘ReCollect’, an ePrints plug-in to implement the profile. The metadata profile was developed from the 3 layer model produced by the IDMB project and mapped to the Datashare, DDI 2.1 and INSPIRE schemas. The pilot repository was tested with sample datasets from Essex departments and also ingested from the ESRC Data Store. The project highlighted gaps in the system: a lack of a facility for tagging multiple files with metadata; Limits to the size of a file that can be uploaded, so ePrints could hold metadata only records for large files that had to be stored elsewhere. Solutions to these problems included improvements to SWORD 2 provision for data transfer or using ‘BitTorrent’ like peer to peer sharing (Ensom, 2013).  

      The Iridium project produced a policy and a pilot infrastructure for RDM at Newcastle University, to enable research data curation throughout the data lifecycle. For infrastructure development, Iridium undertook evaluation of many tools and infrastructure components including CKAN, Dataflow and SWORD [5.5]. The pilot infrastructure includes a public Research Data catalogue based on the CKAN platform, integrated with a bespoke internal facing research data registry, which is a component of the bespoke CRIS (MyImpact & MyProjects - all internal access) and the external facing ePrints Institutional Repository[6.1.11].

7.1.8.   KAPTUR  http://www.vads.ac.uk/kaptur/
      The Kaptur project, led by the VADS[i] aimed to develop a model of best practice in the management of research data in the visual arts. The project investigated the nature of arts research data, and the diverse methods of managing the curation of data outputs. Amongst the project outputs was a technical analysis report (Garrett et al. 2012) [5.6]. The project has also led to the piloting of data repositories at Goldsmiths [6.1.7] and UAL [6.1.15], and the extension of the institutional repositories at GSoA [6.1.6] and UCA [6.1.16] to hold research data.

      The MRD project, which ran at University of the West of England (UWE), developed a pilot data management system for two research centres in the Faculty of Health and Life Sciences. The project aimed to develop processes and infrastructure which integrated with existing research infrastructure and which were in keeping with the current culture and practices at the university. The project outputs included a seven stage roadmap, development of online RDM guidance and training materials and the development of an RDM service and institutional policy. The project developed a metadata profile for research data and implemented an instance of ePrints, modified for data as a pilot repository for data.

7.1.10.    MiSS  http://www.miss.manchester.ac.uk/
      The MiSS (MaDAM into Sustainable Service) project at the University of Manchester, aimed to deliver a RDM infrastructure by building on the experience of the MaDAM (Manchester Data Management) project. The project outputs included an institutional RDM policy which underpins the RDM service, that provides a point of contact for RDM enquiries, a website for RDM nformation and resources and support for data management planning (for which a local DMP tool was developed). A system allowing researchers to annotate, store, publish and preserve their data is under development.

      This project, at the University of Exeter, investigated how researchers create, manage and use research data through case studies. This was followed on with the development of an advocacy and governance framework to embed RDM policy across the university. A third work strand, involved with development of technical infrastructure, resulted in the fully functioning research data repository.

      The Orbital project was initiated to develop and implement a research data management infrastructure at the University of Lincoln. The technical infrastructure was piloted with the School of Engineering, taking into account the challenging requirements of engineering researchers. This has been extended to provide data-driven services across the whole university. The project developed institutional policy, which was approved, and a programme of support and training for research staff and students. The project has resulted in the implementation of the ‘Researcher Dashboard’, a single interface for the RDM infrastructure that links to projects, publications and data. 

7.1.13.    PIMMS  http://proj.badc.rl.ac.uk/pimms
      PIMMS (Portable Infrastructure for the Metafor Metadata System) developed tools to capture information about the workflow of running simulations, from the design of experiments to the implementation of experiments via simulations running models. PIMMS provides a local portal for research groups to view and search their own content, and publish their metadata content to institutional, national and international services. PIMMS also includes data node software so that data documented with PIMMS can also be published to the web.

      This project aimed to develop RDM policy, training resources and pilot infrastructure at the University of Bath. A survey of researcher RDM practices was carried out, that established the need for development of data management policy and coordination, and problems with data storage and sharing needed addressing. A data management policy was developed, guidance provided through a data management website, and training resources and workshops designed. The project investigated development of existing infrastructure – integrating Sakai VRE with SWORD2 deposit protocol and piloting ePrints for a research data repository. The project concluded that considerable resource will be needed to sustain the pilot RDM service and that further areas of RDM need to be investigated.
                                              
      This project investigated RDM requirements using three case studies, from different subject disciplines and being at different stages of the award process (pre-award, live-award and post-award), and through a survey of researchers at the University of Leeds. The various workpackages covered all aspects of RDM. The outputs included the University of Leeds RDM Policy, development of the RDM website, DMPonline development, and the development of training resources. Technical infrastructure was investigated through a pilot vitualised storage system and the development of a set of functional requirements for a research data repository (Proudfoot et al. 2013).

      The project delivered a toolkit for researchers at the University of Hertfordshire, which provides advice on all aspects of research data management. The project also developed technology demonstrators for RDM infrastructure, including a cloud services (active data management) pilot, a document management pilot and a pilot data repository system. An instance of DSpace was installed, tested with data deposit via SWORD2 protocols, and tested for integration with PURE and DataStage. In due course, the DSpace institutional repository will be expanded to include data deposit.

      The Archaeological Data Service (ADS) developed an automated deposit facility, by building upon the SWORD2 protocol and integrating other sources of research and project metadata. The resulting process helps and encourages researchers to deposit their data with the ADS.

7.2.  Outputs from the JISC RDMI 2009-2011 projects 

      Admiral Project at the University of Oxford, created a two-tier federated data management infrastructure for life science researchers. The first tier provides locally managed data storage and staging facility (an ADMIRAL server, which became DataStage) to meet their local data management needs for the collection, digital organization, metadata annotation and controlled sharing of biological datasets. The second tier provides a data preservation and publication platform created and managed by the Bodleian Library Service as part of the Oxford Research Archive initiative (the Databank server), providing an easy and secure route for archiving annotated datasets to an institutional repository, The Oxford University Data Store, for long-term preservation and access, complete with assigned DOIs and Creative Commons open access licences. Two principles were involved (1) Sheer curation - making data management and deposit a seamless part of normal day-to-day research activity, irrespective of the instrument, software or OS used to capture data. (2) Curation by addition - researchers are not required to have everything in place all at once, or in any particular sequence. Data is accepted in any state and then allowed to be improved until ready for publication. Research is often an incremental process, and may change through time, so an incremental, unconstrained approach to data gathering was taken by this project.

7.2.2.   FISHnet  http://www.fishnetonline.org/home
      This project involved investigating the data management practices of freshwater biology researchers and their concerns about data protection, access control and copyright. This information was used to develop a web-based data management and sharing tool, which was piloted and then fully implemented. The tool is built into the FreshwaterLife website at: http://new.freshwaterlife.org/

      Infrastructure for Integration in Structural Sciences (I2S2) aimed to identify requirements for a data-driven research infrastructure in Structural Science, with a focus on Chemistry. Two pilot were established to investigate the business processes of research and the benefits of an integrative approach, particularly to issues of scale (laboratory to national) and issues of boundaries between institutions.

      At the University of Southampton, the Institutional Data Management Blueprint Project, aimed to create a framework for RDM suitable for the whole institutional and that facilitates e-research practice. The project analysed requirements from a range of disciplines using a diverse range of data. The project developed an institutional RDM policy and investigated the use of ePrints and Sharepoint for RDM infrastructure. The project was followed by Datapool.

      Through a series of semi-structured interviews, the project team identified the current practices of researchers (PIs, post-docs and PhD students) at the universities of Glasgow and Cambridge. From these findings, services and resources were developed to help researchers manage their research data. Support services including RDM advice websites were developed for each institution and the institutional repository, DSpace at Cambridge [6.1.2], enhanced to accept research data.

      The Manchester Data Management project aimed to develop a pilot RDM infrastructure by focussing on the practices and requirements of biomedical researchers. The project had an iterative, user-driven, bottom-up developmental approach and aimed to produce a technical and governance solution flexible enough to meet the needs of all disciplines across the institution. The pilot infrastructure was successfully developed to manage data throughout the data curation lifecycle, from data capture to storage to publication and reuse. The project also developed DMP service for researchers at the university. MaDAM was followed by MiSS [7.1.10] to investigate the pilot infrastructure sustainability and rollout.

      This project explored the data management needs of the Palaeoclimate research community, by means of DAF survey of members of the BRIDGE[ii] research group, drawing up a number of use case scenarios. The project developed tools to facilitate data manipulation, publishing, discovery and reuse, climate data management policies and guidelines, and metadata schemas for palaeoclimate data.

7.2.8.   SUDAMIH  http://sudamih.oucs.ox.ac.uk/
      The Supporting Data Management for the Humanities project aimed to develop services for the management of data addressing the requirements of researchers from the Humanities at the University of Oxford. The project determined user requirements by means of DAF survey and developed pilot services to test. The project outputs included training materials and workshops, which take into account for the ‘life’s work’ nature of Humanities research, and a pilot ‘DaaS’ (Database as a service) system, providing online databases designed for typical humanities datatypes.  The pilot DaaS was further developed into a full service through the VIDaaS project.

7.2.9.   VIDaaS  http://vidaas.oucs.ox.ac.uk/
      The VIDaaS (Virtual Infrastructure with Database as a Service) project developed the pilot DaaS service as part of the Online Research Database Service (ORDS) at the University of Oxford. ORDS allows researchers to create, edit, search and share relational databases, online through the university’s private cloud environment. Although funded by another JISC programme[iii], this project developed a service that is an integral part of the current RDM infrastructure at Oxford [6.1.12].

7.3.  Outputs from other relevant projects

      The iRODS evaluation and demonstrator project provided an evaluation and demonstration of the iRODS system. The project, based at the University of York, implemented the iRODS system, deployed via the White Rose Grid (WRG). The capabilities of the demonstrator system were assessed against use-case requirements from the CARMEN project. The WRG hosts one of the nodes of the CARMEN e-science infrastructure, providing collaborative workspace and data archiving facilities. The project demonstrated the benefits of using the iRODS system in place of the CARMEN SRB system. 

7.3.2.   CARMEN  http://www.carmen.org.uk/about
      CARMEN is an e-Science pilot project funded by the EPSRC, which started in 2006. The CARMEN consortium initially involved 19 researchers from 11 universities (including Sheffield, York, Manchester and Newcastle). The project developed a virtual laboratory for neurophysiology enabling storage, sharing and processing of data (neural activity time and image series), analysis code and expertise in a distributed environment. The current system architecture [described above in section 6.2.5] utilises the UK e-Science Grid infrastructure, within which CARMEN Active Information Repository Nodes (CAIRNs) have been built, which are the functional units of the system.




[i] Visual Arts Data Service (VADS), University of the Creative Arts http://www.vads.ac.uk/
[ii] BRIDGE (Bristol Research Initiative for the Dynamic Global Environment) http://www.bristol.ac.uk/geography/research/bridge/
[iii] JISC UMF Shared Services and the Cloud programme http://www.jisc.ac.uk/whatwedo/programmes/umf.aspx

No comments:

Post a Comment