metadatatron: RDM Technical Infrastructure Review

4. Infrastructure Components

Components of the RDM Infrastructures established by higher education institutions are briefly considered below. The component function, the software / platform underlying the component and component interoperability are described, any evaluations identified, and institutions employing the component, particularly in an RDMI context, are noted. The list of components is not exhaustive, the most relevant and popular are reviewed, and the components are loosely categorised by function so there may be considerable overlap.

4.1. Integrated systems and integrating components

4.1.1. Dataflow http://www.dataflow.ox.ac.uk/

A two stage RDMI consisting of an active data management system, DataStage [4.4.1], with a data repository, DataBank [4.2.4] built on a Fedora Commons platform. The system uses ‘Bagit’ specifications [4.8.2] to transfer files to a SWORD2 [4.8.1] compliant archive. This system was developed by the Admiral project [7.2.1] and is being piloted at the University of Oxford [6.14], having been implemented during the Damaro project [7.1.3]. Dataflow is being evaluated by the Universities of Leeds and the Yorkshire & Humberside Metropolitan Area Network. [See section 5.5 for Newcastle University’s Iridium evaluation of DataStage and DataBank].

4.1.2. Orbital Bridge https://github.com/lncd/Orbital-Bridge

The pilot RDMI built at the University of Lincoln centres on the ‘Orbital Bridge’ application (Jackson, 2012), which integrates an institutional facing data registry built on CKAN, a public data catalogue built on EPrints and the Research Management system ‘Nucleus’ (Stainthorp, 2012). The Orbital Bridge provides an interface, the ‘Researcher Dashboard’ (Winn, 2013b) allowing researchers to access and add information about projects, funding, outputs and datasets [6.1.10].

4.1.3. iRODS https://www.irods.org/index.php/Introduction_to_iRODS

Integrated Rule-Oriented Data-management System (iRODS) is a software system that allows the management of a distributed workflow through the chaining of micro-services. See section [2.1.2.] for more information. The iREAD project [5.1.4] evaluated iRODS for use in the CARMEN portal.

4.2. Repository platforms

4.2.1. EPrints http://www.eprints.org/

The most widely used platform for institutional repositories (used for WRRO), EPrints is open source and free to use. Bespoke design, hosting and maintenance services are available. EPrints is built from Apache web server, MySQL [4.10.4] and Perl components and recommended to run on a UNIX-like operating system.

A number of plugins have been developed by the EPrints user community, some of which modify EPrints to handle datasets: The ReCollect plugin ^{^[i]} has been developed by UK Data Archive and the University of Essex to implement a dataset metadata profile; Datacite DOI registration plugin ^{^[ii]}; SWORD 2.0 broker plugin ^{^[iii]}; Arkivum A-Stor storage backend plugin ^{^[iv]}. In addition to these, a number of projects have developed integration of EPrints with other components: KAPTUR [7.1.8] developed integration with Datastage and Figshare; The Orbital Project [7.1.12] developed the Orbital Bridge, which integrates EPrints with CKAN and other components [see above 4.1.2].

A number of H.E. Institutions use EPrints for their institutional data repository - Universities of Essex [6.1.4], Southampton [6.1.14] and West of England (UWE) [6.1.17]. The eCrystals repository [6.2.3], also at Southampton, runs on an EPrints platform. University of Leeds have chosen EPrints for their proposed data repostitory, as reported from the EPrints User Group workshop (Proudfoot, 2013a) at Leeds, October 2013.

4.2.2. CKAN http://ckan.org/

CKAN is an open source data management system developed by the OKF ^{^[v]} to provide access to open data. Technologies used include PostgreSQL database engine [4.10.8], SOLR search [4.10.3], Python backend and Javascript frontend. It has a modular architecture with optional extensions - APIs surrounding a core system. CKAN is part of the RDMI implemented by Bristol [6.1.1] and Lincoln [6.1.10] and being trialled by the Kaptur project [evaluation at 5.6], and at Newcastle [Iridium CKAN use case 5.5]. The DART project ^{^[vi]} at Leeds uses CKAN for their data portal [6.2.2].

4.2.3. Fedora Commons http://www.fedora-commons.org/

(Flexible Extensible Digital Object Repository Architecture) - originally developed by Cornell University for managing digital content (DAMS). Fedora is RDBMS-independent and has been tested with MySQL, Oracle [4.10.7], PostgreSQL, Microsoft SQL and Derby (it is provided with Derby embedded). The Fedora Commons distribution includes Apache Tomcat, Derby SQL and Java components. Many workflow and service components and plug-ins have been developed to integrate Fedora within an RDM infrastructure. UK HEIs using the Fedora Commons platform include University of York Digital Library (YODL) and the Archaeological Data Service (ADS). Data repositories based on Fedora include 3TU Datacentrum [6.3.3], DANS Easy [6.3.4] and RUresearch [6.3.9].

4.2.4. Databank (Fedora) http://www.dataflow.ox.ac.uk/index.php/databank

A data repository based on the Fedora Commons platform, designed by the Admiral project at the University of Oxford. [See section 4.1.1 for a description of DataFlow components and 5.5 for the Iridium evaluation of DataBank].

4.2.5. Hydra (Fedora) http://projecthydra.org/

Hydra is a multi-purpose repository framework based on a micro-services architecture. The main components are a Fedora repository platform, SOLR indexing software [4.10.3], Blacklight discovery interface [4.5.5] and the Hydra plugin, a ‘Ruby on rails’ library, which facilitates workflow in digital object management [as described in section 2.1.2]. Hydra is the platform for the University of Hull Digital Repository, Hydra [6.1.9], University of Virginia Libra [6.3.10] and the LSE digital Library ^{^[vii]}.

4.2.6. VITAL (Fedora) http://www.vtls.com/products/vital

A Fedora based repository system developed through the Arrow project. This is used at Arrow, Monash University’s research repository [6.3.1].

4.2.7. Islandora (Fedora) http://islandora.ca/about

Islandora is an open source Content Management System developed by the University of Prince Edward Island, built on a base of Fedora, Drupal [4.10.9] and Solr. This platform is used for the University of St Andrews Digital Collections Portal ^{^[viii]}.

4.2.8. DSpace http://www.dspace.org/

DSpace is an open source repository system based on Apache server, PostgreSQL or Oracle and Perl. DSpace is the platform used for University of Edinburgh Datashare [6.1.3] and EDINA ShareGeo repository [6.2.1], Open Research Exeter [6.1.54], the University of Hertfordshire Research Archive (UHRA) [6.1.8], the Queen Mary University of London, Centre for Digital Music Research Data Repository (C4DM-RDR) [6.1.13] and DSpace at Cambridge [6.1.2].

4.2.9. Datastar https://sites.google.com/site/datastarsite/

Datastar is an open source repository system developed by Cornell University and Washington University. Designed to support collaboration and data sharing among researchers during the research process, and to promote publishing or archiving data and high-quality metadata to discipline-specific data centers, and/or to the institution's own digital repository.

4.2.10. ContentDM http://www.contentdm.org/

ContentDM is a proprietary repository software from OCLC, in use at University of Sheffield for digital collections at the Library Special Collections and National Fairground Archive [see 3.1].

4.2.11. DigiTool http://www.exlibrisgroup.com/category/DigiToolOverview

This is proprietary repository software from Exlibris, in use at University of Leeds for LUDOS.

4.2.12. Equella http://www.equella.com/

This is proprietary repository software in use for the institutional repository at Royal Holloway Research Online ^{^[ix]} and Oxford Brookes RADAR ^{^[x]}. At Nottingham ^{^[xi]} it is integrated into Moodle and used to house and share digital teaching resources (except audio and video files, which are recommended to be uploaded to Kaltura). Equella is the platform used for the research data repository at Griffiths University [6.3.2] and was also being considered by Nottingham ADMIRe for the data repository / metadata store (Berry and Parsons, 2012a)[see 5.1].

4.2.13. Archimede http://www.bibl.ulaval.ca/archimede/index.en.html

An open source repository system designed at Laval University Library based on the DSpace model. The system was designed with internationalisation in mind, so it has an easily modified multilingual interface. The system is not platform dependent, and based on open source components – Java, Apache Ant, MySQL (recommended) and Lucerne.

4.2.14. ARNO https://www.h-net.org/announce/show.cgi?ID=127076

Repository software developed by the ARNO Project (Academic Research in the Netherlands Online): partners were the universities of Amsterdam, Twente and Tilburg. ARNO is based on Apache, Oracle and Perl architecture.

4.3. ‘Archive data’ storage and digital preservation systems and services

4.3.1. Arkivum http://www.arkivum.com/

Arkivum provides a digital archiving service, certified to ISO27001. Three copies of the data are kept, two at geographically separate data centres and one at an escrow service. Data is uploaded from the institutional network using a local gateway appliance, the A-Stor (a file server), to the Arkivum data centre. This may be achieved within the institutional firewall. Entered a data archiving framework agreement with JANET^{^[xii]}.

4.3.2. Figshare http://figshare.com/

This is a cloud based, open access repository for research outputs. Data is persistently stored under CC license. Unlimited storage is offered for publicly accessible data, whereas private data is provided with 1Gb free storage. The service is supported by Digital Science ^{^[xiii]} , the providers of Symplectics Elements [4.6.1] and Projects [4.4.4]. The company now offers ‘Figshare for institutions’, providing a cloud based data repository service. This service is used by Imperial College London and University of Oxford in the UK.

4.3.3. Dataverse http://thedata.org/

Dataverse Network is an open source web application for publishing, citing, analysing and preserving research data, which may be installed by any institution. The architecture is based on PostgreSQL, Lucerne (SOLR) and Java. This application supports the data repositories at Harvard University [6.3.6] and John Hopkins University [6.3.7].

4.3.4. Rosetta http://www.exlibrisgroup.com/category/RosettaOverview

Rosetta is a proprietary digital preservation system from Exlibris, and the successor to DigiTool. The system is based on a distributed architecture which is scalable and flexible and provides continual preservation actions for long-term curation. The system is based on the OAIS ^{^[xiv]} model and conforms to the TDR ^{^[xv]} requirements. The system integrates easily with Exlibris Primo for the discovery function.

4.3.5. Amazon Glacier http://aws.amazon.com/glacier/

A proprietary cloud storage and backup service, optimised for data that are infrequently accessed and for which short retrieval time is not critical, thus a low cost option for long-term data storage. Geographical location of data storage may be chosen to meet regulatory requirements.

4.3.6. DuraCloud http://duracloud.org/

DuraSpace offers this commercial hosted service providing cloud infrastructure for data preservation and access. This is used for the ICPSR repository [6.3.11].

4.3.7. ARCserve http://www.arcserve.com/gb/default.aspx

A proprietary data backup service that offers a range of options including backup to cloud, disc or tape and high data availability with continuous data protection.

4.4. ‘Active data’ management and collaboration platforms

4.4.1. Datastage http://www.dataflow.ox.ac.uk/index.php/datastage

The active data management component of Dataflow, appears as a mapped drive on the researcher’s computer (an ‘Academic Dropbox’) and provides metadata annotation and repository submission functions. Datastage is being tested by the Universities of Essex, Hertfordshire and QML Centre for Digital Music [See section 4.1.1 for a description of DataFlow components and 5.5 for the Iridium evaluation of DataStage].

4.4.2. Sakai CLE https://sakaiproject.org/

Sakai CLE provides a suite of resources for collaboration and project management. Resources include the means to store, organise and share files, facilities for blog, chat and managing forums, and a glossary providing contextual definitions. Sakai CLE provides the VLE Part of the Hydra infrastructure at Hull [6.1.9], VRE at Newcastle [6.1.11], Bath^{^[xvi]}, Lancaster^{^[xvii]} and Monash [6.3.1]. The software has been evaluated by the Research360 [5.8] and Iridium [5.5] projects.

4.4.3. Microsoft Sharepoint http://en.wikipedia.org/wiki/Microsoft_SharePoint

SharePoint is an established Web application platform introduced by Microsoft in 2001. The platform provides a range of Web tools, including intranet portals, document and file management, collaboration, social networks, extranets, websites, enterprise search, and business intelligence.

Sharepoint is part of the infrastructure at the University of Southampton [evaluation at 5.4].

4.4.4. Digital Science Projects http://www.digital-science.com/products/projects

Projects is a research project management desktop application for Mac. It allows researchers to manage research activity, track changes to files, manage backup and restore previous versions of files, and to annotate and organise files and folders easily. This application integrates seamlessly with Figshare, though there is no institutional form to date.

4.4.5. D4Science http://www.d4science.eu/

D4Science is a European e-infrastructure project which provides a mechanism for of data e-infrastructure interoperability. The mechanism is based on the gCube software framework, which allows distributed virtual organisations to collaborate and share resources by managing the cloud / grid middleware thus configuring their own VREs.

4.4.6. Dropbox https://www.dropbox.com/

Dropbox is a commonly-used collaboration and cloud storage service, free to individuals (for volumes up to 2Gb) with added components for organisation subscription, such as file recovery, version tracking and phone support. Data is protected by 256-bit AES and SSL encryption and Two-step verification & mobile passcodes. Dropbox may store clients’ data on servers in another country.

4.4.7. Google Drive http://www.google.co.uk/enterprise/apps/education/products.html

Also known as Google Documents, Google provides collaborative and cloud storage services for educational institutions, offering 30Gb storage per user and integration with its email service and text, voice and video chat service. Security features include two-step authentication and encrypted connection to servers. A vault service is offered for secure archiving of content. Google provides these services for the Universities of Sheffield and York.

4.4.8. Luminis http://www.ellucian.co.uk/Solutions/Ellucian-Luminis-Platform/

Luminis ia a collaboration portal platform, originally provided by SunGard, now Ellucian. Used for the VLE at the University of Leeds.

4.4.9. Microsoft Dynamics http://www.microsoft.com/en-gb/dynamics/

Dynamics is a collaboration platform for customer relationship management and enterprise resource planning. This will be used for the VRE at University of Leeds, replacing Luminis.

4.4.10. YouShare http://www.youshare.ac.uk/

A web-based portal for sharing data and software, developed at the University of York and funded by HEFCE. The portal allows the sharing of data and services in a secure online environment, the execution of analysis code and analysis of data, and the curation of data, analysis code and experimental protocols.

4.4.11. Alfresco http://www.alfresco.com/

Alfresco is an open source content management system, which interfaces with Google mail and drive (forthcoming) and the institutional filestore. This is being developed at the University of York for use as a ‘Research Lab management system’ and at St. Andrews for data archiving involving a Fedora Commons repository (Allinson, 2013).

4.4.12. Amazon Web Services (AWS) http://aws.amazon.com/

AWS provides a wide range of cloud-based services for organisations, including: cloud computing and applications, storage (S3 and Glacier), databases, networking & virtual private cloud (VPC), analytics and deployment, identity and access management.

4.4.13. Huddle http://www.huddle.com/

Huddle is a collaboration platform that is designed for content sharing, document management, project and workflow management, secure intranet and extranet service. This is marketed as a ‘Sharepoint alternative’.

4.4.14. Kaltura http://corp.kaltura.com/Video-Solutions/Education

Kaltura provides an open source video management platform with a focus on universities deploying videos within their organisation. This platform includes collaborative video editing and publishing components.

4.4.15. THREDDS Data Server (TDS) http://www.unidata.ucar.edu/software/thredds/current/tds/

THREDDS (Thematic Real-time Environmental Distributed Data Servcies) Data server provides catalogue, metadata and data access for scientific datasets. TDS is open source Java middleware, and is used for part of the 3TU Datacentrum infrastructure [6.3.3].

4.4.16. HUBzero http://hubzero.org/

HUBzero is an open source content management system designed for collaborative working and data sharing for scientific research and education. It is the platform for Purdue University Research Repository [6.3.8].

4.5. Catalogue software

4.5.1. DataFinder https://github.com/bhavanaananda/datafinder

Open source software, developed at the University of Oxford to provide a catalogue of research data. The metadata schema has been developed for full description of data, people responsible, how they were generated, access arrangements, links to publications etc. Datafinder integrates with Databank software and is designed to be used, with minimal modification, by other HEIs as part of their RDM infrastructure.

4.5.2. ReDBox (Fedora) http://www.redboxresearchdata.com.au/

Redbox has been designed as a metadata store / catalogue for research data. This provides workflows and interfaces for metadata creation. ReDBox is a research data registry, so the research data is assumed to be stored elsewhere, but data and related documentation / files may be uploaded to the system. ReDBox has been developed with, and is therefore closely integrated with Mint [4.9.7], a name authority and vocabulary system. Development was supported by the ANDS.

4.5.3. XMC Cat http://d2i.indiana.edu/xmccat

This is a metadata catalogue storing rich metadata describing data objects stored in files, repositories or on the web. Metadata schemas are composed of concepts that describe data. In XMC Cat, the XML metadata schemas are partitioned into concepts, which act as the unit of metadata storage. This allows for a dynamically adaptable query interface.

4.5.4. OpenLink Virtuoso http://virtuoso.openlinksw.com/

A hybrid, multi-model data server architecture allows Virtuoso to offer Relational, XML and RDF data management, full text indexing, linked data, web application and document web server function and web service deployment (SOAP or REST).

4.5.5. Blacklight discovery interface http://projectblacklight.org/

Blacklight is an open source discovery interface for any SOLR index. Blacklight is a Ruby on Rails gem which accommodates heterogeneous data. This is part of the infrastructure for Hydra, the IR at the University of Hull [6.1.9].

4.5.6. ExLibris PRIMO http://www.exlibrisgroup.com/category/PrimoOverview

Primo is the discovery interface that offers a single search box for the whole range of a library’s collections, be they locally managed or remote electronic content. This provides the discovery interface for the libraries of the Universities of Sheffield and York.

4.5.7. III Sierra http://sierra.iii.com/

The Sierra platform provides a suite of library services, including a resource Discovery interface. With similar functionality to Ex Libris Primo, this is the Resource Discovery interface used by the University of Leeds library.

4.5.8. Greenstone http://www.greenstone.org/

Open source multilingual digital library software, able to handle a wide variety of media formats.

4.6. Current Research Information Systems (CRIS) and DMP tools

4.6.1. Symplectic Elements http://www.symplectic.co.uk/product-tour/

Elements allows the Research Office to manage their researchers’ published outputs by importing records from external sources such as WoS, Scopus, CrossRef and Figshare, and by allowing researchers to import details from Google Scholar, Mendeley, Endnote, Refman and Bibtex. Research information including HR, finance and grants administration data is managed and may be imported from legacy databases. Faculty information and academic profiles are managed and reported. Elements integrates with Eprints, Fedora and DSpace repositories through community developed plugins. Elements is the RIS in use at the Universities of Sheffield and Leeds.

4.6.2. PURE http://info.scival.com/pure

Pure provides comprehensive research information management. Pure aggregates data from awards management, HR, finance, student administration and other institutional sources. Publication data is retrieved from external sources such as Scopus, WoS, PubMed, Worldcat and Mendeley to populate Pure with information about researcher outputs. Integration with Dspace, ePrints, Fedora and Equella supports automatic population of the institutional repository. Pure is the RIS in use at the Universities of York, Lancaster and Edinburgh.

4.6.3. Converis http://www.converis5.com/

Developed by Avedas, a Thomson-Reuters company, Converis is in use at Hull and integrated into the Hydra infrastructure. Converis appears to have the same functionality as Elements and Pure. This CRIS adheres to Research information standards – CERIF, CASRAI, VIVO and ORCID (see below).

4.6.4. DMPonline https://dmponline.dcc.ac.uk/

The DMPonline tool has been developed by the DCC to help researchers create data management plans. The tool contains templates that represent the requirements of various funders and institutions. Guidance is provided during the process and the DMP may be exported in a variety of formats. The tool is used by the University of Lancaster.

4.7. Data capture and workflow management systems

4.7.1. LIMS

Laboratory information management systems manage all aspects of laboratory processes, from data capture, sample management and instrument control to workflow, document and personnel management. LIMSfinder http://www.limsfinder.com/ provides information about the numerous LIMS available.

4.7.2. Digital Lab books

ELN (Electronic laboratory notebooks) are computer applications designed to document experiments as an alternative to paper laboratory notebooks. Examples include:

Quartzy http://www.quartzy.com/; LabAssistant http://labassistant.en.softonic.com/mac; My Lab http://www.mylab.fi/en/; Sparklix https://www.sparklix.com/; eCAT http://www.researchspace.com/electronic-lab-notebook/index.html and Wingu http://signup.wingu.com/index.html. LabArchives https://mynotebook.labarchives.com/ in partnership with BioMed Central, acts as the default storage system for supplementary data published with articles in BMC journals.

4.7.3. Bioconductor http://www.bioconductor.org/

Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data. More data management software resources for Biomolecular research are provided by BBMRI http://www.bbmri-wp4.eu/node/45 and Biocompare http://www.biocompare.com/Software/.

4.7.4. COLWIZ http://www.colwiz.com/

COLWIZ provides web, desktop and mobile apps to facilitate individual and collaborative research, saving precious time for researchers at every stage: from an initial idea, through collaboration, to publication of results.

4.7.5. Kepler https://kepler-project.org/

An open source java based application to help create, execute and share analyses and data to create scientific workflows.

4.7.6. Labguru http://www.labguru.com/

This is a project management system for Life Sciences. Facilitates project management and collaboration, linking research data, protocols, results, published papers and integrating external data.

4.7.7. Labtrove http://www.labtrove.org/

Labtrove is a data-centric digital infrastructure for supporting research. The software was developed at the University of Southampton as a result of experience gained through eScience research projects such as CombeChem^{^[xviii]}, eBank^{^[xix]}, eCrystals [6.2.3], R4L^{^[xx]}, Smart Tea^{^[xxi]} and oreChem^{^[xxii]}. Research infrastructure components – repository, LIMS, pervasive computing and RDF, are integrated into a blogging / social network paradigm. In Labtrove, the data is associated with the project metadata, at the point of, or prior to, creation. Therefore researchers can recreate and adapt experiments, using automated procedures and instrument settings. The system provides the necessary framework for good data management and curation.

4.7.8. Labview http://www.ni.com/labview/products/

Labview is a graphical development environment providing a range of tools for data acquisition, instrument control, data management and reporting.

4.7.9. MyTea VLab http://mytea.org.uk/vlab

VLab enables the formation of a Smart Research Framework, helping the creation and preservation of the record. VLab extends the model of a digital infrastructure for supporting research (repositories, LIMS, pervasive computing & basic RDF underpinning) to incorporate the online blog paradigm, where a data centric system with control over visibility and sharing are essential.

4.7.10. MIEN http://neurosys.msu.montana.edu/applications/mien/

Model Interaction Environment for Neuroscience - a package of interface and library code intended to make a number of scientific modeling, data markup, and data storage tasks easier. Many of the extension functions of MIEN are devoted to neuroscience tasks, but the core MIEN package is a general purpose scientific modeling and data visualization tool with a flexible extension system.

4.7.11. MyExperiment http://www.myexperiment.org/

This is a social networking site and Virtual Research Environment (VRE) designed for people to share, discover and reuse workflows and build communities. MyExperiment was developed using the MyGrid software suite [4.10.2] by a team from the universities of Southampton, Manchester and Oxford.

4.7.12. Omero http://www.openmicroscopy.org/site/products/omero

OMERO manages images from the microscope to publication using a central repository. Data can be viewed, organized, analyzed and shared from anywhere via the internet, from a desktop app (Windows, Mac or Linux), from the web or from 3rd party software.

4.7.13. OBiBa http://www.obiba.org/

OBiBa provides open source software components for biobanks and biomolecular research.

4.7.14. SCAPE http://www.scape-project.eu/

SCAPE is an open source infrastructure platform that executes institutional digital preservation strategies, for very large, complex and heterogenous collections of digital objects, by extending repository functionality with semi-automated workflows. The system integrates Fedora Commons, Taverna and Hadoop.

4.7.15. Taverna http://www.taverna.org.uk/

Taverna is an open source and domain-independent Workflow Management System – a suite of tools used to design and execute scientific workflows and aid in silico experimentation. The Taverna suite was written in Java by the MyGrid team [4.10.2] and includes the Taverna Engine (enacting workflows), Taverna Workbench (desktop client application) and Taverna Server (allows remote execution of workflows). Taverna has been widely deployed, particularly in Bioinformatics and Chemistry, is hosted by the University of Manchester and supported by JISC, EPSRC, BBSRC^{^[xxiii]}, ESRC^{^[xxiv]} and FP7^{^[xxv]}.

4.7.16. Yogo http://neurosys.msu.montana.edu/

The Yogo Data Management System is a set of software tools created to enhance the process of data annotation, analysis and web publication. The system provides a set of easy to use software tools for data sharing by the scientific community. It enables researchers to build their own custom designed data management systems. Another branch of the system provides tools for viewing anatomical and physiological data.

4.8. Data transfer protocols

4.8.1. SWORD http://swordapp.org/about/

Simple Web-service Offering Repository Deposit (SWORD) is a lightweight protocol for depositing content from one location to another. It is a profile of the Atom Publishing Protocol (APP) and designed to ‘lower the barriers to deposit’ any content into repositories.

4.8.2. BagIt http://tools.ietf.org/html/draft-kunze-bagit-10

The BagIt file packaging format is a hierarchical file packaging format for storage and transfer of digital content. A ’bag’ is a structure to enclose a ’payload’ and descriptive ’tags’, and does not require knowledge of the payload’s internal semantics. Also at: https://github.com/LibraryOfCongress/bagit-java

4.8.3. OAI-PMH http://www.openarchives.org/pmh/

The Open Archives Initiative Protocol for Metadata Harvesting is a low-barrier mechanism for repository interoperability. OAI-PMH is a set of six verbs or services that allow data providers to expose structured metadata and make them available for harvesting by service providers’ requests.

4.9. Identifier services and identity components

4.9.1. DataCite http://www.datacite.org/

Datacite is an international organisation which supports research data archiving, access and citation by asigning persistent identifiers to datsets. An institution may join DataCite in order to have DOIs minted for its datasets.

4.9.2. DOI http://www.doi.org/

The Digital Object Identifier is an character string used to uniquely identify any object. The DOI provides an actionable (clickable), interoperable, persistent link to metadata about the object, including the URL where the object is located. The DOI for an object is permanent, whereas the location and other metadata may change, therefore the DOI may be used for persistent citation.

4.9.3. CERIF http://www.eurocris.org/Index.php?page=featuresCERIF&t=1

The Common European Research Information Format (CERIF) is a standard for managing and exchanging research information. It provides a data model that describes the research domain, defining research entities; researchers, projects, organisations, outputs and funding, and the relationships between these entities. CERIF has been developed by EuroCRIS^{^[xxvi]}.

4.9.4. Shibboleth https://shibboleth.net/

Shibboleth is a very widely deployed federated identity authentication system. It is an open source, free software system that provides single sign-on capabilities for individual access to protected online resources within and between organisations. Shibboleth is employed for user authentication at Sheffield, Leeds, York and many other HEIs.

4.9.5. ORCID http://orcid.org/

ORCID provides a persistent identifier for individual researchers, so that their identity is unambiguous. Automatic links to research outputs, publishing activities and grant applications are supported. ORCID is now integrated with Symplectic Elements and Figshare, (Hamnel, 2012).

4.9.6. CASRAI http://casrai.org/

The Consortia Advancing Standards in Research Administration Information are developing a common data dictionary and advance best practice for research information exchange and reuse.

4.9.7. VIVO http://vivoweb.org/about

VIVO is an open source semantic web platform and ontology for representing researchers and their associated training, background, activities, organizations, and outputs including publications and research resources. VIVO has been developed and implemented at Cornell University in association with other projects including CASRAI, ORCID and EuroCRIS.

4.9.8. Mint http://www.redboxresearchdata.com.au/

Mint is an open source name authority and vocabulary system that provides services to web applications. Mint was developed with ReDBox on the Fascinator platform with support by the ANDS.

4.9.9. BRII Registry http://brii.medsci.ox.ac.uk/

The Building the Research Information Infrastructure project (BRII) at the University of Oxford, aimed at developing infrastructure, built on semantic web technologies, enabling efficient sharing of research information. The registry implemented at Oxford, integrated into the Fedora infrastructure, forms a part of the Oxford DAMS and as such, benefits from data preservation. This system is not yet available for other institutions.

4.10. Other software systems and platforms of interest

4.10.1. Globus Toolkit https://www.globus.org/toolkit

The Globus Toolkit is an open source set of software components enabling the sharing of services and resources across the ‘grid’. The toolkit includes software for security, information infrastructure, resource and data management, monitoring and discovery. Services and resources may be shared across institutional and geographical boundaries whilst retaining local autonomy.

4.10.2. MyGrid http://www.mygrid.org.uk/

The MyGrid team have developed a suite of tools that support the creation of e-laboratories. These tools have been adopted by a large number of projects, across a diverse range of domains, including Taverna [4.7.15] and MyExperiment [4.7.11].

4.10.3. Apache SOLR https://lucene.apache.org/solr/

SOLR is an open source search platform developed by the Apache Lucerne project. Features include full-text search, hit highlighting, faceted search, near real-time indexing, database integration, rich document handling and geospatial search. SOLR is written in Java, has REST- like HTTP/XML and JSON APIs and runs as a standalone full-text search server.

4.10.4. MySQL http://www.mysql.com/

MySQL is the world’s most popular, free open source relational database application. MySQL is the database component of EPrints and an optional database for Fedora commons.

4.10.5. SAP https://www.sap.com/uk/solution/industry/higher-education-research.html

SAP provides a suite of software tools for university management processes.

4.10.6. Agresso / pFACT http://www.unit4software.co.uk/products/agresso

Agresso is a range of Enterprise Resource Planning (ERP) software tools.

4.10.7. Oracle http://www.oracle.com/index.html

Oracle provides a range of database systems and Enterprise management resources.

4.10.8. PostgreSQL http://www.postgresql.org/

PostgreSQL is a free open source object-relational database system. PostgreSQL is one of the databases that may be incorporated in CKAN, Fedora Commons and DSpace.

4.10.9. Drupal https://drupal.org/

Drupal is a free open source content management system, which may be used to provide a web-based user interface for many applications (such as catalogue databases).

4.10.10. Moodle https://moodle.org/

Moodle is a free open source learning management system (LMS) / virtual learning environment (VLE).

4.10.11. Hadoop http://hadoop.apache.org/

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.

5. Reviews, Evaluations and Comparisons of Infrastructure Elements

Several of the JISC research data management infrastructure projects published the results of reviews, evaluations and comparisons they carried out to select the infrastructure components and RDM tools they were to trial. A number of projects published the results of user requirements surveys, conducted to choose from the components and tools available. These are indicated below, where the findings may be briefly described.

5.1. CKAN for Research Data Management in an Academic Setting

A workshop was held on 18^th February 2013, facilitated by the JISC MRD programme, to investigate the use of CKAN for RDM. The workshop featured presentations from Bristol and Lincoln and discussions fed into a user requirements gathering exercise. CKAN capabilities in fulfilling these requirements are expressed in the output of this work, which is available at: http://lncn.eu/mxz2 (Winn et al. 2013).

5.2. Admire

The project issued document outlining considerations in developing a Research Data Management repository strategy, which included a review of repository software: http://admire.jiscinvolve.org/wp/files/2013/05/ADMIRe-RDM-Repository-Strategy-Requirements.pdf (Berry and Parsons, 2012b).

The EQUELLA digital repository system, in use as a DAMS at Nottingham, was piloted for use as a data repository. The evaluation of the data repository pilot was reported at: http://admire.jiscinvolve.org/wp/files/2013/05/ADMIRe-EQUELLA-Research-Data-Repository-Pilot.pdf (Berry and Parsons, 2012a). Key issues revealed include the need for manual validation of the metadata entered through a wizard, the workflow requirements of obtaining a DOI and storing non-open datasets.

5.3. Data.bris

The CKAN data portal platform was investigated by the data.bris project to provide a public read-only catalogue of research data publications (data discovery) and also to manage controlled-access active data (collaborative sharing). The team gave a presentation on their evaluation of CKAN at the JISC ‘CKAN for research data management in an academic setting’ workshop - reported on the blog at: http://data.bris.ac.uk/2012/12/18/ckan-and-data-bris/ (Price, 2013a), and the slides are available at: http://data.bris.ac.uk/files/2013/02/databris-ckan.pdf (Price, 2013b). CKAN has now been adopted as part of the implemented infrastructure at Bristol [6.1.1].

5.4. Datapool

The JISC Datapool project at the University of Southampton was concerned with modifying Microsoft Sharepoint to create data deposit interfaces, consisting of project and dataset forms that can collect metadata to feed through to the EPrints repository. The metadata profile of the EPrints institutional repository was extended using the ReCollect plugin, adapting the repository for research data. A report was published, describing the integration, available at:

http://EPrints.soton.ac.uk/352813/3/EPrints-sharepoint-report-final10.pdf (Hitchcock and White 2013).

5.5. Iridium

The Iridium project involved the assessment and testing of a number of systems:

· A categorisation and brief review of 67 RDM tools and infrastructure components was carried out and reported at: http://research.ncl.ac.uk/media/sites/researchwebsites/iridium/iridium_external_tools_assessment_17_5_2013_v1_PGR_LW.pdf (Iridium Support Team, 2012).

· An evaluation of DataStage and DataBank, reported at: http://iridiummrd.wordpress.com/2013/02/14/iridium-evaluation-of-datastage-and-databank-research-data-management-tools-from-dataflow-project/ (Wood, 2013).

· CKAN use case report at: http://research.ncl.ac.uk/media/sites/researchwebsites/iridium/iridium_CKAN_case_study_12_6_2013_v1_BA.pdf (Allen, 2012)

· Sakai integration into RDMI http://iridiummrd.wordpress.com/2011/11/22/research-data-management-at-euro-sakai-2011/ (Martin, 2011).

CKAN was adopted as the platform for the research data portal at the University of Newcastle. Information on the CKAN data portal is available at: http://research.ncl.ac.uk/rdm/tools/ckan/

5.6. Kaptur

The Kaptur project carried out an evaluation of technical systems in May 2012, to judge their suitability for the management of visual arts research data. A set of user requirements was created with which to evaluate the technical system capabilities, based on: software type and cost, storage requirements, interface requirements, system requirements and institutional requirements. Seventeen software systems were chosen to evaluate and five were short-listed by their high scores: Dataflow, DSpace, EPrints, Fedora and Figshare. These were then measured against a more detailed set of requirements, and EPrints was deemed the most viable option, particularly since it was already in use at the partner institutions. However, Figshare and Dataflow were strong contenders and fulfilled some of the requirements that EPrints did not. Therefore two pilots were implemented, an integration of EPrints with Figshare and an integration of EPrints with Datastage. The findings were reported in the ‘Kaptur Technical analysis report’ at: http://www.research.ucreative.ac.uk/1239//1/Kaptur_technical_analysis.pdf (Garrett et al. 2012).

In November 2012, it was agreed that neither of the two pilots were viable and that an integration of EPrints and CKAN, not available for the earlier technical analysis, would be piloted. It was determined that the EPrints-CKAN instance, although integration was not fully possible at the time, was a stronger, sustainable model and worth continuing to develop in the future (Gramstadt, 2013).

5.7. Orbital

The concept of a ‘minimum viable product for RDM’ was developed for the Orbital project, and its feature set considered to be authentication, storage, hosting/publishing, licensing, persistent URI and analytics. CKAN was chosen as the platform for the data repository, as it was found to meet these requirements ‘out of the box’, and for many other reasons as reported in the blog post at: https://orbital.blogs.lincoln.ac.uk/2012/09/06/choosing-ckan-for-research-data-management/ (Winn, 2012). An evaluation of the use of CKAN for RDM, presented at two conferences, is available at: http://eprints.lincoln.ac.uk/9778/ (Winn, 2013a).

5.8. Research360

The Research360 project at the University of Bath, carried out a survey of user requirements for a research data repository, published at: http://opus.bath.ac.uk/34082/ (Cope, 2013). Development of the technical infrastructure involved integration of ePrints and the HCP file storage system, described at: http://opus.bath.ac.uk/35532/3/Research360_EPrints_HCP_Report_FINAL.docx.pdf (Research360, 2013). Modification of Sakai to enable deposit of material into a SWORD2 compliant repository is described at: http://opus.bath.ac.uk/35540/3/Research360_Sakai_Development_Report_FINAL.pdf (Research360 Project, 2012).

5.9. Roadmap

Research data repository functional requirements were compiled by the RoaDMap repository working group. The criteria were based on the Kaptur project review and enlarged to account for the local context. The draft repository functional requirements are available at: http://library.leeds.ac.uk/downloads/file/389/data_repository_platform_functional_requirements (RoaDMaP repository working group, 2013), and discussed in a blogpost at: http://blog.library.leeds.ac.uk/blog/roadmap/post/163 (Proudfoot, 2013c).

Initially it was considered expedient to build upon the existing EPrints infrastructure, although Dataflow offered a better fit with project needs, so was considered. Dataflow however, revealed technical issues (the link between DataStage and DataBank), so other platforms were considered. Using three case studies, the functional requirements were tested against the three main candidates for repository platform: EPrints, CKAN and DataFlow. EPrints was eventually chosen for a pilot service, given the short timescale given for EPSRC compliance (Proudfoot et al. 2013).

5.10. SMDMRD

User requirements were gathered by questionnaire and interview, using DAF methodology. The main user requirements were:

Seamless access, or command line access with batch import/export support
easy to use web interface for searching published datasets
advanced metadata-based search function
customisable metadata and RDF support
dataset version control
multi-level access control
linking data to published papers (DOI or handle.net)

In order to choose a platform for a prototype data management system, the project compared installations of Fedora Commons (using an Islabdora Drupal module), Dspace, DataVerse and DataFlow in fulfilling the following criteria:

Meeting user requirements out of the box
Ease of install, getting it running and maintenance
Ease of customisation
How many standards supported
How well developed, supported and widely used

The project team favoured DataFlow because of DataStage functions, but it was still under development. DSpace was found to be easiest to install and run, much online help being available. Queen Mary University also has a DSpace institutional repository already. The team found Fedora difficult to install and run, and Dataverse limited in its functionality, particularly metadata customisation. Eventually, DSpace was chosen for the pilot data management system, with the intention to combine it with DataStage to integrate researcher workflows, using the SWORD protocol to transfer datasets. The report on platform choice is available at: http://rdm.c4dm.eecs.qmul.ac.uk/platform_choice (Fabiani, 2012).

5.11. UWE Managing Research Data

The project team chose EPrints for the research data repository at UWE because it is already in use for the Institutional repository, therefore no further funding was necessary and they had the skills necessary to repurpose the system. They were in communication and sharing knowledge with other institutions that had already used EPrints for data publication. These factors are discussed in the document available at: http://www2.uwe.ac.uk/services/library/using_the_library/Services%20for%20researchers/eprints-data-repository-uwe.pdf (Holliday, 2012).

5.12. Loughborough University UK HE Research Data Management Survey

A survey of UK HEIs was conducted to determine their plans for future RDM services and received responses from 38 institutions. Regarding technical infrastructure components and tools, the results revealed that 6 (16%) institutions had an operational Research Data Service, with 25 (66%) developing one. Most institutions were storing or aimed to store both data and metadata, 2 planned to hold only metadata and 2 planned to hold just the data. Regarding the software system the service used or was intending to use:

· EPrints – 11 institutions

· DSpace – 4

· PURE – 4

· Symplectics – 2

· Converis – 1

· Figshare – 1

· iRODS – 1

· Other systems - 13 (included DataFlow, Fedora/Hydra, Equella and in-house developed)

A report on the survey results, and links to the survey results, are available at: http://blog.martinh.net/2013/10/metadata-is-love-note-to-future-uk.html (Hamilton, 2013).

5.13. St Andrews

CKAN was investigated as the platform for a pilot RDM system at the University of St Andrews, as part of the JISC funded C4D project. A list of user requirements was composed, with which to measure CKAN suitability, which contributed to the work done at the ‘CKAN for Research Data Management Workshop’ [5.1], as published in the blogpost at: https://research-computing.wp.st-andrews.ac.uk/2013/03/15/ckan-for-research-data-management/ (Plietzsch, 2013a). CKAN was chosen for the pilot, evaluation process being described in the blogpost at: http://research-computing.wp.st-andrews.ac.uk/2013/11/27/using-ckan-for-research-data-management/ (Plietzsch, 2013b).

5.14. iREAD

The iRODS evaluation and demonstrator project provided an evaluation and demonstration of the iRODS system, assessing the capabilities of a demonstrator system against use-case requirements from the CARMEN project. The evaluation is available at: http://www.wrg.york.ac.uk/iread

5.15. DANS Easy

The process of deciding between Fedora, ePrints and DSpace, for the DANS Easy data repository service, is described in the paper at: http://www.ais.up.ac.za/digi/docs/bogaards_paper.pdf (Bogaards, 2009)

5.16. DCC

The DCC provide a catalogue of RDM tools and services at: http://www.dcc.ac.uk/resources/external/tools-services

5.17. ANDS

Provides information on technical resources at: http://www.ands.org.au/resource/techdocs.html ;

and on metadata stores solutions at: http://ands.org.au/guides/metadata-stores-resources.html .

5.18. JISC Digital Media

Advice on various aspects of managing digital media collections may be found at: http://www.jiscdigitalmedia.ac.uk/managing

6. Active Institutional Infrastructure examples

Several institutions have now established institutional research data repositories, or host discipline-based or multi-institutional project-based research data repositories. The UK based institutional data repositories open to external view, are listed below. Some discipline-based data repositories based at UK HEIs are also listed, together with a number from institutions outside the UK.

6.1. UK institutional data repositories

6.1.1. Bristol http://data.bris.ac.uk/data/

The Open data repository is still under development. CKAN has been selected for the data repository and functions as a catalogue of research data. This integrates with the PURE RIS, which functions as a catalogue of research outputs (Price, 2013a).

6.1.2. DSpace at Cambridge https://www.repository.cam.ac.uk/

The institutional repository is now able to preserve and publish research data.

6.1.3. Edinburgh Datashare http://datashare.is.ed.ac.uk/

This repository is based on DSpace. The technical infrastructure at Edinburgh involves integration with PURE, active data infrastructure and the DMPonline tool.

6.1.4. Essex Research Data http://researchdata.essex.ac.uk/

This data repository is built on the EPrints platform, modified using the ReCollect plugin to accept datasets. The service includes allocation of Datacite DOIs.

6.1.5. Open Research Exeter https://ore.exeter.ac.uk/repository/

Based on the DSpace platform, material may be deposited via Symplectic. ORE’s content includes journal articles, conference papers, working papers, reports, book chapters, videos, audio, images, multimedia research project outputs, raw data and analysed data. Exeter's three former repositories (The Exeter Research and Institutional Content Archive (ERIC), Digital Collections Online (DCO) and the Exeter Data Archive (EDA)) were merged into ORE and all previous content is still available via the same permanent link. The merger took place in March 2013.

6.1.6. GSoA RADAR http://radar.gsa.ac.uk/

This ePrints based Glasgow School of Art institutional repository accepts a wide range of objects including research data. This repository was the subject of a case study for the KAPTUR project.

6.1.7. Goldsmiths Research Data Catalogue http://eprints-data.gold.ac.uk/

Goldsmiths research data catalogue is built on the ePrints platform and results from the work done for the KAPTUR project.

6.1.8. Hertfordshire UH Research Archive http://rdm.herts.ac.uk/rdm/uh-research-archive.html

This is a DSpace institutional repository that is being expanded to include a data catalogue and a research data archive.

6.1.9. Hull Hydra https://hydra.hull.ac.uk/

The digital repository at Hull is built on the Hydra micro-services architecture [see 2.1.2 and 4.2.5]. The repository is designed to hold a wide range of digital resources including research datasets.

6.1.10. University of Lincoln Researcher Dashboard https://orbital.lincoln.ac.uk/

The Researcher Dashboard is the interface for the Data deposit workflow, facilitated by the ‘Orbital Bridge’ application (Stainthorp, 2013). This links the various components of the RDMI: an EPrints IR for published research papers, network storage, Lincoln’s Awards Management System and a CKAN based data registry (Stainthorp, 2012).

6.1.11. University of Newcastle https://research.ncl.ac.uk/rdm/tools/

A Research Data infrastructure has been implemented at Newcastle which includes a CKAN data portal (for archiving and publishing data) together with a number of in-house built systems – a MyProject (a project and awards management system), MyImpact (a researcher profile and publication information system), a Research Data Catalogue (linking data, projects and publications), a VRE and e-Science Central (research collaboration tools).

6.1.12. Oxford DataBank https://databank.ora.ox.ac.uk/

Databank is a Fedora based data repository for the University of Oxford. Data may be stored and preserved in the long-term, retrieved and published from anywhere on the web. This is a component of the DataFlow infrastructure at Oxford, alongside DataStage which provides local management of active research data, including metadata annotation and a collaborative workflow. The RDM infrastructure also includes the Online Research Database Service (ORDS)^{^[xxvii]} and the institutional repository, Oxford University Research Archive (ORA) ^{^[xxviii]}.

6.1.13. C4DM-RDR http://c4dm.eecs.qmul.ac.uk/rdr/

The Research Data Repository at Queen Mary University of London, Centre for Digital Music is a Dspace based repository [Discussion of the process of selection at 5.7]. This repository was specifically configured for long-term preservation and sharing of multimedia file formats.

6.1.14. ePrints Soton http://eprints.soton.ac.uk

The EPrints institutional repository at the University of Southampton has extended the existing the list of data types accepted to include datasets and experiments, using the ReCollect plugin. The EPrints Soton now holds research data underlying published research (papers) outputs. Another strand of work, using Sharepoint to catalogue and share active data, has yet to be implemented.

The University of Southampton has a federated approach to repository management and so there are a number of instances of ePrints being used by departments to curate their research outputs.

6.1.15. University of the Arts London Data Repository http://www.researchdata.arts.ac.uk/

This repository for research data is built on an ePrints platform.

6.1.16. UCA Research Online http://www.research.ucreative.ac.uk/

UCARO is the institutional repository and accepts a wide range of research outputs including research data. This is built on the ePrints platform.

6.1.17. UWE Research Data Repository http://researchdata.uwe.ac.uk/

An instance of EPrints was modified for use as the data repository at UWE. The project developed its own metadata profile for research data, having decided against subscribing to the Datacite scheme and before the Recollect plugin became available.

6.2. Discipline-based research data repositories hosted by UK HEIs

6.2.1. Edina ShareGeo http://edina.ac.uk/projects/sharegeo/

Not an institutional repository, but based at The University of Edinburgh, here, DSpace has been customised to offer a repository that eases both the deposit and discovery of geospatial data.

6.2.2. Leeds DART Data Portal http://dartportal.leeds.ac.uk/

The Detection of Archaeological Residues using Remote-sensing Techniques (DART) research project maintains a CKAN data portal for the open data outputs from the project.

6.2.3. eCrystals at the University of Southampton http://ecrystals.chem.soton.ac.uk/

The University of Southampton department of Chemistry holds data from X-ray diffraction experiments in an ePrints repository. Each ePrint instance consists of Bibliographic data, data collection parameters and files; the files include raw data (.hkl), visualisations (.jpg), experimental conditions (.htm), structure determination outputs, final structural result (.cif and .cml) and a validation report.

6.2.4. UKDA http://www.data-archive.ac.uk/

Not an institutional, but a national social and economic research data repository based at the University of Essex. The UKDA provides the UK Data Service ^{^[xxix]}, which curates key quantitative and qualitative data, UK Data Service ReShare ^{^[xxx]}, curating data from ESRC funded research and the HDS ^{^[xxxi]} (successor to the AHDS). These are housed on a modified ePrints repository platform.

6.2.5. CARMEN Portal http://www.carmen.org.uk/portal

The CARMEN Portal is a VRE to support e-Neuroscience, providing storage and processing services over a Grid infrastructure. The CARMEN system is a three-tier web architecture consisting of a web portal, an application layer and a storage layer, developed by a collaboration of researchers from 11 UK universities. The Java portal allows the user to access data and to create and run analysis tool on remote servers. The storage layer is shared between MySQL databases and a SRB (Storage Resource Broker) system. The application layer consists of Java servlets, providing a middleware layer that bridges storage and portal.

6.3. Institutional and discipline-based research data repositories outside the UK

6.3.1. Monash University http://arrow.monash.edu.au/vital/access/manager/Index

Arrow, the research repository at Monash provides a place for researchers to store and manage research data and related publications. The university provides LaRDS (Large Research Data Store) for research datasets storage, which is used for collaboration using the Confluence wiki and Sakai VRE, and publishing data via the research repository Arrow. Monash also hosts a number of project based research data repositories. Research datasets are catalogued through the various current RDM platforms. This metadata may be harvested by the Research Data Australia (RDA) service, which provides a national research data catalogue. Monash does not have an institutional research metadata repository (catalogue) since this service is provided at the national level (Jones, 2013). The software system employed is ‘VITAL’.

6.3.2. Griffith University http://equella.rcs.griffith.edu.au/research/logon.do

The Research Data Repository is based on Equella, and participates in the RDA catalogue. Some research data collections may be discovered using the Research Hub service at: http://research-hub.griffith.edu.au/collections.

6.3.3. 3TU Datacentrum http://datacentrum.3tu.nl/en/home/

3TU.Datacentrum, a collaboration of TU Delft, TU Eindhoven and University of Twente Libraries, provide a data repository, storing datasets from technical and scientific research in the Netherlands, and data processing services. Datacentrum is built on Fedora Commons and THREDDS dataserver architecture.

6.3.4. DANS EASY https://easy.dans.knaw.nl/ui/home

Easy is the online archiving system provided by the Data Archiving and Networked Services (DANS), an institute of the Royal Netherlands Academy of Arts and Sciences (KNAW) and the Netherlands Organisation for Scientific Research (NOW). The repository is built on Fedora Commons architecture.

6.3.5. California Digital Library Merritt Repository https://merritt.cdlib.org/

Merritt is built on a micro-services architecture providing digital curation through a series of devolved, independent but interoperable services. By devolving functions to a set of small self-contained services, they are easier to deploy, maintain and develop, leading to a flexible system able to respond to diverse needs and an ever changing technical environment. One of the central services is the Curation Storage micro-service, which supports a set of behaviors for manipulating and retrieving entities and their properties. Interaction with the Storage service is provided via a Java procedural API, a command line API, and a RESTful web API. The micro-services available are listed at: https://confluence.ucop.edu/display/Curation/Microservices.

6.3.6. Harvard Dataverse Network http://thedata.harvard.edu/dvn/

The Harvard Dataverse Network is a repository for sharing, citing and preserving research data; open to all scientific data from all disciplines worldwide. This is built on the Dataverse repository application and is part of the Dataverse Network.

6.3.7. Johns Hopkins Data Archive https://archive.data.jhu.edu/dvn/

The JHU Data Archive runs on the Dataverse repository software platform and is part of the Dataverse network.

6.3.8. Purdue University Research Repository https://purr.purdue.edu/

PURR provides an online, collaborative working space and data-sharing facility, based on the HUBzero platform.

6.3.9. Rutgers University Research Data Portal https://rucore.libraries.rutgers.edu/research/

RUresearch makes research data available to the scholarly community and provides a collaborative workspace for data processing and reuse. The system also provides access to supplementary resources, codebooks, lab books and publications to give context to the data. RUresearch is built on Fedora Commons architecture.

6.3.10. University of Virginia Libra http://libra.virginia.edu/

The University of Virginia institutional repository, Libra, is built on the Hydra micro-services platform and now accepts research datasets.

6.3.11. ICPSR http://www.icpsr.umich.edu/icpsrweb/ICPSR/index.jsp

The Inter-university Consortium for Political and Social Research provides a discipline-based data repository, located at the University of Michigan, Ann Arbor. This repository is built on the DuraCloud platform. ICPSR provides a range of other data curation tools and services.

A list of research data repositories, including discipline-based, national and institutional research data repositories can be found at Databib^{^[xxxii]} http://databib.org/ and at re3data.org^{^[xxxiii]} http://www.re3data.org.