S-4 - NFDI4Ing

The NFDI4Ing base services.
Measure S-4

repositories & storage

Measure S-4: Repositories & storage

When accessing or generating data, when analysing or publishing data, engineers are dealing with a huge variety of experimental settings and research circumstances. Besides, the number of research data management solutions is increasing. Different kinds of repositories, storage systems, and recommendations for archiving exist but are not shaped on the discipline specific requirements. For repositories, we distinguish on the one hand institutional repositories (e.g., RADAR, Zenodo, KITopen) and on the other hand community specific repositories (e.g., NOMAD for material sciences). This is a very useful method for publishing research data and can be part of a long-term archiving plan, but the required flexibility for heterogenous data, which is still in the mode of changing and analysis, is missing. Therefore, we provide tools and develop interfaces for integration and interoperability of repositories.

Key challenges and objectives

The heterogeneity of data generation workflows and storage formats poses a challenge to researchers in engineering. Within this measure, it is therefore our goal to define best practices and tools for storage, exchange, and long-term preservation for data of varying quality and volume to foster reusability of research data in this highly decentralised environment.

The objectives are aligned with the tasks:

1. Establishment and maintenance of best practices and recommendations for community specific repositories & storage solutions

2. Development of software for federated storage services

3.Development of a cost and distribution model for storage

Furthermore, the development of storage and repository federations requires a strong involvement in the cross-cutting sections (e. g. Authentication and Authorization Infrastructure (AAI)) of NFDI.

Tasks

Task S-4-1: Establishment and maintenance of best practices and recommendations for community specific repositories & storage solutions
We will compile a catalogue of data formats and storage technologies for engineers considering sustainability, interoperability, and accessibility including a review of suitable technologies. We will deliver recommendations for building new repositories based on the continuous analysis of the state of the art and a collection of existing community specific repositories (to be registered in re3data) and provide tutorials and other training material in cooperation with measure S-6 “Community-based training on enabling data-driven science and FAIR data”.

Task S-4-2: Development of software for federated storage services
We will develop a software stack based on existing solutions that implements repositories, harmonised protocols/interfaces, and best practices for operators of federated storage services that incorporate good scientific practice and the FAIR principles. Considering data storage, the software will take into account data management workflows but also authentication, user management, and access for users from external organisations (in cooperation with measure S-5 “Overall NFDI software architecture – data security and sovereignty”). With the definition in place, we will reach out to existing infrastructure providers to achieve a higher level of standardisation of storage infrastructures for engineering researchers.

Task S-4-3: Development of a cost and distribution model for storage
We will develop a cost and distribution model that allows participating institutions to share storage infrastructures and compensate for costs inflicted by acquisition and operation of the infrastructures when research data is accessed by researchers of the community.

Results

The partners and contributors meet every month virtually to exchange results and information. Available data and metadata storage systems have been compiled and analyzed. The results are now available to the public via the S-4 GitLab repository.