introducing the base service measure S-4:
repositories & storage
When accessing or generating data, when analysing or publishing data, engineers are dealing with a huge variety of experimental settings and research circumstances. Besides, the number of research data management solutions is increasing. Different kinds of repositories, storage systems, and recommendations for archiving exist but are not shaped on the discipline specific requirements. For repositories, we distinguish on the one hand institutional repositories (e.g., RADAR, Zenodo, KITopen) and on the other hand community specific repositories (e.g., NOMAD for material sciences). This is a very useful method for publishing research data and can be part of a long-term archiving plan, but the required flexibility for heterogenous data, which is still in the mode of changing and analysis, is missing. Therefore, we provide tools and develop interfaces for integration and interoperability of repositories. Building up community specific repositories, which serve as best practises and tell stories of successful research inside the engineering sciences, will contribute to the overall goal of realising the FAIR principles. Harmonisation and standardisation of protocols and interfaces is one of the tasks planned in this measure to implement the “A” in FAIR and to enable engineers to share their data easily across local storage and repositories.
Especially when it comes to experimental test benches, the means of generating data are often part of the initial research question. This leads to the common practice that engineers operate their own storage infrastructures to support their scientific workflows. Existing storage infrastructures available to the researchers hence are distributed among local, national, and international providers from academia and industry. While this wide range of offers greatly encourages sovereign scientific work, it also leaves engineers widely unsupported by infrastructure providers when it comes to their individual scenario and makes it harder to reuse data collected by other research groups.
key challenges & objectives
The heterogeneity of data generation workflows and storage formats poses a challenge to researchers in engineering. Within this measure, it is therefore our goal to define best practices and tools for storage, exchange, and long-term preservation for data of varying quality and volume to foster reusability of research data in this highly decentralised environment.
The objectives are aligned with the tasks:
- Establishment and maintenance of best practices and recommendations for community specific repositories & storage solutions
- Development of software for federated storage services
- Development of a cost and distribution model for storage
Furthermore, the development of storage and repository federations requires a strong involvement in the cross-cutting sections (e. g. Authentication and Authorization Infrastructure (AAI)) of NFDI.
tasks
Task S-4-1: Establishment and maintenance of best practices and recommendations for community specific repositories & storage solutions
We will compile a catalogue of data formats and storage technologies for engineers considering sustainability, interoperability, and accessibility including a review of suitable technologies. We will deliver recommendations for building new repositories based on the continuous analysis of the state of the art and a collection of existing community specific repositories (to be registered in re3data) and provide tutorials and other training material in cooperation with measure S-6 “Community-based training on enabling data-driven science and FAIR data”
Task S-4-2: Development of software for federated storage services
We will develop a software stack based on existing solutions that implements repositories, harmonised protocols/interfaces, and best practices for operators of federated storage services that incorporate good scientific practice and the FAIR principles. Considering data storage, the software will take into account data management workflows but also authentication, user management, and access for users from external organisations (in cooperation with measure S-5 “Overall NFDI software architecture – data security and sovereignty”). With the definition in place, we will reach out to existing infrastructure providers to achieve a higher level of standardisation of storage infrastructures for engineering researchers.
Task S-4-3: Development of a cost and distribution model for storage
We will develop a cost and distribution model that allows participating institutions to share storage infrastructures and compensate for costs inflicted by acquisition and operation of the infrastructures when research data is accessed by researchers of the community.
results
The partners and contributors meet every month virtually to exchange results and information. Available data and metadata storage systems have been compiled and analyzed. The information was initially gathered on the NFDI4Ing-Sharepoint (registration required), and is now available to the public via the S-4 GitLab repository (link).
contact information
The measure S-4 is lead by:
Achim Streit
achim.streit@kit.edu
Scientific Computing Center (SCC)
Karlsruher Institut für Technologie (KIT)
Rainer Stotzka
rainer.stotzka@kit.edu
Data Exploitation Methods
Karlsruher Institut für Technologie (KIT)
Marius Politze
Politze@itc.rwth-aachen.de
RWTH Aachen
IT Center
For general information on the measure S-4 please contact:
Philipp Ost
philipp-joachim.ost@kit.edu
Data Exploitation Methods
Karlsruher Institut für Technologie (KIT)