introducing the base service measure S-2:
research software development
Many scientific workflows are governed by algorithms written in software codes. Therefore, software is an important type of research data in the engineering sciences in its own right. It is thus important to build RDM services that take the special properties of research software into account.
key challenges & objectives
The key challenges of the base service S-2 are:
- Integrate with existing environments for software based experiments (Jupyter, Schedulers)
- Make workflow runner infrastructure available to researchers
- Provide a catalog of workflows and workflow steps for quality assessement and RDM processes
A particular computer experiment is usually represented by a snapshot in time, with respect to the state of the source codes and scripts employed as well as the operational environment it is executed on. Replicability guarantees that a software-driven computer experiment, repeated in the same operational environment, produces the same results. Reproducibility, on the other hand, ensures that a software-driven computer experiment can be repeated in a different context. As also outlined in the challenges for the archetype BETTY the situation is particularly complex in engineering, as many different programming languages are employed, often within the same computer experiment (e.g. python scripts for orchestration of simulations that are written in C++, C and FORTRAN). The actual execution environment of a piece of software is also affected by issues such as the hardware platform (e.g. accelerators of various flavors), libraries, the operating system, and compilers.
The goal of this measure is to provide services and best practices that allow researchers to combine RDM and enterprise-grade software development workflows.
S-2-1 Infrastructure for replicable and reproducible software based experiments
We will set up reusable workflows, e.g. script-based, that relate source codes, experimental setups, and data processing routines. Best-practices in continuous integration, arising, for example, from the activities of measure B-1 will be standardised, documented and continually assessed, in particular with respect to version control, automated testing, deployment, and linking to publication records.
We will make them usable not only for highly-skilled software developers, but also for the large number of researchers without a background in software engineering. We employ and assess container solutions such as Singularity for packaging up simulation experiments. We will closely collaborate with measure B-3, emphasising generality of usability as well as enhanced awareness of these platforms of HPC environments and their requirements on performance and emulation of hardware features. Finally, we will deploy, in collaboration with measure B-3, a JupyterHub server to lower further the threshold for new users.
S-2-2 Services for the assesssment of the quality of software created by engineering researchers
Templates and best practice examples will be made available that allow to create quality metrics to make sustainability and reusability of source codes tangible for researchers using existing enterprise grade solutions. Based on existing continuous integration software, we create a service that can be used by engineering researchers for continuous and automatic generation of quality metrics to enhance their software during development and for judging the quality of existing codes before they are being reused. We will connect quality metrics with RDM workflows to extract additional metadata from existing source codes or documentations to make RDM tasks like publishing easier for engineers.
In Task S-2-1 we are currently building a knowledge base consisting of short articles that describe how to set up the different aspects of a modular workflow, based on the experiences made in the context of SFB1194. These articles cover, for example, containerization, automated testing, deployment, and cross-linking of publication, data and code. The knowledge base currently can be accessed via https://tuda-sc.pages.rwth-aachen.de/projects/nfdi-4-ing-kb/
To test and further improve this knowledge base we initiated a collaboration with the group Hydraulic Engineering from Civil and Environmental Engineering Department at TU Darmstadt in the context of OpenFOAM-based modelling of heat transfer in streams.
In Task S-2-2 two introductory Workshops were organized in conjunction with FDM.NRW in December 2020 and April 2021. The workshops will be continuously offered on approximately quarterly to semi annual basis. Future workshops will be announced on the webpage of FDM.NRW (https://www.fdm.nrw). Workshop materials are available on GitLab.com (https://gitlab.com/gitlab-nrw-workshop-2021-04) under CC-BY-SA 4.0 license.
We built a demonstrator for assesment and reporting of RDM KPIs as Schedules build on top of the GitLab infrastructure provided by NFDI4Ing (https://git.rwth-aachen.de and https://git-ce.rwth-aachen.de). Both instances can be accessed by researchers from the engineering community. Current work focusses on making workflow steps and infrastcutures more widely available.