Caden 2.0 - NFDI4Ing

The archetype concept.
Introducing the archetype:

Caden

Hello, I’m Caden. 

I’m an engineer whose research deals with complex sequences of processing and analysing steps, applied to samples and/or data sets. My professional background is mostly informed by materials science, building materials science, materials technology, process engineering, and technical chemistry.

For me, the output of one processing step is the input of a subsequent one. The processed objects can be physical samples (specimens) or data samples. For instance, I synthesise an alloy sample, temper, and etch it. After that, I analyse the sample in various measurement setups, creating data sets. Thus, in the context of my data management, physical and data samples are treated the same and both are commonly referred to as “entities”. Similarly, all kinds of processing steps, whether working on physical or data samples are referred to as “activities”. The resulting graph of entities and activities comprises my workflow.

Key challenges and objectives

A central interest of Caden is the so-called provenance tracking of samples and data. The central requirement is to store data entities (i.e., both data and metadata) and to store parameters of activities (e.g., temperatures, pressures, simulation parameters) in a structured and traceable manner. In addition, entity links must be created to describe a graph topology. The graph can be very complex and non-linear (i.e., contain branches and bifurcations) with a large number of process steps.

Another challenge for Caden is the cooperation between different institutions. It is quite common for institutions to have their own individual repositories and metadata schemes, often with little overlap to those of other institutions. Consolidation of process steps (i.e. the fragments of the workflow graphs) across institutional boundaries is often difficult, and as of now there is no way to automate this step (e.g. via machine-processable link). One way to solve this issue is the use of a unified research data infrastructure, such as Kadi4Mat or eLabFTW.

Approach & measures

A first measure to work towards fulfilling Caden‘s desideratum for a comprehensive compilation of best practices, is an in-depth review of the literature. For this, we will screen the publications already used to determine the state of the art to extract success stories that fit the requirements of archetype Caden, and subsequently extend this effort to non-German publications. Additionally, this screening allows us to identify scientific engineering work groups in Germany whose profiles match Caden and set up structured interviews to gather data on their work practices, use cases and so on. Further interview partners will be found amongst the existing partners of NFDI4Ing.

Close interaction with the materials science and engineering community (research area 43 according to the DFG-nomenclature) ensures that our efforts are communicated into the community and feedback can be collected. The structured interviews will be followed up by workshops with engineers, with the goal of formulating novel best practices for archetype Caden that are then recommended to be deployed by the community. Additional research will identify freely available (not necessarily open-source) RDM tools that are useful for archetype Caden. This includes input from the interviews and workshops. We will compile a report that allows engineers evaluating and comparing the tools.

Results

The questionnaire to collect information on best-practice examples is prepared and will be sent out to the community soon. Several presentations describing the project where held and allowed interacting with the communities gathering new insights in the needs and solutions that are available. Data formats for handling the needs for provenance tracking are discussed and their usage will be evaluated.