Information silos present a significant obstacle to the comprehensive reuse of research data by other researchers. They therefore hinder the realisation of innovative research. These silos arise when researchers publish their data individually in different formats and systems, either on local storage media or without standardised terminology for description, and when this happens without overarching coordination or a common infrastructure.
Consequently, this data can only be tracked individually or within narrow organisational boundaries (e.g. across teams). This data is therefore in a silo. If this data is to be made available to external researchers for further use, additional steps are required.
Spannung ≠ Spannung
When research data is described using terminologies — or, more precisely, metadata based on terminologies — these information silos manifest themselves in the form of ambiguities or inconsistencies, among other things. Ambiguities arise when the metadata entries used have multiple definitions. In engineering, for instance, the german word “Spannung” can refer to either electrical voltage or mechanical tension. Without additional context information, a metadata field called ‘Spannung’ leads to inconsistent queries. It would remain unclear whether 230 V or N/mm² is meant, or whether the voltage is ‘high’ or ‘low’. In such cases, terminologies can help to clearly describe an experiment by assigning specific units of measurement (V or N/mm²).
The requirements become much more complex if the process that led to the creation of the data is also to be made available as metadata. This is necessary, for example, to validate an experiment by repeating it (reproducibility). In such cases, it is important that the processes and resources involved in the individual steps are defined, as well as the units of measurement. This could include sensors for monitoring, specially prepared data sets as input, or versions of the analysis software used. This requirement is far more complex than merely specifying metadata, as the relationships between the data must also be defined.
Meaningful Metadata
The challenge lies in making the metadata much more meaningful. At NFDI4ING, we are therefore developing a Common Information Model (CIM). The CIM is a standardised, structured data model that provides a uniform description of technical objects, processes, and procedures. It facilitates the integration and exchange of information between different systems and disciplines, transcending organisational boundaries. The greatest challenge in developing the CIM is coordinating with the numerous sub-disciplines of engineering and their stakeholders. In addition to agreeing on a common vocabulary — ambiguity makes communication difficult when technical terms are interpreted differently — standardising objectives across very different specialist areas such as production engineering, materials science and IT is also challenging.
Felix Engel 0000-0002-3060-7052
Dorothea Iglezakis 0000-0002-8524-0569
Angelina Kraft 0000-0002-6554-335X