The NFDI4Ing base services.
metadata & terminology services
The definition of metadata and concepts, associated attributes and relations that are readable and understandable, not only to target audiences but also to machines, is key to FAIRness. To address this concern, we will provide services to facilitate the creation, sharing and reuse of subject- and application-specific standardised metadata and their integration into engineering workflows, as well as a Terminology Service to enable researchers and infrastructure providers to access, curate, and update terminologies.
In NFDI4Ing, it is essential that metadata is not only used for documentation and indexing of research data stored in repositories, but also to facilitate tasks such as (automated) retrieval, analysis, or combination of complex research data during active research. As tools for the generation, management, and use of vocabularies and ontologies, we are working on the most suitable tools for reliable management of subject-specific vocabularies.
Key challenges and objectives
The following key challenges have been identified in regard to metadata & terminology services:
1. Support development of metadata with formal semantics that are generic enough to be interoperable but specific enough to represent discipline-specific concepts
2. Help providing (semantic) documentation of all steps of data production and outcome including metadata adjustments, raw data, and intermediate data to describe and predict the behaviour of real and virtual experiments
3. Metadata and ontologies should describe the validation and quality-control processes of research software with a possible focus on simulation software
4. Enable development of suitable vocabularies and schemata to describe the operational function of devices used to collect, analyse, and visualise data generated and/or processed in industrial production workflows
Task S-3-1 Tools for standardising metadata based on application profiles
The variability of methods used in engineering implies a constant need to standardise application-specific metadata. Using existing terminologies as well as those provided by the Terminology Service as basis, we will offer a smart interface that allows to find and select suitable terms and assemble them into application profiles. Term suggestions will take into consideration statistics on term usage within the defined schemata and take into account settings for filtering and preference of underlying vocabularies that can be set based on community recommendations. If no fitting term is found, a custom term may be specified as provisional building block, automatically triggering a term request for the Terminology Service. To ensure that schemata are shared and reused, a connected repository will archive and index the schemata as well as make them available for reuse and adaptation. While the metadata schemata will be defined within the task areas that require them, information experts from metadata services will support the scientists collaboratively. They also assist in developing tools for integration of metadata standards into scientific workflows (e.g. harvesters for extracting metadata from available sources like file-headers, log files, software repositories and tools for quality control of metadata).
Task S-3-2 Terminology Service
The Terminology Service (TS) will enable the development of subject-specific terminologies, simultaneously fueling application profiles with terms and using the application profiles as a basis for refining formal ontologies for the archetypes. The terminology service will provide technical infrastructure for access, curation, and subscription to terminologies, offering a single-point-of-entry to terminologies. A RESTful API will provide access to terminologies in a uniform way regardless of their degree of complexity. The Terminology Service will allow the handling of requests such as custom terms for new terminologies or updates of existing terminologies by stakeholder communities based on a ticket-based help-desk. The service will include transformation tools from textual and tabular documents into semantic formats, a linked data interface and terminology integrity checks and validation. The Terminology Service will enable semantic terminology subscription and notification to recognise new matching terminologies according to the specifications of a user, activate defined processing of the terminology if requested, and inform subscribers by email about new terminologies or recent changes.
Task S-3-3 Metadata Hub
We will provide a repository for publishing the (full) metadata sets describing actual research data according to the application profiles and ontologies. This metadata hub will enable highly specific queries that can be used to access research data stored in repositories that do not support the full scope of the supplied metadata. The use of DOIs enables the linking between all components of the research process incl. experiments, raw data, software, subject-specific metadata sets, and the tracking of usage and citations. This task will also enable applications that require analysis of metadata. Applications include data-level metrics and provenance tracking based on published metadata sets as well as extraction of statistics on term frequencies fuelling the smart interface of the standard generator for application profiles. To demonstrate the common core and compatibilities of the different Metadata Hub implementations, a common frontend and REST interfaces “Turntable” will be implemented. This will demonstrate that not only concepts are shared between different implementations but that they can actually serve common use cases.
In Task S-3-2 we are currently working on a first version of the Terminology Service, based on the Open Source Software Ontology Lookup Service (OLS) from EBI, which will also be the central entry point for the Terminology Service. This first version is expected to be open for public testing in May 2021.