DataDesc

Detailed description of the service 

Service’s Capabilities:

In addition to capturing general information relevant to the scientific context, DataDesc aims in particular at the detailed documentation of software interfaces. The programming language agnostic schema offers the possibility to treat all functions of an interface and its input and output parameters individually. The structured capture of information enables their automated processing and increases their findability and comparability. Mapping them as machine-actionable metadata allows both humans and computers to discover and understand the capabilities of a software, interact with it, and integrate it with other programs and data without having to refer to the source code or further documentation.

Added Value for the User:

Software metadata schemas often concentrate on the general provision of information and neglect the description of interfaces, which results in problems for downstream users in the subsequent application and integration of the software. DataDesc is designed to capture precisely this information.

To be able to use different software publication platforms in parallel and utilise their different strengths to increase the impact and transparency of software, metadata often has to be collected redundantly and adapted to heterogeneous formats and processes. Here, DataDesc offers a machine-processable and programming language-independent exchange format and automated publication pipelines that allow metadata to be collected only once, thus reducing the documentation effort.

Service’s Suitability:

DataDesc is ideal for researchers, academics, and students who create and use research software for their computational analyses and want to improve the interoperability, reusability, and findability of their work.

Typical Use Cases:

Common uses include the annotation of the data models implemented in research software interfaces directly within the code. Along with the collection of general metadata about the software, the information is converted to sustainabe and reusable DataDesc documents. Once the information is compiled, it can be automatically uploaded to various software publication platforms for registration, documentation and dissemination.

Strengths of the Service:

  • The DataDesc framework focuses on research software interfaces and their data models

  • A metadata schema maps input and output content, formats, value ranges and structures

  • Tools enable the collection, exchange and publication of machine-actionable metadata

  • DataDesc reduces annotation efforts and promotes software reuse and integration

Weaknesses of the Service:

  • To date, only a parser for Python-based research software and interfaces to five software platforms are available.

  • DataDesc is offered as a GitHub download but not yet as an online service.

Terms of use & restrictions

DataDesc is an open-source software project available to anyone on GitHub. There are no costs associated with using DataDesc and no registration is required.

Contact 

Patrick Kuckertz, p.kuckertz@fz-juelich.de

References

publications that reference (or report on using) the service

Kuckertz, P., Göpfert, J., Karras, O., Neuroth, D., Schönau, J., Pueblas, R., Ferenz, S., Engel, F., Pflugradt, N., Weinand, J. W., Nieße, A., Auer, S. & Stolten, D. (2023). A Metadata-Based Ecosystem to Improve the FAIRness of Research Software. arXiv preprint arXiv:2306.10620.

#WhyNFDI

 

Miscellaneous

 

Tags

NFDI4ING services may be relevant to different users according to varying requirements. To support filtering or sorting, we added a tag system outlining which archetype, phase of the data lifecycle, or degree of maturity a service corresponds to. By clicking on one of the tags below, you can get an overview of all services aligned with each tag.

This service has the following tags:

The tags correspond to:
The Archetypes: Services relevant to Alex – Bespoke Experiments, Betty – Research Software Engineering, Caden – Provenance Tracking, Doris – High Performance Computing, Ellen – Complex Systems, Fiona – Data Re-Use and Enrichment

The data lifecycle: Services related to Informing & Planning, Organising & Processing, Describing & Documenting, Storing & Computing,
Finding & Re-Using, Learning & Teaching

The maturity of the service: Services sorted according to their maturity and status of their integration into the larger NFDI service landscape. For this we use the Integration Readiness Level (IRL), ranging from IRL0 (no specifications, strictly internal use) up to IRL4 (fully integrated in the German research data landscape and the EOSC). Click here for a diagram outlining all Integration Readiness Levels.