SciMesh – a novel method to share, reuse, and archive research results

Virtually all scientific output is communicated through text publications, only suitable for humans and precluding searches beyond simple text matching. By contrast, SciMesh disseminates results as machine-actionable, comprehensive knowledge graphs.

Sharing scientific data is a challenge. Currently, the best way often is to zip the necessary files and send them to a colleague with a long explanation about what they mean. Similar problems arise in managing data: Most often, when a PhD student leaves, they effectively take their data with them – simply because no one has the skills or knowledge to re-use it.

These issues strongly depend on the scientific discipline. Research areas that are used to working together in large collaborations, e.g. high-energy physics, astronomy, or earth sciences, have established good interoperability standards which enable them to re-use old data and/or data by other scientists rather seamlessly. In many other fields though, a data publication more often than not is just a dump of data files, usually with little or no documentation, which is a major barrier to reuse.

Therefore, we address two things in NFDI4Ing: (a) the data should be as machine-actionable as possible, and (b) as little need as possible should remain for explaining the data in human text.

SciMesh

In order to get closer to this goal, we are working on SciMesh, an early version of which we presented last year. SciMesh is a schema for knowledge graphs that can map workflows in the empirical sciences. At its core, it segments scientific work into processes which are sequenced in a cause–effect chain.

SciMesh has grown rapidly since our last update one year ago. We can now map sub- and super processes as well as sample splits. Moreover, there exist recommendations for electronic laboratory notebooks (ELNs) on how to visualise SciMesh graphs in order to display workflows across all ELNs consistently (not equally).

Our short-term milestone is to use SciMesh as a lingua franca of electronic lab notebooks (ELNs). By adding export and import facilities to a certain ELN software, it can share its data not only with other instances but also with instances of other software products, as long as SciMesh is implemented there, too.
The SciMesh prototype in the ELN framework JuliaBase is by and large feature-complete, and we shifted our focus to another ELN package called Kadi4Mat.

SM4ROC

Another project we are working on in the NFDI4Ing task area Caden, is SM4ROC (pronounced “smarok”). SM4ROC is a completely new development. It combines RO-Crates with SciMesh graphs. An RO-Crate contains raw (bulk) data files of certain research (e.g. a sample) and a simple knowledge graph to tag that data. In zipped form, it can be shared with others. We created a recommendation for how to use SciMesh graphs in RO-Crates to have much richer metadata while adhering to the RO-Crate standard.

This is orthogonal to the above approach of real-time enrichment of ELN data. RO-Crates do not allow real-time enrichment, but contain bulk data, too. This opens new possibilities of data sharing. Additionally, they may come in handy as on option if you need to migrate to another ELN software. Currently, even if your ELN offers an exit strategy, it is usually hardly feasible. SM4ROC may be an answer to that.

The prototypical implementation of SM4ROC in JuliaBase is almost finished. We are adding an overview XHTML page to the RO-Crate and investigate the possibility to even make it a valid EPUB file for easy inspection with any ebook reader.

Call for participation

Please get in touch with us if you are interested in developing tools and specs for sharing scientific output. Rest assured that we will be grateful for any input. Nothing is carved in stone yet! Just open an issue at GitHub or send an e-mail to any of us.

Corresponding: Torsten Bronger, FZ Jülich ZB

Michael Flemming, FZ Jülich ZB
Hartmut Schlenz, FZ Jülich IEK-1
Michael Selzer, Karlsruhe Institute of Technology
Manideep Jayavarapu, Karlsruhe Institute of Technology

Tags

NFDI4ING services may be relevant to different users according to varying requirements. To support filtering or sorting, we added a tag system outlining which archetype, phase of the data lifecycle, or degree of maturity a service corresponds to. By clicking on one of the tags below, you can get an overview of all services aligned with each tag.

This service has the following tags:

The tags correspond to:
The Archetypes: Services relevant to Alex – Bespoke Experiments, Betty – Research Software Engineering, Caden – Provenance Tracking, Doris – High Performance Computing, Ellen – Complex Systems, Fiona – Data Re-Use and Enrichment

The data lifecycle: Services related to Informing & Planning, Organising & Processing, Describing & Documenting, Storing & Computing,
Finding & Re-Using, Learning & Teaching

The maturity of the service: Services sorted according to their maturity and status of their integration into the larger NFDI service landscape. For this we use the Integration Readiness Level (IRL), ranging from IRL0 (no specifications, strictly internal use) up to IRL4 (fully integrated in the German research data landscape and the EOSC). Click here for a diagram outlining all Integration Readiness Levels.