Enabling Access to HPC Research Data via a Compute Cloud

You want to share High-Performance Computing (HPC) data directly from the Leibniz Supercomputing Centre (LRZ) with a colleague who doesn’t have access to German HPC facilities? You want to analyze and visualize other researchers (HPC) data without downloading or transferring it? Then please try out the new NFDI4Ing-service MARGE – the Multi-Access Research Gateway for HPC Experts.

MARGE – the Multi-Access Research Gateway for HPC Experts.

In research data management for High Performance Computing (HPC), users face a series of complex challenges. Typically, the size of generated data exceeds the storage capacity of standard PCs. HPC data are often used for a single application or publication and then archived in personal accounts at the corresponding computing centre. There is a lack of opportunities to make existing data available to other researchers in compliance with the FAIR principles (Findable, Accessible, Interoperable, Reusable).

Therefore, HPC data are rarely mobile and even if they are published in data repositories, reuse is complicated. Not only does the repository have to process a lengthy download smoothly, the recipient also has to have the appropriate HPC hardware available in order to store and then evaluate the large amounts of data. Therefore, in our view, simply making such data available in classic repositories is of little use. In order to overcome these hurdles and make HPC research data available in an interoperable and reusable manner, the DORIS archetype strives for a paradigm shift: Potential users should have the opportunity to access and view HPC data directly on site at the computing center to analyze and reuse it. This is exactly where the MARGE service shows its advantages:

  • External users can access both “hot” and “cold” research data without having personal access to HPC resources.
  • External users can interactively overview and visualize the data and then make a selective selection, for example to only download relevant excerpts from the actually immobile data set.
  • External users can edit the data independently using methods they have developed, without having to download the data or disclose their postprocessing routines.

The establishment of an exclusive cloud server system at the LRZ (Leibniz Supercomputing Centre) has resulted in the creation of a platform that effectively operationalizes the accessibility and reusability demanded in the FAIR principles. This infrastructure enables direct access to vast and immobile datasets through the cloud, facilitating their reuse. The cloud system provides several usage options, including:

  • Exclusive Usage Rights: Granting exclusive usage rights within a professionally managed compute cloud environment.
  • Virtual Machine Deployment: Allowing the deployment of virtual machines on the cloud for on-site data evaluation.
  • Fast/Rapid Connection to LRZ Data Storage Systems: Facilitating swift connectivity to existing LRZ data storage systems

The research data to share are located on the LRZ’s Data Science Storage (DSS) and can be visualized by a ParaView-based service on the cloud instance, which enables members of a research group with LRZ access to share their results with external researchers and third-party users. The latter are able to display the results interactively, if possible directly from the browser without having to download the entire data set, so in particular an initial superficial data-evaluation is simplified and thus scientific collaborations promoted.

Call for contributions

We cordially invite all engineers, who produce large data sets on LRZ systems and want to share them with external researchers, to test our cloud system. We also address third-party users who are interested in large data sets created on LRZ systems. Even if you use another (tier 0 / tier 1) HPC center to create your data, we are happy to help you to check transferability or to determine feasibility on alternative systems. Please get in touch with us via provided links or email contacts (see below).

(Preliminary) URL: http://138.246.238.140/
Documentation (LRZ):  https://doku.lrz.de/attended-cloud-housing-10745950.html

Contact
DORIS newsletter:
https://lists.tu-darmstadt.de/mailman/listinfo/nfdi4ing_taskarea_doris

General inquiries: info-doris@nfdi4ing.de

Website: https://nfdi4ing.de/archetypes/doris/

Spokesperson: Prof. Dr.-Ing. Christian Stemmer | christian.stemmer@tum.de

DORIS participants:
TU Munich (TUM)
High-Performance Computing Center Stuttgart (HLRS)
Leibniz Supercomputing Centre (LRZ)
RWTH Aachen University