Workflows in AiiDA: Engineering a high-throughput, event-based engine for robust and modular computational workflows

by   Martin Uhrin, et al.

Over the last two decades, the field of computational science has seen a dramatic shift towards incorporating high-throughput computation and big-data analysis as fundamental pillars of the scientific discovery process. This has necessitated the development of tools and techniques to deal with the generation, storage and processing of large amounts of data. In this work we present an in-depth look at the workflow engine powering AiiDA, a widely adopted, highly flexible and database-backed informatics infrastructure with an emphasis on data reproducibility. We detail many of the design choices that were made which were informed by several important goals: the ability to scale from running on individual laptops up to high-performance supercomputers, managing jobs with runtimes spanning from fractions of a second to weeks and scaling up to thousands of jobs concurrently, and all this while maximising robustness. In short, AiiDA aims to be a Swiss army knife for high-throughput computational science. As well as the architecture, we outline important API design choices made to give workflow writers a great deal of liberty whilst guiding them towards writing robust and modular workflows, ultimately enabling them to encode their scientific knowledge to the benefit of the wider scientific community.


page 1

page 2

page 3

page 4


kiwiPy: Robust, high-volume, messaging for big-data and computational science workflows

In this work we present kiwiPy, a Python library designed to support rob...

ensemblQueryR: fast, flexible and high-throughput querying of Ensembl LD API endpoints in R

We present ensemblQueryR, a package providing an R interface to the Ense...

AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance

The ever-growing availability of computing power and the sustained devel...

Supporting High-Performance and High-Throughput Computing for Experimental Science

The advent of experimental science facilities, instruments and observato...

RAPTOR: Ravenous Throughput Computing

We describe the design, implementation and performance of the RADICAL-Pi...

Making Memristive Processing-in-Memory Reliable

Processing-in-memory (PIM) solutions vastly accelerate systems by reduci...

Breiman's two cultures: You don't have to choose sides

Breiman's classic paper casts data analysis as a choice between two cult...

Please sign up or login with your details

Forgot password? Click here to reset