The Vera C. Rubin Observatory Data Butler and Pipeline Execution System

06/29/2022
by   Tim Jenness, et al.
0

The Rubin Observatory's Data Butler is designed to allow data file location and file formats to be abstracted away from the people writing the science pipeline algorithms. The Butler works in conjunction with the workflow graph builder to allow pipelines to be constructed from the algorithmic tasks. These pipelines can be executed at scale using object stores and multi-node clusters, or on a laptop using a local file system. The Butler and pipeline system are now in daily use during Rubin construction and early operations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2022

Adding Workflow Management Flexibility to LSST Pipelines Execution

Data processing pipelines need to be executed at scales ranging from sma...
research
02/28/2023

An Alternative to Cells for Selective Execution of Data Science Pipelines

Data Scientists often use notebooks to develop Data Science (DS) pipelin...
research
03/06/2023

Data management and execution systems for the Rubin Observatory Science Pipelines

We present the Rubin Observatory system for data storage/retrieval and p...
research
05/05/2022

Replicating Data Pipelines with GrimoireLab

In this paper, we present our MSR Hackathon 2022 project that replicates...
research
02/24/2021

Dataset Lifecycle Framework and its applications in Bioinformatics

Bioinformatics pipelines depend on shared POSIX filesystems for its inpu...
research
05/19/2018

Partitioning SKA Dataflows for Optimal Graph Execution

Optimizing data-intensive workflow execution is essential to many modern...
research
03/30/2021

Structured Inverted-File k-Means Clustering for High-Dimensional Sparse Data

This paper presents an architecture-friendly k-means clustering algorith...

Please sign up or login with your details

Forgot password? Click here to reset