Distributed statistical inference with pyhf enabled through funcX

03/03/2021
by   Matthew Feickert, et al.
0

In High Energy Physics facilities that provide High Performance Computing environments provide an opportunity to efficiently perform the statistical inference required for analysis of data from the Large Hadron Collider, but can pose problems with orchestration and efficient scheduling. The compute architectures at these facilities do not easily support the Python compute model, and the configuration scheduling of batch jobs for physics often requires expertise in multiple job scheduling services. The combination of the pure-Python libraries pyhf and funcX reduces the common problem in HEP analyses of performing statistical inference with binned models, that would traditionally take multiple hours and bespoke scheduling, to an on-demand (fitting) "function as a service" that can scalably execute across workers in just a few minutes, offering reduced time to insight and inference. We demonstrate execution of a scalable workflow using funcX to simultaneously fit 125 signal hypotheses from a published ATLAS search for new physics using pyhf with a wall time of under 3 minutes. We additionally show performance comparisons for other physics analyses with openly published probability models and argue for a blueprint of fitting as a service systems at HPC centers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2019

Scheduling in the Presence of Data Intensive Compute Jobs

We study the performance of non-adaptive scheduling policies in computin...
research
09/20/2021

Job Scheduling in High Performance Computing

The ever-growing processing power of supercomputers in recent decades en...
research
01/19/2023

Parametrization Cookbook: A set of Bijective Parametrizations for using Machine Learning methods in Statistical Inference

We present in this paper a way to transform a constrained statistical in...
research
05/13/2022

Scalable SAT Solving in the Cloud

Previous efforts on making Satisfiability (SAT) solving fit for high per...
research
11/04/2018

Exploring the Relation Between Two Levels of Scheduling Using a Novel Simulation Approach

Modern high performance computing (HPC) systems exhibit a rapid growth i...
research
09/18/2019

Balsam: Automated Scheduling and Execution of Dynamic, Data-Intensive HPC Workflows

We introduce the Balsam service to manage high-throughput task schedulin...
research
09/24/2021

Extreme Scale Survey Simulation with Python Workflows

The Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) wil...

Please sign up or login with your details

Forgot password? Click here to reset