Desiderata for next generation of ML model serving

10/26/2022
by   Sherif Akoush, et al.
0

Inference is a significant part of ML software infrastructure. Despite the variety of inference frameworks available, the field as a whole can be considered in its early days. This position paper puts forth a range of important qualities that next generation of inference platforms should be aiming for. We present our rationale for the importance of each quality, and discuss ways to achieve it in practice. We propose to focus on data-centricity as the overarching design pattern which enables smarter ML system deployment and operation at scale.

READ FULL TEXT
research
12/17/2017

TensorFlow-Serving: Flexible, High-Performance ML Serving

We describe TensorFlow-Serving, a system to serve machine learning model...
research
11/06/2019

MLPerf Inference Benchmark

Machine-learning (ML) hardware and software system demand is burgeoning....
research
06/03/2019

Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference

Machine learning (ML) has become increasingly important and performance-...
research
10/29/2022

MinUn: Accurate ML Inference on Microcontrollers

Running machine learning inference on tiny devices, known as TinyML, is ...
research
06/21/2023

Subgraph Stationary Hardware-Software Inference Co-Design

A growing number of applications depend on Machine Learning (ML) functio...
research
06/06/2021

ModelCI-e: Enabling Continual Learning in Deep Learning Serving Systems

MLOps is about taking experimental ML models to production, i.e., servin...
research
02/17/2020

Simulating Performance of ML Systems with Offline Profiling

We advocate that simulation based on offline profiling is a promising ap...

Please sign up or login with your details

Forgot password? Click here to reset