Declarative Recursive Computation on an RDBMS, or, Why You Should Use a Database For Distributed Machine Learning

by   Dimitrije Jankov, et al.

A number of popular systems, most notably Google's TensorFlow, have been implemented from the ground up to support machine learning tasks. We consider how to make a very small set of changes to a modern relational database management system (RDBMS) to make it suitable for distributed learning computations. Changes include adding better support for recursion, and optimization and execution of very large compute plans. We also show that there are key advantages to using an RDBMS as a machine learning platform. In particular, learning based on a database management system allows for trivial scaling to large data sets and especially large models, where different computational units operate on different parts of a model that may be too large to fit into RAM.


Auto-Differentiation of Relational Computations for Very Large Scale Machine Learning

The relational data model was designed to facilitate large-scale data ma...

Dynamic Control Flow in Large-Scale Machine Learning

Many recent machine learning models rely on fine-grained dynamic control...

RDFFrames: Knowledge Graph Access for Machine Learning Tools

Knowledge graphs represented as RDF datasets are becoming increasingly p...

MonetDBLite: An Embedded Analytical Database

While traditional RDBMSes offer a lot of advantages, they require signif...

A Comparison of Decision Forest Inference Platforms from A Database Perspective

Decision forest, including RandomForest, XGBoost, and LightGBM, is one o...

Analyzing Large-Scale, Distributed and Uncertain Data

The exponential growth of data in current times and the demand to gain i...

Scaling TensorFlow to 300 million predictions per second

We present the process of transitioning machine learning models to the T...

Please sign up or login with your details

Forgot password? Click here to reset