Fast Counting in Machine Learning Applications

by   Subhadeep Karan, et al.
University at Buffalo

We propose scalable methods to execute counting queries in machine learning applications. To achieve memory and computational efficiency, we abstract counting queries and their context such that the counts can be aggregated as a stream. We demonstrate performance and scalability of the resulting approach on random queries, and through extensive experimentation using Bayesian networks learning and association rule mining. Our methods significantly outperform commonly used ADtrees and hash tables, and are practical alternatives for processing large-scale data.


Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets

This paper introduces new algorithms and data structures for quick count...

A short note on the counting complexity of conjunctive queries

This note closes a minor gap in the literature on the counting complexit...

F-IVM: Learning over Fast-Evolving Relational Data

F-IVM is a system for real-time analytics such as machine learning appli...

A Traveling Salesman Learns Bayesian Networks

Structure learning of Bayesian networks is an important problem that ari...

Graphlet Decomposition: Framework, Algorithms, and Applications

From social science to biology, numerous applications often rely on grap...

Learning to Sample: Counting with Complex Queries

In this paper we present a suite of methods to efficiently estimate coun...

Data Engineering for HPC with Python

Data engineering is becoming an increasingly important part of scientifi...

Please sign up or login with your details

Forgot password? Click here to reset