Distributed non-negative RESCAL with Automatic Model Selection for Exascale Data

by   Manish Bhattarai, et al.

With the boom in the development of computer hardware and software, social media, IoT platforms, and communications, there has been an exponential growth in the volume of data produced around the world. Among these data, relational datasets are growing in popularity as they provide unique insights regarding the evolution of communities and their interactions. Relational datasets are naturally non-negative, sparse, and extra-large. Relational data usually contain triples, (subject, relation, object), and are represented as graphs/multigraphs, called knowledge graphs, which need to be embedded into a low-dimensional dense vector space. Among various embedding models, RESCAL allows learning of relational data to extract the posterior distributions over the latent variables and to make predictions of missing relations. However, RESCAL is computationally demanding and requires a fast and distributed implementation to analyze extra-large real-world datasets. Here we introduce a distributed non-negative RESCAL algorithm for heterogeneous CPU/GPU architectures with automatic selection of the number of latent communities (model selection), called pyDRESCALk. We demonstrate the correctness of pyDRESCALk with real-world and large synthetic tensors, and the efficacy showing near-linear scaling that concurs with the theoretical complexities. Finally, pyDRESCALk determines the number of latent communities in an 11-terabyte dense and 9-exabyte sparse synthetic tensor.


page 13

page 14

page 18

page 20

page 22

page 23


Distributed Non-Negative Tensor Train Decomposition

The era of exascale computing opens new venues for innovations and disco...

Process Modeling, Hidden Markov Models, and Non-negative Tensor Factorization with Model Selection

Monitoring of industrial processes is a critical capability in industry ...

Automatic Dimension Selection for a Non-negative Factorization Approach to Clustering Multiple Random Graphs

We consider a problem of grouping multiple graphs into several clusters ...

Non-negative matrix and tensor factorisations with a smoothed Wasserstein loss

Non-negative matrix and tensor factorisations are a classical tool in ma...

Distributed Out-of-Memory NMF of Dense and Sparse Data on CPU/GPU Architectures with Automatic Model Selection for Exascale Data

The need for efficient and scalable big-data analytics methods is more e...

A Spike-and-Slab Prior for Dimension Selection in Generalized Linear Network Eigenmodels

Latent space models (LSMs) are frequently used to model network data by ...

Out-of-Core and Distributed Algorithms for Dense Subtensor Mining

How can we detect fraudulent lockstep behavior in large-scale multi-aspe...

Please sign up or login with your details

Forgot password? Click here to reset