SiMa: Effective and Efficient Data Silo Federation Using Graph Neural Networks

06/25/2022
by   Christos Koutras, et al.
0

Virtually every sizable organization nowadays is building a form of a data lake. In theory, every department or team in the organization would enrich their datasets with metadata, and store them in a central data lake. Those datasets can then be combined in different ways and produce added value to the organization. In practice, though, the situation is vastly different: each department has its own privacy policies, data release procedures, and goals. As a result, each department maintains its own data lake, leading to data silos. For such data silos to be of any use, they need to be integrated. This paper presents SiMa, a method for federating data silos that consistently finds more correct relationships than the state-of-the-art matching methods, while minimizing wrong predictions and requiring 20x to 1000x less time to execute. SiMa leverages Graph Neural Networks (GNNs) to learn from the existing column relationships and automated data profiles found in data silos. Our method makes use of the trained GNN to perform link prediction and find new column relationships across data silos. Most importantly, SiMa can be trained incrementally on the column relationships within each silo individually, and does not require consolidating all datasets into one place.

READ FULL TEXT

page 1

page 2

page 5

research
02/24/2021

Benchmarking Graph Neural Networks on Link Prediction

In this paper, we benchmark several existing graph neural network (GNN) ...
research
05/29/2019

Graph Learning Network: A Structure Learning Algorithm

Recently, graph neural networks (GNNs) has proved to be suitable in task...
research
05/07/2019

Are Graph Neural Networks Miscalibrated?

Graph Neural Networks (GNNs) have proven to be successful in many classi...
research
02/25/2021

Efficient and Interpretable Robot Manipulation with Graph Neural Networks

Many manipulation tasks can be naturally cast as a sequence of spatial r...
research
08/02/2023

VertexSerum: Poisoning Graph Neural Networks for Link Inference

Graph neural networks (GNNs) have brought superb performance to various ...
research
10/17/2020

Automated Metadata Harmonization Using Entity Resolution Contextual Embedding

ML Data Curation process typically consist of heterogeneous federate...
research
01/04/2016

Learning relationships between data obtained independently

The aim of this paper is to provide a new method for learning the relati...

Please sign up or login with your details

Forgot password? Click here to reset