Sandslash: A Two-Level Framework for Efficient Graph Pattern Mining

by   Xuhao Chen, et al.

Graph pattern mining (GPM) is used in diverse application areas including social network analysis, bioinformatics, and chemical engineering. Existing GPM frameworks either provide high-level interfaces for productivity at the cost of expressiveness or provide low-level interfaces that can express a wide variety of GPM algorithms at the cost of increased programming complexity. Moreover, existing systems lack the flexibility to explore combinations of optimizations to achieve performance competitive with hand-optimized applications. We present Sandslash, an in-memory Graph Pattern Mining (GPM) framework that uses a novel programming interface to support productive, expressive, and efficient GPM on large graphs. Sandslash provides a high-level API that needs only a specification of the GPM problem, and it implements fast subgraph enumeration, provides efficient data structures, and applies high-level optimizations automatically. To achieve performance competitive with expert-optimized implementations, Sandslash also provides a low-level API that allows users to express algorithm-specific optimizations. This enables Sandslash to support both high-productivity and high-efficiency without losing expressiveness. We evaluate Sandslash on shared-memory machines using five GPM applications and a wide range of large real-world graphs. Experimental results demonstrate that applications written using Sandslash high-level or low-level API outperforms state-of-the-art GPM systems AutoMine, Pangolin, and Peregrine on average by 13.8x, 7.9x, and 5.4x, respectively. We also show that these Sandslash applications outperform expert-optimized GPM implementations by 2.3x on average with less programming effort.


Pangolin: An Efficient and Flexible Graph Mining System on CPU and GPU

There is growing interest in graph mining algorithms such as motif count...

Disruptive Changes in Field Equation Modeling: A Simple Interface for Wafer Scale Engines

We present a high-level and accessible Application Programming Interface...

Mnemonic: A Parallel Subgraph Matching System for Streaming Graphs

Finding patterns in large highly connected datasets is critical for valu...

PIMMiner: A High-performance PIM Architecture-aware Graph Mining Framework

Graph mining applications, such as subgraph pattern matching and mining,...

Stateful Dataflow Multigraphs: A Data-Centric Model for High-Performance Parallel Programs

With the ubiquity of accelerators, such as FPGAs and GPUs, the complexit...

Cataloging the Visible Universe through Bayesian Inference at Petascale

Astronomical catalogs derived from wide-field imaging surveys are an imp...

Curvy: An Interactive Design Tool for Varying Density Support Structures

We introduce Curvy-an interactive design tool to generate varying densit...

Please sign up or login with your details

Forgot password? Click here to reset