RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework

12/13/2019
by   Pankaj Singh, et al.
0

Initially, a number of frequent itemset mining (FIM) algorithms have been designed on the Hadoop MapReduce, a distributed big data processing framework. But, due to heavy disk I/O, MapReduce is found to be inefficient for such highly iterative algorithms. Therefore, Spark, a more efficient distributed data processing framework, has been developed with in-memory computation and resilient distributed dataset (RDD) features to support the iterative algorithms. On the Spark RDD framework, Apriori and FP-Growth based FIM algorithms have been designed, but Eclat-based algorithm has not been explored yet. In this paper, RDD-Eclat, a parallel Eclat algorithm on the Spark RDD framework is proposed with its five variants. The proposed algorithms are evaluated on the various benchmark datasets, which shows that RDD-Eclat outperforms the Spark-based Apriori by many times. Also, the experimental results show the scalability of the proposed algorithms on increasing the number of cores and size of the dataset.

READ FULL TEXT
research
10/22/2021

RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework (Extended Version)

Frequent itemset mining (FIM) is a highly computational and data intensi...
research
08/04/2019

A Data Structure Perspective to the RDD-based Apriori Algorithm on Spark

During the recent years, a number of efficient and scalable frequent ite...
research
01/28/2023

Interactive Log Parsing via Light-weight User Feedbacks

Template mining is one of the foundational tasks to support log analysis...
research
04/27/2018

Intermediate Data Caching Optimization for Multi-Stage and Parallel Big Data Frameworks

In the era of big data and cloud computing, large amounts of data are ge...
research
02/15/2019

Reactive Liquid: Optimized Liquid Architecture for Elastic and Resilient Distributed Data Processing

Today's most prominent IT companies are built on the extraction of insig...
research
02/21/2019

Performance study of distributed Apriori-like frequent itemsets mining

In this article, we focus on distributed Apriori-based frequent itemsets...
research
01/30/2017

Comparing Dataset Characteristics that Favor the Apriori, Eclat or FP-Growth Frequent Itemset Mining Algorithms

Frequent itemset mining is a popular data mining technique. Apriori, Ecl...

Please sign up or login with your details

Forgot password? Click here to reset