Performance Optimization of MapReduce-based Apriori Algorithm on Hadoop Cluster

07/16/2018
by   Sudhakar Singh, et al.
0

Many techniques have been proposed to implement the Apriori algorithm on MapReduce framework but only a few have focused on performance improvement. FPC (Fixed Passes Combined-counting) and DPC (Dynamic Passes Combined-counting) algorithms combine multiple passes of Apriori in a single MapReduce phase to reduce the execution time. In this paper, we propose improved MapReduce based Apriori algorithms VFPC (Variable Size based Fixed Passes Combined-counting) and ETDPC (Elapsed Time based Dynamic Passes Combined-counting) over FPC and DPC. Further, we optimize the multi-pass phases of these algorithms by skipping pruning step in some passes, and propose Optimized-VFPC and Optimized-ETDPC algorithms. Quantitative analysis reveals that counting cost of additional un-pruned candidates produced due to skipped-pruning is less significant than reduction in computation cost due to the same. Experimental results show that VFPC and ETDPC are more robust and flexible than FPC and DPC whereas their optimized versions are more efficient in terms of execution time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/29/2018

Counting Independent Sets in Cocomparability Graphs

We show that the number of independent sets in cocomparability graphs ca...
research
04/20/2021

Tuning symplectic integrators is easy and worthwhile

Many applications in computational physics that use numerical integrator...
research
08/28/2023

Efficient Batch Dynamic Graphlet Counting

Graphlet counting is an important problem as it has numerous application...
research
04/20/2023

Counting Computations with Formulae: Logical Characterisations of Counting Complexity Classes

We present quantitative logics with two-step semantics based on the fram...
research
03/02/2023

Successive-Cancellation Flip Decoding of Polar Codes with a Simplified Restart Mechanism

Polar codes are a class of error-correcting codes that provably achieve ...
research
12/24/2019

Parallel optimization of fiber bundle segmentation for massive tractography datasets

We present an optimized algorithm that performs automatic classification...
research
07/04/2019

Fixed-parameter tractability of counting small minimum (S,T)-cuts

The parameterized complexity of counting minimum cuts stands as a natura...

Please sign up or login with your details

Forgot password? Click here to reset