High Performance Cluster Computing for MapReduce
MapReduce is a technique used to vastly improve distributed processing of data and can massively speed up computation. Hadoop and its MapReduce relies on JVM and Java which is expensive on memory. High Performance Computing based MapReduce framework could be used that can perform more memory-efficiently and faster than the standard MapReduce. This paper explores an entirely C++ based approach to the MapReduce and its feasibility on multiple factors like developer friendliness, deployment interface, efficiency and scalability. This paper also introduces Eager Reduction and Delayed Reduction techniques that can speed up MapReduce.
READ FULL TEXT