A Big Data Lake for Multilevel Streaming Analytics

09/25/2020
by   Ruoran Liu, et al.
0

Large organizations are seeking to create new architectures and scalable platforms to effectively handle data management challenges due to the explosive nature of data rarely seen in the past. These data management challenges are largely posed by the availability of streaming data at high velocity from various sources in multiple formats. The changes in data paradigm have led to the emergence of new data analytics and management architecture. This paper focuses on storing high volume, velocity and variety data in the raw formats in a data storage architecture called a data lake. First, we present our study on the limitations of traditional data warehouses in handling recent changes in data paradigms. We discuss and compare different open source and commercial platforms that can be used to develop a data lake. We then describe our end-to-end data lake design and implementation approach using the Hadoop Distributed File System (HDFS) on the Hadoop Data Platform (HDP). Finally, we present a real-world data lake development use case for data stream ingestion, staging, and multilevel streaming analytics which combines structured and unstructured data. This study can serve as a guide for individuals or organizations planning to implement a data lake solution for their use cases.

READ FULL TEXT
research
07/15/2019

A Scalable Framework for Multilevel Streaming Data Analytics using Deep Learning

The rapid growth of data in velocity, volume, value, variety, and veraci...
research
12/11/2018

A Scalable and Robust Framework for Data Stream Ingestion

An essential part of building a data-driven organization is the ability ...
research
08/07/2017

Real Time Analytics: Algorithms and Systems

Velocity is one of the 4 Vs commonly used to characterize Big Data. In t...
research
06/18/2018

AlertMix: A Big Data platform for multi-source streaming data

The demand for stream processing is increasing at an unprecedented rate....
research
08/14/2018

A Scalable Data Science Platform for Healthcare and Precision Medicine Research

Objective: To (1) demonstrate the implementation of a data science platf...
research
03/23/2018

GreyCat: Efficient What-If Analytics for Data in Motion at Scale

Over the last few years, data analytics shifted from a descriptive era, ...
research
12/02/2020

Retracing the Flow of the Stream: Investigating Kodi Streaming Services

Kodi is of one of the world's largest open-source streaming platforms fo...

Please sign up or login with your details

Forgot password? Click here to reset