Skyline Queries Over Incomplete Data Streams (Technical Report)
Nowadays, efficient and effective processing over massive stream data has attracted much attention from the database community, which are useful in many real applications such as sensor data monitoring, network intrusion detection, and so on. In practice, due to the malfunction of sensing devices or imperfect data collection techniques, real-world stream data may often contain missing or incomplete data attributes. In this paper, we will formalize and tackle a novel and important problem, named skyline query over incomplete data stream (Sky-iDS), which retrieves skyline objects (in the presence of missing attributes) with high confidences from incomplete data stream. In order to tackle the Sky-iDS problem, we will design efficient approaches to impute missing attributes of objects from incomplete data stream via differential dependency (DD) rules. We will propose effective pruning strategies to reduce the search space of the Sky-iDS problem, devise cost-model-based index structures to facilitate the data imputation and skyline computation at the same time, and integrate our proposed techniques into an efficient Sky-iDS query answering algorithm. Extensive experiments have been conducted to confirm the efficiency and effectiveness of our Sky-iDS processing approach over both real and synthetic data sets.
READ FULL TEXT