SUMMARIZED: Efficient Framework for Analyzing Multidimensional Process Traces under Edit-distance Constraint

05/02/2019
by   Phuong Nguyen, et al.
0

Domains such as scientific workflows and business processes exhibit data models with complex relationships between objects. This relationship is typically represented as sequences, where each data item is annotated with multi-dimensional attributes. There is a need to analyze this data for operational insights. For example, in business processes, users are interested in clustering process traces into smaller subsets to discover less complex process models. This requires expensive computation of similarity metrics between sequence-based data. Related work on dimension reduction and embedding methods do not take into account the multi-dimensional attributes of data, and do not address the interpretability of data in the embedding space (i.e., by favoring vector-based representation). In this work, we introduce Summarized, a framework for efficient analysis on sequence-based multi-dimensional data using intuitive and user-controlled summarizations. We introduce summarization schemes that provide tunable trade-offs between the quality and efficiency of analysis tasks and derive an error model for summary-based similarity under an edit-distance constraint. Evaluations using real-world datasets show the effectives of our framework.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2021

A Query Language for Summarizing and Analyzing Business Process Data

In modern enterprises, Business Processes (BPs) are realized over a mix ...
research
04/10/2013

Glyph Sorting: Interactive Visualization for Multi-dimensional Data

Glyph-based visualization is an effective tool for depicting multivariat...
research
12/03/2019

Learning Multi-dimensional Indexes

Scanning and filtering over multi-dimensional tables are key operations ...
research
06/01/2023

Efficient and Robust Bayesian Selection of Hyperparameters in Dimension Reduction for Visualization

We introduce an efficient and robust auto-tuning framework for hyperpara...
research
02/25/2019

Utility Mining Across Multi-Dimensional Sequences

Knowledge extraction from database is the fundamental task in database a...
research
09/20/2021

Neural Distance Embeddings for Biological Sequences

The development of data-dependent heuristics and representations for bio...
research
12/03/2022

Castell: Scalable Joint Probability Estimation of Multi-dimensional Data Randomized with Local Differential Privacy

Performing randomized response (RR) over multi-dimensional data is subje...

Please sign up or login with your details

Forgot password? Click here to reset