The Future of Search and Discovery in Big Data Analytics: Ultrametric Information Spaces

02/15/2012
by   Fionn Murtagh, et al.
0

Consider observation data, comprised of n observation vectors with values on a set of attributes. This gives us n points in attribute space. Having data structured as a tree, implied by having our observations embedded in an ultrametric topology, offers great advantage for proximity searching. If we have preprocessed data through such an embedding, then an observation's nearest neighbor is found in constant computational time, i.e. O(1) time. A further powerful approach is discussed in this work: the inducing of a hierarchy, and hence a tree, in linear computational time, i.e. O(n) time for n observations. It is with such a basis for proximity search and best match that we can address the burgeoning problems of processing very large, and possibly also very high dimensional, data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2019

2-D Embedding of Large and High-dimensional Data with Minimal Memory and Computational Time Requirements

In the advent of big data era, interactive visualization of large data s...
research
04/06/2017

Massive Data Clustering in Moderate Dimensions from the Dual Spaces of Observation and Attribute Data Clouds

Cluster analysis of very high dimensional data can benefit from the prop...
research
10/17/2018

Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity Search in High-dimensional Data

Searching for high-dimensional vector data with high accuracy is an inev...
research
07/16/2022

HQANN: Efficient and Robust Similarity Search for Hybrid Queries with Structured and Unstructured Constraints

The in-memory approximate nearest neighbor search (ANNS) algorithms have...
research
12/03/2020

Approximate kNN Classification for Biomedical Data

We are in the era where the Big Data analytics has changed the way of in...
research
01/22/2015

Sketch and Validate for Big Data Clustering

In response to the need for learning tools tuned to big data analytics, ...
research
04/20/2021

Inference of Common Multidimensional Equally-Distributed Attributes

Given two relations containing multiple measurements - possibly with unc...

Please sign up or login with your details

Forgot password? Click here to reset