A new classification framework for high-dimensional data

06/27/2023
by   Xiangbo Mo, et al.
0

Classification is a classic problem but encounters lots of challenges when dealing with a large number of features, which is common in many modern applications, such as identifying tumor sub-types from genomic data or categorizing customer attitudes based on on-line reviews. We propose a new framework that utilizes the ranks of pairwise distances among observations and identifies a common pattern under moderate to high dimensions that has been overlooked before. The proposed method exhibits superior classification power over existing methods under a variety of scenarios. Furthermore, the proposed method can be applied to non-Euclidean data objects, such as network data. We illustrate the method through an analysis of Neuropixels data where neurons are classified based on their firing activities. Additionally, we explore a related approach that is simpler to understand and investigates key quantities that play essential roles in our novel approach.

READ FULL TEXT

page 3

page 8

page 9

research
11/29/2019

Minkowski distances and standardisation for clustering and classification of high dimensional data

There are many distance-based methods for classification and clustering,...
research
04/30/2023

A new clustering framework

Detection of clusters is a crucial task across many disciplines such as ...
research
12/03/2015

A New Statistical Framework for Genetic Pleiotropic Analysis of High Dimensional Phenotype Data

The widely used genetic pleiotropic analysis of multiple phenotypes are ...
research
12/20/2019

CDPA: Common and Distinctive Pattern Analysis between High-dimensional Datasets

A representative model in integrative analysis of two high-dimensional d...
research
10/07/2021

A Fast and Effective Large-Scale Two-Sample Test Based on Kernels

Kernel two-sample tests have been widely used and the development of eff...
research
10/14/2018

Sequential Change-point Detection for High-dimensional and non-Euclidean Data

In many modern applications, high-dimensional/non-Euclidean data sequenc...
research
02/12/2019

High dimensionality: The latest challenge to data analysis

The advent of modern technology, permitting the measurement of thousands...

Please sign up or login with your details

Forgot password? Click here to reset