Design based incomplete U-statistics

08/10/2020
by   Xiangshun Kong, et al.
0

U-statistics are widely used in fields such as economics, machine learning, and statistics. However, while they enjoy desirable statistical properties, they have an obvious drawback in that the computation becomes impractical as the data size n increases. Specifically, the number of combinations, say m, that a U-statistic of order d has to evaluate is O(n^d). Many efforts have been made to approximate the original U-statistic using a small subset of combinations since Blom (1976), who referred to such an approximation as an incomplete U-statistic. To the best of our knowledge, all existing methods require m to grow at least faster than n, albeit more slowly than n^d, in order for the corresponding incomplete U-statistic to be asymptotically efficient in terms of the mean squared error. In this paper, we introduce a new type of incomplete U-statistic that can be asymptotically efficient, even when m grows more slowly than n. In some cases, m is only required to grow faster than √(n). Our theoretical and empirical results both show significant improvements in the statistical efficiency of the new incomplete U-statistic.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/03/2017

Randomized incomplete U-statistics in high dimensions

This paper studies inference for the mean vector of a high-dimensional U...
research
04/12/2020

A Machine Learning Approach for Flagging Incomplete Bid-rigging Cartels

We propose a new method for flagging bid rigging, which is particularly ...
research
07/07/2022

Exponential finite sample bounds for incomplete U-statistics

Incomplete U-statistics have been proposed to accelerate computation. Th...
research
12/29/2021

On the consistency of incomplete U-statistics under infinite second-order moments

We derive a consistency result, in the L_1-sense, for incomplete U-stati...
research
12/17/2022

Asymptotically Optimal Knockoff Statistics via the Masked Likelihood Ratio

This paper introduces a class of asymptotically most powerful knockoff s...
research
06/05/2019

KAS-term: Extracting Slovene Terms from Doctoral Theses via Supervised Machine Learning

This paper presents a dataset and supervised learning experiments for te...
research
06/17/2021

Optimal Relevant Subset Designs in Nonlinear Models

Fisher (1934) argued that certain ancillary statistics form a relevant s...

Please sign up or login with your details

Forgot password? Click here to reset