𝑘�-Variance: A Clustered Notion of Variance

Solomon, Justin; Greenewald, Kristjan; Nagaraja, Haikady

Author(s)

Solomon, Justin; Greenewald, Kristjan; Nagaraja, Haikady

DownloadPublished version (2.334Mb)

Publisher Policy

Terms of use

Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.

Metadata

Show full item record

Abstract

We introduce 𝑘-variance, a generalization of variance built on the machinery of random bipartite matchings. 𝑘-variance measures the expected cost of matching two sets of 𝑘 samples from a distribution to each other, capturing local rather than global information about a measure as 𝑘 increases; it is easily approximated stochastically using sampling and linear programming. In addition to defining 𝑘-variance and proving its basic properties, we provide in-depth analysis of this quantity in several key cases, including one-dimensional measures, clustered measures, and measures concentrated on low-dimensional subsets of ℝ𝑛. We conclude with experiments and open problems motivated by this new way to summarize distributional shape.

Date issued

2022-09

URI

https://hdl.handle.net/1721.1/165682

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; MIT-IBM Watson AI Lab

Journal

SIAM Journal on Mathematics of Data Science

Publisher

Society for Industrial & Applied Mathematics (SIAM)

Citation

Solomon, Justin, Greenewald, Kristjan and Nagaraja, Haikady. 2022. "𝑘-Variance: A Clustered Notion of Variance." SIAM Journal on Mathematics of Data Science, 4 (3).

Version: Final published version

Collections

MIT Open Access Articles