## Abstract

We study the problem of detecting outlier pairs of strongly correlated variables among a collection of n variables with otherwise weak pairwise correlations. After normalization, this task amounts to the geometric task where we are given as input a set of n vectors with unit Euclidean norm and dimension d, and for some constants 0 < τ < ρ < 1, we are asked to find all the outlier pairs of vectors whose inner product is at least ρ in absolute value, subject to the promise that all but at most q pairs of vectors have inner product at most τ in absolute value. Improving on an algorithm of Valiant [FOCS 2012; J. ACM 2015], we present a randomized algorithm that for Boolean inputs ({−1, 1}-valued data normalized to unit Euclidean length) runs in time (Equation presented), where 0 < γ < 1 is a constant tradeoff parameter and M(μ, ν) is the exponent to multiply an [n^{μ]} × [n^{ν} ] matrix with an [n^{ν]} × [n^{μ]} matrix and Δ = 1/(1 − log_{τ} ρ). As corollaries we obtain randomized algorithms that run in time (Equation presented) and in time (Equation presented), where 2 ≤ ω < 2.38 is the exponent for square matrix multiplication and 0.3 < α ≤ 1 is the exponent for rectangular matrix multiplication. The notation Õ (·) hides polylogarithmic factors in n and d whose degree may depend on ρ and τ. We present further corollaries for the light bulb problem and for learning sparse Boolean functions.

Original language | English |
---|---|

Article number | 31 |

Pages (from-to) | 1-26 |

Journal | ACM Transactions on Algorithms |

Volume | 14 |

Issue number | 3 |

DOIs | |

Publication status | Published - 1 Jul 2018 |

MoE publication type | A1 Journal article-refereed |

## Keywords

- Correlation
- Fast matrix multiplication
- Light bulb problem
- Rectangular matrix multiplication
- Similarity search