On the minimax risk of dictionary learning

Alexander Jung, Yonina C. Eldar, Norbert Görtz

Tutkimustuotos: LehtiartikkeliArticleScientificvertaisarvioitu

16 Sitaatiot (Scopus)

Abstrakti

We consider the problem of learning a dictionary matrix from a number of observed signals, which are assumed to be generated via a linear model with a common underlying dictionary. In particular, we derive lower bounds on the minimum achievable worst case mean squared error (MSE), regardless of computational complexity of the dictionary learning (DL) schemes. By casting DL as a classical (or frequentist) estimation problem, the lower bounds on the worst case MSE are derived following an established information-theoretic approach to minimax estimation. The main contribution of this paper is the adaption of these information-theoretic tools to the DL problem in order to derive lower bounds on the worst case MSE of any DL algorithm. We derive three different lower bounds applying to different generative models for the observed signals. The first bound only requires the existence of a covariance matrix of the (unknown) underlying coefficient vector. By specializing this bound to the case of sparse coefficient distributions and assuming the true dictionary satisfies the restricted isometry property, we obtain a lower bound on the worst case MSE of DL methods in terms of the signal-to-noise ratio (SNR). The third bound applies to a more restrictive subclass of coefficient distributions by requiring the non-zero coefficients to be Gaussian. Although the applicability of this bound is the most limited, it is the tightest of the three bounds in the low SNR regime. A particular use of our lower bounds is the derivation of necessary conditions on the required number of observations (sample size), such that DL is feasible, i.e., accurate DL schemes might exist. By comparing these necessary conditions with sufficient conditions on the sample size such that a particular DL technique is successful, we are able to characterize the regimes, where those algorithms are optimal in terms of required sample size.

AlkuperäiskieliEnglanti
Artikkeli7378975
Sivut1501-1515
Sivumäärä15
JulkaisuIEEE Transactions on Information Theory
Vuosikerta62
Numero3
DOI - pysyväislinkit
TilaJulkaistu - 1 maaliskuuta 2016
OKM-julkaisutyyppiA1 Julkaistu artikkeli, soviteltu

Siteeraa tätä