Title: A spectral least-squares-type method for heavy-tailed corrupted regression with unknown covariance & heterogeneous noise
Abstract: We revisit the problem of least-squares linear regression assuming a heavy-tailed distribution assuming available a sample having at a fraction $\epsilon$ of arbitrary outliers. In this context, we (i) allow the noise to depend on the features, (ii) do not assume knowledge of the feature vector covariance matrix and (iii) do not assume knowledge of the noise level. We propose a computationally tractable estimator constructed by means of a two-stage multiplicative weight update algorithm and the power method. We show it attains the near-optimal statistical rate, near-optimal sample complexity and breakdown point (up to absolute constants) in the L2-norm. Here, optimality is considered with respect all relevant parameters, including, sample size, dimension, contamination fraction $\epsilon$ and, also, confidence level. Our estimator is of “least-squares-type” meaning it is not constructed by direct gradient estimation; as a result our bounds are independent of the L2-norm of the ground-truth. We discuss the novelties of our results. This is a joint work with R.I. Oliveira (IMPA) and Z.R. Rico (Columbia).
Join via Zoom: