Dissertation Defense: Luoluo Liu
Title: Collaborative Regression and Classification via Bootstrapping
Abstract: In modern machine learning problems and applications, the data that we are dealing with have large dimensions as well as amount, making data analysis time-consuming and computationally inefficient. Sparse recovery algorithms are developed to extract the underlining low dimensional structure from the data. Classical signal recovery based on l1 minimization solves the least squares problem with all available measurements via sparsity-promoting regularization. It has shown promising performances in regression and classification. Previous work on Compressed Sensing (CS) theory reveals that when the true solution is sparse and if the number of measurements is large enough, then solutions to l1 converge to the ground truths. In practice, when the number of measurements is low or when the noise level is high or when measurements arrive sequentially in streaming fashion, conventional l1 minimization algorithms tend to struggle in signal recovery.
This research work aims at using multiple local measurements generated from resampling using bootstrap or sub-sampling to efficiently make global predictions to deal with aforementioned challenging scenarios in practice. We develop two main approaches – one extends the conventional bagging scheme in sparse regression from a fixed bootstrapping ratio whereas the other called JOBS applies a support consistency among bootstrapped estimators in a collaborative fashion. We first derive rigorous theoretical guarantees for both proposed approaches and then carefully evaluate them with extensive simulations to quantify their performances. Our algorithms are quite robust compared to the conventional l1 minimization, especially in the scenarios with high measurements noise and low number of measurements. Our theoretical analysis also provides key guidance on how to choose optimal parameters, including bootstrapping ratios and number of collaborative estimates. Finally, we demonstrate that our proposed approaches yield significant performance gains in both sparse regression and classification, which are two crucial problems in the field of signal processing and machine learning.