AMS Weekly Seminar | Jason Klusowski
Location: Krieger 205
When: October 30th at 1:30 p.m.
Title: The Value of Side Information in Unlabeled Data
Abstract: Practitioners often work in settings with limited labeled data and abundant unlabeled data. During training, they may even have access to extra side information (some labeled, some not) that won’t be available once the model is deployed. When can this side information actually improve performance? I’ll present a simple framework in which a rich-view model that sees the extra features generates pseudo-labels on the large unlabeled data, and a deployment model that only sees the standard features is trained on both real and pseudo-labels. The two are trained iteratively: each deployment model update calibrates the next round of pseudo-labels, and those refined pseudo-labels in turn guide the deployment model. Our theory shows that side information helps precisely when the rich-view and deployment models make different kinds of errors. We formalize this with a decorrelation score that quantifies how independent those errors are—the more independent, the greater the performance gains.
Zoom link: https://wse.zoom.us/j/93600407710?pwd=JBL8VsObRxX6MkhdjAUxCadqJDoZrZ.1