LEAPS holds regular (virtual) seminar event. This year’s theme is about Large Scale and Decentralized Learning. The topics include, but not limited to, Distributed Learning, Federated Learning, Reinforcement Learning, and Continual Learning.
Jalaj Upadhyay (Rutgers University), November 18th, Thursday, noon-1pm ET
Title: Private federated learning in the age of pandemic: When schools of computer science, statistics, and medicine came together
Abstract: Private federated learning has fundamentally transformed during the past couple of years. The dire need to study the effect and spread of the pandemic raised the question: what data to collect, how to collect, and what information to infer? This tension between data collection and processing in the name of “good” and the associated privacy concerns not only brought together seemingly diverged research communities, but also many large-scale agencies.
In this talk, I will share my collaborative experience with people across various agencies and disciplines to answer one of the pressing problems of 2020: how to do contact tracing privately and how to use this information to perform private federated learning for future insights. I will briefly outline the publicly available information on this, some progress that was made in privacy-preserving federated learning, and challenges that motivated works with improved the state-of-the-art theoretical guarantees but also can be practically deployed on the large scale. I will briefly overview the impact of one of the works in not just industry but also in academic research. During the talk, I will cover the challenges of large-scale private federated learning, current unresolved challenges, some future directions motivated by practical needs, and where we need more synergy between academia and industry.
Some of the technical parts of this talk are based on the materials available on the following web pages:
Disclaimer: The algorithmic and systemic details that will be mentioned in the talk are based on publicly available information and do not cover the state-of-the-art details or the confidential discussions.
Bio: Jalaj Upadhyay is an assistant professor at Rutgers University. Prior to this, he was a senior researcher at Apple, a postdoctoral fellow at Johns Hopkins University and Penn State University. He received his Ph.D. from the University of Waterloo.
Peter Kairouz (Google), November 16th, Tuesday, noon-1pm ET
Title: Privacy and Communication Efficiency in Federated Learning at Scale
Abstract: I will start this talk by overviewing Federated Learning (FL) and its core data minimization principles. I will then describe how privacy can be strengthened using complementary privacy techniques such as differential privacy, secure multi-party computation, and privacy auditing methods. I will spend much of the talk describing how we can carefully combine technologies like differential privacy and secure aggregation to obtain formal distributed privacy guarantees without fully trusting the server in adding noise. I will present a comprehensive end-to-end system, which appropriately discretizes the data and adds discrete Gaussian or Skellam noise before performing secure aggregation. I will conclude by showing experimental results that demonstrate that our solution is able to achieve a comparable accuracy to central differential privacy (which requires trusting the server in adding noise) with just 16 bits of precision per value. If time permits, I will highlight new work on combining linear compression schemes with secure aggregation and differential privacy to reduce the communication overhead.
Bio: Peter Kairouz is a research scientist at Google, where he leads research efforts on federated learning and privacy-preserving technologies. Before joining Google, he was a Postdoctoral Research Fellow at Stanford University. He received his Ph.D. in electrical and computer engineering from the University of Illinois at Urbana-Champaign (UIUC). He is the recipient of the 2012 Roberto Padovani Scholarship from Qualcomm’s Research Center, the 2015 ACM SIGMETRICS Best Paper Award, the 2015 Qualcomm Innovation Fellowship Finalist Award, and the 2016 Harold L. Olesen Award for Excellence in Undergraduate Teaching from UIUC.
Sebastian Stich (EPFL), November 4th, Thursday, noon-1pm ET
Title: Decentralized Deep Learning on Heterogeneous Data
Abstract: We consider the problem of training a machine learning model on a dataset that is decentrally stored on many devices. This is the case, for example, in Federated Learning, where all devices are connected to a central server that orchestrates the training. In a fully decentralised learning environment, the devices may be connected via any arbitrary network, which may change over time. In the first part of the talk, we present a unified convergence analysis covering a variety of decentralized stochastic gradient descent methods. We derive universal convergence rates for smooth (convex and non-convex) problems. The rates interpolate between heterogeneous (non-identically distributed data) and homogeneous (iid) data and show that differences between workers’ local data distributions significantly affect the convergence of these methods. In the second part of the talk, we will present some methods that are not affected by data dissimilarity. In particular, we will focus on a novel mechanism for information propagation in decentralized learning. We propose a relay scheme that uses spanning trees to distribute information exactly uniformly across all workers with finite delays that depend on the distance between nodes. We prove that RelaySGD, based on this mechanism, is independent of data heterogeneity and scales to many workers, enabling highly accurate decentralised Deep Learning on heterogeneous data. This talk is based on joint work with:
– A. Koloskova, N. Loizou, S. Boreiri, M. Jaggi and Sebastian U. Stich, A Unified Theory of Decentralized SGD with Changing Topology and Local Updates, https://arxiv.org/abs/2003.10422, ICML 2020.
– T. Vogels, L. He, A. Koloskova, T. Lin, S.P. Karimireddy, S.U. Stich and Martin Jaggi, RelaySum for Decentralized Deep Learning on Heterogeneous Data, https://arxiv.org/abs/2110.04175, NeurIPS 2021.
Bio: Sebastian Stich is a research scientist at the EPFL (starting as a TT faculty at CISPA Helmholtz Center for Information Security on December 1, 2021). His research interests span machine learning, optimization and statistics, with a current focus on efficient parallel algorithms for training ML models over decentralized datasets. Since 2016 he is hosted in the machine learning and optimization lab of Prof. Martin Jaggi. Between 2014 and 2016 he stayed as a postdoctoral researcher at UCLouvain with Prof. Yurii Nesterov, supported by an SNSF mobility grant. He received his PhD in Computer Science from ETH Zurich in 2014 and prior to that his MSc (2010) and BSc (2009) degrees in Mathematic from ETH Zurich. He is co-founder of the workshop series “Advances in ML: Theory meets practice” run at the Applied Machine Learning Days 2018-2020 and co-organizer of the NeurIPS “Optimization for Machine Learning” workshop 2019, 2020, and 2021.