Title – Advancing scalable, provable optimization methods in semidefinite & polynomial programs
Optimization is a broad area with ramifications in many disciplines, including machine learning, control theory, signal processing, robotics, computer vision, power systems, and quantum information. I will talk about some novel algorithmic and theoretical results in two broad classes of optimization problems. The first class of problems are semidefinite programs (SDP). I will present the first polynomial time guarantees for the Burer-Monteiro method, which is widely used for solving large scale SDPs. I will also discuss some general guarantees on the quality of SDP solutions for parameter estimation problems. The second class of problems I will consider are polynomial systems. I will introduce a novel technique for solving polynomial systems that, by taking advantage of graphical structure, is able to outperform existing techniques by orders of magnitude.
Title – The Landscape of the Proximal Point Method for Nonconvex-Nonconcave Minimax Optimization
Minimax optimization has become a central tool for modern machine learning with applications in generative adversarial networks, robust training, reinforcement learning, etc. These applications are often nonconvex-nonconcave, but the existing theory is unable to identify and deal with the fundamental difficulties this poses. In this talk, we will overcome these limitations, describing the convergence landscape of the classic proximal point method on nonconvex-nonconcave minimax problems. Our key theoretical insight lies in identifying a modified objective, generalizing the Moreau envelope, that smoothes the original objective and convexifies and concavifies it based on the interaction between the minimizing and maximizing variables. When interaction is sufficiently strong, we derive global linear convergence guarantees. When interaction is weak, we derive local linear convergence guarantees under proper initialization. Between these two settings, we show undesirable behaviors like divergence and cycling can occur.
Bio: Benjamin Grimmer is a PhD student in Operations Research at Cornell University. He received his BS and MS degrees in Computer Science from Illinois Institute of Technology. His research focuses on theoretical foundations of optimization.
Please email Meg Tully – [email protected] for more information
Title – Decentralized stochastic gradient descent and beyond
Stochastic gradient descent (SGD) methods have recently found wide applications in large-scale data analysis, especially in machine learning. These methods are very attractive to process online streaming data as they scan through the dataset only once but still generate solutions with acceptable accuracy. However, it is known that classical SGD methods are ineffective in processing streaming data distributed over multi-agent network systems (e.g., sensor and social networks), mainly due to the high communication costs incurred by these methods. In this talk, we present a new class of SGD methods, referred to as stochastic decentralized communication sliding methods, which can significantly reduce the aforementioned communication costs for decentralized stochastic optimization and machine learning. We show that these methods can skip inter-node communications while performing SGD iterations. As a result, they require a substantially smaller number of communication rounds than existing decentralized SGD, while the total number of required stochastic subgradient computations are comparable to those optimal bounds achieved by classical centralized SGD type methods. We also develop new variants of these methods that can achieve graph topology invariant gradient/sampling complexity when the problem is smooth and samples can be stored locally.
BIO: Guanghui (George) Lan is an associate professor in the H. Milton Stewart School of Industrial and Systems Engineering at Georgia Institute of Technology since January 2016. Dr. Lan was on the faculty of the Department of Industrial and Systems Engineering at the University of Florida from 2009 to 2015, after earning his Ph.D. degree from Georgia Institute of Technology in August 2009. His main research interests lie in optimization and machine learning. The academic honors he received include the Mathematical Optimization Society Tucker Prize Finalist (2012), INFORMS Junior Faculty Interest Group Paper Competition First Place (2012) and the National Science Foundation CAREER Award (2013). Dr. Lan serves as an associate editor for Mathematical Programming, SIAM Journal on Optimization and Computational Optimization and Applications. He is also an associate director of the Center for Machine Learning at Georgia Tech.
For Zoom information email Meg Tully – [email protected]
Title – On Complexity of Constrained Nonconvex Optimization
Deriving complexity guarantees for nonconvex optimization problems are driven by long standing theoretical interests and by their relevance to machine learning and data science. This talk discusses complexity of algorithms for two important types of constrained nonconvex optimization problems: bound-constrained and nonlinear equality constrained optimization. Applications include nonnegative matrix factorization (NMF) and dictionary learning.
For nonconvex optimization with bound constraints, we observe from the past work that pursuit of the state-of-art complexity guarantees can compromise the practicality of an algorithm. Therefore, we propose two practical projected Newton types of methods with complexity guarantees matching the best known. The first method is a scaled variant of Bertsekas’ two-metric projection method, with the best complexity guarantee to find an approximate first-order point. The second is a projected Newton-Conjugate Gradient method, equipped with a competitive complexity guarantee to locate an approximate second-order point with high probability. Preliminary numerical experiments on NMF indicate practicality of the latter algorithm.
For nonconvex optimization with nonlinear equality constraints, we analyze complexity of the proximal augmented Lagrangian (AL) framework, in which a Newton-Conjugate-Gradient scheme is used to find approximate solutions of the subproblems. This scheme has three levels of iterations, and we obtain bounds on the number of iterations at each level.
These are joint works with Stephen J. Wright.
For zoom information email Meg Tully – [email protected]
Title – Optimal Transport for Inverse Problems and the Implicit Regularization
Optimal transport has been one interesting topic of mathematical analysis since Monge (1781). The problem’s close connections with differential geometry and kinetic descriptions were discovered within the past century, and the seminal work of Kantorovich (1942) showed its power to solve real-world problems. Recently, we proposed the quadratic Wasserstein distance from optimal transport theory for inverse problems, tackling the classical least-squares method’s longstanding difficulties such as nonconvexity and noise sensitivity. The work was soon adopted in the oil industry. As we advance, we discover that the advantage of changing the data misfit is more general in a broader class of data-fitting problems by examining the preconditioning and “implicit” regularization effects of different mathematical metrics as the objective function in optimization, as the likelihood function in Bayesian inference, and as the measure of residual in numerical solution to PDEs.
Title – Data-driven optimization: test score algorithms and distributionally robust approach
The ever-increasing availability of data has motivated novel optimization models that can incorporate uncertainty in problem parameters. Data-driven optimization frameworks integrate sophisticated statistical estimation parts into optimization frameworks. This often leads to computationally challenging formulations that mandate new technologies to address scalability issues. For combinatorial optimization, algorithms need to be adapted based on parameter estimation schemes, and at the same time, need to provide near-optimal performance guarantees even under uncertain parameters. For mathematical programming models, efficient solution methods are required to deal with added complexity from applying data-driven frameworks. In this talk, we discuss test score algorithms for stochastic utility maximization and talk about distributionally robust chance-constrained programming.
Test score algorithms are based on carefully designed score metrics to evaluate individual items, called test scores, defined as a statistic of observed individual item performance data. Algorithms based on individual item scores are practical when evaluating different combinations of items is difficult. We show that a natural greedy algorithm that selects items solely based on their test scores outputs solutions within a constant factor of the optimum for a broad class of utility functions. Our algorithms and approximation guarantees assume that test scores are noisy estimates of certain expected values with respect to marginal distributions of individual item values, thus making our algorithms practical.For the second part of the talk, we consider distributionally robust optimization (DRO) frameworks, which allow interpolating between traditional robust optimization and stochastic optimization, thereby providing a systematic way of hedging against the ambiguity in underlying probability distributions. In particular, we apply the DRO framework defined with the Wasserstein distance to chance-constrained programming (CCP), an optimization paradigm that involves constraints that have to be satisfied with high probability. We develop formulations by revealing hidden connections between the Wasserstein DRO framework and its nominal counterpart (the sample average approximation), and propose integer programming based solution methods. Our formulations significantly scale up the problem sizes that can be handled by reducing the solution times from hours to seconds,compared to the existing formulations.
This talk is based on joint works with Nam Ho-Nguyen, Fatma Kılınç-Karzan, Simge Küçükyavuz, MilanVojnovic, and Se-young Yun.
For zoom information email Meg Tully – [email protected]
Title – Novel Optimization for Data-Driven Decision Making
We present several exciting works on data-driven decision making. First, during the crisis of COVID-19, we have seen the importance of clinical trial designs. Improving clinical trial designs is important for the wellness of all human beings. We first present a novel optimization framework for adaptive trial design in the context of personalized medicine. Adaptive enrichment designs involve preplanned rules for modifying enrollment criteria based on accruing data in a randomized trial. We focus on designs where the overall population is partitioned into two predefined subpopulations, e.g., based on a biomarker or risk score measured at baseline for personalized medicine. The goal is to learn which populations benefit from an experimental treatment. Two critical components of adaptive enrichment designs are the decision rule for modifying enrollment and multiple testing procedures. We provide a general framework for simultaneously optimizing these components for two-stage, adaptive enrichment designs through Bayesian optimization. We minimize the expected sample size under constraints on power and the familywise Type I error rate. It is computationally infeasible to directly solve this optimization problem due to its nonconvexity and infinite dimensionality. The key to our approach is a novel, discrete representation of this optimization problem as a sparse linear program, which is large-scale but computationally feasible to solve using modern optimization techniques. Applications of our approach produce new, approximately optimal designs. We then present some other optimal data-driven decision making works on high-dimensional linear contextual bandit and reinforcement learning problems.
Title – Polar deconvolution of mixed signals
The signal demixing problem seeks to separate multiple signals from their superposition. I will describe a geometric view of the superposition process, and how the duality of convex cones allows us to develop an efficient algorithm for recovering the components with sublinear iteration complexity and linear storage. Under a random measurement model, this process stably recovers low-complexity and incoherent signals with high probability and with optimal sample complexity. This is joint work with my students and postdocs Zhenan Fan, Halyun Jeong, and Babhru Joshi.
TITLE: Short simplex paths in lattice polytopes
ABSTRACT: We design a simplex algorithm for linear programs on lattice polytopes that traces “short” simplex paths from any given vertex to an optimal one. We consider a lattice polytope P contained in [0, k]^n and defined via ‘m’ linear inequalities. Our first contribution is a simplex algorithm that reaches an optimal vertex by tracing a path along the edges of P of length in O(n^4 k log(n k)). The length of this path is independent of ‘m’ and is only polynomially far from the worst-case diameter, which roughly grows as nk.
Motivated by the fact that most known lattice polytopes are defined via 0,+1,-1 constraint matrices, our second contribution is a more sophisticated simplex algorithm which exploits the largest absolute value of the entries in the constraint matrix, denoted by ‘a’. We show that the length of the simplex path generated by this algorithm is in O(n^2 k log(n k a)). In particular, if the parameter ‘a’ is bounded by a polynomial in n, k, then the length of the simplex path is in O(n^2 k log(n k)). This is a joint work with Alberto Del Pia.
The cloud recording is now available.
Topic: AMS Department Seminar (Spring 2021)
Date: Feb 4, 2021 01:19 PM Eastern Time (US and Canada)
Recording for viewers:
https://wse.zoom.us/rec/share/mr4m196sKhWTcUV4TYKMWcT1MUxgUI7KdFoUhRxxrPBqew_OVKX0X7kG_Lee4jKN.kCmCor9FOsMzTuL5 Passcode: qB4+9D6q
Title: Quantile-based Iterative Methods for Corrupted Systems of Linear Equations
Abstract: One of the most ubiquitous problems arising across the sciences is that of solving large-scale systems of linear equations Ax = b. When it is infeasible to solve the system directly by inversion, scalable and efficient projection-based iterative methods can be used instead, such as, Randomized Kaczmarz (RK) algorithm, or SGD (optimizing ||Ax – b|| in some norm).
The main goal of my talk is to present versions of these two algorithms aimed at linear systems with adversarially corrupted vector b, QuantileRK and QuantileSGD. While the classical approach for noisy (inconsistent) systems is to show that the iterations approach the least squares solution until a certain convergence horizon that depends on the noise size, in order to handle large, sparse, potentially adversarial corruptions, one needs to modify the algorithm to avoid corruptions rather than try to tolerate them — and quantiles of the residual provide a natural way to do so. Our methods work on up to 50% of incoherent corruptions, and up to 20% of adversarial corruptions (that consistently create an “alternative” solution of the system). Theoretically, under some standard assumptions on the measurement model, despite corruptions of any size both methods converge to the true solution with exactly the same rate as RK on an uncorrupted system up to an absolute constant. Our theoretical analysis is based on probabilistic concentration of measure results, and as an auxiliary random matrix theory result, we prove a non-trivial uniform bound for the smallest singular values of all submatrices of a given matrix. Based on the joint work with Jamie Haddock, Deanna Needell, and Will Swartworth.
The cloud recording is now available.
Topic: AMS Department Seminar (Spring 2021)
Date: Feb 11, 2021 12:49 PM Eastern Time (US and Canada)
Share recording with viewers:
https://wse.zoom.us/rec/share/Dn4mMX4n4CMPKWUQfXql6Trt4yurYBxPe37OkFowp0Fcxp3MfNEsnczBxr68roig.90xVqiC1mKI-98wl Passcode: Jx*#6Lko