On Penalty Methods for Nonconvex Bilevel Optimization and First-Order Stochastic Approximation

by   Jeongyeol Kwon, et al.

In this work, we study first-order algorithms for solving Bilevel Optimization (BO) where the objective functions are smooth but possibly nonconvex in both levels and the variables are restricted to closed convex sets. As a first step, we study the landscape of BO through the lens of penalty methods, in which the upper- and lower-level objectives are combined in a weighted sum with penalty parameter σ > 0. In particular, we establish a strong connection between the penalty function and the hyper-objective by explicitly characterizing the conditions under which the values and derivatives of the two must be O(σ)-close. A by-product of our analysis is the explicit formula for the gradient of hyper-objective when the lower-level problem has multiple solutions under minimal conditions, which could be of independent interest. Next, viewing the penalty formulation as O(σ)-approximation of the original BO, we propose first-order algorithms that find an ϵ-stationary solution by optimizing the penalty formulation with σ = O(ϵ). When the perturbed lower-level problem uniformly satisfies the small-error proximal error-bound (EB) condition, we propose a first-order algorithm that converges to an ϵ-stationary point of the penalty function, using in total O(ϵ^-3) and O(ϵ^-7) accesses to first-order (stochastic) gradient oracles when the oracle is deterministic and oracles are noisy, respectively. Under an additional assumption on stochastic oracles, we show that the algorithm can be implemented in a fully single-loop manner, i.e., with O(1) samples per iteration, and achieves the improved oracle-complexity of O(ϵ^-3) and O(ϵ^-5), respectively.


On Penalty-based Bilevel Gradient Descent Method

Bilevel optimization enjoys a wide range of applications in hyper-parame...

A Fully First-Order Method for Stochastic Bilevel Optimization

We consider stochastic unconstrained bilevel optimization problems when ...

First-order penalty methods for bilevel optimization

In this paper we study a class of unconstrained and constrained bilevel ...

On Bilevel Optimization without Lower-level Strong Convexity

Theoretical properties of bilevel problems are well studied when the low...

Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex Optimization

We propose and analyze several stochastic gradient algorithms for findin...

A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization

This paper proposes a new algorithm – the Momentum-assisted Single-times...

On the Complexity of Deterministic Nonsmooth and Nonconvex Optimization

In this paper, we present several new results on minimizing a nonsmooth ...

Please sign up or login with your details

Forgot password? Click here to reset