Solving Regularized Exp, Cosh and Sinh Regression Problems
In modern machine learning, attention computation is a fundamental task for training large language models such as Transformer, GPT-4 and ChatGPT. In this work, we study exponential regression problem which is inspired by the softmax/exp unit in the attention mechanism in large language models. The standard exponential regression is non-convex. We study the regularization version of exponential regression problem which is a convex problem. We use approximate newton method to solve in input sparsity time. Formally, in this problem, one is given matrix A ∈ℝ^n × d, b ∈ℝ^n, w ∈ℝ^n and any of functions exp, cosh and sinh denoted as f. The goal is to find the optimal x that minimize 0.5 f(Ax) - b _2^2 + 0.5 diag(w) A x _2^2. The straightforward method is to use the naive Newton's method. Let nnz(A) denote the number of non-zeros entries in matrix A. Let ω denote the exponent of matrix multiplication. Currently, ω≈ 2.373. Let ϵ denote the accuracy error. In this paper, we make use of the input sparsity and purpose an algorithm that use log ( x_0 - x^*_2 / ϵ) iterations and O(nnz(A) + d^ω ) per iteration time to solve the problem.
READ FULL TEXT