Algorithm and Hardness for Dynamic Attention Maintenance in Large Language Models

by   Jan van den Brand, et al.

Large language models (LLMs) have made fundamental changes in human life. The attention scheme is one of the key components over all the LLMs, such as BERT, GPT-1, Transformers, GPT-2, 3, 3.5 and 4. Inspired by previous theoretical study of static version of the attention multiplication problem [Zandieh, Han, Daliri, and Karbasi arXiv 2023, Alman and Song arXiv 2023]. In this work, we formally define a dynamic version of attention matrix multiplication problem. There are matrices Q,K, V ∈ℝ^n × d, they represent query, key and value in LLMs. In each iteration we update one entry in K or V. In the query stage, we receive (i,j) ∈ [n] × [d] as input, and want to answer (D^-1 A V)_i,j, where A:=exp(QK^⊤) ∈ℝ^n × n is a square matrix and D := diag(A 1_n) ∈ℝ^n × n is a diagonal matrix. Here 1_n denote a length-n vector that all the entries are ones. We provide two results: an algorithm and a conditional lower bound. ∙ On one hand, inspired by the lazy update idea from [Demetrescu and Italiano FOCS 2000, Sankowski FOCS 2004, Cohen, Lee and Song STOC 2019, Brand SODA 2020], we provide a data-structure that uses O(n^ω(1,1,τ)-τ) amortized update time, and O(n^1+τ) worst-case query time. ∙ On the other hand, show that unless the hinted matrix vector multiplication conjecture [Brand, Nanongkai and Saranurak FOCS 2019] is false, there is no algorithm that can use both O(n^ω(1,1,τ) - τ- Ω(1)) amortized update time, and O(n^1+τ-Ω(1)) worst query time. In conclusion, our algorithmic result is conditionally optimal unless hinted matrix vector multiplication conjecture is false.


page 1

page 2

page 3

page 4


Dynamic Effective Resistances and Approximate Schur Complement on Separable Graphs

We consider the problem of dynamically maintaining (approximate) all-pai...

New Amortized Cell-Probe Lower Bounds for Dynamic Problems

We build upon the recent papers by Weinstein and Yu (FOCS'16), Larsen (F...

Randomized and Deterministic Attention Sparsification Algorithms for Over-parameterized Feature Dimension

Large language models (LLMs) have shown their power in different areas. ...

On Updating and Querying Submatrices

In this paper, we study the d-dimensional update-query problem. We provi...

Counting Triangles under Updates in Worst-Case Optimal Time

We consider the problem of incrementally maintaining the triangle count ...

Maintaining Triangle Queries under Updates

We consider the problem of incrementally maintaining the triangle querie...

Please sign up or login with your details

Forgot password? Click here to reset