Deep Prompt Tuning for Graph Transformers

by   Reza Shirkavand, et al.

Graph transformers have gained popularity in various graph-based tasks by addressing challenges faced by traditional Graph Neural Networks. However, the quadratic complexity of self-attention operations and the extensive layering in graph transformer architectures present challenges when applying them to graph based prediction tasks. Fine-tuning, a common approach, is resource-intensive and requires storing multiple copies of large models. We propose a novel approach called deep graph prompt tuning as an alternative to fine-tuning for leveraging large graph transformer models in downstream graph based prediction tasks. Our method introduces trainable feature nodes to the graph and pre-pends task-specific tokens to the graph transformer, enhancing the model's expressive power. By freezing the pre-trained parameters and only updating the added tokens, our approach reduces the number of free parameters and eliminates the need for multiple model copies, making it suitable for small datasets and scalable to large graphs. Through extensive experiments on various-sized datasets, we demonstrate that deep graph prompt tuning achieves comparable or even superior performance to fine-tuning, despite utilizing significantly fewer task-specific parameters. Our contributions include the introduction of prompt tuning for graph transformers, its application to both graph transformers and message passing graph neural networks, improved efficiency and resource utilization, and compelling experimental results. This work brings attention to a promising approach to leverage pre-trained models in graph based prediction tasks and offers new opportunities for exploring and advancing graph representation learning.


page 1

page 2

page 3

page 4


Prompt Tuning for Graph Neural Networks

In recent years, prompt tuning has set off a research boom in the adapta...

G-Adapter: Towards Structure-Aware Parameter-Efficient Transfer Learning for Graph Transformer Networks

It has become a popular paradigm to transfer the knowledge of large-scal...

Optimizing Deeper Transformers on Small Datasets: An Application on Text-to-SQL Semantic Parsing

Due to the common belief that training deep transformers from scratch re...

Attention over pre-trained Sentence Embeddings for Long Document Classification

Despite being the current de-facto models in most NLP tasks, transformer...

Universal Representation for Code

Learning from source code usually requires a large amount of labeled dat...

Cure the headache of Transformers via Collinear Constrained Attention

As the rapid progression of practical applications based on Large Langua...

Mnemosyne: Learning to Train Transformers with Transformers

Training complex machine learning (ML) architectures requires a compute ...

Please sign up or login with your details

Forgot password? Click here to reset