Do Transformers Really Perform Bad for Graph Representation?
The Transformer architecture has become a dominant choice in many domains,
such as natural language processing and computer vision. Yet, it has not
achieved competitive performance on popular leaderboards of graph-level
prediction compared to mainstream GNN variants. Therefore, it remains a mystery
how Transformers could perform well for graph representation learning. In this
paper, we solve this mystery by presenting Graphormer, which is built upon the
standard Transformer architecture, and could attain excellent results on a
broad range of graph representation learning tasks, especially on the recent
OGB Large-Scale Challenge. Our key insight to utilizing Transformer in the
graph is the necessity of effectively encoding the structural information of a
graph into the model. To this end, we propose several simple yet effective
structural encoding methods to help Graphormer better model graph-structured
data. Besides, we mathematically characterize the expressive power of
Graphormer and exhibit that with our ways of encoding the structural
information of graphs, many popular GNN variants could be covered as the
special cases of Graphormer.
READ FULL TEXT