Optimizing Deeper Transformers on Small Datasets: An Application on Text-to-SQL Semantic Parsing

12/30/2020
by   Peng Xu, et al.
39

Due to the common belief that training deep transformers from scratch requires large datasets, people usually only use shallow and simple additional layers on top of pre-trained models during fine-tuning on small datasets. We provide evidence that this does not always need to be the case: with proper initialization and training techniques, the benefits of very deep transformers are shown to carry over to hard structural prediction tasks, even using small datasets. In particular, we successfully train 48 layers of transformers for a semantic parsing task. These comprise 24 fine-tuned transformer layers from pre-trained RoBERTa and 24 relation-aware transformer layers trained from scratch. With fewer training steps and no task-specific pre-training, we obtain the state of the art performance on the challenging cross-domain Text-to-SQL semantic parsing benchmark Spider. We achieve this by deriving a novel Data dependent Transformer Fixed-update initialization scheme (DT-Fixup), inspired by the prior T-Fixup work. Further error analysis demonstrates that increasing the depth of the transformer model can help improve generalization on the cases requiring reasoning and structural understanding.

READ FULL TEXT
research
08/20/2023

Large Transformers are Better EEG Learners

Pre-trained large transformer models have achieved remarkable performanc...
research
09/18/2023

Deep Prompt Tuning for Graph Transformers

Graph transformers have gained popularity in various graph-based tasks b...
research
06/21/2023

Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI

Pre-trained language models have recently emerged as a powerful tool for...
research
05/27/2021

Diagnosing Transformers in Task-Oriented Semantic Parsing

Modern task-oriented semantic parsing approaches typically use seq2seq t...
research
11/01/2021

Transformers for prompt-level EMA non-response prediction

Ecological Momentary Assessments (EMAs) are an important psychological d...
research
06/23/2023

Incorporating Graph Information in Transformer-based AMR Parsing

Abstract Meaning Representation (AMR) is a Semantic Parsing formalism th...
research
07/05/2022

CASHformer: Cognition Aware SHape Transformer for Longitudinal Analysis

Modeling temporal changes in subcortical structures is crucial for a bet...

Please sign up or login with your details

Forgot password? Click here to reset