Beat Transformer: Demixed Beat and Downbeat Tracking with Dilated Self-Attention

09/15/2022
by   Jingwei Zhao, et al.
0

We propose Beat Transformer, a novel Transformer encoder architecture for joint beat and downbeat tracking. Different from previous models that track beats solely based on the spectrogram of an audio mixture, our model deals with demixed spectrograms with multiple instrument channels. This is inspired by the fact that humans perceive metrical structures from richer musical contexts, such as chord progression and instrumentation. To this end, we develop a Transformer model with both time-wise attention and instrument-wise attention to capture deep-buried metrical cues. Moreover, our model adopts a novel dilated self-attention mechanism, which achieves powerful hierarchical modelling with only linear complexity. Experiments demonstrate a significant improvement in demixed beat tracking over the non-demixed version. Also, Beat Transformer achieves up to 4 over the TCN architectures. We further discover an interpretable attention pattern that mirrors our understanding of hierarchical metrical structures.

READ FULL TEXT
research
05/31/2022

Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking

The recent trend in multiple object tracking (MOT) is heading towards le...
research
11/20/2022

Convexifying Transformers: Improving optimization and understanding of transformer networks

Understanding the fundamental mechanism behind the success of transforme...
research
09/29/2020

Attention that does not Explain Away

Models based on the Transformer architecture have achieved better accura...
research
09/14/2019

Tree Transformer: Integrating Tree Structures into Self-Attention

Pre-training Transformer from large-scale raw texts and fine-tuning on t...
research
07/20/2022

AiATrack: Attention in Attention for Transformer Visual Tracking

Transformer trackers have achieved impressive advancements recently, whe...
research
10/09/2022

KSAT: Knowledge-infused Self Attention Transformer – Integrating Multiple Domain-Specific Contexts

Domain-specific language understanding requires integrating multiple pie...

Please sign up or login with your details

Forgot password? Click here to reset