Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters

by   Aston Zhang, et al.

Recent works have demonstrated reasonable success of representation learning in hypercomplex space. Specifically, "fully-connected layers with Quaternions" (4D hypercomplex numbers), which replace real-valued matrix multiplications in fully-connected layers with Hamilton products of Quaternions, both enjoy parameter savings with only 1/4 learnable parameters and achieve comparable performance in various applications. However, one key caveat is that hypercomplex space only exists at very few predefined dimensions (4D, 8D, and 16D). This restricts the flexibility of models that leverage hypercomplex multiplications. To this end, we propose parameterizing hypercomplex multiplications, allowing models to learn multiplication rules from data regardless of whether such rules are predefined. As a result, our method not only subsumes the Hamilton product, but also learns to operate on any arbitrary nD hypercomplex space, providing more architectural flexibility using arbitrarily 1/n learnable parameters compared with the fully-connected layer counterpart. Experiments of applications to the LSTM and Transformer models on natural language inference, machine translation, text style transfer, and subject verb agreement demonstrate architectural flexibility and effectiveness of the proposed approach.


page 1

page 2

page 3

page 4


Compression of Fully-Connected Layer in Neural Network by Kronecker Product

In this paper we propose and study a technique to reduce the number of p...

Do We Need Fully Connected Output Layers in Convolutional Networks?

Traditionally, deep convolutional neural networks consist of a series of...

An Equivalence of Fully Connected Layer and Convolutional Layer

This article demonstrates that convolutional operation can be converted ...

Lightweight Convolutional Neural Networks By Hypercomplex Parameterization

Hypercomplex neural networks have proved to reduce the overall number of...

Learning Robust and Lightweight Model through Separable Structured Transformations

With the proliferation of mobile devices and the Internet of Things, dee...

Efficient GPT Model Pre-training using Tensor Train Matrix Representation

Large-scale transformer models have shown remarkable performance in lang...

Exchangeability and Kernel Invariance in Trained MLPs

In the analysis of machine learning models, it is often convenient to as...

Please sign up or login with your details

Forgot password? Click here to reset