Breaking the Softmax Bottleneck for Sequential Recommender Systems with Dropout and Decoupling

by   Ying-Chen Lin, et al.

The Softmax bottleneck was first identified in language modeling as a theoretical limit on the expressivity of Softmax-based models. Being one of the most widely-used methods to output probability, Softmax-based models have found a wide range of applications, including session-based recommender systems (SBRSs). Softmax-based models consist of a Softmax function on top of a final linear layer. The bottleneck has been shown to be caused by rank deficiency in the final linear layer due to its connection with matrix factorization. In this paper, we show that there are more aspects to the Softmax bottleneck in SBRSs. Contrary to common beliefs, overfitting does happen in the final linear layer, while it is often associated with complex networks. Furthermore, we identified that the common technique of sharing item embeddings among session sequences and the candidate pool creates a tight-coupling that also contributes to the bottleneck. We propose a simple yet effective method, Dropout and Decoupling (D D), to alleviate these problems. Our experiments show that our method significantly improves the accuracy of a variety of Softmax-based SBRS algorithms. When compared to other computationally expensive methods, such as MLP and MoS (Mixture of Softmaxes), our method performs on par with and at times even better than those methods, while keeping the same time complexity as Softmax-based models.


page 1

page 2

page 3

page 4


Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

We formulate language modeling as a matrix factorization problem, and sh...

Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities

The softmax function on top of a final linear layer is the de facto meth...

Sigsoftmax: Reanalysis of the Softmax Bottleneck

Softmax is an output activation function for modeling categorical probab...

SimA: Simple Softmax-free Attention for Vision Transformers

Recently, vision transformers have become very popular. However, deployi...

On the Effectiveness of Sampled Softmax Loss for Item Recommendation

Learning objectives of recommender models remain largely unexplored. Mos...

Query Twice: Dual Mixture Attention Meta Learning for Video Summarization

Video summarization aims to select representative frames to retain high-...

Please sign up or login with your details

Forgot password? Click here to reset