Directed Acyclic Graph Factorization Machines for CTR Prediction via Knowledge Distillation

11/21/2022
by   Zhen Tian, et al.
0

With the growth of high-dimensional sparse data in web-scale recommender systems, the computational cost to learn high-order feature interaction in CTR prediction task largely increases, which limits the use of high-order interaction models in real industrial applications. Some recent knowledge distillation based methods transfer knowledge from complex teacher models to shallow student models for accelerating the online model inference. However, they suffer from the degradation of model accuracy in knowledge distillation process. It is challenging to balance the efficiency and effectiveness of the shallow student models. To address this problem, we propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation. The proposed lightweight student model DAGFM can learn arbitrary explicit feature interactions from teacher networks, which achieves approximately lossless performance and is proved by a dynamic programming algorithm. Besides, an improved general model KD-DAGFM+ is shown to be effective in distilling both explicit and implicit feature interactions from any complex teacher model. Extensive experiments are conducted on four real-world datasets, including a large-scale industrial dataset from WeChat platform with billions of feature dimensions. KD-DAGFM achieves the best performance with less than 21.5 online and offline experiments, showing the superiority of DAGFM to deal with the industrial scale data in CTR prediction task. Our implementation code is available at: https://github.com/RUCAIBox/DAGFM.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2020

Ensembled CTR Prediction via Knowledge Distillation

Recently, deep learning-based models have been widely studied for click-...
research
04/21/2023

EulerNet: Adaptive Feature Interaction Learning via Euler's Formula for CTR Prediction

Learning effective high-order feature interactions is very crucial in th...
research
05/13/2022

Knowledge Distillation Meets Open-Set Semi-Supervised Learning

Existing knowledge distillation methods mostly focus on distillation of ...
research
07/25/2022

HIRE: Distilling High-order Relational Knowledge From Heterogeneous Graph Neural Networks

Researchers have recently proposed plenty of heterogeneous graph neural ...
research
11/11/2020

Distill2Vec: Dynamic Graph Representation Learning with Knowledge Distillation

Dynamic graph representation learning strategies are based on different ...
research
03/25/2020

AutoFIS: Automatic Feature Interaction Selection in Factorization Models for Click-Through Rate Prediction

Learning effective feature interactions is crucial for click-through rat...
research
12/03/2018

Accelerating Large Scale Knowledge Distillation via Dynamic Importance Sampling

Knowledge distillation is an effective technique that transfers knowledg...

Please sign up or login with your details

Forgot password? Click here to reset