SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance

by   Li Lyna Zhang, et al.

Ad relevance modeling plays a critical role in online advertising systems including Microsoft Bing. To leverage powerful transformers like BERT in this low-latency setting, many existing approaches perform ad-side computations offline. While efficient, these approaches are unable to serve cold start ads, resulting in poor relevance predictions for such ads. This work aims to design a new, low-latency BERT via structured pruning to empower real-time online inference for cold start ads relevance on a CPU platform. Our challenge is that previous methods typically prune all layers of the transformer to a high, uniform sparsity, thereby producing models which cannot achieve satisfactory inference speed with an acceptable accuracy. In this paper, we propose SwiftPruner - an efficient framework that leverages evolution-based search to automatically find the best-performing layer-wise sparse BERT model under the desired latency constraint. Different from existing evolution algorithms that conduct random mutations, we propose a reinforced mutator with a latency-aware multi-objective reward to conduct better mutations for efficiently searching the large space of layer-wise sparse models. Extensive experiments demonstrate that our method consistently achieves higher ROC AUC and lower latency than the uniform sparse baseline and state-of-the-art search methods. Remarkably, under our latency requirement of 1900us on CPU, SwiftPruner achieves a 0.86 sparse baseline for BERT-Mini on a large scale real-world dataset. Online A/B testing shows that our model also achieves a significant 11.7 of defective cold start ads with satisfactory real-time serving latency.


page 1

page 2

page 3

page 4


AutoADR: Automatic Model Design for Ad Relevance

Large-scale pre-trained models have attracted extensive attention in the...

An Automatic and Efficient BERT Pruning for Edge AI Systems

With the yearning for deep learning democratization, there are increasin...

Towards Better Query Classification with Multi-Expert Knowledge Condensation in JD Ads Search

Search query classification, as an effective way to understand user inte...

TSI: an Ad Text Strength Indicator using Text-to-CTR and Semantic-Ad-Similarity

Coming up with effective ad text is a time consuming process, and partic...

Joint Channel and Weight Pruning for Model Acceleration on Moblie Devices

For practical deep neural network design on mobile devices, it is essent...

TextGNN: Improving Text Encoder via Graph Neural Network in Sponsored Search

Text encoders based on C-DSSM or transformers have demonstrated strong p...

Gating-adapted Wavelet Multiresolution Analysis for Exposure Sequence Modeling in CTR prediction

The exposure sequence is being actively studied for user interest modeli...

Please sign up or login with your details

Forgot password? Click here to reset