AutoADR: Automatic Model Design for Ad Relevance

by   Yiren Chen, et al.
Peking University

Large-scale pre-trained models have attracted extensive attention in the research community and shown promising results on various tasks of natural language processing. However, these pre-trained models are memory and computation intensive, hindering their deployment into industrial online systems like Ad Relevance. Meanwhile, how to design an effective yet efficient model architecture is another challenging problem in online Ad Relevance. Recently, AutoML shed new lights on architecture design, but how to integrate it with pre-trained language models remains unsettled. In this paper, we propose AutoADR (Automatic model design for AD Relevance) – a novel end-to-end framework to address this challenge, and share our experience to ship these cutting-edge techniques into online Ad Relevance system at Microsoft Bing. Specifically, AutoADR leverages a one-shot neural architecture search algorithm to find a tailored network architecture for Ad Relevance. The search process is simultaneously guided by knowledge distillation from a large pre-trained teacher model (e.g. BERT), while taking the online serving constraints (e.g. memory and latency) into consideration. We add the model designed by AutoADR as a sub-model into the production Ad Relevance model. This additional sub-model improves the Precision-Recall AUC (PR AUC) on top of the original Ad Relevance model by 2.65X of the normalized shipping bar. More importantly, adding this automatically designed sub-model leads to a statistically significant 4.6 Bad-Ad ratio reduction in online A/B testing. This model has been shipped into Microsoft Bing Ad Relevance Production model.


page 1

page 2

page 3

page 4


Automatic Mixed-Precision Quantization Search of BERT

Pre-trained language models such as BERT have shown remarkable effective...

SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance

Ad relevance modeling plays a critical role in online advertising system...

Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models

Large pre-trained language models have achieved state-of-the-art results...

Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains

Large pre-trained models have achieved great success in many natural lan...

Visual Congruent Ads for Image Search

The quality of user experience online is affected by the relevance and p...

TSI: an Ad Text Strength Indicator using Text-to-CTR and Semantic-Ad-Similarity

Coming up with effective ad text is a time consuming process, and partic...

DeepGen: Diverse Search Ad Generation and Real-Time Customization

We present DeepGen, a system deployed at web scale for automatically cre...

Please sign up or login with your details

Forgot password? Click here to reset