Layout-aware Webpage Quality Assessment

by   Anfeng Cheng, et al.

Identifying high-quality webpages is fundamental for real-world search engines, which can fulfil users' information need with the less cognitive burden. Early studies of webpage quality assessment usually design hand-crafted features that may only work on particular categories of webpages (e.g., shopping websites, medical websites). They can hardly be applied to real-world search engines that serve trillions of webpages with various types and purposes. In this paper, we propose a novel layout-aware webpage quality assessment model currently deployed in our search engine. Intuitively, layout is a universal and critical dimension for the quality assessment of different categories of webpages. Based on this, we directly employ the meta-data that describes a webpage, i.e., Document Object Model (DOM) tree, as the input of our model. The DOM tree data unifies the representation of webpages with different categories and purposes and indicates the layout of webpages. To assess webpage quality from complex DOM tree data, we propose a graph neural network (GNN) based method that extracts rich layout-aware information that implies webpage quality in an end-to-end manner. Moreover, we improve the GNN method with an attentive readout function, external web categories and a category-aware sampling method. We conduct rigorous offline and online experiments to show that our proposed solution is effective in real search engines, improving the overall usability and user experience.


CCATMos: Convolutional Context-aware Transformer Network for Non-intrusive Speech Quality Assessment

Speech quality assessment has been a critical component in many voice co...

Cross-Domain Document Layout Analysis via Unsupervised Document Style Guide

The document layout analysis (DLA) aims to decompose document images int...

A Joint Model for Multimodal Document Quality Assessment

The quality of a document is affected by various factors, including gram...

Recognition Oriented Iris Image Quality Assessment in the Feature Space

A large portion of iris images captured in real world scenarios are poor...

Cognitive Representation Learning of Self-Media Online Article Quality

The automatic quality assessment of self-media online articles is an urg...

EEP-3DQA: Efficient and Effective Projection-based 3D Model Quality Assessment

Currently, great numbers of efforts have been put into improving the eff...

PubMed Labs: An experimental platform for improving biomedical literature search

PubMed is a freely accessible system for searching the biomedical litera...

Please sign up or login with your details

Forgot password? Click here to reset