PolyBuilding: Polygon Transformer for End-to-End Building Extraction

by   Yuan Hu, et al.
Alibaba Group
Peking University

We present PolyBuilding, a fully end-to-end polygon Transformer for building extraction. PolyBuilding direct predicts vector representation of buildings from remote sensing images. It builds upon an encoder-decoder transformer architecture and simultaneously outputs building bounding boxes and polygons. Given a set of polygon queries, the model learns the relations among them and encodes context information from the image to predict the final set of building polygons with fixed vertex numbers. Corner classification is performed to distinguish the building corners from the sampled points, which can be used to remove redundant vertices along the building walls during inference. A 1-d non-maximum suppression (NMS) is further applied to reduce vertex redundancy near the building corners. With the refinement operations, polygons with regular shapes and low complexity can be effectively obtained. Comprehensive experiments are conducted on the CrowdAI dataset. Quantitative and qualitative results show that our approach outperforms prior polygonal building extraction methods by a large margin. It also achieves a new state-of-the-art in terms of pixel-level coverage, instance-level precision and recall, and geometry-level properties (including contour regularity and polygon complexity).


page 1

page 3

page 4

page 8

page 10

page 11


HiT: Building Mapping with Hierarchical Transformers

Deep learning-based methods have been extensively explored for automatic...

BiSVP: Building Footprint Extraction via Bidirectional Serialized Vertex Prediction

Extracting building footprints from remote sensing images has been attra...

BuildFormer: Automatic building extraction with vision transformer

Building extraction from fine-resolution remote sensing images plays a v...

LEFormer: A Hybrid CNN-Transformer Architecture for Accurate Lake Extraction from Remote Sensing Imagery

Lake extraction from remote sensing imagery is challenging due to the co...

Agglomerative Transformer for Human-Object Interaction Detection

We propose an agglomerative Transformer (AGER) that enables Transformer-...

Please sign up or login with your details

Forgot password? Click here to reset