Towards End-to-End Image Compression and Analysis with Transformers

by   Yuanchao Bai, et al.

We propose an end-to-end image compression and analysis model with Transformers, targeting to the cloud-based image classification application. Instead of placing an existing Transformer-based image classification model directly after an image codec, we aim to redesign the Vision Transformer (ViT) model to perform image classification from the compressed features and facilitate image compression with the long-term information from the Transformer. Specifically, we first replace the patchify stem (i.e., image splitting and embedding) of the ViT model with a lightweight image encoder modelled by a convolutional neural network. The compressed features generated by the image encoder are injected convolutional inductive bias and are fed to the Transformer for image classification bypassing image reconstruction. Meanwhile, we propose a feature aggregation module to fuse the compressed features with the selected intermediate features of the Transformer, and feed the aggregated features to a deconvolutional neural network for image reconstruction. The aggregated features can obtain the long-term information from the self-attention mechanism of the Transformer and improve the compression performance. The rate-distortion-accuracy optimization problem is finally solved by a two-step training strategy. Experimental results demonstrate the effectiveness of the proposed model in both the image compression and the classification tasks.


page 5

page 7

page 8

page 9

page 10

page 11

page 12

page 13


Aggregated Pyramid Vision Transformer: Split-transform-merge Strategy for Image Recognition without Convolutions

With the achievements of Transformer in the field of natural language pr...

Transformer-based Image Compression

A Transformer-based Image Compression (TIC) approach is developed which ...

Learning A Sparse Transformer Network for Effective Image Deraining

Transformers-based methods have achieved significant performance in imag...

SDLFormer: A Sparse and Dense Locality-enhanced Transformer for Accelerated MR Image Reconstruction

Transformers have emerged as viable alternatives to convolutional neural...

Multi-spectral Entropy Constrained Neural Compression of Solar Imagery

Missions studying the dynamic behaviour of the Sun are defined to captur...

Faster and Accurate Classification for JPEG2000 Compressed Images in Networked Applications

JPEG2000 (j2k) is a highly popular format for image and video compressio...

Forensic License Plate Recognition with Compression-Informed Transformers

Forensic license plate recognition (FLPR) remains an open challenge in l...

Please sign up or login with your details

Forgot password? Click here to reset