End-to-End Learnable Multi-Scale Feature Compression for VCM

by   Yeongwoong Kim, et al.

The proliferation of deep learning-based machine vision applications has given rise to a new type of compression, so called video coding for machine (VCM). VCM differs from traditional video coding in that it is optimized for machine vision performance instead of human visual quality. In the feature compression track of MPEG-VCM, multi-scale features extracted from images are subject to compression. Recent feature compression works have demonstrated that the versatile video coding (VVC) standard-based approach can achieve a BD-rate reduction of up to 96 sub-optimal as VVC was not designed for extracted features but for natural images. Moreover, the high encoding complexity of VVC makes it difficult to design a lightweight encoder without sacrificing performance. To address these challenges, we propose a novel multi-scale feature compression method that enables both the end-to-end optimization on the extracted features and the design of lightweight encoders. The proposed model combines a learnable compressor with a multi-scale feature fusion network so that the redundancy in the multi-scale features is effectively removed. Instead of simply cascading the fusion network and the compression network, we integrate the fusion and encoding processes in an interleaved way. Our model first encodes a larger-scale feature to obtain a latent representation and then fuses the latent with a smaller-scale feature. This process is successively performed until the smallest-scale feature is fused and then the encoded latent at the final stage is entropy-coded for transmission. The results show that our model outperforms previous approaches by at least 52 ×5 to ×27 times less encoding time for object detection. It is noteworthy that our model can attain near-lossless task performance with only 0.002-0.003


page 1

page 4

page 12


Neural Multi-scale Image Compression

This study presents a new lossy image compression method that utilizes t...

Scale Coding Bag of Deep Features for Human Attribute and Action Recognition

Most approaches to human attribute and action recognition in still image...

Multi-Scale Deformable Alignment and Content-Adaptive Inference for Flexible-Rate Bi-Directional Video Compression

The lack of ability to adapt the motion compensation model to video cont...

Enhanced Standard Compatible Image Compression Framework based on Auxiliary Codec Networks

To enhance image compression performance, recent deep neural network-bas...

Multi-scale Grouped Dense Network for VVC Intra Coding

Versatile Video Coding (H.266/VVC) standard achieves better image qualit...

Real-Time Adaptive Image Compression

We present a machine learning-based approach to lossy image compression ...

Pruned Lightweight Encoders for Computer Vision

Latency-critical computer vision systems, such as autonomous driving or ...

Please sign up or login with your details

Forgot password? Click here to reset