Efficient Decision-based Black-box Patch Attacks on Video Recognition

by   Kaixun Jiang, et al.

Although Deep Neural Networks (DNNs) have demonstrated excellent performance, they are vulnerable to adversarial patches that introduce perceptible and localized perturbations to the input. Generating adversarial patches on images has received much attention, while adversarial patches on videos have not been well investigated. Further, decision-based attacks, where attackers only access the predicted hard labels by querying threat models, have not been well explored on video models either, even if they are practical in real-world video recognition scenes. The absence of such studies leads to a huge gap in the robustness assessment for video models. To bridge this gap, this work first explores decision-based patch attacks on video models. We analyze that the huge parameter space brought by videos and the minimal information returned by decision-based models both greatly increase the attack difficulty and query burden. To achieve a query-efficient attack, we propose a spatial-temporal differential evolution (STDE) framework. First, STDE introduces target videos as patch textures and only adds patches on keyframes that are adaptively selected by temporal difference. Second, STDE takes minimizing the patch area as the optimization objective and adopts spatialtemporal mutation and crossover to search for the global optimum without falling into the local optimum. Experiments show STDE has demonstrated state-of-the-art performance in terms of threat, efficiency and imperceptibility. Hence, STDE has the potential to be a powerful tool for evaluating the robustness of video recognition models.


page 1

page 4

page 6

page 8


Query-Efficient Decision-based Black-Box Patch Attack

Deep neural networks (DNNs) have been showed to be highly vulnerable to ...

Attacking Video Recognition Models with Bullet-Screen Comments

Recent research has demonstrated that Deep Neural Networks (DNNs) are vu...

Efficient Robustness Assessment via Adversarial Spatial-Temporal Focus on Videos

Adversarial robustness assessment for video recognition models has raise...

Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal

Vision transformers (ViTs) have demonstrated impressive performance and ...

PatchAttack: A Black-box Texture-based Attack with Reinforcement Learning

Patch-based attacks introduce a perceptible but localized change to the ...

Robustness Out of the Box: Compositional Representations Naturally Defend Against Black-Box Patch Attacks

Patch-based adversarial attacks introduce a perceptible but localized ch...

Adv-Inpainting: Generating Natural and Transferable Adversarial Patch via Attention-guided Feature Fusion

The rudimentary adversarial attacks utilize additive noise to attack fac...

Please sign up or login with your details

Forgot password? Click here to reset