CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning
Contrastive learning (CL) pre-trains general-purpose encoders using an unlabeled pre-training dataset, which consists of images (called single-modal CL) or image-text pairs (called multi-modal CL). CL is vulnerable to data poisoning based backdoor attacks (DPBAs), in which an attacker injects poisoned inputs into the pre-training dataset so the encoder is backdoored. However, existing DPBAs achieve limited effectiveness. In this work, we propose new DPBAs called CorruptEncoder to CL. Our experiments show that CorruptEncoder substantially outperforms existing DPBAs for both single-modal and multi-modal CL. CorruptEncoder is the first DPBA that achieves more than 90 rates on single-modal CL with only a few (3) reference images and a small poisoning ratio (0.5 cropping, to defend single-modal CL against DPBAs. Our results show that our defense can reduce the effectiveness of DPBAs, but it sacrifices the utility of the encoder, highlighting the needs of new defenses.
READ FULL TEXT