Catching Out-of-Context Misinformation with Self-supervised Learning
Despite the recent attention to DeepFakes and other forms of image manipulations, one of the most prevalent ways to mislead audiences is the use of unaltered images in a new but false context. To address these challenges and support fact-checkers, we propose a new method that automatically detects out-of-context image and text pairs. Our core idea is a self-supervised training strategy where we only need images with matching (and non-matching) captions from different sources. At train time, our method learns to selectively align individual objects in an image with textual claims, without explicit supervision. At test time, we check for a given text pair if both texts correspond to same object(s) in the image but semantically convey different descriptions, which allows us to make fairly accurate out-of-context predictions. Our method achieves 82 facilitate training our method, we created a large-scale dataset of 200K images which we match with 450K textual captions from a variety of news websites, blogs, and social media posts; i.e., for each image, we obtained several captions.
READ FULL TEXT