Multi-modal Identification of State-Sponsored Propaganda on Social Media
The prevalence of state-sponsored propaganda on the Internet has become a cause for concern in the recent years. While much effort has been made to identify state-sponsored Internet propaganda, the problem remains far from being solved because the ambiguous definition of propaganda leads to unreliable data labelling, and the huge amount of potential predictive features causes the models to be inexplicable. This paper is the first attempt to build a balanced dataset for this task. The dataset is comprised of propaganda by three different organizations across two time periods. A multi-model framework for detecting propaganda messages solely based on the visual and textual content is proposed which achieves a promising performance on detecting propaganda by the three organizations both for the same time period (training and testing on data from the same time period) (F1=0.869) and for different time periods (training on past, testing on future) (F1=0.697). To reduce the influence of false positive predictions, we change the threshold to test the relationship between the false positive and true positive rates and provide explanations for the predictions made by our models with visualization tools to enhance the interpretability of our framework. Our new dataset and general framework provide a strong benchmark for the task of identifying state-sponsored Internet propaganda and point out a potential path for future work on this task.
READ FULL TEXT