TS-CNN: Text Steganalysis from Semantic Space Based on Convolutional Neural Network
Steganalysis has been an important research topic in cybersecurity that helps to identify covert attacks in public network. With the rapid development of natural language processing technology in the past two years, coverless steganography has been greatly developed. Previous text steganalysis methods have shown unsatisfactory results on this new steganography technique and remain an unsolved challenge. Different from all previous text steganalysis methods, in this paper, we propose a text steganalysis method(TS-CNN) based on semantic analysis, which uses convolutional neural network(CNN) to extract high-level semantic features of texts, and finds the subtle distribution differences in the semantic space before and after embedding the secret information. To train and test the proposed model, we collected and released a large text steganalysis(CT-Steg) dataset, which contains a total number of 216,000 texts with various lengths and various embedding rates. Experimental results show that the proposed model can achieve nearly 100% precision and recall, outperforms all the previous methods. Furthermore, the proposed model can even estimate the capacity of the hidden information inside. These results strongly support that using the subtle changes in the semantic space before and after embedding the secret information to conduct text steganalysis is feasible and effective.
READ FULL TEXT