From Symbols to Embeddings: A Tale of Two Representations in Computational Social Science
Computational Social Science (CSS), aiming at utilizing computational methods to address social science problems, is a recent emerging and fast-developing field. The study of CSS is data-driven and significantly benefits from the availability of online user-generated contents and social networks, which contain rich text and network data for investigation. However, these large-scale and multi-modal data also present researchers with a great challenge: how to represent data effectively to mine the meanings we want in CSS? To explore the answer, we give a thorough review of data representations in CSS for both text and network. Specifically, we summarize existing representations into two schemes, namely symbol-based and embedding-based representations, and introduce a series of typical methods for each scheme. Afterwards, we present the applications of the above representations based on the investigation of more than 400 research articles from 6 top venues involved with CSS. From the statistics of these applications, we unearth the strength of each kind of representations and discover the tendency that embedding-based representations are emerging and obtaining increasing attention over the last decade. Finally, we discuss several key challenges and open issues for future directions. This survey aims to provide a deeper understanding and more advisable applications of data representations for CSS researchers.
READ FULL TEXT