Cookie Synchronization: Everything You Always Wanted to Know But Were Afraid to Ask
User data is the primary input of digital advertising, the fuel of free Internet as we know it. As a result, web entities invest a lot in elaborate tracking mechanisms to acquire more and more user data that can sell to data markets and advertisers. The primary identification mechanism of web is through cookies, where each entity assigns a userID on the user's side. However, each tracker knows the same user with a different ID. So how can the collected data be sold and merged with the associated user data of the buyer? To address this, Cookie Synchronization (CSync) came to the rescue. CSync facilitates an information sharing channel between third parties that may or may not have direct access to the website the user visits. With CSync, they merge the user data they own in the background, but also reconstruct the browsing history of a user bypassing the same origin policy. In this paper, we perform a first to our knowledge in-depth study of CSync in the wild, using a year-long dataset that includes web browsing activity from 850 real mobile users. Through our study, we aim to understand the characteristics of the CSync protocol and the impact it has to the users privacy. Our results show that 97 CSync: most of them within the first week of their browsing. In addition, the average user receives 1 synchronization per 68 GET requests, and the median userID gets leaked, on average, to 3.5 different online entities. In addition, we see that CSync increases the number of entities that track the user by a factor of 6.7. Finally, we propose a novel, machine learning-based method for CSync detection, which can be effective when the synced IDs are obscured.
READ FULL TEXT