The Shape of Alerts: Detecting Malware Using Distributed Detectors by Robustly Amplifying Transient Correlations
We introduce a new malware detector - Shape-GD - that aggregates per-machine detectors into a robust global detector. Shape-GD is based on two insights: 1. Structural: actions such as visiting a website (waterhole attack) by nodes correlate well with malware spread, and create dynamic neighborhoods of nodes that were exposed to the same attack vector. However, neighborhood sizes vary unpredictably and require aggregating an unpredictable number of local detectors' outputs into a global alert. 2. Statistical: feature vectors corresponding to true and false positives of local detectors have markedly different conditional distributions - i.e. their shapes differ. The shape of neighborhoods can identify infected neighborhoods without having to estimate neighborhood sizes - on 5 years of Symantec detectors' logs, Shape-GD reduces false positives from 1M down to 110K and raises alerts 345 days (on average) before commercial anti-virus products; in a waterhole attack simulated using Yahoo web-service logs, Shape-GD detects infected machines when only 100 of 550K are compromised.
READ FULL TEXT