On the Perils of Cascading Robust Classifiers
Ensembling certifiably robust neural networks has been shown to be a promising approach for improving the certified robust accuracy of neural models. Black-box ensembles that assume only query-access to the constituent models (and their robustness certifiers) during prediction are particularly attractive due to their modular structure. Cascading ensembles are a popular instance of black-box ensembles that appear to improve certified robust accuracies in practice. However, we find that the robustness certifier used by a cascading ensemble is unsound. That is, when a cascading ensemble is certified as locally robust at an input x, there can, in fact, be inputs x' in the ϵ-ball centered at x, such that the cascade's prediction at x' is different from x. We present an alternate black-box ensembling mechanism based on weighted voting which we prove to be sound for robustness certification. Via a thought experiment, we demonstrate that if the constituent classifiers are suitably diverse, voting ensembles can improve certified performance. Our code is available at <https://github.com/TristaChi/ensembleKW>.
READ FULL TEXT