Compression Rate Method for Empirical Science and Application to Computer Vision
This philosophical paper proposes a modified version of the scientific method, in which large databases are used instead of experimental observations as the necessary empirical ingredient. This change in the source of the empirical data allows the scientific method to be applied to several aspects of physical reality that previously resisted systematic interrogation. Under the new method, scientific theories are compared by instantiating them as compression programs, and examining the codelengths they achieve on a database of measurements related to a phenomenon of interest. Because of the impossibility of compressing random data, "real world" data can only be compressed by discovering and exploiting the empirical structure it exhibits. The method also provides a new way of thinking about two longstanding issues in the philosophy of science: the problem of induction and the problem of demarcation. The second part of the paper proposes to reformulate computer vision as an empirical science of visual reality, by applying the new method to large databases of natural images. The immediate goal of the proposed reformulation is to repair the chronic difficulties in evaluation experienced by the field of computer vision. The reformulation should bring a wide range of benefits, including a substantially increased degree of methodological rigor, the ability to justify complex theories without overfitting, a scalable evaluation paradigm, and the potential to make systematic progress. A crucial argument is that the change is not especially drastic, because most computer vision tasks can be reformulated as specialized image compression techniques. Finally, a concrete proposal is discussed in which a database is produced by recording from a roadside video camera, and compression is achieved by developing a computational understanding of the appearance of moving cars.
READ FULL TEXT