Two Is Harder To Recognize Than Tom: the Challenge of Visual Numerosity for Deep Learning
In the spirit of Turing test, we design and conduct a set of visual numerosity experiments with deep neural networks. We train DCNNs with a large number of sample images that are varied visual representations of small natural numbers, towards the objective of learning numerosity perception. Numerosity perception, or the number sense, is a cognitive construct so primary and so critical to the survival and well-being of our species that is considered and proven to be innate to human infants, and it responds to visual stimuli prior to the development of any symbolic skills, language or arithmetic. Somewhat surprisingly, in our experiments, even with strong supervision, DCNNs cannot see through superficial variations in visual representations and distill the abstract notion of natural number, a task that children perform with high accuracy and confidence. DCNNs are apparently easy to be confused by geometric variations and fail to grasp the topological essence in numerosity. The failures of DCNNs in the proposed cognition experiments also expose their overreliance on sample statistics at the expense of image semantics. Our findings are, we believe, significant and thought-provoking in the interests of AI research, because visual-based numerosity is a benchmark of minimum sort for human intelligence.
READ FULL TEXT