A comparison between humans and AI at recognizing objects in unusual poses

Netta Ollikka, Amro Abbas, Andrea Perin, Markku Kilpeläinen, Stéphane Deny

Research output: Contribution to journalArticleScientificpeer-review

2 Downloads (Pure)

Abstract

Deep learning is closing the gap with human vision on several object recognition benchmarks. Here we investigate this gap in the context of challenging images where objects are seen in unusual poses. We find that humans excel at recognizing objects in such poses. In contrast, state-of-the-art deep networks for vision (EfficientNet, SWAG, ViT, SWIN, BEiT, ConvNext) and state-of-the-art large vision-language models (Claude 3.5, Gemini 1.5, GPT-4, SigLIP) are systematically brittle on unusual poses, with the exception of Gemini showing excellent robustness to that condition. As we limit image exposure time, human performance degrades to the level of deep networks, suggesting that additional mental processes (requiring additional time) are necessary to identify objects in unusual poses. An analysis of error patterns of humans vs. networks reveals that even time-limited humans are dissimilar to feed-forward deep networks. In conclusion, our comparison reveals that humans are overall more robust than deep networks and that they rely on different mechanisms for recognizing objects in unusual poses. Understanding the nature of the mental processes taking place during extra viewing time may be key to reproduce the robustness of human vision in silico. All code and data is available at https://github.com/BRAIN-Aalto/unusual_poses.

Original languageEnglish
Pages (from-to)1-32
Number of pages32
JournalTransactions on Machine Learning Research
Volume2025
Issue numberJanuary
Publication statusPublished - Jan 2025
MoE publication typeA1 Journal article-refereed

Fingerprint

Dive into the research topics of 'A comparison between humans and AI at recognizing objects in unusual poses'. Together they form a unique fingerprint.

Cite this