Katherine Hermann, Department of Psychology, Stanford University, Ph.D. student with Professor James McClelland.Title: Exploring the origins and prevalence of texture bias in CNNs
Abstract: ImageNet-trained convolutional neural networks (CNNs) have achieved wide popularity as both computer vision models and models of primate visual cortex. However, recent work has indicated that, unlike humans, these models tend to classify images by texture rather than shape (Geirhos et al. 2019), possibly indicating a divergence from primate visual processing. How pervasive is this bias, and where does it come from? We find that, when trained on datasets of images with conflicting shape and texture, the inductive bias of CNNs often favors shape; in general, models learn shape at least as easily as texture. Moreover, although ImageNet training leads to classifier weights that classify ambiguous images according to texture, shape is decodable from the hidden representations of ImageNet networks. Turning to the question of the origin of texture bias, we identify consistent effects of task, architecture, preprocessing, and hyperparameters. Different self-supervised training objectives and different architectures have significant and largely independent effects on the shape bias of the learned representations. Among modern ImageNet architectures, we find that shape bias is positively correlated with ImageNet accuracy. Random-crop data augmentation encourages reliance on texture: Models trained without crops have lower accuracy but higher shape bias. Finally, hyperparameter combinations that yield similar accuracy are associated with vastly different levels of shape bias. Our results suggest general strategies to reduce texture bias in neural networks, and raise questions for human-machine comparison studies. This is joint work with Simon Kornblith (Google Brain Toronto); full paper at: https://arxiv.org/pdf/1911.09071.pdf.