Andrea Tacchetti
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Andrea Tacchetti.
Archive | 2017
Jim Mutch; Fabio Anselmi; Andrea Tacchetti; Lorenzo Rosasco; Joel Z. Leibo; Tomaso Poggio
Tuning properties of simple cells in cortical V1 can be described in terms of a “universal shape” characterized quantitatively by parameter values which hold across different species (Jones and Palmer 1987; Ringach 2002; Niell and Stryker 2008). This puzzling set of findings begs for a general explanation grounded on an evolutionarily important computational function of the visual cortex. We show here that these properties are quantitatively predicted by the hypothesis that the goal of the ventral stream is to compute for each image a “signature” vector which is invariant to geometric transformations (Anselmi et al. 2013b). The mechanism for continuously learning and maintaining invariance may be the memory storage of a sequence of neural images of a few (arbitrary) objects via Hebbian synapses, while undergoing transformations such as translation, scale changes and rotation. For V1 simple cells this hypothesis implies that the tuning of neurons converges to the eigenvectors of the covariance of their input. Starting with a set of dendritic fields spanning a range of sizes, we show with simulations suggested by a direct analysis, that the solution of the associated “cortical equation” effectively provides a set of Gabor-like shapes with parameter values that quantitatively agree with the physiology data. The same theory provides predictions about the tuning of cells in V4 and in the face patch AL (Leibo et al. 2013a) which are in qualitative agreement with physiology data.
PLOS Computational Biology | 2017
Andrea Tacchetti; Leyla Isik; Tomaso Poggio
Recognizing the actions of others from complex visual scenes is an essential task for humans. Here we investigate the computational mechanisms that support action recognition in the human visual system. We use a novel dataset of well-controlled naturalistic videos of five actions performed by five actors at five viewpoint and extend a class of biologically inspired hierarchical computational models of object recognition to recognize actions from videos. We explore a number of variations within the class of convolutional neural networks and assess classification accuracy on a viewpoint invariant action recognition task as well as a correlation measure with magnetoencephalography (MEG) human brain recordings acquired using the same stimuli. We show that feed-forward spatio-temporal convolutional neural networks solve the task of invariant action recognition and account for the majority of the explainable variance in the neural data. Furthermore, we show that model features that improve performance on viewpoint invariant action recognition lead to a model representation that better matches the representation encoded by neural data. These results advance the idea that robustness to complex transformations, such as 3D viewpoint invariance, is a specific goal of visual processing in the human brain.Recognizing the actions of others from visual stimuli is a crucial aspect of human perception that allows individuals to respond to social cues. Humans are able to discriminate between similar actions despite transformations, like changes in viewpoint or actor, that substantially alter the visual appearance of a scene. This ability to generalize across complex transformations is a hallmark of human visual intelligence. Advances in understanding action recognition at the neural level have not always translated into precise accounts of the computational principles underlying what representations of action sequences are constructed by human visual cortex. Here we test the hypothesis that invariant action discrimination might fill this gap. Recently, the study of artificial systems for static object perception has produced models, Convolutional Neural Networks (CNNs), that achieve human level performance in complex discriminative tasks. Within this class, architectures that better support invariant object recognition also produce image representations that better match those implied by human and primate neural data. However, whether these models produce representations of action sequences that support recognition across complex transformations and closely follow neural representations of actions remains unknown. Here we show that spatiotemporal CNNs accurately categorize video stimuli into action classes, and that deliberate model modifications that improve performance on an invariant action recognition task lead to data representations that better match human neural recordings. Our results support our hypothesis that performance on invariant discrimination dictates the neural representations of actions computed in the brain. These results broaden the scope of the invariant recognition framework for understanding visual intelligence from perception of inanimate objects and faces in static images to the study of human perception of action sequences.
Journal of Vision | 2015
Andrea Tacchetti; Leyla Isik; Tomaso Poggio
The human brain can rapidly parse a constant stream of visual input. The majority of visual neuroscience studies, however, focus on responses to static, still-frame images. Here we use magnetoencephalography (MEG) decoding and a computational model to study invariant action recognition in videos. We created a well-controlled, naturalistic dataset to study action recognition across different views and actors. We find that, like objects, actions can also be read out from MEG data in under 200 ms (after the subject has viewed only 5 frames of video). Action can also be decoded across actor and viewpoint, showing that this early representation is invariant. Finally, we developed an extension of the HMAX model, inspired by Hubel and Wiesels findings of simple and complex cells in primary visual cortex as well as a recent computational theory of the feedforward invariant systems, which is traditionally used to perform size- and position-invariant object recognition in images, to recognize actions. We show that instantiations of this model class can also perform recognition in natural videos that are robust to non-affine transformations. Specifically, view-invariant action recognition and action invariant actor identification in the model can be achieved by pooling across views or actions, in the same manner and model layer as affine transformations (size and position) in traditional HMAX. Together these results provide a temporal map of the first few hundred milliseconds of human action recognition as well as a mechanistic explanation of the computations underlying invariant visual recognition. Meeting abstract presented at VSS 2015.
Annual Review of Vision Science | 2018
Andrea Tacchetti; Leyla Isik; Tomaso Poggio
Recognizing the people, objects, and actions in the world around us is a crucial aspect of human perception that allows us to plan and act in our environment. Remarkably, our proficiency in recognizing semantic categories from visual input is unhindered by transformations that substantially alter their appearance (e.g., changes in lighting or position). The ability to generalize across these complex transformations is a hallmark of human visual intelligence, which has been the focus of wide-ranging investigation in systems and computational neuroscience. However, while the neural machinery of human visual perception has been thoroughly described, the computational principles dictating its functioning remain unknown. Here, we review recent results in brain imaging, neurophysiology, and computational neuroscience in support of the hypothesis that the ability to support the invariant recognition of semantic entities in the visual world shapes which neural representations of sensory input are computed by human visual cortex.
arXiv: Computer Vision and Pattern Recognition | 2013
Fabio Anselmi; Joel Z. Leibo; Lorenzo Rosasco; Jim Mutch; Andrea Tacchetti; Tomaso Poggio
Journal of Machine Learning Research | 2013
Andrea Tacchetti; Pavan Kumar Mallapragada; Matteo Santoro; Lorenzo Rosasco
Theoretical Computer Science | 2016
Fabio Anselmi; Joel Z. Leibo; Lorenzo Rosasco; James Vincent Mutch; Andrea Tacchetti; Tomaso Poggio
Archive | 2012
Tomaso Poggio; Jim Mutch; Joel Z. Leibo; Lorenzo Rosasco; Andrea Tacchetti
Archive | 2014
Fabio Anselmi; Joel Z. Leibo; Lorenzo Rosasco; Jim Mutch; Andrea Tacchetti; Tomaso Poggio
arXiv: Computer Vision and Pattern Recognition | 2017
Nicholas Watters; Andrea Tacchetti; Theophane Weber; Razvan Pascanu; Peter Battaglia; Daniel Zoran