Alessandro Bissacco | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alessandro Bissacco is active.

Explore More

Publication

Featured researches published by Alessandro Bissacco.

computer vision and pattern recognition | 2009

Tour the world: Building a web-scale landmark recognition engine

Yan-Tao Zheng; Ming Zhao; Yang Song; Hartwig Adam; Ulrich Buddemeier; Alessandro Bissacco; Fernando Brucher; Tat-Seng Chua; Hartmut Neven

Modeling and recognizing landmarks at world-scale is a useful yet challenging task. There exists no readily available list of worldwide landmarks. Obtaining reliable visual models for each landmark can also pose problems, and efficiency is another challenge for such a large scale system. This paper leverages the vast amount of multimedia data on the Web, the availability of an Internet image search engine, and advances in object recognition and clustering techniques, to address these issues. First, a comprehensive list of landmarks is mined from two sources: (1) ~20 million GPS-tagged photos and (2) online tour guide Web pages. Candidate images for each landmark are then obtained from photo sharing Websites or by querying an image search engine. Second, landmark visual models are built by pruning candidate images using efficient image matching and unsupervised clustering techniques. Finally, the landmarks and their visual models are validated by checking authorship of their member images. The resulting landmark recognition engine incorporates 5312 landmarks from 1259 cities in 144 countries. The experiments demonstrate that the engine can deliver satisfactory recognition performance with high efficiency.

international conference on computer vision | 2013

PhotoOCR: Reading Text in Uncontrolled Conditions

Alessandro Bissacco; Mark Joseph Cummins; Yuval Netzer; Hartmut Neven

We describe Photo OCR, a system for text extraction from images. Our particular focus is reliable text extraction from smartphone imagery, with the goal of text recognition as a user input modality similar to speech recognition. Commercially available OCR performs poorly on this task. Recent progress in machine learning has substantially improved isolated character classification, we build on this progress by demonstrating a complete OCR system using these techniques. We also incorporate modern data center-scale distributed language modelling. Our approach is capable of recognizing text in a variety of challenging imaging conditions where traditional OCR systems fail, notably in the presence of substantial blur, low resolution, low contrast, high image noise and other distortions. It also operates with low latency, mean processing time is 600 ms per image. We evaluate our system on public benchmark datasets for text extraction and outperform all previously reported results, more than halving the error rate on multiple benchmarks. The system is currently in use in many applications at Google, and is available as a user input modality in Google Translate for Android.

computer vision and pattern recognition | 2001

Recognition of human gaits

Alessandro Bissacco; Alessandro Chiuso; Yi Ma; Stefano Soatto

We pose the problem of recognizing different types of human gait in the space of dynamical systems where each gait is represented Established techniques are employed to track a kinematic model of a human body in motion, and the trajectories of the parameters are used to learn a representation of a dynamical system, which defines a gait. Various types of distance between models are then computed These computations are non trivial due to the fact that, even for the case of linear systems, the space of canonical realizations is not linear.

computer vision and pattern recognition | 2007

Fast Human Pose Estimation using Appearance and Motion via Multi-Dimensional Boosting Regression

Alessandro Bissacco; Ming-Hsuan Yang; Stefano Soatto

We address the problem of estimating human pose in video sequences, where rough location has been determined. We exploit both appearance and motion information by defining suitable features of an image and its temporal neighbors, and learning a regression map to the parameters of a model of the human body using boosting techniques. Our algorithm can be viewed as a fast initialization step for human body trackers, or as a tracker itself. We extend gradient boosting techniques to learn a multi-dimensional map from (rotated and scaled) Haar features to the entire set of joint angles representing the full body pose. We test our approach by learning a map from image patches to body joint angles from synchronized video and motion capture walking data. We show how our technique enables learning an efficient real-time pose estimator, validated on publicly available datasets.

international conference on computer vision | 2009

Large-scale privacy protection in Google Street View

Andrea Frome; German Cheung; Ahmad Abdulkader; Marco Zennaro; Bo Wu; Alessandro Bissacco; Hartwig Adam; Hartmut Neven; Luc Vincent

The last two years have witnessed the introduction and rapid expansion of products based upon large, systematically-gathered, street-level image collections, such as Google Street View, EveryScape, and Mapjack. In the process of gathering images of public spaces, these projects also capture license plates, faces, and other information considered sensitive from a privacy standpoint. In this work, we present a system that addresses the challenge of automatically detecting and blurring faces and license plates for the purpose of privacy protection in Google Street View. Though some in the field would claim face detection is “solved”, we show that state-of-the-art face detectors alone are not sufficient to achieve the recall desired for large-scale privacy protection. In this paper we present a system that combines a standard sliding-window detector tuned for a high recall, low-precision operating point with a fast post-processing stage that is able to remove additional false positives by incorporating domain-specific information not available to the sliding-window detector. Using a completely automatic system, we are able to sufficiently blur more than 89% of faces and 94 – 96% of license plates in evaluation sets sampled from Google Street View imagery.

acm multimedia | 2009

Tour the world: a technical demonstration of a web-scale landmark recognition engine

Yan-Tao Zheng; Ming Zhao; Yang Song; Hartwig Adam; Ulrich Buddemeier; Alessandro Bissacco; Fernando Brucher; Tat-Seng Chua; Hartmut Neven; Jay Yagnik

We present a technical demonstration of a world-scale touristic landmark recognition engine. To build such an engine, we leverage ~21.4 million images, from photo sharing websites and Google Image Search, and around two thousand web articles to mine the landmark names and learn the visual models. The landmark recognition engine incorporates 5312 landmarks from 1259 cities in 144 countries. This demonstration gives three exhibits: (1) a live landmark recognition engine that can visually recognize landmarks in a given image; (2) an interactive navigation tool showing landmarks on Google Earth; and (3) sample visual clusters (landmark model images) and a list of 1000 randomly selected landmarks from our recognition engine with their iconic images.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2007

Classification and Recognition of Dynamical Models: The Role of Phase, Independent Components, Kernels and Optimal Transport

Alessandro Bissacco; Alessandro Chiuso; Stefano Soatto

We address the problem of performing decision tasks and, in particular, classification and recognition in the space of dynamical models in order to compare time series of data. Motivated by the application of recognition of human motion in image sequences, we consider a class of models that include linear dynamics, both stable and marginally stable (periodic), both minimum and nonminimum phases, driven by non-Gaussian processes. This requires extending existing learning and system identification algorithms to handle periodic modes and nonminimum-phase behavior while taking into account higher order statistics of the data. Once a model is identified, we define a kernel-based cord distance between models, which includes their dynamics, their initial conditions, and input distribution. This is made possible by a novel kernel defined between two arbitrary (non-Gaussian) distributions, which is computed by efficiently solving an optimal transport problem. We validate our choice of models, inference algorithm, and distance on the tasks of human motion synthesis (sample paths of the learned models) and recognition (nearest-neighbor classification in the computed distance). However, our work can be applied more broadly where one needs to compare historical data while taking into account periodic trends, nonminimum-phase behavior, and non-Gaussian input distributions.

computer vision and pattern recognition | 2005

Modeling and learning contact dynamics in human motion

Alessandro Bissacco

We propose a simple model of human motion as a switching linear dynamical system where the switches correspond to contact forces with the ground. This significantly improves the modeling performance when compared to simpler linear systems, with only marginal increase in complexity. We introduce a novel closed-form (non-iterative) algorithm to estimate the switches and learn the model parameters in between switches. We validate our model qualitatively by running simulations, and quantitatively by computing prediction errors that show significant improvements over previous approaches using linear models.

International Journal of Computer Vision | 2009

Hybrid Dynamical Models of Human Motion for the Recognition of Human Gaits

Alessandro Bissacco; Stefano Soatto

We propose a hybrid dynamical model of human motion and develop a classification algorithm for the purpose of analysis and recognition. We assume that some temporal statistics are extracted from the images, and use them to infer a dynamical model that explicitly represents ground contact events. Such events correspond to “switches” between symmetric sets of hidden parameters in an auto-regressive model. We propose novel algorithms to estimate switches and model parameters, and develop a distance between such models that explicitly factors out exogenous inputs that are not unique to an individual or his/her gait. We show that such a distance is more discriminative than the distance between simple linear systems for the task of gait recognition.

european conference on computer vision | 2004

Modeling and Synthesis of Facial Motion Driven by Speech

Payam Saisan; Alessandro Bissacco; Alessandro Chiuso; Stefano Soatto

We introduce a novel approach to modeling the dynamics of human facial motion induced by the action of speech for the purpose of synthesis. We represent the trajectories of a number of salient features on the human face as the output of a dynamical system made up of two subsystems, one driven by the deterministic speech input, and a second driven by an unknown stochastic input. Inference of the model (learning) is performed automatically and involves an extension of independent component analysis to time-depentend data. Using a shape-texture decompositional representation for the face, we generate facial image sequences reconstructed from synthesized feature point positions.

Explore More