Ruggero Milanese
University of Geneva
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ruggero Milanese.
Optical Engineering | 1995
Ruggero Milanese; Sylvia Gil; Thierry Pun
Attention mechanisms extract regions of interest from image data to reduce the amount of information to be analyzed by time-consuming processes such as image transmission, robot navigation, and object recognition. Two such mechanisms are described. The first one is an alerting system that extracts moving objects in a sequence through the use of multiresolution representations. The second one detects regions in still images that are likely to contain objects of interest. Two types of cues are used and integrated to compute the measure of interest. First, bottom-up cues result from the decomposition of the input image into a number of feature and conspicuity maps. The second type of cues is top-down, and is obtained from a priori knowledge about target objects, represented through invariant models. Results are reported for both the alerting and the attention mechanisms using cluttered and noisy scenes.
Journal of Visual Communication and Image Representation | 1999
Ruggero Milanese; Michel Cherbuliez
We describe a method for computing an image signature, suitable for content-based retrieval from image databases. The signature is extracted from the Fourier power spectrum by performing a mapping from cartesian to logarithmic-polar coordinates, projecting this mapping onto two 1D signature vectors, and computing their power spectra coefficients. Similar to wavelet-based approaches, this representation isholisticand, thus, provides a compact description of all image aspects, including shape, texture, and color. Furthermore, it has the advantage of being invariant to 2D rigid transformations, such as any combination of rotation, scaling, and translation. Experiments have been conducted on a database of 2082 images extracted from various news video clips. Results confirm invariance to 2D rigid transformations, as well as high resilience to more general affine and projective transformations. Moreover, the signature appears to capture perceptually relevant image features, in that it allows successful database querying using example images which have been subject to arbitrary camera and subject motion.
Pattern Recognition | 1996
Sylvia Gil; Ruggero Milanese; Thierry Pun
This paper describes a motion-analysis system, applied to the problem of vehicle tracking in real-world highway scenes. In a first stage a motion-detection algorithm performs a figure/ground segmentation, providing binary masks of the moving objects. In the second stage, vehicles are tracked by using Kalman filters for two state vectors, which represent each targets position and velocity. Three types of features have been used: (i) the bounding rectangle, (ii) the centroid of the convex polygon approximating the vehicles contour and (iii) the 2-D pattern of the vehicle. For each feature, the performance of the tracking algorithm has been tested in terms of robustness and computing time.
european conference on computer vision | 1996
Sylvia Gil; Ruggero Milanese; Thierry Pun
In this paper, the problem of combining estimates provided by multiple models is considered, with application to vehicle tracking. Two tracking systems, based on the bounding-box and on the 2-D pattern of the targets, provide individual motion parameters estimates to the combining method, which in turn produces a global estimate. Two methods are proposed to combine the estimates of these tracking systems: one is based on their covariance matrix, while the other one employs a Kalman filter model. Results are provided on three image sequences taken under different viewpoints, weather conditions and varying vehicle/road contrasts. Two evaluations are made. First, the performances of individual and global estimates are compared. Second, the two global estimates are compared and the superiority of the second method is assessed over the first one.
international conference on image processing | 1996
Ruggero Milanese; David McG. Squire; Thierry Pun
This paper describes a two-stage statistical approach supporting content-based search in image databases. The first stage performs correspondence analysis, a factor analysis method transforming image attributes into a reduced-size, uncorrelated factor space. The second stage performs ascendant hierarchical classification, an iterative clustering method which constructs a hierarchical index structure for the images of the database. Experimental results supporting the applicability of both techniques to data sets of heterogeneous images are reported.
Multimedia Storage and Archiving Systems II | 1997
Ruggero Milanese; Frédéric Deguillaume; Alain Jacot-Descombes
We address the problem of automatically extracting visual indexes from videos, in order to provide sophisticated access methods to the contents of a video server. We focus on tow tasks, namely the decomposition of a video clip into uniform segments, and the characterization of each shot by camera motion parameters. For the first task we use a Bayesian classification approach to detecting scene cuts by analyzing motion vectors. For the second task a least- squares fitting procedure determines the pan/tilt/zoom camera parameters. In order to guarantee the highest processing speed, all techniques process and analyze directly MPEG-1 motion vectors, without need for video decompression. Experimental results are reported for a database of news video clips.
Real-time Imaging | 1999
Ruggero Milanese; Frédéric Deguillaume; Alain Jacot-Descombes
In order to provide sophisticated access methods to the contents of video servers, it is necessary to automatically process and represent each video through a number of visual indexes. We focus on two tasks, namely the hierarchical representation of a video as a sequence of uniform segments (shots), and the characterization of each shot by a vector describing the camera motion parameters. For the first task we use a Bayesian classification approach to detecting scene cuts by analysing motion vectors. Adaptability to different compression qualities is achieved by learning different classification masks. For the second task, the optical flow is processed in order to distinguish between stationary and moving shots. A least-squares fitting procedure determines the pan/tilt/zoom camera parameters within shots that present regular motion. Each shot is then indexed by a vector representing the dominant motion components and the type of motion. In order to maximize processing speed, all techniques directly process and analyse MPEG-1 motion vectors, without the need for video decompression. An overall processing rate of 59 frames/s is achieved on software. The successful classification performance, evaluated on various news video clips for a total of 61 023 frames, attains 97.7% for the shot segmentation, 88.4% for the stationary vs. moving shot classification, and 94.7% for the detailed camera motion characterization.
Photonics for Industrial Applications | 1995
Sylvia Gil; Ruggero Milanese; Thierry Pun
This paper describes a motion-analysis system, applied to the problem of vehicle tracking in real-world highway scenes. The system is structured in two stages. In the first one, a motion- detection algorithm performs a figure/ground segmentation, providing binary masks of the moving objects. In the second stage, vehicles are tracked for the rest of the sequence, by using Kalman filters on two state vectors, which represent each targets position and velocity. A vehicles motion is represented by an affine model, taking into account translations and scale changes. Three types of features have been used for the vehicles description state vectors. Two of them are contour-based: the bounding box and the centroid of the convex polygon approximating the vehicles contour. The third one is region-based and consists of the 2-D pattern of the vehicle in the image. For each of these features, the performance of the tracking algorithm has been tested, in terms of the position error, stability of the estimated motion parameters, trace of the motion models covariance matrix, as well as computing time. A comparison of these results appears in favor of the use of the bounding box features.
Archive | 1999
Ruggero Milanese; Michel Cherbuliez; Thierry Pun
We describe a method for computing an image signature, suitable for content-based retrieval from image databases. The signature is extracted by computing the Fourier power spectrum, performing a mapping from cartesian to logarithmic-polar coordinates, projecting this mapping onto two 1D signature vectors, and computing their power spectra coefficients. Similar to wavelet-based approaches, this representation is holistic, and thus provides a compact description of all image aspects, including shape, texture, and color. Furthermore, it has the advantage of being invariant to 2D rigid transformations, such as any combination of rotation, scaling and translation. Experiments have been conducted on a database of 2082 images extracted from various news video clips. Results confirm invariance to 2D rigid transformations, as well as high resilience to more general affine and projective transformations. Moreover, the signature appears to capture perceptually relevant image features, in that it allows successful database querying using example images which have been subject to arbitrary camera and subject motion.
Proceedings of PerAc '94. From Perception to Action | 1994
Ruggero Milanese; Thierry Pun; S. Gil; J.-M. Bost
When computer vision algorithms are applied to autonomous robots, several dynamic aspects of perception must be taken into account. Among the most important ones is the capability to modify the image acquisition parameters, in order to perform ocular saccades, or to visually track moving objects. This ability is provided by the alerting and attention mechanisms, which allow one to rapidly detect and locate potential visual targets. Another important dynamic aspect of perception is the difference in latencies of the neural signals arriving to cells in the visual cortex. Neurophysiological findings suggest for instance that stronger stimuli elicit earlier responses than weaker ones. In data processing terms, these signals represent a data flow of stimuli. The article describes a computer vision system that exploits these two types of dynamic mechanisms. First, two algorithms are proposed for rapidly detecting interesting parts of the input image; one of them acts on an image sequence, and extracts the regions containing moving objects. The other one acts on a static image, and selects the regions containing the most salient information. The second algorithm performs object recognition by exploiting the different latencies in the data flow of image primitives. The results shown suggest that the proposed mechanisms can be usefully integrated in robotic systems, so as to provide efficient perception action behaviors.