Patrick Gros
Centre national de la recherche scientifique
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Patrick Gros.
acm international workshop on multimedia databases | 2003
Sid-Ahmed Berrani; Laurent Amsaleg; Patrick Gros
This paper proposes a novel content-based image retrieval scheme for image copy identification. Its goal is to detect matches between a set of doubtful images and the ones stored in the database of the legal holders of the photographies. If an image was stolen and used to create a pirated copy, it tries to identify from which original image that copy was created. The image recognition scheme is based on local differential descriptors. Therefore, the matching process takes into account a large set of variations that might have been applied to stolen images in order to create pirated copies. The high cost and the complexity of this image recognition scheme requires a very efficient retrieval process since many individual queries must be executed before being able to construct the final result. This paper therefore proposes to use a novel search method that trades the precision of each individual search for reduced query execution time. This imprecision has only little impact on the overall recognition performance since the final result is a consolidation of many partial results. However, it dramatically accelerates queries. This result has then been corroborated by a theoretically study. Experiments show the efficiency and the robustness of the proposed scheme.
Pattern Analysis and Applications | 2001
Laurent Amsaleg; Patrick Gros
Abstract:Most existing content-based image retrieval systems built above a very large database typically compute a single descriptor per image, based for example on colour histograms. Therefore, these systems can only return images that are globally similar to the query image, but cannot return images that contain some of the objects that are in the query. Recent image processing techniques, however, focused on fine-grain image recognition to address the need of detecting similar objects in images. Fine-grain image recognition typically relies on computing many local descriptors per image. These techniques obviously increase the recognition power of retrieval systems, but also raise new problems in the design of fundamental lower-level functions such as indexes and secondary storage management. This paper addresses these problems: it shows that the three most efficient multi-dimensional indexing techniques known today do not efficiently cope with the deep changes in the retrieval process caused by the use of local descriptors. This paper also identifies several research directions to investigate before being able to build efficient image database systems supporting fine-grain recognition.
Proceedings of the 2nd international workshop on Computer vision meets databases | 2005
Xavier Naturel; Patrick Gros
This article presents a method for detecting duplicate sequences in a continuous television stream. This is of interest to many applications, including commercials monitoring and video indexing. Repetitions can also be used as a way of structuring television streams by detecting inter-program breaks as a set of duplicate sequences. In this context, we present a shot-based method for detecting repeated sequences efficiently. Experiments show that this fast shot matching strategy allows us to retrieve duplicated shots between a 1 hour long query and a 24 hours database in only 10 ms.
conference on information and knowledge management | 2003
Sid-Ahmed Berrani; Laurent Amsaleg; Patrick Gros
It is known that all multi-dimensional index structures fail to accelerate content-based similarity searches when the feature vectors describing images are high-dimensional. It is possible to circumvent this problem by relying on approximate search-schemes trading-off result quality for reduced query execution time. Most approximate schemes, however, provide none or only complex control on the precision of the searches, especially when retrieving the k nearest neighbors (NNs) of query points.In contrast, this paper describes an approximate search scheme for high-dimensional databases where the precision of the search can be probabilistically controlled when retrieving the k NNs of query points. It allows a fine and intuitive control over this precision by setting at run time the maximum probability for a vector that would be in the exact answer set to be missed in the approximate set of answers eventually returned. This paper also presents a performance study of the implementation using real datasets showing its reliability and efficiency. It shows, for example, that our method is 6.72 times faster than the sequential scan when it handles more than 5 106 24-dimensional vectors, even when the probability of missing one of the true nearest neighbors is below 0.01.
international conference on image processing | 2003
Ewa Kijak; Lionel Oisel; Patrick Gros
This paper focuses on the use of hidden Markov models (HMMs) for structure analysis of sport videos. The video structure parsing relies on the analysis of the temporal interleaving of video shots, with respect to a priori information about video content and editing rules. The basic temporal unit is the video shot and visual features are used to characterize its type of view. Our approach is validated in the particular domain of tennis videos. As a result, typical tennis scenes are identified. In addition, each shot is assigned to a level in the hierarchy described in terms of point, game and set.
Storage and Retrieval for Image and Video Databases | 2003
Ewa Kijak; Lionel Oisel; Patrick Gros
This work aims at recovering the temporal structure of a broadcast tennis video from an analysis of the raw footage. Our method relies on a statistical model of the interleaving of shots, in order to group shots into predefined classes representing structural elements of a tennis video. This stochastic modeling is performed in the global framework of Hidden Markov Models (HMMs). The fundamental units are shots and transitions. In a first step, colors and motion attributes of segmented shots are used to map shots into 2 classes: game (view of the full tennis court) and not game (medium, close up views, and commercials). In a second step, a trained HMM is used to analyze the temporal interleaving of shots. This analysis results in the identification of more complex structures, such as first missed services, short rallies that could be aces or services, long rallies, breaks that are significant of the end of a game and replays that highlight interesting points. These higher-level unit structures can be used either to create summaries, or to allow non-linear browsing of the video.
Multimedia Tools and Applications | 2004
Laurent Amsaleg; Patrick Gros; Sid-Ahmed Berrani
Traditional content-based image retrieval systems typically compute a single descriptor per image based for example on color histograms. The result of a query is in general the images from the database whose descriptors are the closest to the descriptor of the query image. Systems built this way are able to return images that are globally similar to the query image, but can not return images that contain some of the objects that are in the query. As opposed to this traditional coarse-grain recognition scheme, recent advances in image processing make fine-grain image recognition possible, notably by computing local descriptors that can detect similar objects in different images. Obviously powerful, fine-grain recognition in images also changes the retrieval process: instead of submitting a single query to retrieve similar images, multiple queries must be submitted and their partial results must be post-processed before delivering the answer. This paper first presents a family of local descriptors supporting fine-grain image recognition. These descriptors enforce robust recognition, despite image rotations and translations, illumination variations, and partial occlusions. Many multi-dimensional indexes have been proposed to speed-up the retrieval process. These indexes, however, have been mostly designed for and evaluated against databases where each image is described by a single descriptor. While this paper does not present any new indexing scheme, it shows that the three most efficient indexing techniques known today are still too slow to be used in practice with local descriptors because of the changes in the retrieval process.
international conference on multimedia and expo | 2008
Siwar Baghdadi; Guillaume Gravier; Claire-Hélène Demarty; Patrick Gros
Several stochastic models provide an effective framework to identify the temporal structure of audiovisual data. Most of them need as input a first video structure, i.e. connections between features and video events. Provided that this structure is given as input, the parameters are then estimated from training data. Bayesian networks offer an additional feature, namely structure learning, which allows the automatic construction of the model structure from training data. Structure learning obviously leads to an increased generality of the model building process. This paper investigates the trade-off between the increase of generality and the quality of the results in video analysis. We model video data using dynamic Bayesian networks (DBNs) where the static part of the network accounts for the correlations between low-level features extracted from the raw data and between these features and the events considered. It is precisely this part of the network whose structure is automatically constructed from training data. Experimental results on a commercial detection case study application show that, even though the model structure is determined in a non supervised manner, the resulting model is effective for the detection of commercial segments in video data.
Proceedings of the 2nd international workshop on Computer vision meets databases | 2005
Patrick Gros
Multimedia content analysis offers many exciting research opportunities and is a necessary step towards automatic understanding of the content of digital documents. Digital documents are typically composite. Processing in parallel and integrating low-level information computed over each of the media that compose a multimedia document can yield knowledge that stand-alone and isolated analysis could not discover.Joint processing of multiple media is very challenging, even at the lowest analysis levels. Coping with imperfect synchronization of pieces of information, mixing extremely different kinds of information (numerical or symbolic descriptions, values describing intervals or instants, probabilities and distances, HMM and Gaussians, ...), and reconciling contradictory outputs are some of the obstacles which make processing of multimedia documents much more difficult than it seems at first glance.This talk will first show what may be gained from jointly analyzing multimedia documents. It will then briefly overview the typical information that can be extracted from major media (video, sound, images and text) before focusing on the problems that arise when trying to use all this information together. We hope to convince researchers to start trying to solve these problems, since they directly hamper the acquisition of higher-level knowledge from multimedia documents.
Optomechatronic Technologies 2005 | 2005
Anthony Remazeilles; François Chaumette; Patrick Gros
In this paper a new method is proposed to control a vision-based robot in large navigation spaces. In this case, visual features observed by an on-board camera can change drastically or even disappear completely between the initial image, as seen at the beginning of a task, and the final image, as seen at the desired position of the robot. These features are therefore not suffcient for controlling the entire motion of the robotic system from beginning to end. This problem requires a more complete definition and representation of the navigation space. This can be achieved by a topological representation, where the environment is directly defined in the sensor space by a data-base of images. In our approach, this data-base is acquired during an offline learning step. An image retrieval method then indexes and matches a request image, given by the camera, to the closest view within the data-base. In this way, an image path is extracted from the database to link the initial and desired images providing enough information to control the robot. The central point of this paper is focused on the closed-loop control law that drives the robot to its desired position using this image path. The method proposed does not require either a global reconstruction or a temporal planning step. Furthermore, the robot is not obliged to converge directly upon each image waypoint but chooses automatically a better trajectory. The visual servoing control law designed uses specific features which ensure that the robot navigates within the visibility path. Experimental simulations are given to show the effectiveness of this method for controlling the motion of a camera in three-dimensional environments (free-flying camera, or camera moving on a plane).