Philippe Joly
Paul Sabatier University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Philippe Joly.
Computers & Graphics | 1994
Philippe Aigrain; Philippe Joly
Abstract We describe an adaptive method for the automatic analysis of film editing. Unlike previously proposed methods, which were restricted to the detection of cut transitions between sequence shots, our method achieves detection of shot change even when done through smooth transition effects (dissolves, wipes), and identifies classes of transition special effects. This method can be applied in real-time on a flow of digital or analog motion picture data of any resolution. We describe experimental implementations of the method. Applications of such a tool are the temporal analysis of motion picture data to be stored in large databases and the automatic choice of entry points in compressed motion picture. The fact that the analysis can be done in real-time allows for its use at digitizing time, or on the output of a video production site. Temporal analysis and description of motion picture is presently a major time and money bottleneck in the establishment of motion picture databases, and we argue that our method can significantly improve the situation in this field.
Signal Processing-image Communication | 1996
Philippe Joly; Hae-Kwang Kim
Abstract The shot has been regarded as a fundamental unit for the application of digital manipulation to a video. Various techniques have been developed to detect automatically shot changes. But a sequence shot can be so long and complex that it has to be further decomposed into smaller units for more flexible and detailed manipulation. A sequence shot can be segmented into shot segments, each of which keeps a homogeneous camera motion. Camera work has important significance that reflect the intention of video producers. Camera work analysis and segmentation of a sequence shot into shot segments can help in choosing a representative image for a shot. Following concepts introduced by Tonomura et al. (1993), we propose an efficient method for the automatic detection of camera work changes using spatiotemporal images called X-ray images. We introduce various steps in the spatiotemporal image analysis process which significantly improves its robustness and decreases its computational complexity.
computer analysis of images and patterns | 2005
Gaël Jaffré; Philippe Joly
This paper presents a novel approach for automatic person labelling in video sequences using costumes. The person recognition is carried out by extracting the costumes of all the persons who appear in the video. Then, their reappearance in subsequent frames is performed by searching the reappearance of their costume. Our contribution in this paper is a new approach for costume detection, without face detection, that allows the localization of costumes even if persons are not facing the camera. Actually face detection is also used because it presents a very accurate heuristic for costume detection, but in addition in each shot mean shift costume localization is carried out with the most relevant costume when face detection fails. Results are presented with TV broadcasts.
International Journal of Intelligent Systems | 2006
Siba Haidar; Philippe Joly; Bilal Chebaro
This article focuses on video document comparison using audiovisual production invariants (API). API are characterized by invariant segments obtained on a set of low‐level features. We propose an algorithm to detect production invariants throughout a collection of audiovisual documents. The algorithm runs on low‐level features, considered as time series, and extracts invariant segments using a one‐dimensional morphological envelop comparison. Then, based on the extracted results, we define a style similarity measure between two video documents. A derivative pseudo distance is also proposed.
computer analysis of images and patterns | 2005
Gaël Jaffré; Philippe Joly
The goal of the works described in this paper is to improve results produced by an object detector operating independently on each frame of a video document in order to generate a more robust index. Results of the object detector are “smoothed” along the time dimension using a temporal window. For a given frame, we count the number of occurrences of each object in the previous and next frames, and then only the objects whose number of appearance is above a threshold are validated. In this paper, we present a probabilistic approach for theoretically computing these thresholds. This approach is well suited to limit the number of false alarms provided by the static detector, and its principle of detection generalization also allows some detections that can be missed by the detector.
electronic imaging | 2003
Thomas Fourès; Philippe Joly
This paper deals with the proposition of a model for human motion analysis in a video. Its main caracteristic is to adapt itself automatically to the current resolution, the actual quality of the picture, or the level of precision required by a given application, due to its possible decomposition into several hierarchical levels. The model is region-based to address some analysis processing needs. The top level of the model is only defined with 5 ribbons, which can be cut into sub-ribbons regarding to a given (or an expected) level of details. Matching process between model and current picture consists in the comparison of extracted subject shape with a graphical rendering of the model built on the base of some computed parameters. The comparison is processed by using a chamfer matching algorithm. In our developments, we intend to realize a platform of interaction between a dancer and tools synthetizing abstract motion pictures and music in the conditions of a real-time dialogue between a human and a computer. In consequence, we use this model in a perspective of motion description instead of motion recognition: no a priori gestures are supposed to be recognized as far as no a priori application is specially targeted. The resulting description will be made following a Description Scheme compliant with the movement notation called Labanotation.
Proceedings of SPIE, the International Society for Optical Engineering | 2001
Rosa I. Ruiloba; Philippe Joly
This article presents a Description Scheme (DS) to describe the audio-visual documents from the video editing work point of view. This DS is based on edition techniques used in the video edition domain. The main objective of this DS is to provide a complete, modular and extensible description of the structure of the video documents based on editing process. This VideoEditing DS is generic in the sense that it may be used in a large number of applications such as video document indexing and analysis, description of Edit Decision List and elaboration of editing patterns. It is based on accurate and complete definitions of shots and transition effects required for video document analysis applications. The VideoEditing DS allows three levels of description : analytic, synthetic and semantic. In the DS, the higher (resp. the lower) is the element of description, the more analytic (resp. synthetic) is the information. %Phil This DS allows describing the editing work made by editing boards, using more detailed descriptors of Shots and Transition DSs. These elements are provided to define editing patterns that allow several possible reconstructions of movies depending on, for example, the target audience. A part of the video description made with this DS may be automatically produced by the video to shots segmentation algorithms (analytic DSs ) or by editing software, at the same time the edition work is made. This DS gives an answer to the needs related to the exchange of editing work descriptions between editing softwares. At the same time, the same DS provide an analytic description of editing work which is complementary to existing standards for Edit Decision Lists like SMPTE or AAF.
adaptive multimedia retrieval | 2003
Thomas Fourès; Philippe Joly
This paper deals with the use of a model dedicated to human motion analysis in a video. This model has the particularity to be able to adapt itself to the current resolution or the required level of precision through possible decompositions into several hierarchical levels. The first level of the model has been described in previous works: it is region-based and the matching process between the model and the current picture is performed by the comparison of the extracted subject shape and a graphical representation of the model consisting in a set of ribbons. To proceed to this comparison, a chamfer matching algorithm is applied on those regions. Until now, the correspondence problem was treated in an independent way for each element of the model in a search area, one for each limb. No physical constraints were applied while positioning the different ribbons, as no temporal information has been taken into account. We present in this paper how we intend to introduce all those parameters in the definition of the different search areas according to positions obtained in the previous frames, distance with neighbor ribbons, and quality of previous matching.
content based multimedia indexing | 2015
Hassan Wehbe; Philippe Joly; Bassem Haidar
In this paper we propose a method to locate inloop repetitions in a video. An in-loop repetition consists in repeating the same action(s) many times consecutively. The proposed method adapts and uses the auto-correlation method YIN, originally proposed to find the fundamental frequency in audio signals. Based on this technique, we propose a method that generates a matrix where repetitions correspond to triangle-shaped zones of low values in this matrix (we called YIN-Matrix). Locating these triangles leads to locate video segments that enclose a repetition as well as to extract their parameters. In order to evaluate our method, we used a standard evaluation method that shows the error rates compared to ground-truth information. According to this evaluation method, our method shows promising results that nominate it to form a solid base for future works.
Internet multimedia management systems | 2000
Rosa I. Ruiloba; Philippe Joly
This article presents the results of a study on spatio-temporal images to evaluate their performances for video-to-shots segmentation purposes. Some shots segmentation methods involve spatio-temporal images that are computed by a projection of successive video frames over the X or Y-axis. On these projections, transition effects and motion are supposed to have different characteristics. Whereas cuts can be easily recognized, the main problem remains in determining a measure that discriminates motions from gradual transition effects. In this article, the quality of transition detections based on line similarity of spatio-temporal images is studied. The probability functions of several measures are estimated to determine which one produce the lowest detection error rate. These distributions are computed on four classes of events: intra shot sequences without motion, sequences with cuts, sequences with fades and sequences with motion. A line matching is performed, based on correlation estimations between projection lines. To separate these classes, we estimate first the density probability functions of the correlation between consecutive lines for each class. For different line segment sizes, the experimental results prove that the class separation can not be clearly produced. To take into account the evolution of the correlation and because we try to detect some particular types of boundaries, we then consider ratios between statistic moments. There are computed over a subset of correlation values. The results show that used measures, based on the matching of projection lines, can not discriminate between motion and fade. Only a subset of motions will be differentiated from gradual transitions. Therefore previous measures should be combined with methods that produce complementary results. Such a method could be a similar measure based on correlation between spatial-shifted segments.