Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Claire-Hélène Demarty is active.

Publication


Featured researches published by Claire-Hélène Demarty.


international conference on acoustics, speech, and signal processing | 2012

Multimodal information fusion and temporal integration for violence detection in movies

Cédric Penet; Claire-Hélène Demarty; Guillaume Gravier; Patrick Gros

This paper presents a violent shots detection system that studies several methods for introducing temporal and multimodal information in the framework. It also investigates different kinds of Bayesian network structure learning algorithms for modelling these problems. The system is trained and tested using the MediaEval 2011 Affect Task corpus, which comprises of 15 Hollywood movies. It is experimentally shown that both multimodality and temporality add interesting information into the system. Moreover, the analysis of the links between the variables of the resulting graphs yields important observations about the quality of the structure learning algorithms. Overall, our best system achieved 50% false alarms and 3% missed detection, which is among the best submissions in the MediaEval campaign.


international conference on computer vision | 2012

A benchmarking campaign for the multimodal detection of violent scenes in movies

Claire-Hélène Demarty; Cédric Penet; Guillaume Gravier; Mohammad Soleymani

We present an international benchmark on the detection of violent scenes in movies, implemented as a part of the multimedia benchmarking initiative MediaEval 2011. The task consists in detecting portions of movies where physical violence is present from the automatic analysis of the video, sound and subtitle tracks. A dataset of 15 Hollywood movies was carefully annotated and divided into a development set and a test set containing 3 movies. Annotation strategies and resolution of borderline cases are discussed at length in the paper. Results from 29 runs submitted by the 6 participating sites are analyzed. The first years results are promising, but considering the use case, there is still a large room for improvement. The detailed analysis of the 2011 benchmark brings valuable insight for the implementation of future evaluation on violent scenes detection in movies.


Multimedia Tools and Applications | 2015

VSD, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation

Claire-Hélène Demarty; Cédric Penet; Mohammad Soleymani; Guillaume Gravier

AbstractContent-based analysis to find where violence appears in multimedia content has several applications, from parental control and children protection to surveillance. This paper presents the design and annotation of the Violent Scene Detection dataset, a corpus targeting the detection of physical violence in Hollywood movies. We discuss definitions of physical violence and provide a simple and objective definition which was used to annotate a set of 18 movies, thus resulting in the largest freely-available dataset for such a task. We discuss borderline cases and compare with annotations based on a subjective definition which requires multiple annotators. We provide a detailed analysis of the corpus, in particular regarding the relationship between violence and a set of key audio and visual concepts which were also annotated. The VSD dataset results from two years of benchmarking in the framework of the MediaEval initiative. We provide results from the 2011 and 2012 benchmarks as a validation of the dataset and as a state-of-the-art baseline. The VSD dataset is freely available at the address: http://www.technicolor.com/en/innovation/research-innovation/scientific-data-sharing/violent-scenes-dataset..


content-based multimedia indexing | 2014

Benchmarking Violent Scenes Detection in movies

Claire-Hélène Demarty; Bogdan Ionescu; Yu-Gang Jiang; Vu Lam Quang; Markus Schedl; Cédric Penet

This paper addresses the issue of detecting violent scenes in Hollywood movies. In this context, we describe the MediaEval 2013 Violent Scene Detection task which proposes a consistent evaluation framework to the research community. 9 participating teams proposed systems for evaluation in 2013, which denotes an increasing interest for the task. In this paper, the 2013 dataset, the annotations process and the tasks rules are detailed. The submitted systems are thoroughfully analysed and compared through several metrics to draw conclusions on the most promising techniques among which multimodal systems and mid-level concept detection. Some further late fusions of the systems are investigated and show promising performances.


content based multimedia indexing | 2015

VSD2014: A dataset for violent scenes detection in hollywood movies and web videos

Markus Schedi; Mats Sjöberg; Ionut Mironica; Bogdan Ionescu; Vu Lam Quang; Yu-Gang Jiang; Claire-Hélène Demarty

In this paper, we introduce a violent scenes and violence-related concept detection dataset named VSD2014. It contains annotations as well as auditory and visual features of Hollywood movies and user-generated footage shared on the web. The dataset is the result of a joint annotation endeavor of different research institutions and responds to the real-world use case of parental guidance in selecting appropriate content for children. The dataset has been validated during the Violent Scenes Detection (VSD) task at the MediaEval benchmarking initiative for multimedia evaluation.


content based multimedia indexing | 2013

Audio event detection in movies using multiple audio words and contextual Bayesian networks

Cédric Penet; Claire-Hélène Demarty; Guillaume Gravier; Patrick Gros

This article investigates a novel use of the well-known audio words representations to detect specific audio events, namely gunshots and explosions, in order to get more robustness towards soundtrack variability in Hollywood movies. An audio stream is processed as a sequence of stationary segments. Each segment is described by one or several audio words obtained by applying product quantization to standard features. Such a representation using multiple audio words constructed via product quantisation is one of the novelties described in this work. Based on this representation, Bayesian networks are used to exploit the contextual information in order to detect audio events. Experiments are performed on a comprehensive set of 15 movies, made publicly available. Results are comparable to the state of the art results obtained on the same dataset but show increased robustness to decision thresholds, however limiting the range of possible operating points in some conditions. Late fusion provides a solution to this issue.


Multimedia Tools and Applications | 2014

Classification-oriented structure learning in Bayesian networks for multimodal event detection in videos

Guillaume Gravier; Claire-Hélène Demarty; Siwar Baghdadi; Patrick Gros

We investigate the use of structure learning in Bayesian networks for a complex multimodal task of action detection in soccer videos. We illustrate that classical score-oriented structure learning algorithms, such as the K2 one whose usefulness has been demonstrated on simple tasks, fail in providing a good network structure for classification tasks where many correlated observed variables are necessary to make a decision. We then compare several structure learning objective functions, which aim at finding out the structure that yields the best classification results, extending existing solutions in the literature. Experimental results on a comprehensive data set of 7 videos show that a discriminative objective function based on conditional likelihood yields the best results, while augmented approaches offer a good compromise between learning speed and classification accuracy.


Multimedia Tools and Applications | 2015

Variability modelling for audio events detection in movies

Cédric Penet; Claire-Hélène Demarty; Guillaume Gravier; Patrick Gros

Detecting audio events in Hollywood movies is a complex task due to the presence of variability between the soundtracks of the movies. This inter-movies variability is shown to impair the audio events detection results in a realistic framework. In this article, we propose to model the variability using a factor analysis technique, which we then use to compensate the audio features. The factor analysis compensation is validated using the state-of-the-art system based on multiple audio words sequences and contextual Bayesian networks which we previously developed in Penet et al. (2013). Results obtained on the same publicly available dataset for the detection of gunshots and explosions show an improvement in the handling of the variability, while keeping the robustness capabilities of the previous system. Furthermore, the system is applied to the detection of screams and proves its ability to generalise to other types of events. The obtained results also emphasise the fact that, in addition to modelling variability, adding concepts in the system may also be beneficial for the precision rates


Fusion in Computer Vision | 2014

Multimodal Violence Detection in Hollywood Movies: State-of-the-Art and Benchmarking

Claire-Hélène Demarty; Cédric Penet; Bogdan Ionescu; Guillaume Gravier; Mohammad Soleymani

This chapter introduces a benchmark evaluation targeting the detection of violent scenes in Hollywood movies. The evaluation was implemented in 2011 and 2012 as an affect task in the framework of the international MediaEval benchmark initiative. We report on these 2 years of evaluation, providing a detailed description of the dataset created, describing the state of the art by studying the results achieved by participants and providing a detailed analysis of two of the best performing multimodal systems. We elaborate on the lessons learned after 2 years to provide insights on future work emphasizing multimodal modeling and fusion.


international conference on multimedia and expo | 2008

Structure learning in a Bayesian network-based video indexing framework

Siwar Baghdadi; Guillaume Gravier; Claire-Hélène Demarty; Patrick Gros

Several stochastic models provide an effective framework to identify the temporal structure of audiovisual data. Most of them need as input a first video structure, i.e. connections between features and video events. Provided that this structure is given as input, the parameters are then estimated from training data. Bayesian networks offer an additional feature, namely structure learning, which allows the automatic construction of the model structure from training data. Structure learning obviously leads to an increased generality of the model building process. This paper investigates the trade-off between the increase of generality and the quality of the results in video analysis. We model video data using dynamic Bayesian networks (DBNs) where the static part of the network accounts for the correlations between low-level features extracted from the raw data and between these features and the events considered. It is precisely this part of the network whose structure is automatically constructed from training data. Experimental results on a commercial detection case study application show that, even though the model structure is determined in a non supervised manner, the resulting model is effective for the detection of commercial segments in video data.

Collaboration


Dive into the Claire-Hélène Demarty's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bogdan Ionescu

Politehnica University of Bucharest

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Markus Schedl

Johannes Kepler University of Linz

View shared research outputs
Top Co-Authors

Avatar

Patrick Gros

French Institute for Research in Computer Science and Automation

View shared research outputs
Researchain Logo
Decentralizing Knowledge