Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Joseph Razik is active.

Publication


Featured researches published by Joseph Razik.


forensics in telecommunications information and multimedia | 2009

Vocal Forgery in Forensic Sciences

Patrick Perrot; Mathieu Morel; Joseph Razik; Gérard Chollet

This article describes techniques of vocal forgery able to affect automatic speaker recognition system in a forensic context. Vocal forgery covers two main aspects: voice transformation and voice conversion. Concerning voice transformation, this article proposes an automatic analysis of four specific disguised voices in order to detect the forgery and, for voice conversion, different ways to automatically imitate a target voice. Vocal forgery appears as a real and relevant question for forensic expertise. In most cases, criminals who make a terrorist claim or a miscellaneous call, disguise their voices to hide their identity or to take the identity of another person. Disguise is considered in this paper as a deliberate action of the speaker who wants to conceal or falsify his identity. Different techniques exist to transform one’s own voice. Some are sophisticated as software manipulation, some others are simpler as using an handkerchief over the mouth. In voice transformation, the presented work is dedicated to the study of disguise used in the most common cases. In voice conversion, different techniques will be presented, compared, and applied on an original example of the French President voice.


international symposium on visual computing | 2010

Experiments on acoustic model supervised adaptation and evaluation by K-Fold Cross Validation technique

Daniel Caon; Asmaa Amehraye; Joseph Razik; Gérard Chollet; Rodrigo V. Andreão; Chafic Mokbel

This paper is an analysis of adaptation techniques for French acoustic models (hidden Markov models). The LVCSR engine Julius, the Hidden Markov Model Toolkit (HTK) and the K-Fold CV technique are used together to build three different adaptation methods: Maximum Likelihood a priori (ML), Maximum Likelihood Linear Regression (MLLR) and Maximum a posteriori (MAP). Experimental results by means of word and phoneme error rate indicate that the best adaptation method depends on the adaptation data, and that the acoustic models performance can be improved by the use of alignments at phoneme-level and K-Fold Cross Validation (CV). The very known K-Fold CV technique will point to the best adaptation technique to follow considering each case of data type.


International Journal of Pattern Recognition and Artificial Intelligence | 2011

FRAME-SYNCHRONOUS AND LOCAL CONFIDENCE MEASURES FOR AUTOMATIC SPEECH RECOGNITION

Joseph Razik; Odile Mella; Dominique Fohr; Jean Paul Haton

In this paper, we introduce two new confidence measures for large vocabulary speech recognition systems. The major feature of these measures is that they can be computed without waiting for the end of the audio stream. We proposed two kinds of confidence measures: frame-synchronous and local. The frame-synchronous ones can be computed as soon as a frame is processed by the recognition engine and are based on a likelihood ratio. The local measures estimate a local posterior probability in the vicinity of the word to analyze. We evaluated our confidence measures within the framework of the automatic transcription of French broadcast news with the EER criterion. Our local measures achieved results very close to the best state-of-the-art measure (EER of 23% compared to 22.0%). We then conducted a preliminary experiment to assess the contribution of our confidence measure in improving the comprehension of an automatic transcription for the hearing impaired. We introduced several modalities to highlight words of low confidence in this transcription. We showed that these modalities used with our local confidence measure improved the comprehension of automatic transcription.


Pattern Recognition | 2017

Centroid-aware local discriminative metric learning in speaker verification

Kekai Sheng; Weiming Dong; Wei Li; Joseph Razik; Feiyue Huang; Bao-Gang Hu

Abstract We propose a new mechanism to pave the way for efficient learning against class-imbalance and improve representation of identity vector (i-vector) in automatic speaker verification (ASV). The insight is to effectively exploit the inherent structure within ASV corpus — centroid priori. In particular: (1) to ensure learning efficiency against class-imbalance, the centroid-aware balanced boosting sampling is proposed to collect balanced mini-batch; (2) to strengthen local discriminative modeling on the mini-batches, neighborhood component analysis (NCA) and magnet loss (MNL) are adopted in ASV-specific modifications. The integration creates adaptive NCA (AdaNCA) and linear MNL (LMNL). Numerical results show that LMNL is a competitive candidate for low-dimensional projection on i-vector (EER=3.84% on SRE2008, EER=1.81% on SRE2010), enjoying competitive edge over linear discriminant analysis (LDA). AdaNCA (EER=4.03% on SRE2008, EER=2.05% on SRE2010) also performs well. Furthermore, to facilitate the future study on boosting sampling, connections between boosting sampling, hinge loss and data augmentation have been established, which help understand the behavior of boosting sampling further.


COST'09 Proceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony | 2009

Spoken dialogue in virtual worlds

Gérard Chollet; Asmaa Amehraye; Joseph Razik; Leila Zouari; Houssemeddine Khemiri; Chafic Mokbel

Human-computer conversations have attracted a great deal of interest especially in virtual worlds. In fact, research gave rise to spoken dialogue systems by taking advantage of speech recognition, language understanding and speech synthesis advances. This work surveys the state of the art of speech dialogue systems. Current dialogue system technologies and approaches are first introduced emphasizing differences between them, then, speech recognition and synthesis and language understanding are introduced as complementary and necessary modules. On the other hand, as the development of spoken dialogue systems becomes more complex, it is necessary to define some processes to evaluate their performance. Wizard-of-Oz techniques play an important role to achieve this task. Thanks to this technique is obtained a suitable dialogue corpus necessary to achieve good performance. A description of this technique is given in this work together with perspectives on multimodal dialogue systems in virtual worlds.


Journal of the Acoustical Society of America | 2013

Sparse coding for scaled bioacoustics: From Humpback whale songs evolution to forest soundscape analyses

Hervé Glotin; Jérôme Sueur; Thierry Artières; Olivier Adam; Joseph Razik

The bioacoustic event indexing has to be scaled in space (oceans and large forests, multiple sensors), and in species number (thousand). We discuss why time-frequency featuring is inefficient compared to the sparse coding (SC) for soundscape analysis. SC is based on the principle that an optimal code should contain enough information to reconstruct the input near regions of high data density, and should not contain enough information to reconstruct inputs in regions of low data density. It has been shown that SC methods can be real-time. We illustrate with an application to humpack whale songs to determine stable components versus evolving ones across season and years. By sparsing at different time scale, the results show that the shortest humpack acoustic codes are the most stable (occurring with similar structure across two consecutive years). Another illustration is given on forest soundscape analysis, where we show that time-frequency atomes allow an easier analysis of forest sound organization, witho...


international conference on data mining | 2015

Sparse Coding for Efficient Bioacoustic Data Mining: Preliminary Application to Analysis of Whale Songs

Joseph Razik; Hervé Glotin; Maia Hoeberechts; Yann Doh; Sébastien Paris

Bioacoustic monitoring, such as surveys of animal populations and migration, needs efficient data mining methods to extract information from large datasets covering multi-year and multi-location recordings. This paper introduces a method for sparse coding of bioacoustic recordings in order to efficiently compress and automatically extract patterns in data. We demonstrate the proposed method on the analysis of humpback whale songs. Previous work suggests that the structure of these songs can be characterized by successive vocalizations called sound units. Most of these analyses are currently done with expert intervention, but the volume of recordings drive the need for automated methods for sound unit classification. This paper proposes that sparse coding of the song at different time scales supports the distinction of stable song components versus those which evolve year to year. The approach is summarized as: first, an unsupervised method is used to encode the entire bioacoustic dataset into a dictionary, second, sparse coding is used to limit the number of elements in the dictionary, third, salient features are identified using the Lasso algorithm, and finally, an interpretation of the evolving and stable components of the songs is derived, supporting an analysis of year to year variation. It is shown that shorter codes are more stable, occurring with similar frequency across two consecutive years, while the occurrence of longer units varies across years as expected based on the prior manual analysis. 250 ms segments appear to be an appropriate length for encoding stable features of whale songs, possibly corresponding to subunits. We conclude by exploring further possibilities of the application of this method for biopopulation analysis.


Traitement Du Signal | 2013

Décomposition parcimonieuse des chants de cétacés pour leur suivi

Yann Doh; Joseph Razik; Sébastien Paris; Olivier Adam; Hervé Glotin

Male humpback whales emit songs during the breeding season. These songs are made with successive vocalizations called sound units. The study of these songs is based on the classification of these sound units, especially to extract the song theme of the singers in a specific area during a specific season. Recently, some approaches are proposed for automatic classification of these sound units. This paper introduces the sparse coding as a robust unsupervised classifier to generate efficient time-frequency representation of the calls of the whale. Secondly, the subunit shows to be interesting to analyze the evolution of the humpback whale songs during two years. It is statistically shown that the shortest units are the most stable (occurring with similar time frequency shape across the two years), while the longest units are evolving from one year to one other.


information sciences, signal processing and their applications | 2007

Frame-synchronous and local confidence measures for on-the-fly keyword spotting

Joseph Razik; Odile Mella; Dominique Fohr; Jean Paul Haton

This paper presents several new confidence measures for speech recognition applications. The major advantage of these measures is that they can be evaluated with only a part of the whole sentence. Two of these measures can be computed directly within the first step of the recognition process, synchronously with the decoding engine. Such measures are useful to drive the recognition process by modifying the likelihood score or to validate recognized words in on-the-fly applications as keyword spotting task and on-line automatic speech transcription for deaf people. Two kinds of results are given. Firstly, an EER evaluation on a French broadcast news corpus shows performance close to the batch version of these measures (23.9% against 23.8% of EER). Secondly, for the keyword spotting application, our best measure provides a decrease of the false-acceptation rate by 50% with only a decrease of the correct words by 5%.


TRECVID 2011 - TREC Video Retrieval Evaluation Online | 2011

IRIM at TRECVID 2011: Semantic Indexing and Instance Search

Nicolas Ballas; Benjamin Labb; Aymen Shabou; Philippe Gosselin; Miriam Redi; Marc Bernard; Jonathan Delhumeau; Boris Mansencal; Jenny Benois-Pineau; Abdelkader Haadi; Bahjat Safadi; Franck Thollard; Nadia Derbas; Liming Chen; Alexandre Beno; Patrick Lambert; Tiberius Strat; Joseph Razik; Dijana Petrovska; Andrei Stoian; Michel Crucianu

Collaboration


Dive into the Joseph Razik's collaboration.

Top Co-Authors

Avatar

Hervé Glotin

Aix-Marseille University

View shared research outputs
Top Co-Authors

Avatar

Jean-Paul Haton

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Odile Mella

French Institute for Research in Computer Science and Automation

View shared research outputs
Top Co-Authors

Avatar

Yann Doh

Aix-Marseille University

View shared research outputs
Top Co-Authors

Avatar

Xanadu Halkias

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge