Patrick Cardinal | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Patrick Cardinal is active.

Explore More

Publication

Featured researches published by Patrick Cardinal.

spoken language technology workshop | 2014

A complete KALDI recipe for building Arabic speech recognition systems

Ahmed M. Ali; Yifan Zhang; Patrick Cardinal; Najim Dahak; Stephan Vogel; James R. Glass

In this paper we present a recipe and language resources for training and testing Arabic speech recognition systems using the KALDI toolkit. We built a prototype broadcast news system using 200 hours GALE data that is publicly available through LDC. We describe in detail the decisions made in building the system: using the MADA toolkit for text normalization and vowelization; why we use 36 phonemes; how we generate pronunciations; how we build the language model. We report results using state-of-the-art modeling and decoding techniques. The scripts are released through KALDI and resources are made available on QCRIs language resources web portal. This is the first effort to share reproducible sizable training and testing results on MSA system.

conference of the international speech communication association | 2016

Automatic dialect detection in Arabic broadcast speech

Ahmed M. Ali; Najim Dehak; Patrick Cardinal; Sameer Khurana; Sree Harsha Yella; James R. Glass; Peter Bell; Steve Renals

We investigate different approaches for dialect identification in Arabic broadcast speech, using phonetic, lexical features obtained from a speech recognition system, and acoustic features using the i-vector framework. We studied both generative and discriminate classifiers, and we combined these features using a multi-class Support Vector Machine (SVM). We validated our results on an Arabic/English language identification task, with an accuracy of 100%. We used these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%. We further report results using the proposed method to discriminate between the five most widely used dialects of Arabic: namely Egyptian, Gulf, Levantine, North African, and MSA, with an accuracy of 52%. We discuss dialect identification errors in the context of dialect code-switching between Dialectal Arabic and MSA, and compare the error pattern between manually labeled data, and the output from our classifier. We also release the train and test data as standard corpus for dialect identification.

international conference on acoustics, speech, and signal processing | 2010

Content-based audio copy detection using nearest-neighbor mapping

Vishwa Gupta; Gilles Boulianne; Patrick Cardinal

We report results on video copy detection using nearest-neighbor (NN) mapping that has been used successfully in audio copy detection. For copy detection search, we use a sliding window to move the query video over the test video, and count the number of frames of query that match the frames in the test segment. The feature in the test frame that we match is the frame number of the query that is closest to that test frame. This leads to good matching scores even when the query video is distorted and contains occlusions. We test the NN mapping algorithm and the video features that map test frame to the closest query frame on TRECVID 2009 and 2010 content-based copy detection (CBCD) evaluation data. For both these tasks, the NN mapping for video copy detection gives minimal normalized detection cost rate (min NDCR) comparable to that achieved with audio copy detection for the same task. For the TRECVID 2011 CBCD evaluation data we got the lowest min NDCR for 26 out of 56 transforms for actual no false alarm case.

content based multimedia indexing | 2010

Crim's content-based audio copy detection system for TRECVID 2009

Vishwa Gupta; Gilles Boulianne; Patrick Cardinal

We report results on audio copy detection for TRECVID 2009 copy detection task. This task involves searching for transformed audio queries in over 385 hours of test audio. The queries were transformed in seven different ways, three of them involved mixing unrelated speech to the original query, making it a much more difficult task. We give results with two different audio fingerprints and show that mapping each test frame to the nearest query frame (nearest-neighbor fingerprint) results in robust audio copy detection. The most difficult task in TRECVID 2009 was to detect audio copies using predetermined thresholds computed from 2008 data. We show that the nearest-neighbor fingerprints were robust to even this task and gave actual minimal normalized detection cost rate (NDCR) of around 0.06 for all the transformations. These results are close to those obtained by using the optimal threshold for each transform. This result shows the robustness of the nearest-neighbor fingerprints. These nearest-neighbor fingerprints can be efficiently computed on a graphics processing unit, leading to a very fast search.

conference of the international speech communication association | 2016

Native language detection using the i-vector framework

Mohammed Senoussaoui; Patrick Cardinal; Najim Dehak; Alessandro L. Koerich

Native-language identification is the task of determining a speaker’s native language based only on their speeches in a second language. In this paper we propose the use of the wellknown i-vector representation of the speech signal to detect the native language of an English speaker. The i-vector representation has shown an excellent performance on the quite similar task of distinguishing between different languages. We have evaluated different ways to extract i-vectors in order to adapt them to the specificities of the native language detection task. The experimental results on the 2016 ComParE Native language sub-challenge test set have shown that the proposed system based on a conventional i-vector extractor outperforms the baseline system with a 42% relative improvement.

IEEE Transactions on Audio, Speech, and Language Processing | 2013

Large Vocabulary Speech Recognition on Parallel Architectures

Patrick Cardinal; Pierre Dumouchel; Gilles Boulianne

The speed of modern processors has remained constant over the last few years but the integration capacity continues to follow Moores law and thus, to be scalable, applications must be parallelized. The parallelization of the classical Viterbi beam search has been shown to be very difficult on multi-core processor architectures or massively threaded architectures such as Graphics Processing Unit (GPU). The problem with this approach is that active states are scattered in memory and thus, they cannot be efficiently transferred to the processor memory. This problem can be circumvented by using the A* search which uses a heuristic to significantly reduce the number of explored hypotheses. The main advantage of this algorithm is that the processing time is moved from the search in the recognition network to the computation of heuristic costs, which can be designed to take advantage of parallel architectures. Our parallel implementation of the A* decoder on a 4-core processor with a GPU led to a speed-up factor of 6.13 compared to the Viterbi beam search at its maximum capacity and an improvement of 4% absolute in accuracy at real-time.

information sciences, signal processing and their applications | 2012

The A* speech recognition system on parallel architectures

Patrick Cardinal; Gilles Boulianne; Pierre Dumouchel

The speed of modern processors has remained constant over the last few years but the integration capacity continues to follow Moores law and thus, to be scalable, applications must be parallelized. In addition to the main CPU, almost every computer is equipped with a Graphics Processors Unit (GPU) which is in essence a specialized parallel processor. This paper explore how performance of speech recognition systems can be enhanced by using the A* algorithm which allows better parallelization over the Viterbi algorithm and a GPU for the acoustic computations in large vocabulary applications. First experiments with a “unigram approximation” heuristic resulted in approximatively 8.7 times less states being explored compared to our classical Viterbi decoder. The multi-thread implementation of the A* decoder combined with GPU for acoustic computation led to a speed-up factor of 5.2 over its sequential counterpart and an improvement of 5% absolute of the accuracy over the sequential Viterbi search at real-time.

Behavior Modification | 2017

Using a Visual Structured Criterion for the Analysis of Alternating-Treatment Designs:

Marc J. Lanovaz; Patrick Cardinal; Mary Francis

Although visual inspection remains common in the analysis of single-case designs, the lack of agreement between raters is an issue that may seriously compromise its validity. Thus, the purpose of our study was to develop and examine the properties of a simple structured criterion to supplement the visual analysis of alternating-treatment designs. To this end, we generated simulated data sets with varying number of points, number of conditions, effect sizes, and autocorrelations, and then measured Type I error rates and power produced by the visual structured criterion (VSC) and permutation analyses. We also validated the results for Type I error rates using nonsimulated data. Overall, our results indicate that using the VSC as a supplement for the analysis of systematically alternating-treatment designs with at least five points per condition generally provides adequate control over Type I error rates and sufficient power to detect most behavior changes.

acm multimedia | 2015

ETS System for AV+EC 2015 Challenge

Patrick Cardinal; Najim Dehak; Alessandro L. Koerich; Jahangir Alam; Patrice Boucher

This paper presents the system that we have developed for the AV+EC 2015 challenge which is mainly based on deep neural networks (DNNs). We have investigated different options using the audio feature set as a base system. The improvements that were achieved on this specific modality have been applied to other modalities. One of our main findings is that the frame stacking technique improves the quality of the predictions made by our model, and the improvements were also observed in all other modalities. Besides that, we also present a new feature set derived from the cardiac rhythm that were extracted from electrocardiogram readings. Such a new feature set helped us to improve the concordance correlation coefficient from 0.088 to 0.124 (on the development set) for the valence, an improvement of 25%. Finally, the fusion of all modalities has been studied using fusion at feature level using a DNN and at prediction level by training linear and random forest regressors. Both fusion schemes provided promising results.

international conference on acoustics, speech, and signal processing | 2012

Using A* for the parallelization of speech recognition systems

Patrick Cardinal; Gilles Boulianne; Pierre Dumouchel

The speed of modern processors has remained constant over the last few years but the integration capacity continues to follow Moores law and thus, to be scalable, applications must be parallelized. This paper presents results in using the A* search algorithm in a large vocabulary speech recognition parallel system. This algorithm allows better parallelization over the Viterbi algorithm. First experiments with a “unigram approximation” heuristic resulted in approximatively 8.7 times less states being explored compared to our classical Viterbi decoder. The multi-thread implementation of the A* decoder led to a speed-up factor of 3 over its sequential counterpart.

Explore More