Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jean-Pierre Martens is active.

Publication


Featured researches published by Jean-Pierre Martens.


Journal of the Acoustical Society of America | 1992

Pitch and voiced/unvoiced determination with an auditory model

Luc Van Immerseel; Jean-Pierre Martens

In this paper, an accurate pitch and voiced/unvoiced determination algorithm for speech analysis is described. The algorithm is called AMPEX (auditory model-based pitch extractor) and it performs a temporal analysis of the outputs emerging from a new auditory model. However, in spite of its use of an auditory model, AMPEX should not be regarded as a substitute for any psychophysical theory of human auditory pitch perception. What is mainly described is the design of a computationally efficient auditory model, the perceptually motivated determination of the model parameters, the conception of a reliable pitch extractor for speech analysis, and the elaboration of an experimental procedure for evaluating the performance of such a pitch extractor. In the course of the evaluation experiment several kinds of speech stimuli including clean speech, bandpass-filtered speech, and noisy speech were presented to three different pitch extractors. The experimental results clearly indicate that AMPEX outperforms the best algorithms available.


international conference on acoustics, speech, and signal processing | 2004

A comparison of human and automatic musical genre classification

Stefaan Lippens; Jean-Pierre Martens; T. De Mulder

Recently there has been an increasing amount of work in the area of automatic genre classification of music in audio format. In addition to automatically structuring large music collections such classification can be used as a way to evaluate features for describing musical content. However the evaluation and comparison of genre classification systems is hindered by the subjective perception of genre definitions by users. In this work, we describe a set of experiments in automatic musical genre classification. An important contribution of this work is the comparison of the automatic results with human genre classifications on the same dataset. The results show that, although there is room for improvement, genre classification is inherently subjective and therefore perfect results can not be expected neither from automatic nor human classification. The experiments also show that features derived from an auditory model have similar performance with features based on mel-frequency cepstral coefficients (MFCC).


EURASIP Journal on Advances in Signal Processing | 2009

Automated intelligibility assessment of pathological speech using phonological features

Catherine Middag; Jean-Pierre Martens; Gwen Van Nuffelen; Marc De Bodt

It is commonly acknowledged that word or phoneme intelligibility is an important criterion in the assessment of the communication efficiency of a pathological speaker. People have therefore put a lot of effort in the design of perceptual intelligibility rating tests. These tests usually have the drawback that they employ unnatural speech material (e.g., nonsense words) and that they cannot fully exclude errors due to listener bias. Therefore, there is a growing interest in the application of objective automatic speech recognition technology to automate the intelligibility assessment. Current research is headed towards the design of automated methods which can be shown to produce ratings that correspond well with those emerging from a well-designed and well-performed perceptual test. In this paper, a novel methodology that is built on previous work (Middag et al., 2008) is presented. It utilizes phonological features, automatic speech alignment based on acoustic models that were trained on normal speech, context-dependent speaker feature extraction, and intelligibility prediction based on a small model that can be trained on pathological speech samples. The experimental evaluation of the new system reveals that the root mean squared error of the discrepancies between perceived and computed intelligibilities can be as low as 8 on a scale of 0 to 100.


PLOS ONE | 2013

Activating and relaxing music entrains the speed of beat synchronized walking

Marc Leman; Dirk Moelants; Matthias Varewyck; Frederik Styns; Leon van Noorden; Jean-Pierre Martens

Inspired by a theory of embodied music cognition, we investigate whether music can entrain the speed of beat synchronized walking. If human walking is in synchrony with the beat and all musical stimuli have the same duration and the same tempo, then differences in walking speed can only be the result of music-induced differences in stride length, thus reflecting the vigor or physical strength of the movement. Participants walked in an open field in synchrony with the beat of 52 different musical stimuli all having a tempo of 130 beats per minute and a meter of 4 beats. The walking speed was measured as the walked distance during a time interval of 30 seconds. The results reveal that some music is ‘activating’ in the sense that it increases the speed, and some music is ‘relaxing’ in the sense that it decreases the speed, compared to the spontaneous walked speed in response to metronome stimuli. Participants are consistent in their observation of qualitative differences between the relaxing and activating musical stimuli. Using regression analysis, it was possible to set up a predictive model using only four sonic features that explain 60% of the variance. The sonic features capture variation in loudness and pitch patterns at periods of three, four and six beats, suggesting that expressive patterns in music are responsible for the effect. The mechanism may be attributed to an attentional shift, a subliminal audio-motor entrainment mechanism, or an arousal effect, but further study is needed to figure this out. Overall, the study supports the hypothesis that recurrent patterns of fluctuation affecting the binary meter strength of the music may entrain the vigor of the movement. The study opens up new perspectives for understanding the relationship between entrainment and expressiveness, with the possibility to develop applications that can be used in domains such as sports and physical rehabilitation.


Speech Communication | 1999

In search of better pronunciation models for speech recognition

Nick Cremelie; Jean-Pierre Martens

Abstract The lexicon of a speech recognizer is supposed to contain pronunciation models describing how words can be realized as sequences of subword units (usually phonemes). In this contribution we present a method for upgrading initially simple pronunciation models to new models that can explain several pronunciation variants of each word. Since the presented strategy is capable of producing pronunciation variants and cross-word dependencies completely automatically, it is an attractive alternative to the manual encoding of multiple pronunciations in the lexicon. The method learns pronunciation rules from orthographically transcribed speech utterances, and subsequently applies these rules to generate common pronunciation variants. All variants of one word are then compiled into a compact pronunciation model. The obtained models are properly integrated in the speech recognizer, where they replace the formerly used simple models. By learning pronunciation rules rather than pronunciation variants from the data, one can combine the advantages of data-driven and rule-based approaches. Important properties of the proposed methodology are that it incorporates dependencies between the rules from the very beginning (during the training), that it supports exception rules not producing pronunciation variants but affecting the production of such variants by other rules (called production rules), and that it has a sound probabilistic basis for the attachment of likelihoods to the word pronunciation variants. Experiments showed that the introduction of such variants in a segment-based recognizer significantly improves the recognition accuracy: on timit a relative word error rate reduction of as high as 17% was obtained.


international conference on spoken language processing | 1996

A fast and reliable rate of speech detector

Jan P. Verhasselt; Jean-Pierre Martens

In this paper, we present a new rate-of-speech (ROS) detector that operates independently from the recognition process. This detector is evaluated on the TIMIT corpus and positioned with respect to other ROS detectors. The ROS estimate is subsequently used to compensate for the effects of unusual speech rates on continuous speech recognition. We report on results obtained with two ROS compensation techniques on a speaker-independent acoustic-phonetic decoding task.


Neural Networks | 1991

A fast and robust learning algorithm for feedforward neural networks

Nico Weymaere; Jean-Pierre Martens

The back propagation algorithm caused a tremendous breakthrough in the application of multilayer perceptrons. However, it has some important drawbacks: long training times and sensitivity to the presence of local minima. Another problem is the network topology; the exact number of units in a particular hidden layer, as well as the number of hidden layers need to be known in advance. A lot of time is often spent in finding the optimal topology. In this article, we consider multilayer networks with one hidden layer of Gaussian units and an output layer of conventional units. We show that for this kind of networks, it is possible to perform a fast dimensionality analysis, by analyzing only a small fraction of the input patterns. Moreover, as a result of this approach, it is possible to initialize the weights of the network before starting the back propagation training. Several classification problems are taken as examples.


Speech Communication | 1996

Automatic segmentation and labelling of multi-lingual speech data

Annemie Vorstermans; Jean-Pierre Martens; B. Van Coile

Abstract A new system for the automatic segmentation and labelling of speech is presented. The system is capable of labelling speech originating from different languages without requiring extensive linguistic knowledge or large (manually segmented and labeled) training databases of that language. The system comprises small neural networks for the segmentation and the broad phonetic classification of the speech. These networks were originally trained on one task (Flemish continuous speech), and are automatically adapted to a new task. Due to the limited size of the neural networks, the segmentation and labelling strategy requires but a limited amount of computations, and the adaptation to a new task can be accomplished very quickly. The system was first evaluated on five isolated word corpora designed for the development of Dutch, French, American English, Spanish and Korean text-to-speech systems. The results show that the accuracy of the obtained automatic segmentation and labelling is comparable to that of human experts. In order to provide segmentation and labelling results which can be compared to data reported in the literature, additional tests were run on TIMIT and on the English, Danish and Italian portions of the EUROM0 continuous speech utterances. The performance of our system appears to compare favourably to that of other systems.


International Journal of Language & Communication Disorders | 2009

Speech technology-based assessment of phoneme intelligibility in dysarthria

Gwen Van Nuffelen; Catherine Middag; Marc De Bodt; Jean-Pierre Martens

BACKGROUND Currently, clinicians mainly rely on perceptual judgements to assess intelligibility of dysarthric speech. Although often highly reliable, this procedure is subjective with a lot of intrinsic variables. Therefore, certain benefits can be expected from a speech technology-based intelligibility assessment. Previous attempts to develop an automated intelligibility assessment mainly relied on automatic speech recognition (ASR) systems that were trained to recognize the speech of persons without known impairments. In this paper automatic speech alignment (ASA) systems are used instead. In addition, previous attempts only made use of phonemic features (PMF). However, since articulation is an important contributing factor to intelligibility of dysarthric speech and since phonological features (PLF) are shared by multiple phonemes, phonological features may be more appropriate to characterize and identify dysarthric phonemes. AIMS To investigate the reliability of objective phoneme intelligibility scores obtained by three types of intelligibility models: models using only phonemic features (yielded by an automated speech aligner) (PMF models), models using only phonological features (PLF models), and models using a combination of phonemic and phonological features (PMF + PLF models). METHODS & PROCEDURES Correlations were calculated between the objective phoneme intelligibility scores of 60 dysarthric speakers and the corresponding perceptual phoneme intelligibility scores obtained by a standardized perceptual phoneme intelligibility assessment. OUTCOMES & RESULTS The correlations between the objective and perceptual intelligibility scores range from 0.793 for the PMF models, over 0.828 for PLF models to 0.943 for PMF + PLF models. The features selected to obtain such high correlations can be divided into six main subgroups: (1) vowel-related phonemic and phonological features, (2) lateral-related features, (3) silence-related features, (4) fricative-related features, (5) velar-related features and (6) plosive-related features. CONCLUSIONS & IMPLICATIONS The phoneme intelligibility scores of dysarthric speakers obtained by the three investigated intelligibility model types are reliable. The highest correlation between the perceptual and objective intelligibility scores was found for models combining phonemic and phonological features. The intelligibility scoring system is now ready to be implemented in a clinical tool.


systems man and cybernetics | 2011

A Practical Approach to Model Selection for Support Vector Machines With a Gaussian Kernel

Matthias Varewyck; Jean-Pierre Martens

When learning a support vector machine (SVM) from a set of labeled development patterns, the ultimate goal is to get a classifier attaining a low error rate on new patterns. This so-called generalization ability obviously depends on the choices of the learning parameters that control the learning process. Model selection is the method for identifying appropriate values for these parameters. In this paper, a novel model selection method for SVMs with a Gaussian kernel is proposed. Its aim is to find suitable values for the kernel parameter γ and the cost parameter C with a minimum amount of central processing unit time. The determination of the kernel parameter is based on the argument that, for most patterns, the decision function of the SVM should consist of a sufficiently large number of significant contributions. A unique property of the proposed method is that it retrieves the kernel parameter as a simple analytical function of the dimensionality of the feature space and the dispersion of the classes in that space. An experimental evaluation on a test bed of 17 classification problems has shown that the new method favorably competes with two recently published methods: the classification of new patterns is equally good, but the computational effort to identify the learning parameters is substantially lower.

Collaboration


Dive into the Jean-Pierre Martens's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mieke Moerman

Ghent University Hospital

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge