Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Matthias Janke is active.

Publication


Featured researches published by Matthias Janke.


IEEE Transactions on Biomedical Engineering | 2014

Tackling speaking mode varieties in EMG-based speech recognition.

Michael Wand; Matthias Janke; Tanja Schultz

An electromyographic (EMG) silent speech recognizer is a system that recognizes speech by capturing the electric potentials of the human articulatory muscles, thus enabling the user to communicate silently. After having established a baseline EMG-based continuous speech recognizer, in this paper, we investigate speaking mode variations, i.e., discrepancies between audible and silent speech that deteriorate recognition accuracy. We introduce multimode systems that allow seamless switching between audible and silent speech, investigate different measures which quantify speaking mode differences, and present the spectral mapping algorithm, which improves the word error rate (WER) on silent speech by up to 14.3% relative. Our best average silent speech WER is 34.7%, and our best WER on audibly spoken speech is 16.8%.


international symposium on neural networks | 2015

Direct conversion from facial myoelectric signals to speech using Deep Neural Networks

Lorenz Diener; Matthias Janke; Tanja Schultz

This paper presents our first results using Deep Neural Networks for surface electromyographic (EMG) speech synthesis. The proposed approach enables a direct mapping from EMG signals captured from the articulatory muscle movements to the acoustic speech signal. Features are processed from multiple EMG channels and are fed into a feed forward neural network to achieve a mapping to the target acoustic speech output. We show that this approach is feasible to generate speech output from the input EMG signal and compare the results to a prior mapping technique based on Gaussian mixture models. The comparison is conducted via objective Mel-Cepstral distortion scores and subjective listening test evaluations. It shows that the proposed Deep Neural Network approach gives substantial improvements for both evaluation criteria.


international conference on acoustics, speech, and signal processing | 2011

Estimation of fundamental frequency from surface electromyographic data: EMG-to-F 0

Keigo Nakamura; Matthias Janke; Michael Wand; Tanja Schultz

In this paper, we present our recent studies of F0 estimation from the surface electromyographic (EMG) data using a Gaussian mixture model (GMM)-based voice conversion (VC) technique, referred to as EMG-to-F0. In our approach, a support vector machine recognizes individual frames as unvoiced and voiced (U/V), and voiced F0 contours are discriminated by the trained GMM based on the manner of minimum mean-square error. EMG-to-F0 is experimentally evaluated using three data sets of different speakers. Each data set includes almost 500 utterances. Objective experiments demonstrate that we achieve a correlation coefficient of up to 0.49 between estimated and target F0 contours with more than 84% U/V decision accuracy, although the results have large variations.


international conference of the ieee engineering in medicine and biology society | 2013

Artifact removal algorithm for an EMG-based Silent Speech Interface

Michael Wand; Adam Himmelsbach; Till Heistermann; Matthias Janke; Tanja Schultz

An electromygraphic (EMG) Silent Speech Interface is a system which recognizes speech by capturing the electric potentials of the human articulatory muscles, thus enabling the user to communicate silently. This study deals with improving the EMG signal quality by removing artifacts: The EMG signals are captured by electrode arrays with multiple measuring points. On the resulting high-dimensional signal, Independent Component Analysis is performed, and artifact components are automatically detected and removed. This method reduces the Word Error Rate of the silent speech recognizer by 9.9% relative on a development corpus, and by 13.9% relative on an evaluation corpus.


international conference on acoustics, speech, and signal processing | 2012

Further investigations on EMG-to-speech conversion

Matthias Janke; Michael Wand; Keigo Nakamura; Tanja Schultz

Our study deals with a Silent Speech Interface based on mapping surface electromyographic (EMG) signals to speech waveforms. Electromyographic signals recorded from the facial muscles capture the activity of the human articulatory apparatus and therefore allow to retrace speech, even when no audible signal is produced. The mapping of EMG signals to speech is done via a Gaussian mixture model (GMM)-based conversion technique. In this paper, we follow the lead of EMG-based speech-to-text systems and apply two major recent technological advances to our system, namely, we consider session-independent systems, which are robust against electrode repositioning, and we show that mapping the EMG signal to whispered speech creates a better speech signal than a mapping to normally spoken speech. We objectively evaluate the performance of our systems using a spectral distortion measure.


international conference on acoustics, speech, and signal processing | 2014

Fundamental frequency generation for whisper-to-audible speech conversion

Matthias Janke; Michael Wand; Till Heistermann; Tanja Schultz; K. Prahallad

In this work, we address the issues involved in whisper-to-audible speech conversion. Spectral mapping techniques using Gaussian mixture models or Artificial Neural Networks borrowed from voice conversion have been applied to transform whisper spectral features to normally phonated audible speech. However, the modeling and generation of fundamental frequency (F0) and its contour in the converted speech is a major issue. Whispered speech does not contain explicit voicing characteristics and hence it is hard to derive a suitable F0, making it difficult to generate a natural prosody after conversion. Our work addresses the F0 modeling in whisper-to-speech conversion. We show that F0 contours can be derived from the mapped spectral vectors, which can be used for the synthesis of a speech signal. We also present a hybrid unit selection approach for whisper-to-speech conversion. Unit selection is performed on the spectral vectors, where F0 and its contour can be obtained as a byproduct without any additional modeling.


biomedical engineering systems and technologies | 2014

Spatial Artifact Detection for Multi-channel EMG-based Speech Recognition

Till Heistermann; Matthias Janke; Michael Wand; Tanja Schultz

We introduce a spatial artifact detection method for a surface electromyography (EMG) based speech recognition system. The EMG signals are recorded using grid-shaped electrode arrays affixed to the speakers face. Continuous speech recognition is performed on the basis of these signals. As the EMG data are highdimensional, Independent Component Analysis (ICA) can be applied to separate artifact components from the content-bearing signal. The proposed artifact detection method classifies the ICA components by their spatial shape, which is analyzed using the spectra of the spatial patterns of the independent components. Components identified as artifacts can then be removed. Our artifact detection method reduces the word error rates (WER) of the recognizer significantly. We observe a slight advantage in terms of WER over the temporal signal based artifact detection method by (Wand et al., 2013a).


international conference on acoustics, speech, and signal processing | 2014

Compensation of Recording Position Shifts for a Myoelectric Silent Speech Recognizer

Michael Wand; Christopher Schulte; Matthias Janke; Tanja Schultz

A myoelectric Silent Speech Recognizer is a system which recognizes speech by capturing the electrical activity of the human articulatory muscles, thus enabling the user to communicate silently. We recently devised a recording setup based on electrode arrays with multiple measuring points. In this study we show that this allows to compensate for shifts of the recording position, which happen when the array is removed and reattached between system training and application. We present a method which determines the amount of recording position shift; compensation is performed by linear interpolation. We evaluate our method by running recognition experiments across recording sessions and obtain a Word Error Rate improvement of 14.3% relative on the development set and 12.9% relative on the evaluation set, compared to using classical session adaptation.


asia pacific signal and information processing association annual summit and conference | 2014

Enhancement of EMG-based Thai number words classification using frame-based time domain features with stacking filter

Niyawadee Srisuwan; Michael Wand; Matthias Janke; Pornchai Phukpattaranont; Tanja Schultz; Chusak Limsakul

In order to overcome a problem existing in a classical automatic speech recognition (e.g. ambient noise and loss of privacy), Electromyography (EMG) from speech production muscles was used in place of a human speech signal. We aim to investigate the EMG speech recognition based on Thai language. The earlier work, we used five channels of the EMG from the facial and neck muscles to classify 11 Thai number words based on Neural Network Classification. 15 features in time domain and frequency domain were employed for feature extraction. We obtained an average accuracy rate of 89.45% for audible speech and 78.55% for silent speech. However, it needs to be enhanced to get the best result. This paper proposes to improve an accuracy rate of EMG-based Thai number words classification. The ten subjects uttered 11 words in both an audible and a silent speech while five channels of the EMG signal were captured. Frame-based time domain features with a stacking filter was performed for feature extraction stage. After that, LDA was used to lessen a dimension of the feature vector. Hidden Markov Model (HMM) was employed in classification stage. The results show that using above techniques of feature extraction, feature dimensionality reduction and classification can improve an average accuracy rate by 3% absolute for audible speech when were compared to earlier work. We achieved an average classification rate of 92.45% and 75.73% for audible and silent speech respectively.


biomedical engineering systems and technologies | 2013

Application of Electrode Arrays for Artifact Removal in an Electromyographic Silent Speech Interface

Michael Wand; Matthias Janke; Till Heistermann; Christopher Schulte; Adam Himmelsbach; Tanja Schultz

An electromygraphic (EMG) Silent Speech Interface is a system which recognizes speech by capturing the electric potentials of the human articulatory muscles, thus enabling the user to communicate silently. This study deals with the introduction of multi-channel electrode arrays to the EMG recording system, which requires meticulous dealing with the resulting high-dimensional data. As a first application of the technology, Independent Component Analysis (ICA) is applied for automated artifact detection and removal. Without the artifact removal component, the system achieves optimal average Word Error Rates of 40.1 % for 40 training sentences and 10.9 % for 160 training sentences on EMG signals of audible speech. On a subset of the corpus, we evaluate the ICA artifact removal method, improving the Word Error Rate by 10.7 % relative.

Collaboration


Dive into the Matthias Janke's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Till Heistermann

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Christopher Schulte

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Keigo Nakamura

Nara Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Adam Himmelsbach

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Christian Herff

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Christoph Amma

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Dirk Gehrig

Karlsruhe Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge