M. A. Tugtekin Turan
Koç University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by M. A. Tugtekin Turan.
Speech Communication | 2013
Can Yağlı; M. A. Tugtekin Turan; Engin Erzin
In this paper, we propose a hidden Markov model (HMM)-based wideband spectral envelope estimation method for the artificial bandwidth extension problem. The proposed HMM-based estimator decodes an optimal Viterbi path based on the temporal contour of the narrowband spectral envelope and then performs the minimum mean square error (MMSE) estimation of the wideband spectral envelope on this path. Experimental evaluations are performed to compare the proposed estimator to the state-of-the-art HMM and Gaussian mixture model based estimators using both objective and subjective evaluations. Objective evaluations are performed with the log-spectral distortion (LSD) and the wideband perceptual evaluation of speech quality (PESQ) metrics. Subjective evaluations are performed with the A/B pair comparison listening test. Both objective and subjective evaluations yield that the proposed wideband spectral envelope estimator consistently improves performances over the state-of-the-art estimators.
IEEE Transactions on Audio, Speech, and Language Processing | 2016
M. A. Tugtekin Turan; Engin Erzin
In this paper, we propose a new statistical enhancement system for throat microphone recordings through source and filter separation. Throat microphones (TM) are skin-attached piezoelectric sensors that can capture speech sound signals in the form of tissue vibrations. Due to their limited bandwidth, TM recorded speech suffers from intelligibility and naturalness. In this paper, we investigate learning phone-dependent Gaussian mixture model (GMM)-based statistical mappings using parallel recordings of acoustic microphone (AM) and TM for enhancement of the spectral envelope and excitation signals of the TM speech. The proposed mappings address the phone-dependent variability of tissue conduction with TM recordings. While the spectral envelope mapping estimates the line spectral frequency (LSF) representation of AM from TM recordings, the excitation mapping is constructed based on the spectral energy difference (SED) of AM and TM excitation signals. The excitation enhancement is modeled as an estimation of the SED features from the TM signal. The proposed enhancement system is evaluated using both objective and subjective tests. Objective evaluations are performed with the log-spectral distortion (LSD), the wideband perceptual evaluation of speech quality (PESQ) and mean-squared error (MSE) metrics. Subjective evaluations are performed with an A/B comparison test. Experimental results indicate that the proposed phone-dependent mappings exhibit enhancements over phone-independent mappings. Furthermore enhancement of the TM excitation through statistical mappings of the SED features introduces significant objective and subjective performance improvements to the enhancement of TM recordings.
international conference on acoustics, speech, and signal processing | 2016
Johannes Abel; Magdalena Kaniewska; Cyril Guillaume; Wouter Tirry; Hannu Pulakka; Ville Myllylä; Jari Sjoberg; Paavo Alku; Itai Katsir; David Malah; Israel Cohen; M. A. Tugtekin Turan; Engin Erzin; Thomas Schlien; Peter Vary; Amr H. Nour-Eldin; Peter Kabal; Tim Fingscheidt
In studies on artificial bandwidth extension (ABE), there is a lack of international coordination in subjective tests between multiple methods and languages. Here we present the design of absolute category rating listening tests evaluating 12 ABE variants of six approaches in multiple languages, namely in American English, Chinese, German, and Korean. Since the number of ABE variants caused a higher-than-recommended length of the listening test, ABE variants were distributed into two separate listening tests per language. The paper focuses on the listening test design, which aimed at merging the subjective scores of both tests and thus allows for a joint analysis of all ABE variants under test at once. A language-dependent analysis, evaluating ABE variants in the context of the underlying coded narrowband speech condition showed statistical significant improvement in English, German, and Korean for some ABE solutions.
international conference on acoustics, speech, and signal processing | 2013
M. A. Tugtekin Turan; Engin Erzin
We investigate spectral envelope mapping problem with joint analysis of throat- and acoustic-microphone recordings to enhance throat-microphone speech. A new phone-dependent GMM-based spectral envelope mapping scheme, which performs the minimum mean square error (MMSE) estimation of the acoustic-microphone spectral envelope, has been proposed. Experimental evaluations are performed to compare the proposed mapping scheme to the state-of-theart GMM-based estimator using both objective and subjective evaluations. Objective evaluations are performed with the log-spectral distortion (LSD) and the wideband perceptual evaluation of speech quality (PESQ) metrics. Subjective evaluations are performed with the A/B pair comparison listening test. Both objective and subjective evaluations yield that the proposed phone-dependent mapping consistently improves performances over the state-of-the-art GMM estimator.
signal processing and communications applications conference | 2016
M. A. Tugtekin Turan; Engin Erzin
Swallowing action is one of the two fundamental elements of food intake mechanism. Classification of different swallowing patterns establishes an important part of the nutrient activity analysis. This paper is a preliminary research that investigates ingestion monitoring. We observe that throat microphone recordings can reveal certain characteristics of different swallowing types during food intake process. To evaluate the performance of proposed classifiers we recorded swallowing sounds regarding six different classes and extracted features over time-frequency analysis which results in between 60% and 80% accuracy. Experimental results are encouraging for automatic detection of swallowing events in future studies.
Proceedings of the 2nd International Workshop on Multimedia for Personal Health and Health Care | 2017
M. A. Tugtekin Turan; Engin Erzin
Wearable sensor systems can deliver promising solutions to automatic monitoring of ingestive behavior. This study presents an on-body sensor system and related signal processing techniques to classify different types of food intake sounds. A piezoelectric throat microphone is used to capture food consumption sounds from the neck. The recorded signals are firstly segmented and decomposed using the empirical mode decomposition (EMD) analysis. EMD has been a widely implemented tool to analyze non-stationary and non-linear signals by decomposing data into a series of sub-band oscillations known as intrinsic mode functions (IMFs). For each decomposed IMF signal, time and frequency domain features are then computed to provide a multi-resolution representation of the signal. The minimum redundancy maximum relevance (mRMR) principle is utilized to investigate the most representative features for the food intake classification task, which is carried out using the support vector machines. Experimental evaluations over selected groups of features and EMD achieve significant performance improvements compared to the baseline classification system without EMD.
signal processing and communications applications conference | 2015
M. A. Tugtekin Turan; Engin Erzin
In this paper, a new approach that extends narrowband excitation signals to synthesize wide-band speech have been proposed. Bandwidth extension problem is analyzed using source-filter separation framework where a speech signal is decomposed into two independent components. For spectral envelope extension, our former work based on hidden Markov model have been used. For excitation signal extension, the proposed method moves the spectrum based on correlation analysis where the distance between the harmonics and the structure of the excitation signal are preserved in high-bands. In experimental studies, we also apply two other well-known extension techniques for excitation signals comparatively and evaluate the overall performance of proposed system using the PESQ metric. Our findings indicate that the proposed extension method outperforms other two techniques.
signal processing and communications applications conference | 2014
M. A. Tugtekin Turan; Engin Erzin
In this analysis paper, we investigate the effect of phonetic clustering based on place and manner of articulation for the enhancement of throat-microphone speech through spectral envelope mapping. Place of articulation (PoA) and manner of articulation (MoA) dependent GMM-based spectral envelope mapping schemes have been investigated using the reflection coefficient representation of the linear prediction model. Reflection coefficients are expected to localize mapping performance within the concatenation of lossless tubes model of the vocal tract. In experimental studies, we evaluate spectral mapping performance within clusters of the PoA and MoA using the log-spectral distortion (LSD) and as function of reflection coefficient mapping using the mean-square error distance. Our findings indicate that highest degradations after the spectral mapping occur with stops and liquids of the MoA, and velar and alveolar classes of the PoA. The MoA classification attains higher improvements than the PoA classification.
conference of the international speech communication association | 2015
M. A. Tugtekin Turan; Engin Erzin
conference of the international speech communication association | 2013
M. A. Tugtekin Turan; Engin Erzin