Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Levent M. Arslan is active.

Publication


Featured researches published by Levent M. Arslan.


Speech Communication | 1999

Speaker transformation algorithm using segmental codebooks (STASC)

Levent M. Arslan

This paper presents a new voice conversion algorithm which modifies the utterance of a source speaker to sound-like speech from a target speaker. We refer to the method as Speaker Transformation Algorithm using Segmental Codebooks (STASC). A novel method is proposed which finds accurate alignments between source and target speaker utterances. Using the alignments, source speaker acoustic characteristics are mapped to target speaker acoustic characteristics. The acoustic parameters included in the mapping are vocal tract, excitation, intonation, energy, and duration characteristics. Informal listening tests suggest that convincing voice conversion is achieved while maintaining high speech quality. The performance of the proposed system is also evaluated on a simple Gaussian mixture model-based speaker identification system, and the results show that the transformed speech is assigned higher likelihood by the target speaker model when compared to the source speaker model.


Speech Communication | 1996

Language accent classification in American English

Levent M. Arslan; John H. L. Hansen

Abstract It is well known that speaker variability caused by accent is one factor that degrades performance of speech recognition algorithms. If knowledge of speaker accent can be estimated accurately, then a modified set of recognition models which addresses speaker accent could be employed to increase recognition accuracy. In this study, the problem of language accent classification in American English is considered. A database of foreign language accent is established that consists of words and phrases that are known to be sensitive to accent. Next, isolated word and phoneme based accent classification algorithms are developed. The feature set under consideration includes Mel-cepstrum coefficients and energy, and their first order differences. It is shown that as test utterance length increases, higher classification accuracy is achieved. Isolated word strings of 7–8 words uttered by the speaker results in an accent classification rate of 93% among four different language accents. A subjective listening test is also conducted in order to compare human performance with computer algorithm performance in accent discrimination. The results show that computer based accent classification consistently achieves superior performance over human listener responses for classification. It is shown, however, that some listeners are able to match algorithm performance for accent detection. Finally, an experimental study is performed to investigate the influence of foreign accent on speech recognition algorithms. It is shown that training separate models for each accent rather than using a single model for each word can improve recognition accuracy dramatically.


international conference on acoustics, speech, and signal processing | 1995

Foreign accent classification using source generator based prosodic features

John H. L. Hansen; Levent M. Arslan

Speaker accent is an important issue in the formulation of robust speaker independent recognition systems. Knowledge gained from a reliable accent classification approach could improve overall recognition performance. In this paper, a new algorithm is proposed for foreign accent classification of American English. A series of experimental studies are considered which focus on establishing how speech production is varied to convey accent. The proposed method uses a source generator framework, recently proposed for analysis and recognition of speech under stress [5]. An accent sensitive database is established using speakers of American English with foreign language accents. An initial version of the classification algorithm classified speaker accent from among four different accents with an accuracy of 81.5% in the case of unknown text, and 88.9% assuming known text. Finally, it is shown that as ascent sensitive word count increases, the ability to correctly classify accent also increases, achieving an overall classification rate of 92% among four accent classes.


international conference on acoustics, speech, and signal processing | 1995

New methods for adaptive noise suppression

Levent M. Arslan; Alan V. McCree; Vishu R. Viswanathan

We propose three new adaptive noise suppression algorithms for enhancing noise-corrupted speech: smoothed spectral subtraction (SSS), vector quantization of line spectral frequencies (VQ-LSF), and modified Wiener filtering (MWF). SSS is an improved version of the well-known spectra subtraction algorithm, while the other two methods are based on generalised Wiener filtering. We have compared these three algorithms with each other and with spectral subtraction on both simulated noise and actual car noise. All three proposed methods perform substantially better than spectral subtraction, primarily because of the absence of any musical noise artifacts in the processed speech. Listening tests showed preference for MWF and SSS over VQ-LSF. Also, MWF provides a much higher mean opinion score (MOS) than does spectral subtraction. Finally, VQ-LSF provides a relatively good spectral match to the clean speech, and may, therefore, be better suited for speech recognition.


Journal of the Acoustical Society of America | 1997

A study of temporal features and frequency characteristics in American English foreign accent

Levent M. Arslan; John H. L. Hansen

In this paper, a detailed acoustic study of foreign accent is proposed using temporal features, intonation patterns, and frequency characteristics in American English. Using a database which consists of words uttered in isolation, temporal features such as voice onset time, word-final stop closure duration, and characteristics of duration are investigated. Accent differences for native-produced versus Mandarin, German, and Turkish accented English utterances are analyzed. Of the dimensions considered, the most important accent relayer is found to be word-final stop closure duration. Mandarin accented English utterances show significant differences in terms of this feature when compared to native speaker utterances. In addition, the intonation characteristics across a set of foreign accents in American English is investigated. It is shown that Mandarin speaker utterances possess a larger negative continuative intonation slope than native speaker utterances, and German speaker utterances had a more positive intonation slope when compared to native speaker utterances. Finally, a detailed frequency analysis of foreign accented speech is conducted. It is shown that the midfrequency range (1500–2500 Hz) is the most sensitive frequency band to non-native speaker pronunciation variations. Based on this knowledge a new frequency scale for the calculation of cepstrum coefficients is formulated which is shown to outperform the Mel-scale in terms of its ability to classify accent automatically among four accent classes.In this paper, a detailed acoustic study of foreign accent is proposed using temporal features, intonation patterns, and frequency characteristics in American English. Using a database which consists of words uttered in isolation, temporal features such as voice onset time, word-final stop closure duration, and characteristics of duration are investigated. Accent differences for native-produced versus Mandarin, German, and Turkish accented English utterances are analyzed. Of the dimensions considered, the most important accent relayer is found to be word-final stop closure duration. Mandarin accented English utterances show significant differences in terms of this feature when compared to native speaker utterances. In addition, the intonation characteristics across a set of foreign accents in American English is investigated. It is shown that Mandarin speaker utterances possess a larger negative continuative intonation slope than native speaker utterances, and German speaker utterances had a more positive...


Computer Speech & Language | 2006

Robust processing techniques for voice conversion

Oytun Türk; Levent M. Arslan

Abstract Differences in speaker characteristics, recording conditions, and signal processing algorithms affect output quality in voice conversion systems. This study focuses on formulating robust techniques for a codebook mapping based voice conversion algorithm. Three different methods are used to improve voice conversion performance: confidence measures, pre-emphasis, and spectral equalization. Analysis is performed for each method and the implementation details are discussed. The first method employs confidence measures in the training stage to eliminate problematic pairs of source and target speech units that might result from possible misalignments, speaking style differences or pronunciation variations. Four confidence measures are developed based on the spectral distance, fundamental frequency (f0) distance, energy distance, and duration distance between the source and target speech units. The second method focuses on the importance of pre-emphasis in line-spectral frequency (LSF) based vocal tract modeling and transformation. The last method, spectral equalization, is aimed at reducing the differences in the source and target long-term spectra when the source and target recording conditions are significantly different. The voice conversion algorithm that employs the proposed techniques is compared with the baseline voice conversion algorithm with objective tests as well as three subjective listening tests. First, similarity to the target voice is evaluated in a subjective listening test and it is shown that the proposed algorithm improves similarity to the target voice by 23.0%. An ABX test is performed and the proposed algorithm is preferred over the baseline algorithm by 76.4%. In the third test, the two algorithms are compared in terms of the subjective quality of the voice conversion output. The proposed algorithm improves the subjective output quality by 46.8% in terms of mean opinion score (MOS).


IEEE Transactions on Speech and Audio Processing | 1999

Selective training for hidden Markov models with applications to speech classification

Levent M. Arslan; John H. L. Hansen

Traditional maximum likelihood estimation of hidden Markov model parameters aims at maximizing the overall probability across the training tokens of a given speech unit. As such, it disregards any interaction or biases across the models in the training procedure. Often, the resulting model parameters do not result in minimum error classification in the training set. A new selective training method is proposed that controls the influence of outliers in the training data on the generated models. The resulting models are shown to possess feature statistics which are more clearly separated for confusable patterns. The proposed selective training procedure is used for hidden Markov model training, with application to foreign accent classification, language identification, and speech recognition using the E-set alphabet. The resulting error rates are measurably improved over traditional forward-backward training under open test conditions. The proposed method is similar in terms of its goal to maximum mutual information estimation training, however it requires less computation, and the convergence properties of maximum likelihood estimation are retained in the new formulation.


IEEE Transactions on Speech and Audio Processing | 1995

Robust feature-estimation and objective quality assessment for noisy speech recognition using the Credit Card corpus

John H. L. Hansen; Levent M. Arslan

The introduction of acoustic background distortion into speech causes recognition algorithms to fail. In order to improve the environmental robustness of speech recognition in adverse conditions, a novel constrained-iterative feature-estimation algorithm is considered and shown to produce improved feature characterization in a variety of actual noise conditions. In addition, an objective measure based MAP estimator is formulated as a means of predicting changes in robust recognition performance at the speech feature extraction stage. The four measures considered include (i) NIST SNR; (ii) Itakura-Saito log-likelihood; (iii) log-area-ratio; (iv) the weighted-spectral slope measure. A continuous distribution, monophone based, hidden Markov model recognition algorithm is used for objective measure based MAP estimator analysis and recognition evaluation. Evaluations were based on speech data from the Credit Card corpus (CC-DATA). It is shown that feature enhancement provides a consistent level of recognition improvement for broadband, and low-frequency colored noise sources. As the stationarity assumption for a given noise source breaks down, the ability of feature enhancement to improve recognition performance decreases. Finally, the log-likelihood based MAP estimator was found to be the best predictor of recognition performance, while the NIST SNR based MAP estimator was found to be poorest recognition predictor across the 27 noise conditions considered. >


international conference on acoustics, speech, and signal processing | 1997

Frequency characteristics of foreign accented speech

Levent M. Arslan; John H. L. Hansen

In this study, the frequency characteristics of foreign accented speech is investigated. Experiments are conducted to discover the relative significance of different resonant frequencies and frequency bands in terms of their accent discrimination ability. It is shown that second and third formants are more important than other resonant frequencies. A filter bank analysis of accented speech supports this statement, where the 1500-2500 Hz range was shown to be the most significant frequency range in discriminating accented speech. Based on these results, a new frequency scale is proposed in place of the commonly used Mel-scale to extract the cepstrum coefficients from the speech signal. The proposed scale results in better performance for the problems of accent classification and language identification.


IEEE Transactions on Speech and Audio Processing | 1995

Markov model-based phoneme class partitioning for improved constrained iterative speech enhancement

John H. L. Hansen; Levent M. Arslan

Research has shown that degrading acoustic background noise influences speech quality across phoneme classes in a nonuniform manner. This results in variable quality performance of many speech enhancement algorithms in noisy environments. A phoneme classification procedure is proposed which directs single-channel constrained speech enhancement. The procedure performs broad phoneme class partitioning of noisy speech frames using a continuous mixture hidden Markov model recognizer in conjunction with a perceptually motivated cost-based decision process. Once noisy speech frames are identified, iterative speech enhancement based on all-pole parameter estimation with inter- and intra-frame spectral constraints is employed. The phoneme class-directed enhancement algorithm is evaluated using TIMIT speech data and shown to result in substantial improvement in objective speech quality over a range of signal-to-noise ratios and individual phoneme classes. >

Collaboration


Dive into the Levent M. Arslan's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

John H. L. Hansen

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mustafa Suphi Erden

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge