Ausdang Thangthai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ausdang Thangthai is active.

Explore More

Publication

Featured researches published by Ausdang Thangthai.

ieee automatic speech recognition and understanding workshop | 2011

Accent level adjustment in bilingual Thai-English text-to-speech synthesis

Chai Wutiwiwatchai; Ausdang Thangthai; Ananlada Chotimongkol; Chatchawarn Hansakunbuntheung; Nattanun Thatphithakkul

This paper introduces an accent level adjustment mechanism for Thai-English text-to-speech synthesis (TTS). English words often appearing in modern Thai writing can be speech synthesized by either Thai TTS using corresponding Thai phones or by separated English TTS using English phones. As many Thai native listeners may not prefer any of such extreme accent styles, a mechanism that allows selecting accent level preference is proposed. In HMM-based TTS, adjusting the accent level is done by interpolating HMMs of purely Thai and purely English sounds. Solutions for cross-language phone alignment and HMM state mapping are addressed. Evaluations are performed by a listening test on sounds synthesized with varied accent levels. Experimental results show that the proposed method is acceptable by the majority of human listeners.

2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) | 2011

Mongolian speech corpus for text-to-speech development

Chatchawarn Hansakunbuntheung; Ausdang Thangthai; Nattanun Thatphithakkul; Altangerel Chagnaa

This paper presents a first attempt to develop Mongolian speech corpus that designed for data-driven speech synthesis in Mongolia. The aim of the speech corpus is to develop a high-quality Mongolian TTS for blinds to use with screen reader. The speech corpus contains nearly 6 hours of Mongolian phones. It well provides Cyrillic text transcription and its phonetic transcription with stress marking. It also provides context information including phone context, stressing levels, syntactic position in word, phrase and utterance for modeling speech acoustics and characteristics for speech synthesis.

international conference on electrical engineering/electronics, computer, telecommunications and information technology | 2008

Automatic duration weighting in Thai unit-selection speech synthesis

S. Saychum; A. Rugchatjaroen; Nattanun Thatphithakkul; Chai Wutiwiwatchai; Ausdang Thangthai

This paper presents the naturalness improvement in Thai unit-selection text-to-speech synthesis (TTS) by automatic weighting of targeted cost. An intuition of the proposed method is that the sensitivity of human perception might be varied to different phonemic and prosodic units. In this work, the unit-selection targeted-cost of each phoneme unit is weighted differently according to its duration statistic and voicing characteristic. Two automatic weighting algorithms, based on the statistical mean and standard deviation of phoneme duration, are comparatively evaluated. A subjective test shows a 0.46 mean-opinion-score improvement over the baseline speech synthesized without targeted-cost weighting.

conference of the international speech communication association | 2016

Visual speech synthesis using dynamic visemes, contextual features and DNNs

Ausdang Thangthai; Ben Milner; Sarah Taylor

This paper examines methods to improve visual speech synthesis from a text input using a deep neural network (DNN). Two representations of the input text are considered, namely into phoneme sequences or dynamic viseme sequences. From these sequences, contextual features are extracted that include information at varying linguistic levels, from frame level down to the utterance level. These are extracted from a broad sliding window that captures context and produces features that are input into the DNN to estimate visual features. Experiments first compare the accuracy of these visual features against an HMM baseline method which establishes that both the phoneme and dynamic viseme systems perform better with best performance obtained by a combined phoneme-dynamic viseme system. An investigation into the features then reveals the importance of the frame level information which is able to avoid discontinuities in the visual feature sequence and produces a smooth and realistic output.

international conference on electrical engineering electronics computer telecommunications and information technology | 2011

Categorial-grammar-based phrase break prediction

S. Saychum; C. Hansakunbuntheung; Nattanun Thatphithakkul; T. Ruangrajitpakorn; Chai Wutiwiwatchai; T. Supnithi; Ananlada Chotimongkol; Ausdang Thangthai

Part-of-speech (POS) has been widely used as the main feature for predicting phrase breaks in text-to-speech synthesis (TTS) systems. However, POS does not clearly represent syntactic information that is necessary for analyzing the grammatical tree structure of a language to assign phrase breaks. Instead of using POS, this paper proposes to use categorial grammar (CG), which embeds fine syntactic information, for Thai as a key feature to predict phrase breaks in Thai Texts. The performances of phrase break predictions using CG, POS, and their reduced sets are compared using classification and regression tree (CART) for learning and predicting phrase break locations. The experimental results showed that the phrase break prediction using CGs as the main feature gave the best performance among the tested features (Precision=73.15%, Recall = 96.96%, F-measure=83.39%).

conference of the international speech communication association | 2007