Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Paisarn Charoenpornsawat is active.

Publication


Featured researches published by Paisarn Charoenpornsawat.


international conference on acoustics, speech, and signal processing | 2005

Thai automatic speech recognition

Sinaporn Suebvisai; Paisarn Charoenpornsawat; Alan W. Black; Monika Woszczyna; Tanja Schultz

We describe the development of a robust and flexible Thai speech recognizer as integrated into our English-Thai speech-to-speech translation system. We focus on the discussion of the rapid deployment of ASR for Thai under limited time and data resources, including rapid data collection issues, acoustic model bootstrap, and automatic generation of pronunciations. Issues relating to the translation and overall system will be reported elsewhere.


international conference on computational linguistics | 2002

Improving translation quality of rule-based machine translation

Paisarn Charoenpornsawat; Virach Sornlertlamvanich; Thatsanee Charoenporn

This paper proposes machine learning techniques, which help disambiguate word meaning. These methods focus on considering the relationship between a word and its surroundings, described as context information in the paper. Context information is produced from rule-based translation such as part-of-speech tags, semantic concept, case relations and so on. To automatically extract the context information, we apply machine learning algorithms which are C4.5, C4.5rule and RIPPER. In this paper, we test on ParSit, which is an interlingual-based machine translation for English to Thai. To evaluate our approach, an verb-to-be is selected because it has increased in frequency and it is quite difficult to be translated into Thai by using only linguistic rules. The result shows that the accuracy of C4.5, C4.5rule and RIPPER are 77.7%, 73.1% and 76.1% respectively whereas ParSit give accuracy only 48%.


north american chapter of the association for computational linguistics | 2006

Thai Grapheme-Based Speech Recognition

Paisarn Charoenpornsawat; Sanjika Hewavitharana; Tanja Schultz

In this paper we present the results for building a grapheme-based speech recognition system for Thai. We experiment with different settings for the initial context independent system, different number of acoustic models and different contexts for the speech unit. In addition, we investigate the potential of an enhanced tree clustering method as a way of sharing parameters across models. We compare our system with two phoneme-based systems; one that uses a hand-crafted dictionary and another that uses an automatically generated dictionary. Experiment results show that the grapheme-based system with enhanced tree clustering outperforms the phoneme-based system using an automatically generated dictionary, and has comparable results to the phoneme-based system with the hand-crafted dictionary.


asia pacific conference on circuits and systems | 1998

Feature-based Thai unknown word boundary identification using Winnow

Paisarn Charoenpornsawat; Boonserm Kijsirikul; Surapant Meknavin

This paper addresses the problem of Thai unknown word boundary identification. Unknown words are becoming the main problem in many tasks of natural language processing such as word segmentation information retrieval and part of speech tagging, etc.. In Thai, as words are written consecutively without delimiters, finding an unknown word boundary is difficult. We proposed a feature-based approach to identify Thai unknown word boundary. A feature can be anything that tests for specific information in context around the target unknown words. To automatically extract features from a training corpus, we used a machine learning algorithm, namely Winnow.


north american chapter of the association for computational linguistics | 2003

A context-sensitive homograph disambiguation in Thai text-to-speech synthesis

Virongrong Tesprasit; Paisarn Charoenpornsawat; Virach Sornlertlamvanich

Homograph ambiguity is an original issue in Text-to-Speech (TTS). To disambiguate homograph, several efficient approaches have been proposed such as part-of-speech (POS) n-gram, Bayesian classifier, decision tree, and Bayesian-hybrid approaches. These methods need words or/and POS tags surrounding the question homographs in disambiguation. Some languages such as Thai, Chinese, and Japanese have no word-boundary delimiter. Therefore before solving homograph ambiguity, we need to identify word boundaries. In this paper, we propose a unique framework that solves both word segmentation and homograph ambiguity problems altogether. Our model employs both local and long-distance contexts, which are automatically extracted by a machine learning technique called Winnow.


north american chapter of the association for computational linguistics | 2009

Incremental Adaptation of Speech-to-Speech Translation

Nguyen Bach; Roger Hsiao; Matthias Eck; Paisarn Charoenpornsawat; Stephan Vogel; Tanja Schultz; Ian R. Lane; Alex Waibel; Alan W. Black

In building practical two-way speech-to-speech translation systems the end user will always wish to use the system in an environment different from the original training data. As with all speech systems, it is important to allow the system to adapt to the actual usage situations. This paper investigates how a speech-to-speech translation system can adapt day-to-day from collected data on day one to improve performance on day two. The platform is the CMU Iraqi-English portable two-way speech-to-speech system as developed under the DARPA TransTac program. We show how machine translation, speech recognition and overall system performance can be improved on day 2 after adapting from day 1 in both a supervised and unsupervised way.


spoken language technology workshop | 2008

Improving word segmentation for Thai speech translation

Paisarn Charoenpornsawat; Tanja Schultz

A vocabulary list and language model are primary components in a speech translation system. Generating both from plain text is a straightforward task for English. However, it is quite challenging for Chinese, Japanese, or Thai which provide no word segmentation, i.e. the text has no word boundary delimiter. For Thai word segmentation, maximal matching, a lexicon-based approach, is one of the popular methods. Nevertheless this method heavily relies on the coverage of the lexicon. When text contains an unknown word, this method usually produces a wrong boundary. When extracting words from this segmented text, some words will not be retrieved because of wrong segmentation. In this paper, we propose statistical techniques to tackle this problem. Based on different word segmentation methods we develop various speech translation systems and show that the proposed method can significantly improve the translation accuracy by about 6.42% BLEU points compared to the baseline system.


Proc. IWSLT, 2007 | 2007

The CMU TransTac 2007 Eyes-free and Hands-free Two-way Speech-to-Speech Translation System

Nguyen Bach; Matthias Eck; Paisarn Charoenpornsawat; Thilo Köhler; Sebastian Stüker; ThuyLinh Nguyen; Roger Hsiao; Alex Waibel; Stephan Vogel; Tanja Schultz; Alan W. Black


conference of the international speech communication association | 2006

Optimizing components for handheld two-way speech translation for an English-Iraqi Arabic system

Roger Hsiao; Ashish Venugopal; Ying Zhang; Paisarn Charoenpornsawat; Andreas Zollmann; Stephan Vogel; Alan W. Black; Tanja Schultz; Alex Waibel


international conference on the computer processing of oriental languages | 2001

Automatic Sentence Break Disambiguation for Thai

Paisarn Charoenpornsawat; Virach Sornlertlamvanich

Collaboration


Dive into the Paisarn Charoenpornsawat's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alan W. Black

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Virach Sornlertlamvanich

Sirindhorn International Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Roger Hsiao

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Stephan Vogel

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Alex Waibel

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Matthias Eck

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Monika Woszczyna

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Nguyen Bach

Carnegie Mellon University

View shared research outputs
Researchain Logo
Decentralizing Knowledge