Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Pongtep Angkititrakul is active.

Publication


Featured researches published by Pongtep Angkititrakul.


IEEE Transactions on Speech and Audio Processing | 2005

SpeechFind: advances in spoken document retrieval for a National Gallery of the Spoken Word

John H. L. Hansen; Rongqing Huang; Bowen Zhou; Michael Seadle; John R. Deller; Aparna Gurijala; Mikko Kurimo; Pongtep Angkititrakul

In this study, we discuss a number of issues for audio stream phrase recognition for information retrieval for a new National Gallery of the Spoken Word (NGSW). NGSW is the first largescale repository of its kind, consisting of speeches, news broadcasts, and recordings that are of historical content from the 20 th Century. We propose a system diagram and discuss critical tasks associated with effective audio information retrieval that include: advanced audio segmentation, speech recognition model adaptation for acoustic background noise and speaker variability, and natural language processing for text query requests. A number of questions regarding copyright assessment, metadata construction, digital watermarking must also be addressed for a sustainable audio collection of this magnitude. Our experimental online system entitled “SpeechFind” is presented which allows for audio retrieval from a portion of the NGSW corpus. We discuss a number of research challenges to address the overall task of robust phrase searching in unrestricted audio corpora. 1. Overview The problem of reliable speech recognition for spoken


IEEE Transactions on Audio, Speech, and Language Processing | 2006

Advances in phone-based modeling for automatic accent classification

Pongtep Angkititrakul; John H. L. Hansen

It is suggested that algorithms capable of estimating and characterizing accent knowledge would provide valuable information in the development of more effective speech systems such as speech recognition, speaker identification, audio stream tagging in spoken document retrieval, channel monitoring, or voice conversion. Accent knowledge could be used for selection of alternative pronunciations in a lexicon, engage adaptation for acoustic modeling, or provide information for biasing a language model in large vocabulary speech recognition. In this paper, we propose a text-independent automatic accent classification system using phone-based models. Algorithm formulation begins with a series of experiments focused on capturing the spectral evolution information as potential accent sensitive cues. Alternative subspace representations using principal component analysis and linear discriminant analysis with projected trajectories are considered. Finally, an experimental study is performed to compare the spectral trajectory model framework to a traditional hidden Markov model recognition framework using an accent sensitive word corpus. System evaluation is performed using a corpus representing five English speaker groups with native American English, and English spoken with Mandarin Chinese, French, Thai, and Turkish accents for both male and female speakers.


IEEE Transactions on Intelligent Transportation Systems | 2011

On the Use of Stochastic Driver Behavior Model in Lane Departure Warning

Pongtep Angkititrakul; Ryuta Terashima; Toshihiro Wakita

In this paper, we propose a new framework for discriminating the initial maneuver of a lane-crossing event from a driver correction event, which is the primary reason for false warnings of lane departure prediction systems (LDPSs). The proposed algorithm validates the beginning episode of the trajectory of driving signals, i.e., whether it will cause a lane-crossing event, by employing driver behavior models of the directional sequence of piecewise lateral slopes (DSPLS) representing lane-crossing and driver correction events. The framework utilizes only common driving signals and allows the adaptation scheme of driver behavior models to better represent individual driving characteristics. The experimental evaluation shows that the proposed DSPLS framework has a detection error with as low as a 17% equal error rate. Furthermore, the proposed algorithm reduces the false-warning rate of the original lane departure prediction system with less tradeoff for the correct prediction.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Discriminative In-Set/Out-of-Set Speaker Recognition

Pongtep Angkititrakul; John H. L. Hansen

In this paper, the problem of identifying in-set versus out-of-set speakers for limited training/test data durations is addressed. The recognition objective is to form a decision regarding an input speaker as being a legitimate member of a set of enrolled speakers or outside speakers. The general goal is to perform rapid speaker model construction from limited enrollment and test size resources for in-set testing for input audio streams. In-set detection can help ensure security and proper access to private information, as well as detecting and tracking input speakers. Areas of applications of these concepts include rapid speaker tagging and tracking for information retrieval, communication networks, personal device assistants, and location access. We propose an integrated system with emphasis on short-enrollment data (about 5 s of speech for each enrolled speaker) and test data (2-8 s) within a text-independent mode. We present a simple and yet powerful decision rule to accept or reject speakers using a discriminative vector in the decision score space, together with statistical hypothesis testing based on the conventional likelihood ratio test. Discriminative training is introduced to further improve system performance for both decision techniques, by employing minimum classification error and minimum verification error frameworks. Experiments are performed using three separate corpora. Using the YOHO speaker recognition database, the alternative decision rule achieves measurable improvement over the likelihood ratio test, and discriminative training consistently enhances overall system performance with relative improvements ranging from 11.26%-28.68%. A further extended evaluation using the TIMIT (CORPUS1) and actual noisy aircraft communications data (CORPUS2) shows measurable improvement over the traditional MAP based scheme using the likelihood ratio test (MAP-LRT), with average EERs of 9%-23% for TIMIT and 13%-32% for noisy aircraft communications. The results confirm that an effective in-set/out-of-set speaker recognition system can be formulated using discriminative training for rapid tagging of input speakers from limited training and test data sizes


ieee intelligent vehicles symposium | 2007

UTDrive: Driver Behavior and Speech Interactive Systems for In-Vehicle Environments

Pongtep Angkititrakul; Matteo Petracca; Amardeep Sathyanarayana; John H. L. Hansen

This paper describes an overview of the UTDrive project. UTDrive is part of an on-going international collaboration to collect and research rich multi-modal data recorded for modeling driver behavior for in-vehicle environments. The objective of the UTDrive project is to analyze behavior while the driver is interacting with speech-activated systems or performing common secondary tasks, as well as to better understand speech characteristics of the driver undergoing additional cognitive load. The corpus consists of audio, video, gas/brake pedal pressure, forward distance, GPS information, and CAN-Bus information. The resulting corpus, analysis, and modeling will contribute to more effective speech interactive systems with are less distractive and adjustable to the drivers cognitive capacity and driving situations.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Dialect/Accent Classification Using Unrestricted Audio

Rongqing Huang; John H. L. Hansen; Pongtep Angkititrakul

This study addresses novel advances in English dialect/accent classification. A word-based modeling technique is proposed that is shown to outperform a large vocabulary continuous speech recognition (LVCSR)-based system with significantly less computational costs. The new algorithm, which is named Word-based Dialect Classification (WDC), converts the text-independent decision problem into a text-dependent decision problem and produces multiple combination decisions at the word level rather than making a single decision at the utterance level. The basic WDC algorithm also provides options for further modeling and decision strategy improvement. Two sets of classifiers are employed for WDC: a word classifier DW(k) and an utterance classifier D u. DW(k) is boosted via the AdaBoost algorithm directly in the probability space instead of the traditional feature space. Du is boosted via the dialect dependency information of the words. For a small training corpus, it is difficult to obtain a robust statistical model for each word and each dialect. Therefore, a context adapted training (CAT) algorithm is formulated, which adapts the universal phoneme Gaussian mixture models (GMMs) to dialect-dependent word hidden Markov models (HMMs) via linear regression. Three separate dialect corpora are used in the evaluations that include the Wall Street Journal (American and British English), NATO N4 (British, Canadian, Dutch, and German accent English), and IViE (eight British dialects). Significant improvement in dialect classification is achieved for all corpora tested


3rd Biennial Workshop on Digital Signal Processing for Mobile and Vehicular Systems, DSP 2007 | 2009

UTDrive: The Smart Vehicle Project

Pongtep Angkititrakul; John H. L. Hansen; Sangjo Choi; Tyler Creek; Jeremy Hayes; Jeonghee Kim; Donggu Kwak; Levi T. Noecker; Anhphuc Phan

This chapter presents research activities of UTDrive: the smart vehicle project, at the Center for Robust Speech Systems, University of Texas at Dallas. The objectives of the UTDrive project are to collect and research rich multi-modal data recorded in actual car environments for analyzing and modeling driver behavior. The models of driver behavior under normal and distracted driving conditions can be used to create improved in-vehicle human–machine interactive systems and reduce vehicle accidents on the road. The UTDrive corpus consists of audio, video, brake/gas pedal pressure, head distance, GPS information (e.g., position, velocity), and CAN-bus information (e.g., steering-wheel angle, brake position, throttle position, and vehicle speed). Here, we describe our in-vehicle data collection framework, data collection protocol, dialog and secondary task demands, data analysis, and preliminary experimental results. Finally, we discuss our proposed multi-layer data transcription procedure for in-vehicle data collection and future research directions.


Archive | 2005

CU-Move: Advanced In-Vehicle Speech Systems for Route Navigation

John H. L. Hansen; Xianxian Zhang; Murat Akbacak; Umit H. Yapanel; Bryan L. Pellom; Wayne H. Ward; Pongtep Angkititrakul

In this chapter, we present our recent advances in the formulation and development of an in-vehicle hands-free route navigation system. The system is comprised of a multi-microphone array processing front-end, environmental sniffer (for noise analysis), robust speech recognition system, and dialog manager and information servers. We also present our recently completed speech corpus for in-vehicle interactive speech systems for route planning and navigation. The corpus consists of five domains which include: digit strings, route navigation expressions, street and location sentences, phonetically balanced sentences, and a route navigation dialog in a human Wizard-of-Oz like scenario. A total of 500 speakers were collected from across the United States of America during a six month period from April-Sept. 2001. While previous attempts at in-vehicle speech systems have generally focused on isolated command words to set radio frequencies, temperature control, etc., the CU-Move system is focused on natural conversational interaction between the user and in-vehicle system. After presenting our proposed in-vehicle speech system, we consider advances in multi-channel array processing, environmental noise sniffing and tracking, new and more robust acoustic front-end representations and built-in speaker normalization for robust ASR, and our back-end dialog navigation information retrieval sub-system connected to the WWW. Results are presented in each sub-section with a discussion at the end of the chapter.


IEEE Transactions on Intelligent Transportation Systems | 2012

Self-Coaching System Based on Recorded Driving Data: Learning From One's Experiences

Kazuya Takeda; Chiyomi Miyajima; Tatsuya Suzuki; Pongtep Angkititrakul; Kenji Kurumida; Yuichi Kuroyanagi; Hiroaki Ishikawa; Ryuta Terashima; Toshihiro Wakita; Masato Oikawa; Yuichi Komada

This paper describes the development of a self-coaching system to improve driving behavior by allowing drivers to review a record of their own driving activity. By employing stochastic driver-behavior modeling, the proposed system is able to detect a wide range of potentially hazardous situations, which conventional event data recorders are not able to capture, including those involving latent risks, of which drivers themselves are unaware. By utilizing these automatically detected hazardous situations, our web-based system offers a user-friendly interface for drivers to navigate and review each hazardous situation in detail (e.g., driving scenes are categorized into different types of hazardous situations and are displayed with corresponding multimodal driving signals). Furthermore, the system provides feedback on each risky driving behavior and suggests how users can safely respond to such situations. The proposed system establishes a cooperative relationship between the driver, the vehicle, and the driving environment, leading to the development of the next generation of safety systems and paving the way for an alternative form of driving education that could further reduce the number of fatal accidents. The systems potential benefits are demonstrated through preliminary extensive evaluation of an on-road experiment, showing that safe-driving behavior can be significantly improved when drivers use the proposed system.


ieee intelligent vehicles symposium | 2012

An improved driver-behavior model with combined individual and general driving characteristics

Pongtep Angkititrakul; Chiyomi Miyajima; Kazuya Takeda

In this paper, we propose a stochastic driver-behavior modeling framework which takes into account both individual and general driving characteristics as one aggregate model. Patterns of individual driving styles are modeled using Dirichlet process mixture model, a nonparametric Bayesian approach which automatically selects the optimal number of model components to fit sparse observations of each particular drivers behavior. In addition, general or background driving patterns are also captured with a Gaussian mixture model using a reasonably large amount of development observed data from several drivers. By combining both probability distributions, the aggregate driver-dependent model can better emphasize driving characteristics of each particular driver, while also backing off to exploit general driving behavior in cases of unmatched parameter spaces from individual training observations. The proposed driver-behavior model was employed to anticipate pedal-operation behavior during car-following maneuvers involving several drivers on the road. The experimental results showed advantages of the combined model over the adapted model previously proposed.

Collaboration


Dive into the Pongtep Angkititrakul's collaboration.

Top Co-Authors

Avatar

John H. L. Hansen

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Wooil Kim

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bryan L. Pellom

University of Colorado Boulder

View shared research outputs
Researchain Logo
Decentralizing Knowledge