Santitham Prom-on
King Mongkut's University of Technology Thonburi
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Santitham Prom-on.
Speech Communication | 2014
Yi Xu; Santitham Prom-on
Variability has been one of the major challenges for both theoretical understanding and computer synthesis of speech prosody. In this paper we show that economical representation of variability is the key to effective modeling of prosody. Specifically, we report the development of PENTAtrainer-A trainable yet deterministic prosody synthesizer based on an articulatory-functional view of speech. We show with testing results on Thai, Mandarin and English that it is possible to achieve high-accuracy predictive synthesis of fundamental frequency contours with very small sets of parameters obtained through stochastic learning from real speech data. The first key component of this system is syllable-synchronized sequential target approximation-implemented as the qTA model, which is designed to simulate, for each tonal unit, a wide range of contextual variability with a single invariant target. The second key component is the automatic learning of function-specific targets through stochastic global optimization, guided by a layered pseudo-hierarchical functional annotation scheme, which requires the manual labeling of only the temporal domains of the functional units. The results in terms of synthesis accuracy demonstrate that effective modeling of the contextual variability is the key also to effective modeling of function-related variability. Additionally, we show that, being both theory-based and trainable (hence data-driven), computational systems like PENTAtrainer can serve as an effective modeling tool in basic research, with which the level of falsifiability in theory testing can be raised, and also a closer link between basic and applied research in speech science can be developed.
Journal of the Acoustical Society of America | 2012
Santitham Prom-on; Fang Liu; Yi Xu
Post-low bouncing is a phenomenon whereby after reaching a very low pitch in a low lexical tone, F(0) bounces up and then gradually drops back in the following syllables. This paper reports the results of an acoustic analysis of the phenomenon in two Mandarin Chinese corpora and presents a simple mechanical model that can effectively simulate this bouncing effect. The acoustic analysis shows that most of the F(0) dynamic features profiling the bouncing effect strongly correlate with the amount of F(0) lowering in the preceding low-tone syllable, and that the additional F(0) raising commences at the onset of the first post-low syllable. Using the quantitative Target Approximation model, this bouncing effect was simulated by adding an acceleration adjustment to the initial F(0) state of the first post-low syllable. A highly linear relation between F(0) lowering and estimated acceleration adjustment was found. This relation was then used to effectively simulate the bouncing effect in both the neutral tone and the full tones. The results of the analysis and simulation are consistent with the hypothesis that the bouncing effect is due to a temporary perturbation of the balance between antagonistic forces in the laryngeal control in producing a very low pitch.
Neural Computing and Applications | 2012
Pitak Sootanan; Santitham Prom-on; Asawin Meechai; Jonathan H. Chan
The advent of high-throughput technology has made it possible to measure genome-wide expression profiles, thus providing a new basis for microarray-based diagnosis of disease states. Numerous methods have been proposed to identify biomarkers that can accurately discriminate between case and control classes. Many of the methods used only a subset of ranked genes in the pathway and may not be able to fully represent the classification boundaries for the two disease classes. The use of negatively correlated feature sets (NCFS) to obtain more relevant features in form of phenotype-correlated genes (PCOGs) and inferring pathway activities is proposed in this study. The two pathway activity inference schemes that use NCFS significantly improved the power of pathway markers to discriminate between two phenotypes classes in microarray expression datasets of breast cancer. In particular, the NCFS-i method provided better contrasting features for classification purposes. The improvement is consistent for all cases of pathways used, using both within- and across-dataset validations. The results show that the two proposed methods that use NCFS clearly outperformed other pathway-based classifiers in terms of both ROC area and discriminative score. That is, the identification of PCOGs within each pathway, especially NCFS-i method, helps to reduce noisy or variable measurements, leading to a high performance and more robust classifier. In summary, we have demonstrated that effective incorporation of pathway information into expression-based disease diagnosis and using NCFS can provide better discriminative and more robust models.
international conference on acoustics, speech, and signal processing | 2006
Santitham Prom-on; Yi Xu; Bundit Thipakorn
This paper proposes a quantitative target approximation (qTA) model for simulating tone and intonation. Based on two theoretical models: the target approximation model (Y. Xu and Q.E. Wang, 2001) and the PENTA model (Y. Xu, 2005), the qTA model additionally incorporates several assumptions related to the underlying articulatory mechanisms, including (1) F0 production can be represented by a second-order overdamped system, and (2) the system is controlled by a time-delayed feedback loop to sequentially approximate underlying pitch targets. We tested the model with the dataset from Y. Xu (1999). Two experiments were conducted to validate the model and to study the effect of tone, position, and focus. The results were satisfactory in term of the error rate and correlation
Journal of Bioinformatics and Computational Biology | 2011
Santitham Prom-on; Atthawut Chanthaphan; Jonathan H. Chan; Asawin Meechai
Relationships among gene expression levels may be associated with the mechanisms of the disease. While identifying a direct association such as a difference in expression levels between case and control groups links genes to disease mechanisms, uncovering an indirect association in the form of a network structure may help reveal the underlying functional module associated with the disease under scrutiny. This paper presents a method to improve the biological relevance in functional module identification from the gene expression microarray data by enhancing the structure of a weighted gene co-expression network using minimum spanning tree. The enhanced network, which is called a backbone network, contains only the essential structural information to represent the gene co-expression network. The entire backbone network is decoupled into a number of coherent sub-networks, and then the functional modules are reconstructed from these sub-networks to ensure minimum redundancy. The method was tested with a simulated gene expression dataset and case-control expression datasets of autism spectrum disorder and colorectal cancer studies. The results indicate that the proposed method can accurately identify clusters in the simulated dataset, and the functional modules of the backbone network are more biologically relevant than those obtained from the original approach.
Phonology | 2015
Yi Xu; Albert Lee; Santitham Prom-on; Fang Liu
This paper presents an overview of the Parallel Encoding and Target Approximation (PENTA) model of speech prosody, in response to an extensive critique by Arvaniti & Ladd (2009). PENTA is a framework for conceptually and computationally linking communicative meanings to fine-grained prosodic details, based on an articulatory-functional view of speech. Target Approximation simulates the articulatory realisation of underlying pitch targets – the prosodic primitives in the framework. Parallel Encoding provides an operational scheme that enables simultaneous encoding of multiple communicative functions. We also outline how PENTA can be computationally tested with a set of software tools. With the help of one of the tools, we offer a PENTA-based hypothetical account of the Greek intonational patterns reported by Arvaniti & Ladd, showing how it is possible to predict the prosodic shapes of an utterance based on the lexical and postlexical meanings it conveys.
international conference on computer control informatics and its applications | 2014
Santitham Prom-on; Sirapop Na Ranong; Patcharaporn Jenviriyakul; Thepparit Wongkaew; Nareerat Saetiew; Tiranee Achalakul
This paper presents the development of DOM, a mobile big data analytics engine for mining Thai public opinions. The engine takes in data from multiple well-known social network sources, and then processes them using MapReduce, a keyword-based sentiment analysis technique, and an influencer analysis algorithm to determine public opinions and sentiments of certain topics. The system was evaluated its sentiment prediction accuracy by matching the predicted result with the human sentiment and tested on various case studies. The effectiveness of the approach demonstrates the practical applications of the engine.
Eurasip Journal on Audio, Speech, and Music Processing | 2014
Santitham Prom-on; Peter Birkholz; Yi Xu
This paper investigates the estimation of underlying articulatory targets of Thai vowels as invariant representation of vocal tract shapes by means of analysis-by-synthesis based on acoustic data. The basic idea is to simulate the process of learning speech production as a distal learning task, with acoustic signals of natural utterances in the form of Mel-frequency cepstral coefficients (MFCCs) as input, VocalTractLab - a 3D articulatory synthesizer controlled by target approximation models as the learner, and stochastic gradient descent as the target training method. To test the effectiveness of this approach, a speech corpus was designed to contain contextual variations of Thai vowels by juxtaposing nine Thai long vowels in two-syllable sequences. A speech corpus consisting of 81 disyllabic utterances was recorded from a native Thai speaker. Nine vocal tract shapes, each corresponding to a vowel, were estimated by optimizing the vocal tract shape parameters of each vowel to minimize the sum of square error of MFCCs between original and synthesized speech. The stochastic gradient descent algorithm was used to iteratively optimize the shape parameters. The optimized vocal tract shapes were then used to synthesize Thai vowels both in monosyllables and in disyllabic sequences. The results, both numerically and perceptually, indicate that this model-based analysis strategy allows us to effectively and economically estimate the vocal tract shapes to synthesize accurate Thai vowels as well as smooth formant transitions between adjacent vowels.
international conference on neural information processing | 2010
Pitak Sootanan; Santitham Prom-on; Asawin Meechai; Jonathan H. Chan
The vast amount of data on gene expression that is now available through high-throughput measurement of mRNA abundance has provided a new basis for disease diagnosis. Microarray-based classification of disease states is based on gene expression profiles of patients. A large number of methods have been proposed to identify diagnostic markers that can accurately discriminate between different classes of a disease. Using only a subset of genes in the pathway, such as so-called condition-responsive genes (CORGs), may not fully represent the two classification boundaries for Case and Control classes. Negatively correlated feature sets (NCFS) for identifying CORGs and inferring pathway activities are proposed in this study. Our two proposed methods (NCFSi and NCFS-c) achieve higher accuracy in disease classification and can identify more phenotype-correlated genes in each pathway when comparing to several existing pathway activity inference methods.
international conference on neural information processing | 2009
Thammakorn Saethang; Santitham Prom-on; Asawin Meechai; Jonathan H. Chan
Feature selection (FS) plays a crucial role in machine learning to build a robust model for either learning or classification from a large amount of data. Among feature selection techniques, the Relief algorithm is one of the most common due to its simplicity and effectiveness. The performance of the Relief algorithm, however, could be dramatically affected by the consistency of the data patterns. For instance, Relief-F could become less accurate in the presence of noise. The accuracy would decrease further if an outlier sample was included in the dataset. Therefore, it is very important to select the samples to be included in the dataset carefully. This paper presents an effort to improve the effectiveness of Relief algorithm by filtering samples before selecting features. This method is termed Sample Filtering Relief Algorithm (SFRA). The main idea of this method is to discriminate outlier samples out of the main pattern using self organizing map (SOM) and then proceed with feature selection using the Relief algorithm. We have tested SFRA with a gene expression dataset of interferon-?(IFN-?) response of Hepatitis B patients that contains outlier data. SFRA could successfully remove outlier samples that have been verified by visual inspection by experts. Also, it has better accuracy in separating the relevant and irrelevant features than other feature selection methods considered.