Constantinos Boulis
University of Washington
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Constantinos Boulis.
european conference on principles of data mining and knowledge discovery | 2004
Constantinos Boulis; Mari Ostendorf
Three methods for combining multiple clustering systems are presented and evaluated, focusing on the problem of finding the correspondence between clusters of different systems. In this work, the clusters of individual systems are represented in a common space and their correspondence estimated by either clustering clusters or with Singular Value Decomposition. The approaches are evaluated for the task of topic discovery on three major corpora and eight different clustering algorithms and it is shown experimentally that combination schemes almost always offer gains compared to single systems, but gains from using a combination scheme depend on the underlying clustering systems.
IEEE Transactions on Speech and Audio Processing | 2002
Li Deng; Kuansan Wang; Alex Acero; Hsiao-Wuen Hon; Jasha Droppo; Constantinos Boulis; Ye-Yi Wang; Derek Jacoby; Milind Mahajan; Ciprian Chelba; Xuedong Huang
This paper describes the main components of MiPad (multimodal interactive PAD) and especially its distributed speech processing aspects. MiPad is a wireless mobile PDA prototype that enables users to accomplish many common tasks using a multimodal spoken language interface and wireless-data technologies. It fully integrates continuous speech recognition and spoken language understanding, and provides a novel solution for data entry in PDAs or smart phones, often done by pecking with tiny styluses or typing on minuscule keyboards. Our user study indicates that the throughput of MiPad is significantly superior to that of the existing pen-based PDA interface. Acoustic modeling and noise robustness in distributed speech recognition are key components in MiPads design and implementation. In a typical scenario, the user speaks to the device at a distance so that he or she can see the screen. The built-in microphone thus picks up a lot of background noise, which requires MiPad be noise robust. For complex tasks, such as dictating e-mails, resource limitations demand the use of a client-server (peer-to-peer) architecture, where the PDA performs primitive feature extraction, feature quantization, and error protection, while the transmitted features to the server are subject to further speech feature enhancement, speech decoding and understanding before a dialog is carried out and actions rendered. Noise robustness can be achieved at the client, at the server or both. Various speech processing aspects of this type of distributed computation as related to MiPads potential deployment are presented. Previous user interface study results are also described. Finally, we point out future research directions as related to several key MiPad functionalities.
IEEE Transactions on Speech and Audio Processing | 2002
Constantinos Boulis; Mari Ostendorf; Eve A. Riskin; Scott Otterson
This paper explores packet loss recovery for automatic speech recognition (ASR) in spoken dialog systems, assuming an architecture in which a lightweight client communicates with a remote ASR server. Speech is transmitted with source and channel codes optimized for the ASR application, i.e., to minimize word error rate. Unequal amounts of forward error correction, depending on the datas effect on ASR performance, are assigned to protect against packet loss. Experiments with simulated packet loss in a range of loss conditions are conducted on the DARPA Communicator (air travel information) task. Results show that the approach provides robust ASR performance which degrades gracefully as packet loss rates increase. Transmitting at 5.2 Kbps with up to 200 ms added delay, leads to only a 7% relative degradation in word error rate even under extremely adverse network conditions.
meeting of the association for computational linguistics | 2005
Constantinos Boulis; Mari Ostendorf
In this work, we provide an empirical analysis of differences in word use between genders in telephone conversations, which complements the considerable body of work in sociolinguistics concerned with gender linguistic differences. Experiments are performed on a large speech corpus of roughly 12000 conversations. We employ machine learning techniques to automatically categorize the gender of each speaker given only the transcript of his/her speech, achieving 92% accuracy. An analysis of the most characteristic words for each gender is also presented. Experiments reveal that the gender of one conversation side influences lexical use of the other side. A surprising result is that we were able to classify male-only vs. female-only conversations with almost perfect accuracy.
international conference on acoustics speech and signal processing | 1999
Enrico Bocchieri; Vassilios Digalakis; Adrian Corduneanu; Constantinos Boulis
This paper concerns rapid adaptation of hidden Markov model (HMM) based speech recognizers to a new speaker, when only few speech samples (one minute or less) are available from the new speaker. A widely used family of adaptation algorithms defines adaptation as a linearly constrained reestimation of the HMM Gaussians. With few speech data, tight constraints must be introduced, by reducing the number of linear transforms and by specifying certain transform structures (e.g. block diagonal). We hypothesize that under these adaptation conditions, the residual errors of the adapted Gaussian parameters can be represented and corrected by dependency models, as estimated from a training corpus. Thus, after introducing a particular class of linear transforms, we develop correlation models of the transform parameters. In rapid adaptation experiments on the Switchboard corpus, the proposed algorithm performs better than the transform-constrained adaptation and the adaptation by correlation modeling of the HMM parameters, respectively.
signal processing systems | 2004
Li Deng; Ye-Yi Wang; Kuansan Wang; Alex Acero; Hsiao-Wuen Hon; James G. Droppo; Constantinos Boulis; Milind Mahajan; Xuedong David Huang
In this paper, we describe our recent work at Microsoft Research, in the project codenamed Dr. Who, aimed at the development of enabling technologies for speech-centric multimodal human-computer interaction. In particular, we present in detail MiPad as the first Dr. Whos application that addresses specifically the mobile user interaction scenario. MiPad is a wireless mobile PDA prototype that enables users to accomplish many common tasks using a multimodal spoken language interface and wireless-data technologies. It fully integrates continuous speech recognition and spoken language understanding, and provides a novel solution to the current prevailing problem of pecking with tiny styluses or typing on minuscule keyboards in todays PDAs or smart phones. Despite its current incomplete implementation, we have observed that speech and pen have the potential to significantly improve user experience in our user study reported in this paper. We describe in this system-oriented paper the main components of MiPad, with a focus on the robust speech processing and spoken language understanding aspects. The detailed MiPad components discussed include: distributed speech recognition considerations for the speech processing algorithm design; a stereo-based speech feature enhancement algorithm used for noise-robust front-end speech processing; Aurora2 evaluation results for this front-end processing; speech feature compression (source coding) and error protection (channel coding) for distributed speech recognition in MiPad; HMM-based acoustic modeling for continuous speech recognition decoding; a unified language model integrating context-free grammar and N-gram model for the speech decoding; schema-based knowledge representation for the MiPads personal information management task; a unified statistical framework that integrates speech recognition, spoken language understanding and dialogue management; the robust natural language parser used in MiPad to process the speech recognizers output; a machine-aided grammar learning and development used for spoken language understanding for the MiPad task; Tap & Talk multimodal interaction and user interface design; back channel communication and MiPads error repair strategy; and finally, user study results that demonstrate the superior throughput achieved by the Tap & Talk multimodal interaction over the existing pen-only PDA interface. These user study results highlight the crucial role played by speech in enhancing the overall user experience in MiPad-like human-computer interaction devices.
north american chapter of the association for computational linguistics | 2004
Constantinos Boulis
When estimating a mixture of Gaussians there are usually two choices for the covariance type of each Gaussian component. Either diagonal or full covariance. Imposing a structure though may be restrictive and lead to degraded performance and/or increased computations. In this work, several criteria to estimate the structure of regression matrices of a mixture of Gaussians are introduced and evaluated. Most of the criteria attempt to estimate a discriminative structure, which is suited for classification tasks. Results are reported on the 1996 NIST speaker recognition task and performance is compared with structural EM, a well-known, non-discriminative, structure-finding algorithm.
international conference on acoustics speech and signal processing | 1999
Vassilios Digalakis; Heather Collier; Sid Berkowitz; Adrian Corduneanu; Enrico Bocchieri; Ashvin Kannan; Constantinos Boulis; Sanjeev Khudanpur; William Byrne; Ananth Sankar
conference of the international speech communication association | 2001
Eve A. Riskin; Constantinos Boulis; Scott Otterson; Mari Ostendorf
Archive | 2005
Mari Ostendorf; Constantinos Boulis