Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kuansan Wang is active.

Publication


Featured researches published by Kuansan Wang.


IEEE Transactions on Information Theory | 1992

Auditory representations of acoustic signals

Xiaowei Yang; Kuansan Wang; Shihab A. Shamma

An analytically tractable framework is presented to describe mechanical and neural processing in the early stages of the auditory system. Algorithms are developed to assess the integrity of the acoustic spectrum at all processing stages. The algorithms employ wavelet representations, multiresolution processing, and the method of convex projections to construct a close replica of the input stimulus. Reconstructions using natural speech sounds demonstrate minimal loss of information along the auditory pathway. Close inspection of the final auditory patterns reveals spectral enhancements and noise suppression that have close perceptual correlates. The functional significance of the various auditory processing stages is discussed in light of the model, together with their potential applications in automatic speech recognition and low bit-rate data compression. >


IEEE Transactions on Speech and Audio Processing | 1995

Spectral shape analysis in the central auditory system

Kuansan Wang; Shihab A. Shamma

A model of spectral shape analysis in the central auditory system is developed based on neurophysiological mappings in the primary auditory cortex and on results from psychoacoustical experiments in human subjects. The model suggests that the auditory system analyzes an input spectral pattern along three independent dimensions: a logarithmic frequency axis, a local symmetry axis, and a local spectral bandwidth axis. It is shown that this representation is equivalent to performing an affine wavelet transform of the spectral pattern and preserving both the magnitude (a measure of the scale or local bandwidth of the spectrum) and phase (a measure of the local symmetry of the spectrum). Such an analysis is in the spirit of the cepstral analysis commonly used in speech recognition systems, the major difference being that the double Fourier-like transformation that the auditory system employs is carried out in a local fashion. Examples of such a representation for various speech and synthetic signals are discussed, together with its potential significance and applications for speech and audio processing. >


IEEE Transactions on Speech and Audio Processing | 1994

Self-normalization and noise-robustness in early auditory representations

Kuansan Wang; Shihab A. Shamma

A common sequence of operations in the early stages of most sensory systems is a multiscale transform followed by a compressive nonlinearity. The authors explore the contribution of these operations to the formation of robust and perceptually significant representation in the early auditory system. It is shown that auditory representation of the acoustic spectrum is effectively a self-normalized spectral analysis, i.e., the auditory system computes a spectrum divided by a smoothed version of itself. Such a self-normalization induces significant effects such as spectral shape enhancement and robustness against scaling and noise corruption. Examples using synthesized signals and a natural speech vowel are presented to illustrate these results. Furthermore, the characteristics of auditory representation are discussed in the context of several psychoacoustical findings, together with the possible benefits of this model for various engineering applications. >


international world wide web conferences | 2015

An Overview of Microsoft Academic Service (MAS) and Applications

Arnab Sinha; Zhihong Shen; Yang Song; Hao Ma; Darrin Eide; Bo-June Paul Hsu; Kuansan Wang

In this paper we describe a new release of a Web scale entity graph that serves as the backbone of Microsoft Academic Service (MAS), a major production effort with a broadened scope to the namesake vertical search engine that has been publicly available since 2008 as a research prototype. At the core of MAS is a heterogeneous entity graph comprised of six types of entities that model the scholarly activities: field of study, author, institution, paper, venue, and event. In addition to obtaining these entities from the publisher feeds as in the previous effort, we in this version include data mining results from the Web index and an in-house knowledge base from Bing, a major commercial search engine. As a result of the Bing integration, the new MAS graph sees significant increase in size, with fresh information streaming in automatically following their discoveries by the search engine. In addition, the rich entity relations included in the knowledge base provide additional signals to disambiguate and enrich the entities within and beyond the academic domain. The number of papers indexed by MAS, for instance, has grown from low tens of millions to 83 million while maintaining an above 95% accuracy based on test data sets derived from academic activities at Microsoft Research. Based on the data set, we demonstrate two scenarios in this work: a knowledge driven, highly interactive dialog that seamlessly combines reactive search and proactive suggestion experience, and a proactive heterogeneous entity recommendation.


international world wide web conferences | 2010

Exploring web scale language models for search query processing

Jian Huang; Jianfeng Gao; Jiangbo Miao; Xiaolong Li; Kuansan Wang; Fritz Behr; C. Lee Giles

It has been widely observed that search queries are composed in a very different style from that of the body or the title of a document. Many techniques explicitly accounting for this language style discrepancy have shown promising results for information retrieval, yet a large scale analysis on the extent of the language differences has been lacking. In this paper, we present an extensive study on this issue by examining the language model properties of search queries and the three text streams associated with each web document: the body, the title, and the anchor text. Our information theoretical analysis shows that queries seem to be composed in a way most similar to how authors summarize documents in anchor texts or titles, offering a quantitative explanation to the observations in past work. We apply these web scale n-gram language models to three search query processing (SQP) tasks: query spelling correction, query bracketing and long query segmentation. By controlling the size and the order of different language models, we find that the perplexity metric to be a good accuracy indicator for these query processing tasks. We show that using smoothed language models yields significant accuracy gains for query bracketing for instance, compared to using web counts as in the literature. We also demonstrate that applying web-scale language models can have marked accuracy advantage over smaller ones.


international world wide web conferences | 2013

Exploring and exploiting user search behavior on mobile and tablet devices to improve search relevance

Yang Song; Hao Ma; Hongning Wang; Kuansan Wang

In this paper, we present a log-based study on user search behavior comparisons on three different platforms: desktop, mobile and tablet. We use three-month search logs in 2012 from a commercial search engine for our study. Our objective is to better understand how and to what extent mobile and tablet searchers behave differently than desktop users. Our study spans a variety of aspects including query categorization, query length, search time distribution, search location distribution, user click patterns and so on. From our data set, we reveal that there are significant differences between user search patterns in these three platforms, and therefore use the same ranking system is not an optimal solution for all of them. Consequently, we propose a framework that leverages a set of domain-specific features, along with the training data from desktop search, to further improve the search relevance for mobile and tablet platforms. Experimental results demonstrate that by transferring knowledge from desktop search, search relevance on mobile and tablet can be greatly improved.


IEEE Signal Processing Letters | 1995

Lip synchronization using speech-assisted video processing

Tsuhan Chen; Hans Peter Graf; Kuansan Wang

We utilize speech information to improve the quality of audio-visual communications such as videotelephony and videoconferencing. In particular, the marriage of speech analysis and image processing can solve problems related to lip synchronization. We present a technique called speech-assisted frame-rate conversion. Demonstration sequences are presented. Other applications, including speech-assisted video coding, are outlined.<<ETX>>


IEEE Transactions on Speech and Audio Processing | 2002

Distributed speech processing in miPad's multimodal user interface

Li Deng; Kuansan Wang; Alex Acero; Hsiao-Wuen Hon; Jasha Droppo; Constantinos Boulis; Ye-Yi Wang; Derek Jacoby; Milind Mahajan; Ciprian Chelba; Xuedong Huang

This paper describes the main components of MiPad (multimodal interactive PAD) and especially its distributed speech processing aspects. MiPad is a wireless mobile PDA prototype that enables users to accomplish many common tasks using a multimodal spoken language interface and wireless-data technologies. It fully integrates continuous speech recognition and spoken language understanding, and provides a novel solution for data entry in PDAs or smart phones, often done by pecking with tiny styluses or typing on minuscule keyboards. Our user study indicates that the throughput of MiPad is significantly superior to that of the existing pen-based PDA interface. Acoustic modeling and noise robustness in distributed speech recognition are key components in MiPads design and implementation. In a typical scenario, the user speaks to the device at a distance so that he or she can see the screen. The built-in microphone thus picks up a lot of background noise, which requires MiPad be noise robust. For complex tasks, such as dictating e-mails, resource limitations demand the use of a client-server (peer-to-peer) architecture, where the PDA performs primitive feature extraction, feature quantization, and error protection, while the transmitted features to the server are subject to further speech feature enhancement, speech decoding and understanding before a dialog is carried out and actions rendered. Noise robustness can be achieved at the client, at the server or both. Various speech processing aspects of this type of distributed computation as related to MiPads potential deployment are presented. Previous user interface study results are also described. Finally, we point out future research directions as related to several key MiPad functionalities.


Proceedings of the IEEE | 1996

Time-frequency analysis and auditory modeling for automatic recognition of speech

James W. Pitton; Kuansan Wang; Biing-hwang Juang

Modern speech processing research may be categorized into three broad areas: statistical, physiological, and perceptual. Statistical research investigates the nature of the variability of the speech waveform from a signal processing viewpoint. This approach relates to the processing of speech in order to obtain measurements of speech characteristics which demonstrate manageable variabilities across a wide range of the talker population, in the presence of noise or competing speakers as well as the interaction of speech with the channel through which it is transmitted, and under the inherent interaction of the information content of speech itself (i.e., the contextual factor). Physiological research aims at constructing accurate models of the articulatory and auditory process, helping to limit the signal space for speech processing. In the perceptual realm, work focuses on understanding the psychoacoustic and possibly the psycholinguistic aspects of the speech communication process that the human so conveniently conducts. By studying this working analysis/recognition system, insights may be garnered that will lead to improved methods of speech processing. Conversely by studying the limitations of this system, particularly how it reduces the information rate of the received signal through, for example, masking and adaptation improvements may be made in the efficiency of speech coding schemes without impacting the quality of the reconstructed speech. Thus comprehension of speech production and perception impacts methods of speech processing, and vice-versa. This paper enunciates such a position, focusing on how modern time-frequency signal analysis methods could help expedite needed advances in these areas.


international conference on acoustics, speech, and signal processing | 2001

MiPad: a multimodal interaction prototype

Xuedong Huang; Alex Acero; Ciprian Chelba; Li Deng; Jasha Droppo; Doug Duchene; Joshua T. Goodman; Hsiao-Wuen Hon; Derek Jacoby; Li Jiang; Ricky Loynd; Milind Mahajan; Peter Mau; Scott Meredith; Salman Mughal; Salvado Neto; Mike Plumpe; Kuansan Steury; Gina Venolia; Kuansan Wang; Ye-Yi Wang

Dr. Who is a Microsoft research project aiming at creating a speech-centric multimodal interaction framework, which serves as the foundation for the NET natural user interface. MiPad is the application prototype that demonstrates compelling user advantages for wireless personal digital assistant (PDA) devices, MiPad fully integrates continuous speech recognition (CSR) and spoken language understanding (SLU) to enable users to accomplish many common tasks using a multimodal interface and wireless technologies. It tries to solve the problem of pecking with tiny styluses or typing on minuscule keyboards in todays PDAs. Unlike a cellular phone, MiPad avoids speech-only interaction. It incorporates a built-in microphone that activates whenever a field is selected. As a user taps the screen or uses a built in roller to navigate, the tapping action narrows the number of possible instructions for spoken word understanding. MiPad currently runs on a Windows CE Pocket PC with a Windows 2000 machine where speech recognition is performed. The Dr Who CSR engine uses a unified CFG and n-gram language model. The Dr Who SLU engine is based on a robust chart parser and a plan-based dialog manager. The paper discusses MiPads design, implementation work in progress, and preliminary user study in comparison to the existing pen-based PDA interface.

Collaboration


Dive into the Kuansan Wang's collaboration.

Researchain Logo
Decentralizing Knowledge