Choon Hui Teo
Yahoo!
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Choon Hui Teo.
knowledge discovery and data mining | 2007
Choon Hui Teo; Alexander J. Smola; S. V. N. Vishwanathan; Quoc V. Le
A wide variety of machine learning problems can be described as minimizing a regularized risk functional, with different algorithms using different notions of risk and different regularizers. Examples include linear Support Vector Machines (SVMs), Logistic Regression, Conditional Random Fields (CRFs), and Lasso amongst others. This paper describes the theory and implementation of a highly scalable and modular convex solver which solves all these estimation problems. It can be parallelized on a cluster of workstations, allows for data-locality, and can deal with regularizers such as l1 and l2 penalties. At present, our solver implements 20 different estimation problems, can be easily extended, scales to millions of observations, and is up to 10 times faster than specialized solvers for many applications. The open source code is freely available as part of the ELEFANT toolbox.
international world wide web conferences | 2011
Amr Ahmed; Qirong Ho; Jacob Eisenstein; Eric P. Xing; Alexander J. Smola; Choon Hui Teo
News clustering, categorization and analysis are key components of any news portal. They require algorithms capable of dealing with dynamic data to cluster, interpret and to temporally aggregate news articles. These three tasks are often solved separately. In this paper we present a unified framework to group incoming news articles into temporary but tightly-focused storylines, to identify prevalent topics and key entities within these stories, and to reveal the temporal structure of stories as they evolve. We achieve this by building a hybrid clustering and topic model. To deal with the available wealth of data we build an efficient parallel inference algorithm by sequential Monte Carlo estimation. Time and memory costs are nearly constant in the length of the history, and the approach scales to hundreds of thousands of documents. We demonstrate the efficiency and accuracy on the publicly available TDT dataset and data of a major internet news site.
international conference on machine learning | 2006
Choon Hui Teo; S. V. N. Vishwanathan
String kernels which compare the set of all common substrings between two given strings have recently been proposed by Vishwanathan & Smola (2004). Surprisingly, these kernels can be computed in linear time and linear space using annotated suffix trees. Even though, in theory, the suffix tree based algorithm requires O(n) space for an n length string, in practice at least 40n bytes are required -- 20n bytes for storing the suffix tree, and an additional 20n bytes for the annotation. This large memory requirement coupled with poor locality of memory access, inherent due to the use of suffix trees, means that the performance of the suffix tree based algorithm deteriorates on large strings. In this paper, we describe a new linear time yet space efficient and scalable algorithm for computing string kernels, based on suffix arrays. Our algorithm is a) faster and easier to implement, b) on the average requires only 19n bytes of storage, and c) exhibits strong locality of memory access. We show that our algorithm can be extended to perform linear time prediction on a test string, and present experiments to validate our claims.
web search and data mining | 2012
Amr Ahmed; Choon Hui Teo; S. V. N. Vishwanathan; Alexander J. Smola
Relevance, diversity and personalization are key issues when presenting content which is apt to pique a users interest. This is particularly true when presenting an engaging set of news stories. In this paper we propose an efficient algorithm for selecting a small subset of relevant articles from a streaming news corpus. It offers three key pieces of improvement over past work: 1) It is based on a detailed model of a users viewing behavior which does not require explicit feedback. 2) We use the notion of submodularity to estimate the propensity of interacting with content. This improves over the classical context independent relevance ranking algorithms. Unlike existing methods, we learn the submodular function from the data. 3) We present an efficient online algorithm which can be adapted for personalization, story adaptation, and factorization models. Experiments show that our system yields a significant improvement over a retrieval system deployed in production.
web search and data mining | 2011
Choon Hui Teo; Suju Rajan; Kunal Punera; Byron Dom; Alexander J. Smola; Yi Chang; Zhaohui Zheng
In this paper, we present a system for clustering the search results of a news search engine. The news search interface includes the relevant news articles to a given query organized in terms of related news stories. Here each cluster corresponds to a news story and the news articles are clustered into stories. We present a system that clusters the search results of a news search system in a fast and scalable manner. The clustering system is organized into three components including offline clustering, incremental clustering and realtime clustering. We propose novel techniques for clustering the search results in realtime. The experimental results with large collections of news documents reveal that our system is both scalable and also achieves good accuracy in clustering the news search results.
pacific rim international conference on artificial intelligence | 2006
Choon Hui Teo; Yong Haur Tay
Invariant object recognition (IOR) has been one of the most active research areas in computer vision. However, there is no technique able to achieve the best performance in all possible domains. Out of many techniques, convolutional network (CN) is proven to be a good candidate in this area. Given large numbers of training samples of objects under various variation aspects such as lighting, pose, background, etc., convolutional network can learn to extract invariant features by itself. This comes with the price of lengthy training time. Hence, we propose a circular pairwise classification technique to shorten the training time. We compared the recognition accuracy and training time complexity between our approach and a benchmark generic object recognizer LeNet7 which is a monolithic convolutional network.
neural information processing systems | 2007
Arthur Gretton; Kenji Fukumizu; Choon Hui Teo; Le Song; Bernhard Schölkopf; Alexander J. Smola
Journal of Machine Learning Research | 2010
Choon Hui Teo; S.V.N. Vishwanthan; Alexander J. Smola; Quoc V. Le
neural information processing systems | 2007
Choon Hui Teo; Amir Globerson; Sam T. Roweis; Alexander J. Smola
conference on email and anti spam | 2009
Aleksander Kolcz; Choon Hui Teo