Is this you? Create Your Porfile

Xiaoyong Chai

University of Wisconsin-Madison

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiaoyong Chai is active.

Explore More

Publication

Featured researches published by Xiaoyong Chai.

IEEE Transactions on Knowledge and Data Engineering | 2006

Power-efficient access-point selection for indoor location estimation

Yiqiang Chen; Qiang Yang; Jie Yin; Xiaoyong Chai

An important goal of indoor location estimation systems is to increase the estimation accuracy while reducing the power consumption. In this paper, we present a novel algorithm known as CaDet for power-efficient location estimation by intelligently selecting the number of access points (APs) used for location estimation. We show that by employing machine learning techniques, CaDet is able to use a small subset of the APs in the environment to detect a clients location with high accuracy. CaDet uses a combination of information theory, clustering analysis, and a decision tree algorithm. By collecting data and testing our algorithms in a realistic WLAN environment in the computer science department area of the Hong Kong University of Science and Technology, we show that CaDet (clustering and decision tree-based method) can be much higher in accuracy as compared to other methods. We also show through experiments that, by intelligently selecting APs, we are able to save the power on the client device while achieving the same level of accuracy.

IEEE Transactions on Mobile Computing | 2007

Reducing the Calibration Effort for Probabilistic Indoor Location Estimation

Xiaoyong Chai; Qiang Yang

WLAN location estimation based on 802.11 signal strength is becoming increasingly prevalent in todays pervasive computing applications. Among the well-established location determination approaches, probabilistic techniques show good performance and, thus, become increasingly popular. For these techniques to achieve a high level of accuracy, however, a large number of training samples are usually required for calibration, which incurs a great amount of offline manual effort. In this paper, we aim to solve the problem by reducing both the sampling time and the number of locations sampled in constructing a radio map. We propose a novel learning algorithm that builds location-estimation systems based on a small fraction of the calibration data that traditional techniques require and a collection of user traces that can be cheaply obtained. When the number of sampled locations is reduced, an interpolation method is developed to effectively patch a radio map. Extensive experiments show that our proposed methods are effective in reducing the calibration effort. In particular, unlabeled user traces can be used to compensate for the effects of reducing the calibration effort and can even improve the system performance. Consequently, manual effort can be reduced substantially while a high level of accuracy is still achieved

international conference on data mining | 2004

Test-cost sensitive naive Bayes classification

Xiaoyong Chai; Lin Deng; Qiang Yang; Charles X. Ling

Inductive learning techniques such as the naive Bayes and decision tree algorithms have been extended in the past to handle different types of costs mainly by distinguishing different costs of classification errors. However, it is an equally important issue to consider how to handle the test costs associated with querying the missing values in a test case. When the value of an attribute is missing in a test case, it may or may not be worthwhile to take the effort to obtain its missing value, depending on how much the value results in a potential gain in the classification accuracy. In this paper, we show how to obtain a test-cost sensitive naive Bayes classifier (csNB) by including a test strategy which determines how unknown attributes are selected to perform test on in order to minimize the sum of the mis-classification costs and test costs. We propose and evaluate several potential test strategies including one that allows several tests to be done at once. We empirically evaluate the csNB method, and show that it compares favorably with its decision tree counterpart.

international conference on management of data | 2009

Combining keyword search and forms for ad hoc querying of databases

Eric Chu; Akanksha Baid; Xiaoyong Chai; AnHai Doan; Jeffrey F. Naughton

A common criticism of database systems is that they are hard to query for users uncomfortable with a formal query language. To address this problem, form-based interfaces and keyword search have been proposed; while both have benefits, both also have limitations. In this paper, we investigate combining the two with the hopes of creating an approach that provides the best of both. Specifically, we propose to take as input a target database and then generate and index a set of query forms offline. At query time, a user with a question to be answered issues standard keyword search queries; but instead of returning tuples, the system returns forms relevant to the question. The user may then build a structured query with one of these forms and submit it back to the system for evaluation. In this paper, we address challenges that arise in form generation, keyword search over forms, and ranking and displaying these forms. We explore techniques to tackle these challenges, and present experimental results suggesting that the approach of combining keyword search and form-based interfaces is promising.

international conference on management of data | 2009

Information extraction challenges in managing unstructured data

AnHai Doan; Jeffrey F. Naughton; Raghu Ramakrishnan; Akanksha Baid; Xiaoyong Chai; Fei Chen; Ting Chen; Eric Chu; Pedro DeRose; Byron J. Gao; Chaitanya Gokhale; Jiansheng Huang; Warren Shen; Ba-Quy Vuong

Over the past few years, we have been trying to build an end-to-end system at Wisconsin to manage unstructured data, using extraction, integration, and user interaction. This paper describes the key information extraction (IE) challenges that we have run into, and sketches our solutions. We discuss in particular developing a declarative IE language, optimizing for this language, generating IE provenance, incorporating user feedback into the IE process, developing a novel wiki-based user interface for feedback, best-effort IE, pushing IE into RDBMSs, and more. Our work suggests that IE in managing unstructured data can open up many interesting research challenges, and that these challenges can greatly benefit from the wealth of work on managing structured data that has been carried out by the database community.

database systems for advanced applications | 2004

Applying co-training to clickthrough data for search engine adaptation

Qingzhao Tan; Xiaoyong Chai; Wilfred Ng; Dik Lun Lee

The information on the World Wide Web is growing without bound. Users may have very diversified preferences in the pages they target through a search engine. It is therefore a challenging task to adapt a search engine to suit the needs of a particular community of users who share similar interests. In this paper, we propose a new algorithm, Ranking SVM in a Co-training Framework (RSCF). Essentially, the RSCF algorithm takes the clickthrough data containing the items in the search result that have been clicked on by a user as an input, and generates adaptive rankers as an output. By analyzing the clickthrough data, RSCF first categorizes the data as the labelled data set, which contains the items that have been scanned already, and the unlabelled data set, which contains the items that have not yet been scanned. The labelled data is then augmented with unlabelled data to obtain a larger data set for training the rankers. We demonstrate that the RSCF algorithm produces better ranking results than the standard Ranking SVM algorithm. Based on RSCF we develop a metasearch engine that comprises MSNSearch, Wisenut, and Overture, and carry out an online experiment to show that our metasearch engine outperforms Google.

ieee international conference on pervasive computing and communications | 2005

Reducing the Calibration Effort for Location Estimation Using Unlabeled Samples

Xiaoyong Chai; Qiang Yang

WLAN location estimation based on 802.11 signal strength is becoming increasingly prevalent in todays pervasive computing applications. As an alternative to the well-established deterministic approaches, probabilistic location determination techniques show good performance and thus become increasingly popular. For these techniques to achieve a high level of accuracy, however, adequate training samples should be collected offline for calibration. As a result, a great amount of manual effort is incurred. In this paper, we aim to solve the problem by reducing both the sampling time and the number of locations sampled in constructing the radio map. A learning algorithm is proposed to build location estimation systems based on a small fraction of the calibration data that traditional techniques require and a collection of user traces that can be cheaply obtained. Our experiments show that unlabeled user traces can be used to compensate for the effects of reducing calibration effort and can even improve the system performance. Consequently, manual effort can be significantly reduced while a high level of accuracy is still achieved

IEEE Transactions on Knowledge and Data Engineering | 2006

Test-cost sensitive classification on data with missing values

Qiang Yang; Charles X. Ling; Xiaoyong Chai; Rong Pan

In the area of cost-sensitive learning, inductive learning algorithms have been extended to handle different types of costs to better represent misclassification errors. Most of the previous works have only focused on how to deal with misclassification costs. In this paper, we address the equally important issue of how to handle the test costs associated with querying the missing values in a test case. When an attribute contains a missing value in a test case, it may or may not be worthwhile to take the extra effort in order to obtain a value for that attribute, or attributes, depending on how much benefit the new value bring about in increasing the accuracy. In this paper, we consider how to integrate test-cost-sensitive learning with the handling of missing values in a unified framework that includes model building and a testing strategy. The testing strategies determine which attributes to perform the test on in order to minimize the sum of the classification costs and test costs. We show how to instantiate this framework in two popular machine learning algorithms: decision trees and naive Bayesian method. We empirically evaluate the test-cost-sensitive methods for handling missing values on several data sets.

international conference on management of data | 2009

Efficiently incorporating user feedback into information extraction and integration programs

Xiaoyong Chai; Ba-Quy Vuong; AnHai Doan; Jeffrey F. Naughton

Many applications increasingly employ information extraction and integration (IE/II) programs to infer structures from unstructured data. Automatic IE/II are inherently imprecise. Hence such programs often make many IE/II mistakes, and thus can significantly benefit from user feedback. Today, however, there is no good way to automatically provide and process such feedback. When finding an IE/II mistake, users often must alert the developer team (e.g., via email or Web form) about the mistake, and then wait for the team to manually examine the program internals to locate and fix the mistake, a slow, error-prone, and frustrating process. In this paper we propose a solution for users to directly provide feedback and for IE/II programs to automatically process such feedback. In our solution a developer U uses hlog, a declarative IE/II language, to write an IE/II program P. Next, U writes declarative user feedback rules that specify which parts of Ps data (e.g., input, intermediate, or output data) users can edit, and via which user interfaces. Next, the so-augmented program P is executed, then enters a loop of waiting for and incorporating user feedback. Given user feedback F on a data portion of P, we show how to automatically propagate F to the rest of P, and to seamlessly combine F with prior user feedback. We describe the syntax and semantics of hlog, a baseline execution strategy, and then various optimization techniques. Finally, we describe experiments with real-world data that demonstrate the promise of our solution.

mobile data management | 2007

Place: A Distributed Spatio-Temporal Data Stream Management System for Moving Objects

Xiaopeng Xiong; Hicham G. Elmongui; Xiaoyong Chai; Walid G. Aref

In this paper, we introduce PLACE*, a distributed spatio-temporal data stream management system for moving objects. PLACE* supports continuous spatio-temporal queries that hop among a network of regional servers. To minimize the execution cost, a new Query-Track- Participate (QTP) query processing model is proposed inside PLACE*. In the QTP model, a query is continuously answered by a querying server, a tracking server, and a set of participating servers. In this paper, we focus on query plan generation, execution and update algorithms for continuous range queries in PLACE* using QTP. An extensive experimental study demonstrates the effectiveness of the proposed algorithms in PLACE*.

Explore More