Kevin Chen Chuan Chang
University of Illinois at Urbana–Champaign
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kevin Chen Chuan Chang.
Communications of The ACM | 2007
Bin He; Mitesh Pankaj Patel; Zhen Zhang; Kevin Chen Chuan Chang
Attempting to locate and quantify material on the Web that is hidden from typical search techniques.
international conference on data engineering | 2012
Rui Li; Kin Hou Lei; Ravi Khadiwala; Kevin Chen Chuan Chang
Witnessing the emergence of Twitter, we propose a Twitter-based Event Detection and Analysis System (TEDAS), which helps to (1) detect new events, to (2) analyze the spatial and temporal pattern of an event, and to (3) identify importance of events. In this demonstration, we show the overall system architecture, explain in detail the implementation of the components that crawl, classify, and rank tweets and extract location from tweets, and present some interesting results of our system.
acm multimedia | 2004
Yi Wu; Edward Y. Chang; Kevin Chen Chuan Chang; John R. Smith
Considerable research has been devoted to utilizing multimodal features for better understanding multimedia data. However, two core research issues have not yet been adequately addressed. First, given a set of features extracted from multiple media sources (e.g., extracted from the visual, audio, and caption track of videos), how do we determine the best modalities? Second, once a set of modalities has been identified, how do we best fuse them to map to semantics? In this paper, we propose a two-step approach. The first step finds <i>statistically independent modalities</i> from raw features. In the second step, we use <i>super-kernel fusion</i> to determine the optimal combination of individual modalities. We carefully analyze the tradeoffs between three design factors that affect fusion performance: <i>modality independence</i>, <i>curse of dimensionality</i>, and <i>fusion-model complexity</i>. Through analytical and empirical studies, we demonstrate that our two-step approach, which achieves a careful balance of the three design factors, can improve class-prediction accuracy over traditional techniques.
IEEE Computer | 2002
Jiawei Han; Kevin Chen Chuan Chang
Searching, comprehending, and using the semistructured HTML, XML, and database-service-engine information stored on the Web poses a significant challenge. This data is more sophisticated and dynamic than the information commercial database systems store. To supplement keyword-based indexing, researchers have applied data mining to Web-page ranking. In this context, data mining helps Web search engines find high-quality Web pages and enhances Web click stream analysis. For the Web to reach its full potential, however, we must improve its services, make it more comprehensible, and increase its usability. As researchers continue to develop data mining techniques, the authors believe this technology will play an increasingly important role in meeting the challenges of developing the intelligent Web. Ultimately, data mining for Web intelligence will make the Web a richer, friendlier, and more intelligent resource that we can all share and explore. The paper considers how data mining holds the key to uncovering and cataloging the authoritative links, traversal patterns, and semantic structures that will bring intelligence and direction to our Web interactions.
ACM Transactions on Database Systems | 2006
Bin He; Kevin Chen Chuan Chang
To enable information integration, schema matching is a critical step for discovering semantic correspondences of attributes across heterogeneous sources. While complex matchings are common, because of their far more complex search space, most existing techniques focus on simple 1:1 matchings. To tackle this challenge, this article takes a conceptually novel approach by viewing schema matching as correlation mining, for our task of matching Web query interfaces to integrate the myriad databases on the Internet. On this “deep Web ” query interfaces generally form complex matchings between attribute groups (e.g., {author} corresponds to {first name, last name} in the Books domain). We observe that the co-occurrences patterns across query interfaces often reveal such complex semantic relationships: grouping attributes (e.g., {first name, last name}) tend to be co-present in query interfaces and thus positively correlated. In contrast, synonym attributes are negatively correlated because they rarely co-occur. This insight enables us to discover complex matchings by a correlation mining approach. In particular, we develop the DCM framework, which consists of data preprocessing, dual mining of positive and negative correlations, and finally matching construction. We evaluate the DCM framework on manually extracted interfaces and the results show good accuracy for discovering complex matchings. Further, to automate the entire matching process, we incorporate automatic techniques for interface extraction. Executing the DCM framework on automatically extracted interfaces, we find that the inevitable errors in automatic interface extraction may significantly affect the matching result. To make the DCM framework robust against such “noisy” schemas, we integrate it with a novel “ensemble” approach, which creates an ensemble of DCM matchers, by randomizing the schema data into many trials and aggregating their ranked results by taking majority voting. As a principled basis, we provide analytic justification of the robustness of the ensemble approach. Empirically, our experiments show that the “ensemblization” indeed significantly boosts the matching accuracy, over automatically extracted and thus noisy schema data. By employing the DCM framework with the ensemble approach, we thus complete an automatic process of matchings Web query interfaces.
international conference on management of data | 2006
Chengkai Li; Kevin Chen Chuan Chang; Ihab F. Ilyas
This paper presents a principled framework for efficient processing of ad-hoc top-k (ranking) aggregate queries, which provide the k groups with the highest aggregates as results. Essential support of such queries is lacking in current systems, which process the queries in a naïve materialize-group-sort scheme that can be prohibitively inefficient. Our framework is based on three fundamental principles. The Upper-Bound Principle dictates the requirements of early pruning, and the Group-Ranking and Tuple-Ranking Principles dictate group-ordering and tuple-ordering requirements. They together guide the query processor toward a provably optimal tuple schedule for aggregate query processing. We propose a new execution framework to apply the principles and requirements. We address the challenges in realizing the framework and implementing new query operators, enabling efficient group-aware and rank-aware query plans. The experimental study validates our framework by demonstrating orders of magnitude performance improvement in the new query plans, compared with the traditional plans.
ACM Transactions on Database Systems | 2008
Mohamed A. Soliman; Ihab F. Ilyas; Kevin Chen Chuan Chang
Ranking and aggregation queries are widely used in data exploration, data analysis, and decision-making scenarios. While most of the currently proposed ranking and aggregation techniques focus on deterministic data, several emerging applications involve data that is unclean or uncertain. Ranking and aggregating uncertain (probabilistic) data raises new challenges in query semantics and processing, making conventional methods inapplicable. Furthermore, uncertainty imposes probability as a new ranking dimension that does not exist in the traditional settings. In this article we introduce new probabilistic formulations for top-k and ranking-aggregate queries in probabilistic databases. Our formulations are based on marriage of traditional top-k semantics with possible worlds semantics. In the light of these formulations, we construct a generic processing framework supporting both query types, and leveraging existing query processing and indexing capabilities in current RDBMSs. The framework encapsulates a state space model and efficient search algorithms to compute query answers. Our proposed techniques minimize the number of accessed tuples and the size of materialized search space to compute query answers. Our experimental study shows the efficiency of our techniques under different data distributions with orders of magnitude improvement over naïve methods.
international conference on management of data | 2006
Zhen Zhang; Seung-won Hwang; Kevin Chen Chuan Chang; Min Wang; Christian A. Lang; Yuan Chi Chang
The wide spread of databases for managing structured data, compounded with the expanded reach of the Internet, has brought forward interesting data retrieval and analysis scenarios to RDBMS. In such settings, queries often take the form of k-constrained optimization, with a Boolean constraint and a numeric optimization expression as the goal function, retrieving only the top-k tuples. This paper proposes the concept of supporting such queries, as their nature implies, by a functional optimization machinery over the search space of multiple indices. To realize this concept, we combine the dual perspectives of discrete state search (from the view of indices) and continuous function optimization (from the view of goal functions). We present, as the marriage of the two perspectives, the OPT* framework, which encodes k-constrained optimization as an A* search over the composite space of multiple indices, driven by functional optimization for providing tight heuristics. By processing queries as optimization, OPT* significantly outperforms baseline approaches, with up to 3 orders of magnitude margins.
international world wide web conferences | 2010
Ganesh Agarwal; Govind Kabra; Kevin Chen Chuan Chang
We propose to mine structured query templates from search logs, for enabling rich query interpretation that recognizes both query intents and associated attributes. We formalize the notion of template as a sequence of keywords and domain attributes, and our objective is to discover templates with high precision and recall for matching queries in a domain of interest. Our solution bootstraps from small seed input knowledge to discover relevant query templates, by harnessing the wealth of information available in search logs. We model this information in a tri-partite QueST network of queries, sites, and templates. We propose a probabilistic inferencing framework based on the dual metrics of precision and recall- and we show that the dual inferencing correspond respectively to the random walks in backward and forward directions. We deployed and tested our algorithm over a real-world search log of 15 million queries. The algorithm achieved accuracy of as high as 90% (on F-measure), with little seed knowledge and even with incomplete domain schema.
IEEE Transactions on Knowledge and Data Engineering | 2007
Seung-won Hwang; Kevin Chen Chuan Chang
This paper addresses the problem of evaluating ranked top-k queries with expensive predicates. As major DBMSs now all support expensive user-defined predicates for Boolean queries, we believe such support for ranked queries can be even more important: first, ranked queries often need to model user-specific concepts of preference, relevance, or similarity, which call for dynamic user-defined functions. Second, middleware systems must incorporate external predicates for integrating autonomous sources typically accessible only by per-object queries. Third, ranked queries often accompany Boolean ranking conditions, which may turn predicates into expensive ones, as the index structure on the predicate built on the base table may be no longer effective in retrieving the filtered objects in order. Fourth, fuzzy joins are inherently expensive, as they are essentially user-defined operations that dynamically associate multiple relations. These predicates, being dynamically defined or externally accessed, cannot rely on index mechanisms to provide zero-time sorted output, and must instead require per-object probe to evaluate. To enable probe minimization, we develop the problem as cost-based optimization of searching over potential probe schedules. In particular, we decouple probe scheduling into object and predicate scheduling problems and develop an analytical object scheduling optimization and a dynamic predicate scheduling optimization, which combined together form a cost-effective probe schedule