Chun Jin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chun Jin is active.

Explore More

Publication

Featured researches published by Chun Jin.

knowledge discovery and data mining | 2002

Topic-conditioned novelty detection

Yiming Yang; Jian Zhang; Jaime G. Carbonell; Chun Jin

Automated detection of the first document reporting each new event in temporally-sequenced streams of documents is an open challenge. In this paper we propose a new approach which addresses this problem in two stages: 1) using a supervised learning algorithm to classify the on-line document stream into pre-defined broad topic categories, and 2) performing topic-conditioned novelty detection for documents in each topic. We also focus on exploiting named-entities for event-level novelty detection and using feature-based heuristics derived from the topic histories. Evaluating these methods using a set of broadcast news stories, our results show substantial performance gains over the traditional one-level approach to the novelty detection problem.

electronic commerce | 2007

Red Opal: product-feature scoring from reviews

Christopher Scaffidi; Kevin Bierhoff; Eric Chang; Mikhael Felker; Herman Ng; Chun Jin

Online shoppers are generally highly task-driven: they have a certain goal in mind, and they are looking for a product with features that are consistent with that goal. Unfortunately, finding a product with specific features is extremely time-consuming using the search functionality provided by existing web sites.In this paper, we present a new search system called Red Opal that enables users to locate products rapidly based on features. Our fully automatic system examines prior customer reviews, identifies product features, and scores each product on each feature. Red Opal uses these scores to determine which products to show when a user specifies a desired product feature. We evaluate our system on four dimensions: precision of feature extraction, efficiency of feature extraction, precision of product scores, and estimated time savings to customers. On each dimension, Red Opal performs better than a comparison system.

international database engineering and applications symposium | 2006

ARGUS: Efficient Scalable Continuous Query Optimization for Large-Volume Data Streams

Chun Jin; Jaime G. Carbonell

We present the architecture of ARGUS, a stream processing system implemented atop commercial DBMSs to support large-scale complex continuous queries over data streams. ARGUS supports incremental operator evaluation and incremental multi-query plan optimization as new queries arrive. The latter is done to a degree well beyond the previous state-of-the-art via a suite of techniques such as query-algebra canonicalization, indexing, and searching, and topological query network optimization with join order optimization, conditional materialization, minimal column projection, and transitivity inference. Building on top of a DBMS, the system provides a value-adding package to the existing database applications where the needs of stream processing become increasingly demanding. Compared to directly running the continuous queries on the DBMS, ARGUS achieves well over a 100-fold improvement in performance

international syposium on methodologies for intelligent systems | 2005

ARGUS: rete + DBMS = efficient persistent profile matching on large-volume data streams

Chun Jin; Jaime G. Carbonell; Philip J. Hayes

Efficient processing of complex streaming data presents multiple challenges, especially when combined with intelligent detection of hidden anomalies in real time. We label such systems Stream Anomaly Monitoring Systems (SAMS), and describe the CMU/Dynamix ARGUS system as a new kind of SAMS to detect rare but high value patterns combining streaming and historical data. Such patterns may correspond to hidden precursors of terrorist activity, or early indicators of the onset of a dangerous disease, such as a SARS outbreak. Our method starts from an extension of the RETE algorithm for matching streaming data against multiple complex persistent queries, and proceeds beyond to transitivity inferences, conditional intermediate result materialization, and other such techniques to obtain both accuracy and efficiency, as demonstrated by the evaluation results outperforming classical techniques such as a modern DMBS.

international syposium on methodologies for intelligent systems | 2008

Predicate indexing for incremental multi-query optimization

Chun Jin; Jaime G. Carbonell

We present a relational schema that stores the computations of a shared query evaluation plan, and tools that search the common computations between new queries and the schema, which are the two essential parts of the Incremental Multiple Query Optimization (IMQO) framework we proposed to allow the efficient construction of the optimal evaluation plan for multiple continuous queries.

international syposium on methodologies for intelligent systems | 2006

Incremental aggregation on multiple continuous queries

Chun Jin; Jaime G. Carbonell

Continuously monitoring large-scale aggregates over data streams is important for many stream processing applications, e.g. collaborative intelligence analysis, and presents new challenges to data management systems. The first challenge is to efficiently generate the updated aggregate values and provide the new results to users after new tuples arrive. We implemented an incremental aggregation mechanism for doing so for arbitrary algebraic aggregate functions including user-defined ones by keeping up-to-date finite data summaries. The second challenge is to construct shared query evaluation plans to support large-scale queries effectively. Since multiple query optimization is NP-complete and the queries generally arrive asynchronously, we apply an incremental sharing approach to obtain the shared plans that perform reasonably well. The system is built as a part of ARGUS, a stream processing system atop of a DBMS. The evaluation study shows that our approaches are effective and efficient on typical collaborative intelligence analysis data and queries.

conference of the international speech communication association | 2001