Mingsheng Hong | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mingsheng Hong is active.

Explore More

Publication

Featured researches published by Mingsheng Hong.

extending database technology | 2006

Towards expressive publish/subscribe systems

Alan J. Demers; Johannes Gehrke; Mingsheng Hong; Mirek Riedewald; Walker M. White

Traditional content based publish/subscribe (pub/sub) systems allow users to express stateless subscriptions evaluated on individual events. However, many applications such as monitoring RSS streams, stock tickers, or management of RFID data streams require the ability to handle stateful subscriptions. In this paper, we introduce Cayuga, a stateful pub/sub system based on nondeterministic finite state automata (NFA). Cayuga allows users to express subscriptions that span multiple events, and it supports powerful language features such as parameterization and aggregation, which significantly extend the expressive power of standard pub/sub systems. Based on a set of formally defined language operators, the subscription language of Cayuga provides non-ambiguous subscription semantics as well as unique opportunities for optimizations. We experimentally demonstrate that common optimization techniques used in NFA-based systems such as state merging have only limited effectiveness, and we propose novel efficient indexing methods to speed up subscription processing. In a thorough experimental evaluation we show the efficacy of our approach.

international conference on management of data | 2007

Cayuga: a high-performance event processing engine

Lars Brenna; Alan J. Demers; Johannes Gehrke; Mingsheng Hong; Joel Ossher; Biswanath Panda; Mirek Riedewald; Mohit Thatte; Walker M. White

We propose a demonstration of Cayuga, a complex event monitoring system for high speed data streams. Our demonstration will show Cayuga applied to monitoring Web feeds; the demo will illustrate the expressiveness of the Cayuga query language, the scalability of its query processing engine to high stream rates, and a visualization of the internals of the query processing engine.

pacific-asia conference on knowledge discovery and data mining | 2004

Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining

Chen Wang; Mingsheng Hong; Jian Pei; Haofeng Zhou; Wei Wang; Baile Shi

Mining frequent tree patterns is an important research problems with broad applications in bioinformatics, digital library, e-commerce, and so on. Previous studies highly suggested that pattern-growth methods are efficient in frequent pattern mining. In this paper, we systematically develop the pattern growth methods for mining frequent tree patterns. Two algorithms, Chopper and XSpanner, are devised. An extensive performance study shows that the two newly developed algorithms outperform TreeMinerV [13], one of the fastest methods proposed before, in mining large databases. Furthermore, algorithm XSpanner is substantially faster than Chopper in many cases.

very large data bases | 2008

Transaction time indexing with version compression

David B. Lomet; Mingsheng Hong; Rimma V. Nehme; Rui Zhang

Immortal DB is a transaction time database system designed to enable high performance for temporal applications. It is built into a commercial database engine, Microsoft SQL Server. This paper describes how we integrated a temporal indexing technique, the TSB-tree, into Immortal DB to serve as the core access method. The TSB-tree provides high performance access and update for both current and historical data. A main challenge was integrating TSB-tree functionality while preserving original B+tree functionality, including concurrency control and recovery. We discuss the overall architecture, including our unique treatment of index terms, and practical issues such as uncommitted data and log management. Performance is a primary concern. To increase performance, versions are locally delta compressed, exploiting the commonality between adjacent versions of the same record. This technique is also applied to index terms in index pages. There is a tradeoff between query performance and storage space. We discuss optimizing performance regarding this tradeoff throughout the paper. The result of our efforts is a high-performance transaction time database system built into an RDBMS engine, which has not been achieved before. We include a thorough experimental study and analysis that confirms the very good performance that it achieves.

distributed event-based systems | 2009

Distributed event stream processing with non-deterministic finite automata

Lars Brenna; Johannes Gehrke; Mingsheng Hong; Dag Johansen

Efficient matching of incoming events to persistent queries is fundamental to event pattern matching, complex event processing, and publish/subscribe systems. Recent processing engines based on non-deterministic finite automata (NFAs) have demonstrated scalability in the number of queries that can be efficiently executed on a single machine. However, existing NFA based systems are limited to processing events on a single machine. Consequently, their event processing capacity cannot be increased by adding more machines. In this paper, we present an experimental evaluation of different methods for distributing an event processing system that is based on NFAs across multiple machines in a cluster. Our results show that careful input stream partitioning gives close to linear performance scaleup for CPU bound workloads.

international conference on management of data | 2007

Massively multi-query join processing in publish/subscribe systems

Mingsheng Hong; Alan J. Demers; Johannes Gehrke; Christoph Koch; Mirek Riedewald; Walker M. White

There has been much recent interest in XML publish/subscribe systems. Some systems scale to thousands of concurrent queries, but support a limited query language (usually a fragment of XPath 1.0). Other systems support more expressive languages, but do not scale well with the number of concurrent queries. In this paper, we propose a set of novel query processing techniques, referred to as Massively Multi-Query Join Processing techniques, for processing a large number of XML stream queries involving value joins over multiple XML streams and documents. These techniques enable the sharing of representations of inputs to multiple joins, and the sharing of join computation. Our techniques are also applicable to relational event processing systems and publish/subscribe systems that support join queries. We present experimental results to demonstrate the effectiveness of our techniques. We are able to process thousands of XML messages with hundreds of thousands of join queries on real RSS feed streams. Our techniques gain more than two orders of magnitude speedup compared to the naive approach of evaluating such join queries.

extending database technology | 2009

Rule-based multi-query optimization

Mingsheng Hong; Mirek Riedewald; Christoph Koch; Johannes Gehrke; Alan J. Demers

Data stream management systems usually have to process many long-running queries that are active at the same time. Multiple queries can be evaluated more efficiently together than independently, because it is often possible to share state and computation. Motivated by this observation, various Multi-Query Optimization (MQO) techniques have been proposed. However, these approaches suffer from two limitations. First, they focus on very specialized workloads. Second, integrating MQO techniques for CQL-style stream engines and those for event pattern detection engines is even harder, as the processing models of these two types of stream engines are radically different. In this paper, we propose a rule-based MQO framework. This framework incorporates a set of new abstractions, extending their counterparts, physical operators, transformation rules, and streams, in a traditional RDBMS or stream processing system. Within this framework, we can integrate new and existing MQO techniques through the use of transformation rules. This allows us to build an expressive and scalable stream system. Just as relational optimizers are crucial for the success of RDBMSes, a powerful multi-query optimizer is needed for data stream processing. This work lays the foundation for such a multi-query optimizer, creating opportunities for future research. We experimentally demonstrate the efficacy of our approach.

knowledge discovery and data mining | 2003

An efficient algorithm of frequent connected subgraph extraction

Mingsheng Hong; Haofeng Zhou; Wei Wang; Baile Shi

Mining frequent patterns from datasets is one of the key success stories of data mining research. Currently, most of the works focus on independent data, such as the items in the marketing basket. However, the objects in the real world often have close relationship with each other. How to extract frequent patterns from these relations is the objective in this paper. We use graphs to model the relations, and select a simple type for analysis. Combining the graph theory and algorithms to generate frequent patterns, a new algorithm Topology, which can mine these graphs efficiently, has been proposed. We evaluate the performance of the algorithm by doing experiments with synthetic datasets and real data. The experimental results show that Topology can do the job well. At the end of this paper, the potential improvement is mentioned.

Journal of Computer Science and Technology | 2004

Chopper: efficient algorithm for tree mining

Chen Wang; Mingsheng Hong; Wei Wang; Baile Shi

With the development of Internet, frequent pattern mining has been extended to more complex patterns like tree mining and graph mining. Such applications arise in complex domains like bioinformatics, web mining, etc. In this paper, we present a novel algorithm, namedChopper, to discover frequent subtrees from ordered labeled trees. An extensive performance study shows that the newly developed algorithm outperformsTreeMiner V, one of the fastest methods proposed previously, in mining large databases. At the end of this paper, the potential improvement ofChopper is mentioned.

Journal of Computer Science and Technology | 2004

Extracting frequent connected subgraphs from large graph sets

Wei Wang; Qingqing Yuan; Haofeng Zhou; Mingsheng Hong; Baile Shi

Mining frequent patterns from datasets is one of the key success of data mining research. Currently, most of the studies focus on the data sets in which the elements are independent, such as the items in the marketing basket. However, the objects in the real world often have close relationship with each other. How to extract frequent patterns from these relations is the objective of this paper. The authors use graphs to model the relations, and select a simple type for analysis. Combining the graph theory and algorithms to generate frequent patterns, a new algorithm called Topology, which can mine these graphs efficiently, has been proposed. The performance of the algorithm is evaluated by doing experiments with synthetic datasets and real data. The experimental results show that Topology can do the job well. At the end of this paper, the potential improvement is mentioned.

Explore More