Is this you? Create Your Porfile

Qinghua Zou

University of California, Los Angeles

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Qinghua Zou is active.

Explore More

Publication

Featured researches published by Qinghua Zou.

international conference on data mining | 2002

SmartMiner: a depth first algorithm guided by tail information for mining maximal frequent itemsets

Qinghua Zou; Wesley W. Chu; Baojing Lu

Maximal frequent itemsets (MR) are crucial to many tasks in data mining. Since the MaxMiner algorithm first introduced enumeration trees for mining MR in 1998, several methods have been proposed to use depth first search to improve performance. To further improve the performance of mining MR, we proposed a technique that takes advantage of the information gathered from previous steps to discover new MR. More specifically, our algorithm called SmartMiner gathers and passes tail information and uses a heuristic select function which uses the tail information to select the next node to explore. Compared with Mafia and GenMax, SmartMiner generates a smaller search tree, requires a smaller number of support counting, and does not require superset checking. Using the datasets Mushroom and Connect, our experimental study reveals that SmartMiner generates the same MFI as Mafia and GenMax, but yields an order of magnitude improvement in speed.

web information and data management | 2004

Ctree: a compact tree for indexing XML data

Qinghua Zou; Shaorong Liu; Wesley W. Chu

In this paper, we propose a novel compact tree (Ctree) for XML indexing, which provides not only concise path summaries at the group level but also detailed child-parent links at the element level. Group level mapping allows efficient pruning of a large search space while element level mapping provides fast access to the parent of an element. Due to the tree nature of XML data and queries, such fast child-to-parent access is essential for efficient XML query processing. Using group-based element reference, Ctree enables the clustering of inverted lists according to groups, which provides efficient join between inverted lists and structural index group extents. Our experiments reveal that Ctree is efficient for processing both single-path and branching queries with various value predicates.

Knowledge and Information Systems | 2002

A pattern decomposition algorithm for data mining of frequent patterns

Qinghua Zou; Wesley W. Chu; David B. Johnson; Henry Chiu

Abstract. Efficient algorithms to mine frequent patterns are crucial to many tasks in data mining. Since the Apriori algorithm was proposed in 1994, there have been several methods proposed to improve its performance. However, most still adopt its candidate set generation-and-test approach. In addition, many methods do not generate all frequent patterns, making them inadequate to derive association rules. We propose a pattern decomposition (PD) algorithm that can significantly reduce the size of the dataset on each pass, making it more efficient to mine all frequent patterns in a large dataset. The proposed algorithm avoids the costly process of candidate set generation and saves time by reducing the size of the dataset. Our empirical evaluation shows that the algorithm outperforms Apriori by one order of magnitude and is faster than FP-tree algorithm.

Annals of the New York Academy of Sciences | 2002

Modeling Medical Content for Automated Summarization

David B. Johnson; Qinghua Zou; John David N. Dionisio; Victor Zhenyu Liu; Wesley W. Chu

Abstract: Medical information is available from a variety of new online resources. Given the number and diversity of sources, methods must be found that will enable users to quickly assimilate and determine the content of a document. Summarization is one such tool that can help users to quickly determine the main points of a document. Previous methods to automatically summarize text documents typically do not attempt to infer or define the content of a document. Rather these systems rely on secondary features or clues that may point to content. This paper describes text summarization techniques that enable users to focus on the key content of a document. The techniques presented here analyze groups of similar documents in order to form a content model. The content model is used to select sentences forming the summary. The technique does not require additional knowledge sources; thus the method should be applicable to any set of text documents.

international conference on data mining | 2001

A pattern decomposition (PD) algorithm for finding all frequent patterns in large datasets

Qinghua Zou; Wesley W. Chu; David B. Johnson; Henry Chiu

Efficient algorithms to mine frequent patterns are crucial to many tasks in data mining. Since the Apriori algorithm was proposed (R. Agrawal and R. Srikant, 1994), there have been several methods proposed to improve its performance. However, most still adopt its candidate set generation-and-test approach. We propose a pattern decomposition (PD) algorithm that can significantly reduce the size of the dataset on each pass, making it more efficient to mine frequent patterns in a large dataset. The proposed algorithm avoids the costly process of candidate set generation and saves time by reducing dataset. Our empirical evaluation shows that the algorithm outperforms Apriori by one order of magnitude and is faster than FP-tree. Further, PD is more scalable than both Apriori and FP-tree.

Biomedical Information Technology | 2008

KMeX: A Knowledge-Based Digital Library for Retrieving Scenario-Specific Medical Text Documents

Wesley W. Chu; Zhenyu Liu; Wenlei Mao; Qinghua Zou

Publisher Summary This chapter presents a new knowledge-based approach to mitigate problems related to scenario-specific information retrieval (IR). The use of metathesaurus and semantic structure in the UMLS to extract key concepts from the free text for indexing, phrase-based indexing for representing similar concepts, and query expansion to improve the probability of matching query terms with the terms in the document is preferred. A scenario typically refers to a specific health care task, such as searching for treatment methods for a specific disease. Although traditional systems are useful for general information retrieval, these systems cannot support scenario-specific IR because: the terms in the query posed by the user may not use a standardized medical vocabulary; there is no effective technique to represent synonyms, phrases, and similar concepts in free text; and the terms used in a query and those used in a document for representing the same topic may be mismatched. A new knowledge-based approach for retrieving scenario-specific free-text documents has been developed, which consists of three integrated components: IndexFinder, phrase-based VSM, and knowledge-based query expansion. IndexFinder extracts key terms from free text, generating conceptual terms by permuting words in a sentence rather than using the traditional techniques based on NLP. The phrase-based VSM has been developed for document retrieval. Knowledge-based query expansion techniques and the phrase-based VSM can be used in conjunction to significantly improve precision and recall.

conference on information and knowledge management | 2004

Using a compact tree to index and query XML data

Qinghua Zou; Shaorong Liu; Wesley W. Chu

Indexing XML is crucial for efficient XML query processing. We propose a compact tree (Ctree) for XML indexing, which provides not only concise path summaries at group level but also detailed child-parent relationships at element level. Based on Ctree, we are able to measure how well XML data is structured. We also propose a three-step query processing method. Its efficiency is achieved by: (1) summarizing large XML data structures into a condensed Ctree; (2) pruning irrelevant groups to significantly reduce the search space; (3) eliminating join operations between the matches for value predicates and those for structure constraints and (4) using Ctree properties such as regular groups to reduce query processing time. Our experiments reveal that Ctree is an effective data structure for managing XML data.

american medical informatics association annual symposium | 2003