Dongmei Ren
North Dakota State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dongmei Ren.
international conference on data mining | 2004
Dongmei Ren; Baoying Wang; William Perrizo
Outlier detection can lead to discovering unexpected and interesting knowledge, which is critical important to some areas such as monitoring of criminal activities in electronic commerce, credit card fraud, etc. In this paper, we developed an efficient density-based outlier detection method for large datasets. Our contributions are: a) we introduce a relative density factor (RDF); b) based on RDF, we propose an RDF-based outlier detection method which can efficiently prune the data points which are deep in clusters, and detect outliers only within the remaining small subset of the data; c) the performance of our method is further improved by means of a vertical data representation, P-trees. We tested our method with NHL and NBA data. Our method shows an order of magnitude speed improvement compared to the contemporary approaches.
Journal of Information & Knowledge Management | 2004
Imad Rahal; Dongmei Ren; William Perrizo
Association rule mining (ARM) is the data-mining process for finding all association rules in datasets matching user-defined measures of interest such as support and confidence. Usually, ARM proceeds by mining all frequent itemsets — a step known to be very computationally intensive — from which rules are then derived in a straight forward manner. In general, mining all frequent itemsets prunes the space by using the downward closure (or anti-monotonicity) property of support which states that no itemset can be frequent unless all of its subsets are frequent. A large number of papers have addressed the problem of ARM but not many of them have focused on scalability over very large datasets (i.e. when datasets contain a very large number of transactions). In this paper, we propose a new model for representing data and mining frequent itemsets that is based on the P-tree technology for compression and faster logical operations over vertically structured data and on set enumeration trees for fast itemset enumeration. Experimental results presented hereinafter show big improvements for our approach over large datasets when compared to other contemporary approaches in the literature.
international conference on tools with artificial intelligence | 2004
Dongmei Ren; Imad Rahal; William Perrizo
Outlier detection can lead to discovering unexpected and interesting knowledge, which is critically important to some areas such as monitoring of criminal activities in electronic commerce, credit card fraud, and the like. In This work, we propose an efficient outlier detection method with clusters as by-product, which works efficiently for large datasets. Our contributions are: a) We introduce a local connective factor (LCF); b) Based on LCF, we propose an outlier detection method which can efficiently detect outliers and group data into clusters in a one-time process. Our method does not require the beforehand clustering process, which is the first step in other state-of-the-art clustering-based outlier detection methods; c) The performance of our method is further improved by means of a vertical data representation, P-trees. We tested our method with real dataset. Our method shows around five-time speed improvements compared to the other contemporary clustering-based outlier-detection approaches.
international conference on tools with artificial intelligence | 2004
Imad Rahal; Dongmei Ren; Weihua Wu; William Perrizo
Association rule mining (ARM) finds all the association rules in data, that match some measures of interest such as support and confidence. In certain situations where high support is not necessarily of interest, fixed-consequent association-rule mining for confident rules might be favored over traditional ARM. The need for fixed consequent ARM is becoming more evident in a number of applications such as market basket research (MBR) or precision agriculture. Highly confident rules are desired in all situations; however, support thresholds fluctuate with the applications and the data sets under study, as we shall show later. We propose an approach for mining minimal confident rules in the context of fixed-consequent ARM that relieves the user from the burden of specifying a minimum support threshold. We show that the framework suggested herein is efficient and can be easily expanded by adding new pruning conditions pertaining to specific situations.
acm multimedia | 2002
William Perrizo; William Jockheck; Amal Shehan Perera; Dongmei Ren; Weihua Wu; Yi Zhang
The DataSURG group at NDSU has a long-standing interest in data mining remotely sensed imagery (RSI) for agricultural, forestry and other prediction and analysis applications. A spatial data structure, the Peano count tree, was developed that provided an efficient, lossless, data mining ready representation of the many types of data involved in these applications. This data structure has made possible the mining of multiple very large data sets, including time-sequence of RSI and multimedia land data. The Peano count tree (P-tree) technology provides an efficient way to store and mine images of any format, together with pertinent land data of still other formats. With the invention of Gene chips and gene expression microarrays (MA data) for use in medicine, plant science and many other application areas, new multimedia data mining challenges appeared. MA data presents a one-time, gene expression level map of thousands of genes subjected to hundreds of conditions. An important multimedia plant science application of the near future is to integrate macro-scale analysis of RSI with the micro-scale analysis of MA and to do the latter across multiple organisms. Most of the MA research has been done for a particular organism and the results have been archived as text abstracts (e.g., Medline abstracts). It will therefore be necessary to combine text mining with most multimedia RSI and MA mining. This is truly a multimedia data mining setting. The way text is almost always mined today is to extract pertinent features into tables and to then mine the tables (i.e., extract structured records from the unstructured text first). P-trees are a convenient technology to mine all media involved in this research. In fact, in almost all multimedia data mining applications, feature extraction converts the pertinent data to relational or tabular form, and then the tuples or rows are data mined. If multi-medias are going to be mined by first converting to a common format or media, a good candidate common data structure for that purpose is the P-tree. The P-tree data structure is designed for just such a data mining setting.
Knowledge and Information Systems | 2006
Imad Rahal; Dongmei Ren; Weihua Wu; Anne M. Denton; Christopher Besemann; William Perrizo
Graphs are increasingly becoming a vital source of information within which a great deal of semantics is embedded. As the size of available graphs increases, our ability to arrive at the embedded semantics grows into a much more complicated task. One form of important hidden semantics is that which is embedded in the edges of directed graphs. Citation graphs serve as a good example in this context. This paper attempts to understand temporal aspects in publication trends through citation graphs, by identifying patterns in the subject matters of scientific publications using an efficient, vertical association rule mining model. Such patterns can (a) indicate subject-matter evolutionary history, (b) highlight subject-matter future extensions, and (c) give insights on the potential effects of current research on future research. We highlight our major differences with previous work in the areas of graph mining, citation mining, and Web-structure mining, propose an efficient vertical data representation model, introduce a new subjective interestingness measure for evaluating patterns with a special focus on those patterns that signify strong associations between properties of cited papers and citing papers, and present an efficient algorithm for the purpose of discovering rules of interest followed by a detailed experimental analysis.
acm symposium on applied computing | 2005
Imad Rahal; Dongmei Ren; Amal Shehan Perera; Hassan Najadat; William Perrizo; Riad M. Rahhal; Willy Valdivia
Data arising from genomic and proteomic experiments is amassing at high speeds resulting in huge amounts of raw data; consequently, the need for analyzing such biological data --- the understanding of which is still lagging way behind --- has been prominently solicited in the post-genomic era we are currently witnessing. In this paper we attempt to analyze annotated genome data by applying a very central data-mining technique known as association rule mining with the aim of discovering rules capable of yielding deeper insights into this type of data. We propose a new technique capable of using domain knowledge in the form of queries in order to efficiently mine only the subset of the associations that are of interest to researcher in an incremental and interactive mode.
european conference on principles of data mining and knowledge discovery | 2003
Fei Pan; Baoying Wang; Yi Zhang; Dongmei Ren; Xin Hu; William Perrizo
Data mining for spatial data has become increasingly important as more and more organizations are exposed to spatial data from sources such as remote sensing, geographical information systems, astronomy, computer cartography, environmental assessment and planning, etc. Recently, density based clustering methods, such as DENCLUE, DBSCAN, OPTICS, have been published and recognized as powerful clustering methods for data mining. These approaches have run time complexity of O(nlogn) when using spatial index techniques, R + tree and grid cell. However, these methods are known to lack scalability with respect to dimensionality. In this paper, a unique approach to efficient neighborhood search and a new efficient density based clustering algorithm using EIN-rings are developed. Our approach exploits compressed vertical data structures, Peano Trees (P-trees), and fast P-tree logical operations to accelerate the calculation of the density function within EIN-rings. This approach stands in contrast to the ubiquitous approach of vertically scanning horizontal data structures (records). The average run time complexity of our algorithm for spatial data in d-dimension is \(O(dn\sqrt{n})\). Our proposed method has comparable cardinality scalability with other density methods for small and medium size of data, but superior speed and dimensional scalability.
conference on information and knowledge management | 2004
Dongmei Ren; Imad Rahal; William Perrizo; Kirk Scott
international conference on management of data | 2003
Baoying Wang; Fei Pan; Dongmei Ren; Yue Cui; Qiang Ding; William Perrizo