Qin Ding
East Carolina University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Qin Ding.
Bioinformatics | 2013
Boya Xie; Qin Ding; Hongjin Han; Di Wu
MOTIVATION Research interests in microRNAs have increased rapidly in the past decade. Many studies have showed that microRNAs have close relationships with various human cancers, and they potentially could be used as cancer indicators in diagnosis or as a suppressor for treatment purposes. There are several databases that contain microRNA-cancer associations predicted by computational methods but few from empirical results. Despite the fact that abundant experiments investigating microRNA expressions in cancer cells have been carried out, the results have remain scattered in the literature. We propose to extract microRNA-cancer associations by text mining and store them in a database called miRCancer. RESULTS The text mining is based on 75 rules we have constructed, which represent the common sentence structures typically used to state microRNA expressions in cancers. The microRNA-cancer association database, miRCancer, is updated regularly by running the text mining algorithm against PubMed. All miRNA-cancer associations are confirmed manually after automatic extraction. miRCancer currently documents 878 relationships between 236 microRNAs and 79 human cancers through the processing of >26 000 published articles. AVAILABILITY miRCancer is freely available on the web at http://mircancer.ecu.edu/
knowledge discovery and data mining | 2002
Maleq Khan; Qin Ding; William Perrizo
Classification of spatial data streams is crucial, since the training dataset changes often. Building a new classifier each time can be very costly with most techniques. In this situation, k-nearest neighbor (KNN) classification is a very good choice, since no residual classifier needs to be built ahead of time. KNN is extremely simple to implement and lends itself to a wide variety of variations. We propose a new method of KNN classification for spatial data using a new, rich, data-mining-ready structure, the Peano-count-tree (P-tree). We merely perform some AND/OR operations on P-trees to find the nearest neighbors of a new sample and assign the class label. We have fast and efficient algorithms for the AND/OR operations, which reduce the classification time significantly. Instead of taking exactly the k nearest neighbors we form a closed-KNN set. Our experimental results show closed-KNN yields higher classification accuracy as well as significantly higher speed.
acm symposium on applied computing | 2002
Qiang Ding; Qin Ding; William Perrizo
Many organizations have large quantities of spatial data collected in various application areas, including remote sensing, geographical information systems (GIS), astronomy, computer cartography, environmental assessment and planning, etc. These data collections are growing rapidly and can therefore be considered as spatial data streams. For data stream classification, time is a major issue. However, these spatial data sets are too large to be classified effectively in a reasonable amount of time using existing methods. In this paper, we developed a new method for decision tree classification on spatial data streams using a data structure called Peano Count Tree (P-tree). The Peano Count Tree is a spatial data organization that provides a lossless compressed representation of a spatial data set and facilitates efficient classification and other data mining techniques. Using P-tree structure, fast calculation of measurements, such as information gain, can be achieved. We compare P-tree based decision tree induction classification and a classical decision tree induction method with respect to the speed at which the classifier can be built (and rebuilt when substantial amounts of new data arrive). Experimental results show that the P-tree method is significantly faster than existing classification methods, making it the preferred method for mining on spatial data streams.
acm symposium on applied computing | 2002
Qin Ding; Maleq Khan; Amalendu Roy; William Perrizo
The Peano Count Tree (P-tree) is a quadrant-based lossless tree representation of the original spatial data. The idea of P-tree is to recursively divide the entire spatial data, such as Remotely Sensed Imagery data, into quadrants and record the count of 1-bits for each quadrant, thus forming a quadrant count tree. Using P-tree structure, all the count information can be calculated quickly. This facilitates efficient ways for data mining. In this paper, we will focus on the algebra and properties of P-tree structure and its variations. We have implemented fast algorithms for P-tree generation and P-tree operations. Our performance analysis shows P-tree has small space and time costs compared to the original data. We have also implemented some data mining algorithms using P-trees, such as Association Rule Mining, Decision Tree Classification and K-Clustering.
Nature Communications | 2014
John W. Stiller; John Schreiber; Jipei Yue; Hui Guo; Qin Ding; Jinling Huang
Chromist algae include diverse photosynthetic organisms of great ecological and social importance. Despite vigorous research efforts, a clear understanding of how various chromists acquired photosynthetic organelles has been complicated by conflicting phylogenetic results, along with an undetermined number and pattern of endosymbioses, and the horizontal movement of genes that accompany them. We apply novel statistical approaches to assess impacts of endosymbiotic gene transfer on three principal chromist groups at the heart of long-standing controversies. Our results provide robust support for acquisitions of photosynthesis through serial endosymbioses, beginning with the adoption of a red alga by cryptophytes, then a cryptophyte by the ancestor of ochrophytes, and finally an ochrophyte by the ancestor of haptophytes. Resolution of how chromist algae are related through endosymbioses provides a framework for unravelling the further reticulate history of red algal-derived plastids, and for clarifying evolutionary processes that gave rise to eukaryotic photosynthetic diversity.
knowledge discovery and data mining | 2002
Qin Ding; Qiang Ding; William Perrizo
Association Rule Mining, originally proposed for market basket data, has potential applications in many areas. Remote Sensed Imagery (RSI) data is one of the promising application areas. Extracting interesting patterns and rules from datasets composed of images and associated ground data, can be of importance in precision agriculture, community planning, resource discovery and other areas. However, in most cases the image data sizes are too large to be mined in a reasonable amount of time using existing algorithms. In this paper, we propose an approach to derive association rules on RSI data using Peano Count Tree (P-tree) structure. P-tree structure, proposed in our previous work, provides a lossless and compressed representation of image data. Based on P-trees, an efficient association rule mining algorithm P-ARM with fast support calculation and significant pruning techniques are introduced to improve the efficiency of the rule mining process. P-ARM algorithm is implemented and compared with FP-growth and Apriori algorithms. Experimental results showed that our algorithm is superior for association rule mining on RSI spatial data.
systems man and cybernetics | 2008
Qin Ding; Qiang Ding; William Perrizo
Association rule mining, originally proposed for market basket data, has potential applications in many areas. Spatial data, such as remote sensed imagery (RSI) data, is one of the promising application areas. Extracting interesting patterns and rules from spatial data sets, composed of images and associated ground data, can be of importance in precision agriculture, resource discovery, and other areas. However, in most cases, the sizes of the spatial data sets are too large to be mined in a reasonable amount of time using existing algorithms. In this paper, we propose an efficient approach to derive association rules from spatial data using Peano count tree (P-tree) structure. P-tree structure provides a lossless and compressed representation of spatial data. Based on P-trees, an efficient association rule mining algorithm PARM with fast support calculation and significant pruning techniques is introduced to improve the efficiency of the rule mining process. The P-tree based association rule mining (PARM) algorithm is implemented and compared with FP-growth and Apriori algorithms. Experimental results showed that our algorithm is superior for association rule mining on RSI spatial data.
web age information management | 2001
William Perrizo; Qin Ding; Qiang Ding; Amalendu Roy
The traditional task of association rule mining is to find all rules with high support and high confidence. In some applications, such as mining spatial datasets for natural resource location, the task is to find high confidence rules even though the support may be low. In still other applications, such as the identification of agricultural pest infestations, the task is to find high confidence rules preferably while the support is still very low. The basic Apriori algorithm cannot be used to solve these problems efficiently since it relies on first identifying all high support itemsets. In this paper, we propose a new model to derive high confidence rules for spatial data regardless of their support level. A new data structure, the Peano Count Tree (P-tree), is used in our model to represent all the information we need. P-trees represent spatial data bit-by-bit in a recursive quadrant-by-quadrant arrangement. Based on the P-tree, we build a special data cube, the Tuple Count Cube (T-cube), to derive high confidence rules. Our algorithm for deriving confident rules is fast and efficient. In addition, we discuss some strategies for avoiding over-fitting (removing redundant and misleading rules).
acm symposium on applied computing | 2000
Jianning Dong; William Perrizo; Qin Ding; Jingkai Zhou
The explosive growth in data and database has generated an urgent need for new techniques and tools that can intelligently and automatically transform the processed data into useful information and knowledge. Data mining is such a technique. In this paper, we consider the mining of association rules from remotely sensed data and its application in precision. Based on the characteristics of the remotely sensed data and the problem itself, we present a bit oriented formal model and discuss the issues of partitioning quantitative attributes into equal, unequal and discontinuous partitions. We propose two new pruning techniques and compare the performances with a base algorithm. An improvement in performance is shown when using these pruning techniques.
International Journal of Business Intelligence and Data Mining | 2007
William Perrizo; Qin Ding; Maleq Khan; Anne M. Denton; Qiang Ding
The k-nearest neighbour (KNN) technique is a simple yet effective method for classification. In this paper, we propose an efficient weighted nearest neighbour classification algorithm, called PINE, using vertical data representation. A metric called HOBBit is used as the distance metric. The PINE algorithm applies a Gaussian podium function to set weights to different neighbours. We compare PINE with classical KNN methods using horizontal and vertical representation with different distance metrics. The experimental results show that PINE outperforms other KNN methods in terms of classification accuracy and running time.