Raj P. Gopalan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Raj P. Gopalan is active.

Explore More

Publication

Featured researches published by Raj P. Gopalan.

computer and information technology | 2007

CTU-Mine: An Efficient High Utility Itemset Mining Algorithm Using the Pattern Growth Approach

Alva Erwin; Raj P. Gopalan; N. R. Achuthan

Frequent pattern mining discovers patterns in transaction databases based only on the relative frequency of occurrence of items without considering their utility. For many real world applications, however, utility of itemsets based on cost, profit or revenue is of importance. The utility mining problem is to find itemsets that have higher utility than a user specified minimum. Unlike itemset support in frequent pattern mining, itemset utility does not have the anti-monotone property and so efficient high utility mining poses a greater challenge. Recent research on utility mining has been based on the candidate-generation-and-test approach which is suitable for sparse data sets with short patterns, but not feasible for dense data sets or long patterns. In this paper we propose a new algorithm called CTU-Mine that mines high utility itemsets using the pattern growth approach. We have tested our algorithm on several dense data sets, compared it with the recent algorithms and the results show that our algorithm works efficiently.

knowledge discovery and data mining | 2008

Efficient mining of high utility itemsets from large datasets

Alva Erwin; Raj P. Gopalan; N. R. Achuthan

High utility itemsets mining extends frequent pattern mining to discover itemsets in a transaction database with utility values above a given threshold. However, mining high utility itemsets presents a greater challenge than frequent itemset mining, since high utility itemsets lack the anti-monotone property of frequent itemsets. Transaction Weighted Utility (TWU) proposed recently by researchers has anti-monotone property, but it is an overestimate of itemset utility and therefore leads to a larger search space. We propose an algorithm that uses TWU with pattern growth based on a compact utility pattern tree data structure. Our algorithm implements a parallel projection scheme to use disk storage when the main memory is inadequate for dealing with large datasets. Experimental evaluation shows that our algorithm is more efficient compared to previous algorithms and can mine larger datasets of both dense and sparse data containing long patterns.

australasian joint conference on artificial intelligence | 2004

Effective sampling for mining association rules

Yanrong Li; Raj P. Gopalan

As discovering association rules in a very large database is time consuming, researchers have developed many algorithms to improve the efficiency Sampling can significantly reduce the cost of mining, since the mining algorithms need to deal with only a small dataset compared to the original database Especially, if data comes as a stream flowing at a faster rate than can be processed, sampling seems to be the only choice How to sample the data and how big the sample size should be for a given error bound and confidence level are key issues for particular data mining tasks In this paper, we derive the sufficient sample size based on central limit theorem for sampling large datasets with replacement This approach requires smaller sample size than that based on the Chernoff bounds and is effective for association rules mining The effectiveness of the method has been evaluated on both dense and sparse datasets.

australasian joint conference on artificial intelligence | 2004

Building a more accurate classifier based on strong frequent patterns

Yudho Giri Sucahyo; Raj P. Gopalan

The classification problem in data mining is to discover models from training data for classifying unknown instances Associative classification builds the classifier rules using association rules and it is more accurate compared to previous methods In this paper, a new method named CSFP that builds a classifier from strong frequent patterns without the need to generate association rules is presented We address the rare item problem by using a partitioning method Rules generated are stored using a compact data structure named CP-Tree and a series of pruning methods are employed to discard weak frequent patterns Experimental results show that our classifier is more accurate than previous associative classification methods as well as other state-of-the-art non-associative classifiers.

australian joint conference on artificial intelligence | 2002

TreeITL-Mine: Mining Frequent Itemsets Using Pattern Growth, Tid Intersection, and Prefix Tree

Raj P. Gopalan; Yudho Giri Sucahyo

An important problem in data mining is the discovery of association rules that identify relationships among sets of items. Finding frequent itemsets is computationally the most expensive step in association rules mining, and so most of the research attention has been focused on it. In this paper, we present a more efficient algorithm for mining frequent itemsets. In designing our algorithm, we have combined the ideas of pattern-growth, tid-intersection and prefix trees, with significant modifications. We present performance comparisons of our algorithm against the fastest Apriori algorithm, and the recently developed H-Mine algorithm. We have tested all the algorithms using several widely used test datasets. The performance results indicate that our algorithm significantly reduces the processing time for mining frequent itemsets in dense data sets that contain relatively long patterns.

australasian joint conference on artificial intelligence | 2003

Efficiently mining frequent patterns from dense datasets using a cluster of computers

Yudho Giri Sucahyo; Raj P. Gopalan; Amit Rudra

Efficient mining of frequent patterns from large databases has been an active area of research since it is the most expensive step in association rules mining. In this paper, we present an algorithm for finding complete frequent patterns from very large dense datasets in a cluster environment. The data needs to be distributed to the nodes of the cluster only once and the mining can be performed in parallel many times with different parameter settings for minimum support. The algorithm is based on a master-slave scheme where a coordinator controls the data parallel programs running on a number of nodes of the cluster. The parallel program was executed on a cluster of Alpha SMPs. The performance of the algorithm was studied on small and large dense datasets. We report the results of the experiments that show both speed up and scale up of our algorithm along with our conclusions and pointers for further work.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2018

A Review on Methods for Detecting SNP Interactions in High-Dimensional Genomic Data

Suneetha Uppu; Aneesh Krishna; Raj P. Gopalan

In this era of genome-wide association studies (GWAS), the quest for understanding the genetic architecture of complex diseases is rapidly increasing more than ever before. The development of high throughput genotyping and next generation sequencing technologies enables genetic epidemiological analysis of large scale data. These advances have led to the identification of a number of single nucleotide polymorphisms (SNPs) responsible for disease susceptibility. The interactions between SNPs associated with complex diseases are increasingly being explored in the current literature. These interaction studies are mathematically challenging and computationally complex. These challenges have been addressed by a number of data mining and machine learning approaches. This paper reviews the current methods and the related software packages to detect the SNP interactions that contribute to diseases. The issues that need to be considered when developing these models are addressed in this review. The paper also reviews the achievements in data simulation to evaluate the performance of these models. Further, it discusses the future of SNP interaction analysis.

australian joint conference on artificial intelligence | 2006

Clustering transactional data streams

Yanrong Li; Raj P. Gopalan

The challenge of mining data streams is three fold. Firstly, an algorithm for a particular data mining task is subject to the sequential one-pass constraint; secondly, it must work under bounded resources such as memory and disk space; thirdly, it should have capabilities to answer time-sensitive queries. Dealing with transactional data streams is even more challenging due to their high dimensionality and sparseness. In this paper, algorithms for clustering transactional data streams are proposed by incorporating the incremental clustering algorithm INCLUS into the equal-width time window model and the elastic time window model. These algorithms can efficiently cluster a transactional data stream in one pass and answer time sensitive queries at different granularities with limited resources.

intelligent data engineering and automated learning | 2003

Improving the Efficiency of Frequent Pattern Mining by Compact Data Structure Design

Raj P. Gopalan; Yudho Giri Sucahyo

Mining frequent patterns has been a topic of active research because it is computationally the most expensive step in association rule discovery. In this paper, we discuss the use of compact data structure design for improving the efficiency of frequent pattern mining. It is based on our work in developing efficient algorithms that outperform the best available frequent pattern algorithms on a number of typical data sets. We discuss improvements to the data structure design that has resulted in faster frequent pattern discovery. The performance of our algorithms is studied by comparing their running times on typical test data sets against the fastest Apriori, Eclat, FP-Growth and OpportuneProject algorithms. We discuss the performance results as well as the strengths and limitations of our algorithms.

Journal of Software | 2016

A Deep Learning Approach to Detect SNP Interactions

Suneetha Uppu; Aneesh Krishna; Raj P. Gopalan

The susceptibility of complex diseases are characterised by numerous genetic, lifestyle, and environmental causes individually or due to their interaction effects. The recent explosion in detecting genetic interacting factors is increasingly revealing the underlying biological networks behind complex diseases. Several computational methods are explored to discover interacting polymorphisms among unlinked loci. However, there has been no significant breakthrough towards solving this problem because of biomolecular complexities and computational limitations. Our previous research trained a deep multilayered feedforward neural network to predict two-locus polymorphisms due to interactions in genome-wide data. The performance of the method was studied on numerous simulated datasets and a published genomewide dataset. In this manuscript, the performance of the trained multilayer neural network is validated by varying the parameters of the models under various scenarios. Furthermore, the observations of the previous method are confirmed in this study by evaluating on a real dataset. The experimental findings on a real dataset show significant rise in the prediction accuracy over other conventional techniques. The result shows highly ranked interacting two-locus polymorphisms, which may be associated with susceptibility for the development of breast cancer.

Explore More