Nikhil S. Ketkar | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nikhil S. Ketkar is active.

Explore More

Publication

Featured researches published by Nikhil S. Ketkar.

Proceedings of the 1st international workshop on open source data mining | 2005

Subdue: compression-based frequent pattern discovery in graph data

Nikhil S. Ketkar; Lawrence B. Holder; Diane J. Cook

A majority of the existing algorithms which mine graph datasets target complete, frequent sub-graph discovery. We describe the graph-based data mining system Subdue which focuses on the discovery of sub-graphs which are not only frequent but also compress the graph dataset, using a heuristic algorithm. The rationale behind the use of a compression-based methodology for frequent pattern discovery is to produce a fewer number of highly interesting patterns than to generate a large number of patterns from which interesting patterns need to be identified. We perform an experimental comparison of Subdue with the graph mining systems gSpan and FSG on the Chemical Toxicity and the Chemical Compounds datasets that are provided with gSpan. We present results on the performance on the Subdue system on the Mutagenesis and the KDD 2003 Citation Graph dataset. An analysis of the results indicates that Subdue can efficiently discover best-compressing frequent patterns which are fewer in number but can be of higher interest.

Sigkdd Explorations | 2005

Comparison of graph-based and logic-based multi-relational data mining

Nikhil S. Ketkar; Lawrence B. Holder; Diane J. Cook

We perform an experimental comparison of the graph-based multi-relational data mining system, Subdue, and the inductive logic programming system, CProgol, on the Mutagenesis dataset and various artificially generated Bongard problems. Experimental results indicate that Subdue can significantly outperform CProgol while discovering structurally large multi-relational concepts. It is also observed that CProgol is better at learning semantically complicated concepts and it tends to use background knowledge more effectively than Subdue. An analysis of the results indicates that the differences in the performance of the systems are a result of the difference in the expressiveness of the logic-based and the graph-based representations. The ability of graph-based systems to learn structurally large concepts comes from the use of a weaker representation whose expressiveness is intermediate between propositional and first-order logic. The use of this weaker representation is advantageous while learning structurally large concepts but it limits the learning of semantically complicated concepts and the utilization background knowledge.

Journal of Medicinal Chemistry | 2008

High confidence predictions of drug-drug interactions: predicting affinities for cytochrome P450 2C9 with multiple computational methods.

Matthew Hudelson; Nikhil S. Ketkar; Lawrence B. Holder; Timothy J. Carlson; Chi Chi Peng; Benjamin Waldher; Jeffrey P. Jones

Four different models are used to predict whether a compound will bind to 2C9 with a K(i) value of less than 10 microM. A training set of 276 compounds and a diverse validation set of 50 compounds were used to build and assess each model. The modeling methods are chosen to exploit the differences in how training sets are used to develop the predictive models. Two of the four methods develop partitioning trees based on global descriptions of structure using nine descriptors. A third method uses the same descriptors to develop local descriptions that relate activity to structures with similar descriptor characteristics. The fourth method uses a graph-theoretic approach to predict activity based on molecular structure. When all of these methods agree, the predictive accuracy is 94%. An external validation set of 11 compounds gives a predictive accuracy of 91% when all methods agree.

computational intelligence and data mining | 2009

Empirical comparison of graph classification algorithms

Nikhil S. Ketkar; Lawrence B. Holder; Diane J. Cook

The graph classification problem is learning to classify separate, individual graphs in a graph database into two or more categories. A number of algorithms have been introduced for the graph classification problem. We present an empirical comparison of the major approaches for graph classification introduced in literature, namely, SubdueCL, frequent subgraph mining in conjunction with SVMs, walk-based graph kernel, frequent subgraph mining in conjunction with AdaBoost and DT-CLGBI. Experiments are performed on five real world data sets from the Mutagenesis and Predictive Toxicology domain which are considered benchmark data sets for the graph classification problem. Additionally, experiments are performed on a corpus of artificial data sets constructed to investigate the performance of the algorithms across a variety of parameters of interest. Our conclusions are as follows. In datasets where the underlying concept has a high average degree, walk-based graph kernels perform poorly as compared to other approaches. The hypothesis space of the kernel is walks and it is insufficient at capturing concepts involving significant structure. In datasets where the underlying concept is disconnected, SubdueCL performs poorly as compared to other approaches. The hypothesis space of SubdueCL is connected graphs and it is insufficient at capturing concepts which consist of a disconnected graph. FSG+SVM, FSG+AdaBoost, DT-CLGBI have comparable performance in most cases.

international geoscience and remote sensing symposium | 2008

The Integraton of Graph-Based Knowledge Discovery with Image Segmentation Hierarchies for Data Analysis, Data Mining and Knowledge Discovery

James C. Tilton; Diane J. Cook; Nikhil S. Ketkar

Currently available pixel-based image analysis techniques do not effectively extract the information content from the increasingly available high spatial resolution remotely sensed imagery data. We are exploring an approach to object-based image analysis in which hierarchical image segmentations provided by the Recursive Hierarchical Segmentation (RHSEG) software are analyzed by the Subdue graph-based knowledge-discovery system. In this paper we discuss our initial approach to representing the RHSEG-produced hierarchical image segmentations in a graphical form understandable by Subdue, and discuss results from real and simulated data.

knowledge discovery and data mining | 2005

Qualitative comparison of graph-based and logic-based multi-relational data mining: a case study

Nikhil S. Ketkar; Lawrence B. Holder; Diane J. Cook

The goal of this paper is to generate insights about the differences between graph-based and logic-based approaches to multi-relational data mining by performing a case study of graph-based system, Subdue and the inductive logic programming system, CProgol. We identify three key factors for comparing graph-based and logic-based multi-relational data mining; namely, the ability to discover structurally large concepts, the ability to discover semantically complicated concepts and the ability to effectively utilize background knowledge. We perform an experimental comparison of Subdue and CProgol on the Mutagenesis domain and various artificially generated Bongard problems. Experimental results indicate that Subdue can significantly outperform CProgol while discovering structurally large multi-relational concepts. It is also observed that CProgol is better at learning semantically complicated concepts and it tends to use background knowledge more effectively than Subdue.

computational intelligence and data mining | 2009

Faster computation of the direct product kernel for graph classification

Nikhil S. Ketkar; Lawrence B. Holder; Diane J. Cook

The direct product kernel, introduced by Gärtner et al. for graph classification, is based on defining a feature for every possible label sequence in a labelled graph and counting how many label sequences in two given graphs are identical. Although the direct product kernel has achieved promising results in terms of accuracy, the kernel computation is not feasible for large graphs. This is because computing the direct product kernel for two graphs is essentially computing either the inverse of or by diagonalizing the adjacency matrix of the direct product of these two graphs. For two graphs with adjacency matrices of sizes m and n, the adjacency matrix of their direct product graph can be of size mn in the worst case. As both matrix inversion or matrix diagonalizing in the general case is O(n3), computing the direct product kernel is O((mn)3). Our survey of data sets in graph classification indicates that most graphs have adjacency matrices of sizes in the order of hundreds which often leads to adjacency matrices of direct product graphs (of two graphs) having sizes in the order of thousands. In this work we show how the direct product kernel can be computed in O((m + n)3). The key insight behind our result is that the language of label sequences in a labeled graph is a regular language and that regular languages are closed under union and intersection.

Archive | 2006