D. K. Subramanian | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where D. K. Subramanian is active.

Explore More

Publication

Featured researches published by D. K. Subramanian.

Pattern Recognition Letters | 2003

Tree structure for efficient data mining using rough sets

V. S. Ananthanarayana; M. Narasimha Murty; D. K. Subramanian

In data mining, an important goal is to generate an abstraction of the data. Such an abstraction helps in reducing the space and search time requirements of the overall decision making process. Further, it is important that the abstraction is generated from the data with a small number of disk scans. We propose a novel data structure, pattern count tree (PC-tree), that can be built by scanning the database only once. PC-tree is a minimal size complete representation of the data and it can be used to represent dynamic databases with the help of knowledge that is either static or changing. We show that further compactness can be achieved by constructing the PC-tree on segmented patterns. We exploit the flexibility offered by rough sets to realize a rough PC-tree and use it for efficient and effective rough classification. To be consistent with the sizes of the branches of the PC-tree, we use upper and lower approximations of feature sets in a manner different from the conventional rough set theory. We conducted experiments using the proposed classification scheme on a large-scale hand-written digit data set. We use the experimental results to establish the efficacy of the proposed approach.

ieee international conference on high performance computing data and analytics | 2000

Scalable, Distributed and Dynamic Mining of Association Rules

V. S. Ananthanarayana; D. K. Subramanian; Mn Murty

We propose a novel pattern tree called Pattern Count tree (PC-tree) which is a complete and compact representation of the database. We show that construction of this tree and then generation of all large itemsets requires a single database scan where as the current algorithms need at least two database scans. The completeness property of the PCtree with respect to the database makes it amenable for mining association rules in the context of changing data and knowledge, which we call dynamic mining. Algorithms based on PC-tree are scalable because PC-tree is compact. We propose a partitioned distributed architecture and an efficient distributed association rule mining algorithm based on the PC-tree structure.

Pattern Recognition | 2001

Efficient clustering of large data sets

V. S. Ananthanarayana; M. Narasimha Murty; D. K. Subramanian

Clustering is an activity of finding abstractions from data and these abstractions can be used for decision making [1]. In this paper, we select the cluster representatives as prototypes for efficient classification [3]. There are a variety of clustering algorithms reported in the literature. However, clustering algorithms that perform multiple scans of large databases (of size in Tera bytes) residing on the disk demand prohibitive computational times. As a consequence, there is a growing interest in designing clustering algorithms that scan the database only once. Algorithms like BIRCH [2], Leader [5] and Single-pass k-means algorithm [4] belong to this category.

Knowledge Based Systems | 2003

Knowledge-based association rule mining using AND–OR taxonomies

D. K. Subramanian; V. S. Ananthanarayana; M. Narasimha Murty

We introduce a knowledge-based approach to mine generalized association rules which is sound and interactive. Proposed mining is sound because our scheme uses knowledge for mining for only those concepts that are of interest to the user. It is interactive because we provide a user controllable parameter with the help of which user can interactively mine. For this, we use a taxonomy based on functionality, and a restricted way of generalization of the items. We call such a taxonomy A O taxonomy and the corresponding generalization A O generalization. We claim that this type of generalization is more meaningful since it is based on a semantic-grouping of concepts. We use this knowledge to naturally exploit the mining of interesting negative association rules. We define the interestingness of association rules based on the level of the concepts in the taxonomy. We give an efficient algorithm based on A O taxonomy which not only derives generalized association rules, but also accesses the database only once.

Pattern Analysis and Applications | 2006

Efficient median based clustering and classification techniques for protein sequences

P. A. Vijaya; M. Narasimha Murty; D. K. Subramanian

In this paper, an efficient K-medians clustering (unsupervised) algorithm for prototype selection and Supervised K-medians (SKM) classification technique for protein sequences are presented. For sequence data sets, a median string/sequence can be used as the cluster/group representative. In K-medians clustering technique, a desired number of clusters, K, each represented by a median string/sequence, is generated and these median sequences are used as prototypes for classifying the new/test sequence whereas in SKM classification technique, median sequence in each group/class of labelled protein sequences is determined and the set of median sequences is used as prototypes for classification purpose. It is found that the K-medians clustering technique outperforms the leader based technique and also SKM classification technique performs better than that of motifs based approach for the data sets used. We further use a simple technique to reduce time and space requirements during protein sequence clustering and classification. During training and testing phase, the similarity score value between a pair of sequences is determined by selecting a portion of the sequence instead of the entire sequence. It is like selecting a subset of features for sequence data sets. The experimental results of the proposed method on K-medians, SKM and Nearest Neighbour Classifier (NNC) techniques show that the Classification Accuracy (CA) using the prototypes generated/used does not degrade much but the training and testing time are reduced significantly. Thus the experimental results indicate that the similarity score does not need to be calculated by considering the entire length of the sequence for achieving a good CA. Even space requirement is reduced during both training and classification.

pattern recognition and machine intelligence | 2005

An efficient hybrid hierarchical agglomerative clustering (HHAC) technique for partitioning large data sets

P. A. Vijaya; M. Narasimha Murty; D. K. Subramanian

In this paper, an efficient Hybrid Hierarchical Agglomerative Clustering (HHAC) technique is proposed for effective clustering and prototype selection for pattern classification. It uses the characteristics of both partitional (an incremental scheme) and Hierarchical Agglomerative Clustering (HAC) schemes. Initially, an incremental, partitional clustering algorithm – leader is used for finding the subgroups/subclusters. It reduces the time and space requirements incurred in the formation of the subclusters using the conventional hierarchical agglomerative schemes or other methods. Further, only the subcluster representatives are merged to get a required number of clusters using a hierarchical agglomerative scheme which now requires less space and time when compared to that of using it on the entire training set. Thus, this hybrid scheme would be suitable for clustering large data sets and we can get a hierarchical structure consisting of clusters and subclusters. The subcluster representatives of a cluster can also handle its arbitrary/non-spherical shape. The experimental results (Classification Accuracy (CA) using the prototypes obtained and the computation time) of the proposed algorithm are promising.

international conference on pattern recognition | 2004

An efficient technique for protein sequence clustering and classification

P.A. Vijaya; M.N. Murty; D. K. Subramanian

A technique to reduce time and space during protein sequence clustering and classification is presented. During training and testing phase, the similarity score value between a pair of sequences is determined by selecting a portion of the sequence instead of the entire sequence. It is like selecting a subset of features for sequence data sets. The experimental results of the proposed method show that the classification accuracy (CA) using the prototypes generated/used does not degrade much but the training and testing time are reduced significantly. Thus the experimental results indicate that the similarity score need not be calculated by considering the entire length of the sequence for achieving a good CA. Even space requirement is reduced during execution phase. We have tested this using K-medians, supervised K-medians and nearest neighbour classifier (NNC) techniques.

Pattern Recognition | 2001

Multi-dimensional semantic clustering of large databases for association rule mining

V. S. Ananthanarayana; M. Narasimha Murty; D. K. Subramanian

Clustering is an activity of finding abstractions from data [1]. These abstractions are mainly used for: (i) identifying outliers and (ii) other decision making activities. In this paper, we propose a novel application, association rule mining (ARM), based on abstractions/cluster descriptions obtained using clustering. A majority of ARM algorithms proposed in the literature are used for mining intra-transaction association rules. Typically, the ARM algorithms find large itemsets from the transaction database based on the frequency of co-occurrence of items. ARM involves two main steps [3. A.K.H. Tung, H. Lu, J. Han, L. Feng, Breaking the barrier of transactions: mining inter-transaction association rules, Proceedings of the 1999 International Conference on Knowledge Discovery and Data Mining, August 1999.3]: (i) generating large itemsets which are having the frequency of co-occurrence (support) greater than or equal to the user defined minimum support (σ) and (ii) generation of association rules from the large itemsets which satisfy the user-defined minimum confidence (c). Mining for complete set of inter-transaction association rules along a single dimension, the time axis, is proposed in [3]. In this paper, we propose a clustering scheme based on multiple dimensions for mining a complete set of inter-transaction association rules. This scheme has two components: (i) generation of descriptions of clusters based on multi-dimensional semantic grouping; our algorithm needs at most two database scans to do this step, and (ii) exploring associations between cluster descriptions/abstractions.2. Clustering process Clustering is a subjective process. It employs knowledge to group data. Knowledge is in the form of the similarity measure used, the values assigned to parameters like the number of clusters and assumptions on the nature of clusters, and structures that capture explicit knowledge [1]. We call clustering based on knowledge, semantic clustering. In this paper, we discuss clustering based on multiple dimensions including size of transaction, cost of transaction, and their combinations to generate inter-transaction associations.

Pattern Recognition Letters | 2004