P. Viswanath
Indian Institutes of Information Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by P. Viswanath.
International Journal of Machine Learning and Cybernetics | 2013
T. Hitendra Sarma; P. Viswanath; B. Eswara Reddy
Abstractk-means clustering method is an iterative partition-based method which for finite data-sets converges to a solution in a finite time. The running time of this method grows linearly with respect to the size of the data-set. Many variants have been proposed to speed-up the conventional k-means clustering method. In this paper, we propose a prototype-based hybrid approach to speed-up the k-means clustering method. The proposed method, first partitions the data-set into small clusters (grouplets), which are of varying sizes. Each grouplet is represented by a prototype. Later, the set of prototypes is partitioned into k clusters using the modified k-means method. The modified k-means clustering method is similar to the conventional k-means method but it avoids empty clusters (the clusters to which no pattern is assigned) in the iterative process. In each cluster of prototypes, each prototype is replaced by its corresponding set of patterns (which formed the grouplet) to derive a partition of the data-set. Since this partition of the data-set can deviate from the partition obtained using the conventional k-means method over the entire data-set, a correcting step is proposed. Both theoretically and experimentally, the conventional k-means method and the proposed hybrid method (augmented with the correcting step) are shown to yield the same result (provided, the initial k seed points are same). But, the proposed method is much faster than the conventional one. Experimentally, the proposed method is compared with the conventional method and the other recent methods that are proposed to speed-up the k-means method.
ieee recent advances in intelligent computational systems | 2011
P. Viswanath; T. Hitendra Sarma
Non-parametric methods like Nearest neighbor classifier (NNC) and its variants such as k-nearest neighbor classifier (k-NNC) are simple to use and often shows good performance in practice. It stores all training patterns and searches to find k nearest neighbors of the given test pattern. Some fundamental improvements to k-NNC are (i) weighted k-nearest neighbor classifier (wk-NNC) where a weight to each of the neighbors is given and is used in the classification, (ii) to use a bootstrapped training set instead of the given training set, etc. Hamamoto et. al. [1] has given a bootstrapping method, where a training pattern is replaced by a weighted mean of a few of its neighbors from its own class of training patterns. It is shown to improve the classification accuracy in most of the cases. The time to create the bootstrapped set is O(n2) where n is the number of training patterns. This paper presents a novel improvement to the k-NNC called k-Nearest Neighbor Mean Classifier (k-NNMC). k-NNMC finds k nearest neighbors for each class of training patterns separately, and finds means for each of these k neighbors (class-wise). Classification is done according to the nearest mean pattern. It is shown experimentally using several standard data-sets that the proposed classifier shows better classification accuracy over k-NNC, wk-NNC and k-NNC using Hamamotos bootstrapped training set. Further, the proposed method does not have a design phase as the Hamamotos method, and this is suitable for parallel implementations which can be coupled with any indexing and space reduction methods easily. It is a suitable method to be used in data mining applications.
Pattern Recognition Letters | 2013
T. Hitendra Sarma; P. Viswanath; B. Eswara Reddy
Kernel k-means clustering method has been proved to be effective in identifying non-isotropic and linearly inseparable clusters in the input space. However, this method is not a suitable one for large datasets because of its quadratic time complexity with respect to the size of the dataset. This paper presents a simple prototype based hybrid approach to speed-up the kernel k-means clustering method for large datasets. The proposed method works in two stages. First, the dataset is partitioned into a number of small grouplets by using the leaders clustering method which takes the size of each grouplet, called the threshold t, as an input parameter. The conventional leaders clustering method is modified such that these grouplets are formed in the kernel induced feature space, but each grouplet is represented by a pattern (called its leader) in the input space. The dataset is re-indexed according to these grouplets. Later, the kernel k-means clustering method is applied over the set of leaders to derive a partition of the leaders set. Finally, each leader is replaced by its group to get a partition of the entire dataset. The time complexity as well as space complexity of the proposed method is O(n+p^2), where p is the number of leaders. The overall running time and the quality of the clustering result depends on the threshold t and the order in which the dataset is scanned. This paper presents a study on how the input parameter t affects the overall running time and the clustering quality obtained by the proposed method. Further, both theoretically and experimentally it has been shown how the order of scanning of the dataset affects the clustering result. The proposed method is also compared with the other recent methods that are proposed to speed-up the kernel k-means clustering method. Experimental study with several real world as well as synthetic datasets shows that, for an appropriate value of t, the proposed method can significantly reduce the computation time but with a small loss in clustering quality, particularly for large datasets.
ieee recent advances in intelligent computational systems | 2011
T. Hitendra Sarma; P. Viswanath; B. Eswara Reddy
In unsupervised classification, kernel k-means clustering method has been shown to perform better than conventional k-means clustering method in identifying non-isotropic clusters in a data set. The space and time requirements of this method are O(n2), where n is the data set size. The paper proposes a two stage hybrid approach to speed-up the kernel k-means clustering method. In the first stage, the data set is divided in to a number of group-lets by employing a fast clustering method called leaders clustering method. Each group-let is represented by a prototype called its leader. The set of leaders, which depends on a threshold parameter, can be derived in O(n) time. The paper presents a modification to the leaders clustering method where group-lets are found in the kernel space (not in the input space), but are represented by leaders in the input space. In the second stage, kernel k-means clustering method is applied with the set of leaders to derive a partition of the set of leaders. Finally, each leader is replaced by its group to get a partition of the data set. The proposed method has time complexity of O(n+p2), where p is the leaders set size. Its space complexity is also O(n+p2). The proposed method can be easily implemented. Experimental results shows that, with a small loss of quality, the proposed method can significantly reduce the time taken than the conventional kernel k-means clustering method.
pattern recognition and machine intelligence | 2009
T. Hitendra Sarma; P. Viswanath
The paper is about speeding-up the k-means clustering method which processes the data in a faster pace, but produces the same clustering result as the k-means method. We present a prototype based method for this where prototypes are derived using the leaders clustering method. Along with prototypes called leaders some additional information is also preserved which enables in deriving the k means. Experimental study is done to compare the proposed method with recent similar methods which are mainly based on building an index over the data-set.
international joint conference on neural network | 2016
T. Hitendra Sarma; P. Viswanath; Atul Negi
Kernel k-means is seen as a non-linear extension of the k-means clustering method, with good performance in identifying non-isotropic and linearly inseparable clusters. However space and time requirement of kernel k-means is expensive with O(n2) complexity. Present applications with large in-memory computations make this method insuitable for large data sets. Recently, a simple prototype based hybrid approach speedsup kernel k-means method for large data sets [1]. The time complexity of this method is O(n + p2), where p is the number of prototypes. Each prototype is a representative pattern of a group-let of size (threshold) τ . The time complexity of this method not only depends upon p but which in turn depends on clustering threshold. Increasing the threshold value can decrease the number of prototypes p, but, quality of the clustering result might suffer. Hence fixing the appropriate value of the threshold is the major challenge in this approach. This paper, presents a solution to this problem, by allowing τ to vary, depending on the location of the group-let in the space. Intuitively, If the grouplet is close to a cluster center (and away from others) then its size could be large, but if it is lying somewhere between two cluster centers, then its size should be small. It is experimentally shown that this reduces the clustering time and also increases the clustering accuracy. The presented method is a suitable one for large data sets like in data mining.
Archive | 2017
P. Viswanath; J. Rohini; Y.C.A. Padmanabha Reddy
In an information retrieval system, predicting query performance, for keyword based queries is important in giving early feedback to the user which can result in an improved query which in turn results in a better query result. There exists clarity score based and ranking robustness score based techniques to solve this problem. Both these, eventhough shows good performance, suffers from high computational time needs and are post-retrieval methods. In contrast to this, there do exist several pre-retrieval parameters which can judge the query without executing it. Pre-retrieval parameters based on distribution of information in query terms, which basically depends on inverse document frequency (idf) of query terms, are shown to be good predictors. Among these, the standard-deviation of idf values of query terms is known to be better. This paper generalizes this and proposes to use joint idf for a set of terms together, than using each term’s idf individually. Empirical studies are done using some standard data sets. The parameters based on the proposed method are shown to be better than the previous method which is nothing but a special case of the proposed method.
ieee recent advances in intelligent computational systems | 2011
P. Viswanath; K. Rajesh; C. Lavanya; Y.C.A. Padmanabha Reddy
Transductive learning is a special case of semi-supervised learning, where class labels to the test patterns alone are found. That is, the domain of the learner is the test set alone. Often, transductive learners achieve a better classification accuracy, since additional information in the form of test patterns location in the feature-space is used. For several inductive learners, there exists corresponding transductive learners; like for SVMs there exists transductive SVMs (TSVMs). For nearest neighbor based classifiers, their corresponding transductive methods are achieved through graph mincuts or spectral graph mincuts. It is shown that these solutions achieve low leave-one-out cross-validation (LOOCV) error with respect to nearest neighbor based classifiers. It is formally shown in the paper that, through a clustering method, it is possible to get various solutions having zero LOOCV error with respect to nearest neighbor based classifiers. Some solutions can have low classification accuracy. The paper proposes, instead of optimizing LOOCV error, to optimize a margin like criterion. This criterion is based on the observation that similar labeled patterns should be nearer to each other, while dissimilar labeled patterns should be far away. An approximate method to solve the proposed optimization problem is given in the paper which is called selective incremental transductive nearest neighbor classifier (SI-TNNC). SI-TNNC finds the test pattern from the test set which is very close to one class of training patterns and at the same time very much away from the other class of training examples. The selected test pattern is given its nearest neighbors label and is added to the training set. This pattern is removed from the test set. The process is repeated with the next best test pattern, and is stopped only when the test set becomes empty. An algorithm to implement SI-TNNC method is given in the paper which has a quadratic time complexity. Other related solutions have either cubic time complexity or are NP-hard. Experimentally, using several standard data-sets, it is shown that the proposed transductive learner achieves on-par or better classification accuracy than its related competitors.
computational intelligence | 2011
T. Gokaramaiah; P. Viswanath; B. Eswara Reddy
Content based image retrieval system (CBIR) retrieves images from a database based on the contents of the query image.Retrieval based on the shape of the 2D object present in the image is important in several applications. Shape of an objectis invariant to translation, scaling, rotation and mirror-reflection. Hence, the representation scheme which possesses all theseproperties is important. Signature histogram and k th order augmented histogram have all invariance properties [17]. But,they are applicable only to convex shapes. This representation scheme assumes that centroid to contour distance is a functionof angle (with a predefined axis). This is not true for non-convex and open shapes, since for some angles there can be more than onecentroid to contour distance. The current paper does not make this assumption, but considers distribution of centroid tocontour distances. Further, to reduce the false positive rate, distribution of local variations of the centroid contour distancesare also considered. Experimental studies are done using a standard image database and handwritten symbols database. The present technique is comparedagainst a similar recent technique.
Sadhana-academy Proceedings in Engineering Sciences | 2013
T. Hitendra Sarma; P. Viswanath; B. Eswara Reddy