Girish Keshav Palshikar
Tata Consultancy Services
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Girish Keshav Palshikar.
Data Mining and Knowledge Discovery | 2008
Girish Keshav Palshikar; Manoj Apte
Many mal-practices in stock market trading—e.g., circular trading and price manipulation—use the modus operandi of collusion. Informally, a set of traders is a candidate collusion set when they have “heavy trading” among themselves, as compared to their trading with others. We formalize the problem of detection of collusion sets, if any, in the given trading database. We show that naïve approaches are inefficient for real-life situations. We adapt and apply two well-known graph clustering algorithms for this problem. We also propose a new graph clustering algorithm, specifically tailored for detecting collusion sets. A novel feature of our approach is the use of Dempster–Schafer theory of evidence to combine the candidate collusion sets detected by individual algorithms. Treating individual experiments as evidence, this approach allows us to quantify the confidence (or belief) in the candidate collusion sets. We present detailed simulation experiments to demonstrate effectiveness of the proposed algorithms.
pattern recognition and machine intelligence | 2007
Girish Keshav Palshikar
Keywords characterize the topics discussed in a document. Extracting a small set of keywords from a single document is an important problem in text mining. We propose a hybrid structural and statistical approach to extract keywords. We represent the given document as an undirected graph, whose vertices are words in the document and the edges are labeled with a dissimilarity measure between two words, derived from the frequency of their co-occurrence in the document. We propose that central vertices in this graph are candidates as keywords. We model importance of a word in terms of its centrality in this graph. Using graph-theoretical notions of vertex centrality, we suggest several algorithms to extract keywords from the given document. We demonstrate the effectiveness of the proposed algorithms on real-life documents.
data and knowledge engineering | 2007
Girish Keshav Palshikar; Mandar S. Kale; Manoj Apte
A well-known problem that limits the practical usage of association rule mining algorithms is the extremely large number of rules generated. Such a large number of rules makes the algorithms inefficient and makes it difficult for the end users to comprehend the discovered rules. We present the concept of a heavy itemset. An itemset A is heavy (for given support and confidence values) if all possible association rules made up of items only in A are present. We prove a simple necessary and sufficient condition for an itemset to be heavy. We present a formula for the number of possible rules for a given heavy itemset, and show that a heavy itemset compactly represents an exponential number of association rules. Along with two simple search algorithms, we present an efficient greedy algorithm to generate a collection of disjoint heavy itemsets in a given transaction database. We then present a modified apriori algorithm that starts with a given collection of disjoint heavy itemsets and discovers more heavy itemsets, not necessarily disjoint with the given ones.
Applied Soft Computing | 2004
Girish Keshav Palshikar
Abstract Many practical applications involving spatial aspects work with finite discrete space domains, e.g. map grids, railways track layouts and road networks. Such space domains are computationally tractable and often include specialised forms of spatial reasoning. Moreover, in such applications, the spatial information naturally includes various forms of approximation, uncertainty or inexactness. Fuzzy representations are then appropriate. In this paper, we reformulate the region connection calculus (RCC) framework for finite, discrete space domains in simple set-theoretical terms. We generalise RCC framework and develop several fuzzy spatial concepts like fuzzy regions, fuzzy directions, fuzzy named distances. We propose a fuzzification of standard spatial relations in RCC. For this purpose, we enhance the fuzzy set theory to include fuzzy definitions for membership, subset and set equality crisp binary relations between sets (fuzzy or crisp). We illustrate the approach using a discrete finite two-dimensional map grid as the space domain.
Pattern Recognition Letters | 2001
Girish Keshav Palshikar
Abstract Many systems collect vast amounts of data over time, which is used to perform critical tasks like diagnosis, surveillance, resource management, planning and forecasting. To effectively use the historical data for these purposes, it is important to analyze the data and to gain insight into its significant aspects, by identifying the presence and characteristics of specific patterns. We describe a fuzzy logical notation, enhanced with facilities for expressing approximate temporal patterns, to build compositional and abstract models of syntactic structure of patterns. We present an algorithm, which detects where and how strongly the given pattern (i.e., a formula) is present. The approach is illustrated by specifying and detecting fault patterns for trace-based diagnosis of dynamic systems.
international conference on data mining | 2010
Maitreya Natu; Girish Keshav Palshikar
Various real-life datasets can be viewed as a set of records consisting of attributes explaining the records and set of measures evaluating the records. In this paper, we address the problem of automatically discovering interesting subsets from such a dataset, such that the discovered interesting subsets have significantly different characteristics of performance than the rest of the dataset. We present an algorithm to discover such interesting subsets. The proposed algorithm uses a generic domain-independent definition of interestingness and uses various heuristics to intelligently prune the search space in order to build a solution scalable to large size datasets. This paper presents application of the interesting subset discovery algorithm on four real-world case-studies and demonstrates the effectiveness of the interesting subset discovery algorithm in extracting insights in order to identify problem areas and provide improvement recommendations to wide variety of systems.
international conference on data mining | 2010
Girish Keshav Palshikar; Harrick M. Vin; Mohammed Mudassar; Maitreya Natu
Support analytics (i.e., statistical analysis, modeling and mining of customer/operations support tickets data) is important in service industries. In this paper, we adopt a domain-driven data mining approach to support analytics with a focus on IT infrastructure Support (ITIS) services. We identify specific business questions and then propose algorithms for answering them. The questions are: (1) How to reduce the overall workload? (2) How to improve efforts spent in ticket processing? (3) How to improve compliance to service level agreements? We propose novel formalizations of these notions and propose rigorous statistics-based algorithms for these questions. The approach is domain-driven in the sense that the results produced are directly usable by and easy to understand for end-users having no expertise in data-mining, do not require any experimentation and often discover novel and non-obvious answers. All this helps in better acceptance among end-users and more active use of the results produced. The algorithms have been implemented and have produced satisfactory results on more than 25 real-life ITIS datasets, one of which we use for illustration.
Information & Software Technology | 2001
Girish Keshav Palshikar
Abstract Keeping the trains and tracks in a safe state is important for railway systems, which include automated control. ATO-2000 is an automated railway system that plans, operates, monitors and controls a small railway network of driver-less trains within a mine. The formal specifications, design and implementation of Checker Function (CF), a software sub-system responsible for maintaining safety in ATO-2000 are described. CF is an important component in a safety-critical, real-time, distributed, mobile computing system. The formal specifications (in Z) of the core safety requirements in ATO-2000 are presented which include a new representation of the track topology. Some fault tolerance of the data received from the field is achieved through data validation constraints. Command safety constraints conservatively validate outgoing commands so that no possible future system state is unsafe. A simple approach used to integrate formal methods in the industrial software development process is discussed. The paper concludes with a review of the lessons learnt.
international conference on distributed computing and internet technology | 2005
Girish Keshav Palshikar
Automatically finding interesting, novel or surprising patterns in time series data is useful in several applications, such as fault diagnosis and fraud detection. In this paper, we extend the notion of distance-based outliers to time series data and propose two algorithms to detect both global and local outliers in time series data. We illustrate these algorithms on some real datasets.
applications of natural language to data bases | 2016
Nitin Ramrakhiyani; Sachin Pawar; Girish Keshav Palshikar; Manoj Apte
Performance appraisal (PA) is an important Human Resources exercise conducted by most organizations. The text data generated during the PA process can be a source of valuable insights for management. As a new application area, analysis of a large PA dataset (100K sentences) of supervisor feedback text is carried out. As the first contribution, the paper redefines the notion of an aspect in the feedback text. Aspects in PA text are like activities characterized by verb-noun pairs. These activities vary dynamically from employee to employee (e.g. conduct training, improve coding) and can be challenging to identify than the static properties of products like a camera (e.g. price, battery life). Another important contribution of the paper is a novel enhancement to the Label Propagation (LP) algorithm to identify aspects from PA text. It involves induction of a prior distribution for each node and iterative identification of new aspects starting from a seed set. Evaluation using a manually labelled set of 500 verb-noun pairs suggests an improvement over multiple baselines.