Ramesh C. Agarwal | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ramesh C. Agarwal is active.

Explore More

Publication

Featured researches published by Ramesh C. Agarwal.

Journal of Parallel and Distributed Computing | 2001

A Tree Projection Algorithm for Generation of Frequent Item Sets

Ramesh C. Agarwal; Charu C. Aggarwal; V. V. V. Prasad

In this paper we propose algorithms for generation of frequent item sets by successive construction of the nodes of a lexicographic tree of item sets. We discuss different strategies in generation and traversal of the lexicographic tree such as breadth-first search, depth-first search, or a combination of the two. These techniques provide different trade-offs in terms of the I/O, memory, and computational time requirements. We use the hierarchical structure of the lexicographic tree to successively project transactions at each node of the lexicographic tree and use matrix counting on this reduced set of transactions for finding frequent item sets. We tested our algorithm on both real and synthetic data. We provide an implementation of the tree projection method which is up to one order of magnitude faster than other recent techniques in the literature. The algorithm has a well-structured data access pattern which provides data locality and reuse of data for multiple levels of the cache. We also discuss methods for parallelization of the TreeProjection algorithm.

knowledge discovery and data mining | 2000

Depth first generation of long patterns

Ramesh C. Agarwal; Charu C. Aggarwal; V. V. V. Prasad

ABSTRACT In this paper we present an algorithm for mining long patterns in databases. The algorithm nds large itemsets by using depth rst search on a lexicographic tree of itemsets. The focus of this paper is to develop CPU-e cient algorithms for nding frequent itemsets in the cases when the database contains patterns which are very wide. We refer to this algorithm as DepthProject, and it achieves more than one order of magnitude speedup over the recently proposed MaxMiner algorithm for nding long patterns. These techniques may be quite useful for applications in areas such as computational biology in which the number of records is relatively small, but the itemsets are very long. This necessitates the discovery of patterns using algorithms which are especially tailored to the nature of such domains.

international conference on data mining | 2001

Evaluating boosting algorithms to classify rare classes: comparison and improvements

Mahesh V. Joshi; Vipin Kumar; Ramesh C. Agarwal

Classification of rare events has many important data mining applications. Boosting is a promising meta-technique that improves the classification performance of any weak classifier. So far, no systematic study has been conducted to evaluate how boosting performs for the task of mining rare classes. The authors evaluate three existing categories of boosting algorithms from the single viewpoint of how they update the example weights in each iteration, and discuss their possible effect on recall and precision of the rare class. We propose enhanced algorithms in two of the categories, and justify their choice of weight updating parameters theoretically. Using some specially designed synthetic datasets, we compare the capability of all the algorithms from the rare class perspective. The results support our qualitative analysis, and also indicate that our enhancements bring an extra capability for achieving better balance between recall and precision in mining rare classes.

international conference on management of data | 2001

Mining needle in a haystack: classifying rare classes via two-phase rule induction

Mahesh Joshi; Ramesh C. Agarwal; Vipin Kumar

Learning models to classify rarely occurring target classes is an important problem with applications in network intrusion detection, fraud detection, or deviation detection in general. In this paper, we analyze our previously proposed two-phase rule induction method in the context of learning complete and precise signatures of rare classes. The key feature of our method is that it separately conquers the objectives of achieving high recall and high precision for the given target class. The first phase of the method aims for high recall by inducing rules with high support and a reasonable level of accuracy. The second phase then tries to improve the precision by learning rules to remove false positives in the collection of the records covered by the first phase rules. Existing sequential covering techniques try to achieve high precision for each individual disjunct learned. In this paper, we claim that such approach is inadequate for rare classes, because of two problems: splintered false positives and error-prone small disjuncts. Motivated by the strengths of our two-phase design, we design various synthetic data models to identify and analyze the situations in which two state-of-the-art methods, RIPPER and C4.5 rules, either fail to learn a model or learn a very poor model. In all these situations, our two-phase approach learns a model with significantly better recall and precision levels. We also present a comparison of the three methods on a challenging real-life network intrusion detection dataset. Our method is significantly better or comparable to the best competitor in terms of achieving better balance between recall and precision.

international world wide web conferences | 2003

Dynamic maintenance of web indexes using landmarks

Lipyeow Lim; Min Wang; Sriram Padmanabhan; Jeffrey Scott Vitter; Ramesh C. Agarwal

Recent work on incremental crawling has enabled the indexed document collection of a search engine to be more synchronized with the changing World Wide Web. However, this synchronized collection is not immediately searchable, because the keyword index is rebuilt from scratch less frequently than the collection can be refreshed. An inverted index is usually used to index documents crawled from the web. Complete index rebuild at high frequency is expensive. Previous work on incremental inverted index updates have been restricted to adding and removing documents. Updating the inverted index for previously indexed documents that have changed has not been addressed.In this paper, we propose an efficient method to update the inverted index for previously indexed documents whose contents have changed. Our method uses the idea of landmarks together with the diff algorithm to significantly reduce the number of postings in the inverted index that need to be updated. Our experiments verify that our landmark-diff method results in significant savings in the number of update operations on the inverted index.

knowledge discovery and data mining | 2002

Predicting rare classes: can boosting make any weak learner strong?

Mahesh Joshi; Ramesh C. Agarwal; Vipin Kumar

Boosting is a strong ensemble-based learning algorithm with the promise of iteratively improving the classification accuracy using any base learner, as long as it satisfies the condition of yielding weighted accuracy > 0.5. In this paper, we analyze boosting with respect to this basic condition on the base learner, to see if boosting ensures prediction of rarely occurring events with high recall and precision. First we show that a base learner can satisfy the required condition even for poor recall or precision levels, especially for very rare classes. Furthermore, we show that the intelligent weight updating mechanism in boosting, even in its strong cost-sensitive form, does not prevent cases where the base learner always achieves high precision but poor recall or high recall but poor precision, when mapped to the original distribution. In either of these cases, we show that the voting mechanism of boosting falls to achieve good overall recall and precision for the ensemble. In effect, our analysis indicates that one cannot be blind to the base learner performance, and just rely on the boosting mechanism to take care of its weakness. We validate our arguments empirically on variety of real and synthetic rare class problems. In particular, using AdaCost as the boosting algorithm, and variations of PNrule and RIPPER as the base learners, we show that if algorithm A achieves better recall-precision balance than algorithm B, then using A as the base learner in AdaCost yields significantly better performance than using B as the base learner.

Ibm Journal of Research and Development | 1994

A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication

Ramesh C. Agarwal; Fred G. Gustavson; Mohammad Zubair

In this paper, we propose a scheme for matrix-matrix multiplication on a distributed-memory parallel computer. The scheme hides almost all of the communication cost with the computation and uses the standard, optimized Level-3 BLAS operation on each node. As a result, the overall performance of the scheme is nearly equal to the performance of the Level-3 optimized BLAS operation times the number of nodes in the computer, which is the peak performance obtainable for parallel BLAS. Another feature of our algorithm is that it can give peak performance for larger ®Copyright 1994 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor. matrices, even if the underlying communication network of the computer is slow .

conference on high performance computing (supercomputing) | 1992

A high performance algorithm using pre-processing for the sparse matrix-vector multiplication

Ramesh C. Agarwal; Fred G. Gustavson; Mohammad Zubair

The authors propose a feature-extraction-based algorithm (FEBA) for sparse matrix-vector multiplication. The key idea of FEBA is to exploit any regular structure present in the sparse matrix by extracting it and processing it separately. The order in which these structures are extracted is determined by the relative efficiency with which they can be processed. The authors have tested FEBA on IBM 3000 VF for matrices from the Harwell Boeing and OSL collection. The results obtained were on average five times faster than the ESSL routine which is based on the ITPACK storage structure.<<ETX>>

Ibm Journal of Research and Development | 1986

Fourier transform and convolution subroutines for the IBM 3090 Vector facility

Ramesh C. Agarwal; James W. Cooley

A set of highly optimized subroutines for digital signal processing has been included in the Engineering and Scientific Subroutine Library (ESSL) for the IBM 3090 Vector Facility. These include FORTRAN-callable subroutines for Fourier transforms, convolution, and correlation. The subroutines are carefully designed and tuned for optimal vector and cache performance. Speedups of up to 9½ times over scalar performance on the 3090 have been obtained.

web age information management | 2001

Characterizing Web Document Change

Lipyeow Lim; Min Wang; Sriram Padmanabhan; Jeffrey Scott Vitter; Ramesh C. Agarwal

The World Wide Web is growing and changing at an astonishing rate. For the information in the web to be useful, web information systems such as search engines have to keep up with the growth and change of the web. In this paper we study how web documents change. In particular, we study two important characteristics of web document change that are directly related to keeping web information systems upto-date: the degree of the change and the clusteredness of the change. We analyze the evolution of web documents with respect to these two measures and discuss the implications for web information systems update.

Explore More