Chidanand Apte | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chidanand Apte is active.

Explore More

Publication

Featured researches published by Chidanand Apte.

ACM Transactions on Information Systems | 1994

Automated learning of decision rules for text categorization

Chidanand Apte; Fred J. Damerau; Sholom M. Weiss

We describe the results of extensive experiments using optimized rule-based induction methods on large document collections. The goal of these methods is to discover automatically classification patterns that can be used for general document categorization or personalized filtering of free text. Previous reports indicate that human-engineered rule-based systems, requiring many man-years of developmental efforts, have been successfully built to “read” documents and assign topics to them. We show that machine-generated decision rules appear comparable to human performance, while using the identical rule-based representation. In comparison with other machine-learning techniques, results on a key benchmark from the Reuters collection show a large gain in performance, from a previously reported 67% recall/precision breakeven point to 80.5%. In the context of a very high-dimensional feature space, several methodological alternatives are examined, including universal versus local dictionaries, and binary versus frequency-related features.

Future Generation Computer Systems | 1997

Data mining with decision trees and decision rules

Chidanand Apte; Sholom M. Weiss

Abstract This paper describes the use of decision tree and rule induction in data-mining applications. Of methods for classification and regression that have been developed in the fields of pattern recognition, statistics, and machine learning, these are of particular interest for data mining since they utilize symbolic and interpretable representations. Symbolic solutions can provide a high degree of insight into the decision boundaries that exist in the data, and the logic underlying them. This aspect makes these predictive-mining techniques particularly attractive in commercial and industrial data-mining applications. We present here a synopsis of some major state-of-the-art tree and rule mining methodologies, as well as some recent advances.

international acm sigir conference on research and development in information retrieval | 1994

Towards language independent automated learning of text categorization models

Chidanand Apte; Fred J. Damerau; Sholom M. Weiss

We describe the results of extensive machine learning experiments on large collections of Reuters’ English and German newswires. The goal of these experiments was to automatically discover classification patterns that can be used for assignment of topics to the individual newswires. Our results with the English newswire collection show a very large gain in performance as compared to published benchmarks, while our initial results with the German newswires appear very promising. We present our methodology, which seems to be insensitive to the language of the document collections, and discuss issues related to the differences in results that we have obtained for the two collections.

Communications of The ACM | 2002

Business applications of data mining

Chidanand Apte; Bing Liu; Edwin P. D. Pednault; Padhraic Smyth

They help identify and predict individual, as well as aggregate, behavior, as illustrated by four application domains: direct mail, retail, automobile insurance, and health care.

european conference on principles of data mining and knowledge discovery | 2000

Lightweight document clustering

Chidanand Apte; Sholom M. Weiss; Brian F. White

A lightweight document clustering method is described that operates in high dimensions, processes tens of thousands of documents and groups them into several thousand clusters, or by varying a single parameter, into a few dozen clusters. The method uses a reduced indexing view of the original documents, where only the k best keywords of each document are indexed. An efficient procedure for clustering is speci fied in two parts (a) compute k most similar documents for each document in the collection and (b) group the documents into clusters using these similarity scores. The method has been evaluated on a database of over 50,000 customer service problem reports that are reduced to 3,000 clusters and 5,000 exemplar documents. Results demonstrate efficient clustering performance with excellent group similarity measures.

Ibm Systems Journal | 2002

Predictive algorithms in the management of computer systems

Ricardo Vilalta; Chidanand Apte; Joseph L. Hellerstein; Sheng Ma; Sholom M. Weiss

Predictive algorithms play a crucial role in systems management by alerting the user to potential failures. We report on three case studies dealing with the prediction of failures in computer systems: (1) long-term prediction of performance variables (e.g., disk utilization), (2) short-term prediction of abnormal behavior (e.g., threshold violations), and (3) short-term prediction of system events (e.g., router failure). Empirical results show that predictive algorithms can be successfully employed in the estimation of performance variables and the prediction of critical events.

computational science and engineering | 1997

Data mining: an industrial research perspective

Chidanand Apte

Just what exactly is data mining? At a broad level, it is the process by which accurate and previously unknown information is extracted from large volumes of data. This information should be in a form that can be understood, acted upon, and used for improving decision processes. Obviously, with this definition, data mining encompasses a broad set of technologies, including data warehousing, database management, data analysis algorithms, and visualization. The crux of the appeal for this new technology lies in the data analysis algorithms, since they provide automated mechanisms for sifting through data and extracting useful information. The analysis capability of these algorithms, coupled with todays data warehousing and database management technology, make corporate and industrial data mining possible. The data representation model for such algorithms is quite straightforward. Data is considered to be a collection of records, where each record is a collection of fields. Using this tabular data model, data mining algorithms are designed to operate on the contents, under differing assumptions, and delivering results in differing formats. The data analysis algorithms (or data mining algorithms, as they are more popularly known nowadays) can be divided into three major categories based on the nature of their information extraction: predictive modeling (also called classification or supervised learning), clustering (also called segmentation or unsupervised learning), and frequent pattern extraction.

knowledge discovery and data mining | 2001

Segmentation-based modeling for advanced targeted marketing

Chidanand Apte; Eric Bibelnieks; Ramesh Natarajan; Edwin P. D. Pednault; Fateh A. Tipu; Deb Campbell; Bryan Nelson

Fingerhut Business Intelligence (BI) has a long and successful history of building statistical models to predict consumer behavior. The models constructed are typically segmentation-based models in which the target audience is split into subpopulations (i.e., customer segments) and individually tailored statistical models are then developed for each segment. Such models are commonly employed in the direct-mail industry; however, segmentation is often performed on an ad-hoc basis without directly considering how segmentation affects the accuracy of the resulting segment models. Fingerhut BI approached IBM Research with the problem of how to build segmentation-based models more effectively so as to maximize predictive accuracy. The IBM Advanced Targeted Marketing-Single EventsTM (IBM ATM-SETM) solution is the result of IBM Research and Fingerhut BI directing their efforts jointly towards solving this problem. This paper presents an evaluation of ATM-SEs modeling capabilities using data from Fingerhuts catalog mailings.

knowledge discovery and data mining | 2004

Cross channel optimized marketing by reinforcement learning

Naoki Abe; Naval K. Verma; Chidanand Apte; Robert Schroko

The issues of cross channel integration and customer life time value modeling are two of the most important topics surrounding customer relationship management (CRM) today. In the present paper, we describe and evaluate a novel solution that treats these two important issues in a unified framework of Markov Decision Processes (MDP). In particular, we report on the results of a joint project between IBM Research and Saks Fifth Avenue to investigate the applicability of this technology to real world problems. The business problem we use as a testbed for our evaluation is that of optimizing direct mail campaign mailings for maximization of profits in the store channel. We identify a problem common to cross-channel CRM, which we call the Cross-Channel Challenge, due to the lack of explicit linking between the marketing actions taken in one channel and the customer responses obtained in another. We provide a solution for this problem based on old and new techniques in reinforcement learning. Our in-laboratory experimental evaluation using actual customer interaction data show that as much as 7 to 8 per cent increase in the store profits can be expected, by employing a mailing policy automatically generated by our methodology. These results confirm that our approach is valid in dealing with the cross channel CRM scenarios in the real world.

conference on artificial intelligence for applications | 1993

Predicting defects in disk drive manufacturing: A case study in high-dimensional classification

Chidanand Apte; S. Weiss; G. Grout

The authors consider the application of several computationally intensive classification techniques to disk drive manufacturing quality control. This application is characterized by very high dimensions, with hundreds of features, and tens of thousands of cases. The efforts were directed toward a search for knowledge that would assist the engineers in providing another increment of performance to the already highly performing knowledge-based system. Two principal issues are addressed. The issues are whether a very expensive testing process can be eliminated while still maintaining high quality throughput in disk drive manufacturing, and whether the manufacturing process can be made more efficient by identifying bad disk drives prior to the expensive testing. Preliminary results indicate that although the expensive testing cannot be completely eliminated, a fraction of the disk drives can be determined to be faulty prior to further testing. This detection may improve the throughput of the manufacturing line.<<ETX>>

Explore More