Robi Polikar | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robi Polikar is active.

Explore More

Publication

Featured researches published by Robi Polikar.

systems man and cybernetics | 2001

Learn++: an incremental learning algorithm for supervised neural networks

Robi Polikar; L. Upda; S. S. Upda; Vasant G. Honavar

We introduce Learn++, an algorithm for incremental training of neural network (NN) pattern classifiers. The proposed algorithm enables supervised NN paradigms, such as the multilayer perceptron (MLP), to accommodate new data, including examples that correspond to previously unseen classes. Furthermore, the algorithm does not require access to previously used data during subsequent incremental learning sessions, yet at the same time, it does not forget previously acquired knowledge. Learn++ utilizes ensemble of classifiers by generating multiple hypotheses using training data sampled according to carefully tailored distributions. The outputs of the resulting classifiers are combined using a weighted majority voting procedure. We present simulation results on several benchmark datasets as well as a real-world classification task. Initial results indicate that the proposed algorithm works rather well in practice. A theoretical upper bound on the error of the classifiers constructed by Learn++ is also provided.

IEEE Transactions on Neural Networks | 2011

Incremental Learning of Concept Drift in Nonstationary Environments

Ryan Elwell; Robi Polikar

We introduce an ensemble of classifiers-based approach for incremental learning of concept drift, characterized by nonstationary environments (NSEs), where the underlying data distributions change over time. The proposed algorithm, named Learn++.NSE, learns from consecutive batches of data without making any assumptions on the nature or rate of drift; it can learn from such environments that experience constant or variable rate of drift, addition or deletion of concept classes, as well as cyclical drift. The algorithm learns incrementally, as other members of the Learn++ family of algorithms, that is, without requiring access to previously seen data. Learn++.NSE trains one new classifier for each batch of data it receives, and combines these classifiers using a dynamically weighted majority voting. The novelty of the approach is in determining the voting weights, based on each classifiers time-adjusted accuracy on current and past environments. This approach allows the algorithm to recognize, and act accordingly, to the changes in underlying data distributions, as well as to a possible reoccurrence of an earlier distribution. We evaluate the algorithm on several synthetic datasets designed to simulate a variety of nonstationary environments, as well as a real-world weather prediction dataset. Comparisons with several other approaches are also included. Results indicate that Learn++.NSE can track the changing environments very closely, regardless of the type of concept drift. To allow future use, comparison and benchmarking by interested researchers, we also release our data used in this paper.

IEEE Transactions on Knowledge and Data Engineering | 2013

Incremental Learning of Concept Drift from Streaming Imbalanced Data

Gregory Ditzler; Robi Polikar

Learning in nonstationary environments, also known as learning concept drift, is concerned with learning from data whose statistical characteristics change over time. Concept drift is further complicated if the data set is class imbalanced. While these two issues have been independently addressed, their joint treatment has been mostly underexplored. We describe two ensemble-based approaches for learning concept drift from imbalanced data. Our first approach is a logical combination of our previously introduced Learn++.NSE algorithm for concept drift, with the well-established SMOTE for learning from imbalanced data. Our second approach makes two major modifications to Learn++.NSE-SMOTE integration by replacing SMOTE with a subensemble that makes strategic use of minority class data; and replacing Learn++.NSE and its class-independent error weighting mechanism with a penalty constraint that forces the algorithm to balance accuracy on all classes. The primary novelty of this approach is in determining the voting weights for combining ensemble members, based on each classifiers time and imbalance-adjusted accuracy on current and past environments. Favorable results in comparison to other approaches indicate that both approaches are able to address this challenging problem, each with its own specific areas of strength. We also release all experimental data as a resource and benchmark for future research.

IEEE Signal Processing Magazine | 2007

Bootstrap - Inspired Techniques in Computation Intelligence

Robi Polikar

This article is about the success story of a seemingly simple yet extremely powerful approach that has recently reached a celebrity status in statistical and engineering sciences. The hero of this story - bootstrap resampling - is relatively young, but the story itself is a familiar one within the scientific community: a mathematician or a statistician conceives and formulates a theory that is first developed by fellow mathematicians and then brought to fame by other professionals, typically engineers, who point to many applications that can benefit from just such an approach. Signal processing boasts some of the finest examples of such stories, such as the classic story of Fourier transforms or the more contemporary tale of wavelet transforms.

systems man and cybernetics | 2007

An Ensemble-Based Incremental Learning Approach to Data Fusion

Devi Parikh; Robi Polikar

This paper introduces Learn++, an ensemble of classifiers based algorithm originally developed for incremental learning, and now adapted for information/data fusion applications. Recognizing the conceptual similarity between incremental learning and data fusion, Learn++ follows an alternative approach to data fusion, i.e., sequentially generating an ensemble of classifiers that specifically seek the most discriminating information from each data set. It was observed that Learn++ based data fusion consistently outperforms a similarly configured ensemble classifier trained on any of the individual data sources across several applications. Furthermore, even if the classifiers trained on individual data sources are fine tuned for the given problem, Learn++ can still achieve a statistically significant improvement by combining them, if the additional data sets carry complementary information. The algorithm can also identify-albeit indirectly-those data sets that do not carry such additional information. Finally, it was shown that the algorithm can consecutively learn both the supplementary novel information coming from additional data of the same source, and the complementary information coming from new data sources without requiring access to any of the previously seen data

Progress in Artificial Intelligence | 2012

Learning from streaming data with concept drift and imbalance: an overview

T. Ryan Hoens; Robi Polikar; Nitesh V. Chawla

The primary focus of machine learning has traditionally been on learning from data assumed to be sufficient and representative of the underlying fixed, yet unknown, distribution. Such restrictions on the problem domain paved the way for development of elegant algorithms with theoretically provable performance guarantees. As is often the case, however, real-world problems rarely fit neatly into such restricted models. For instance class distributions are often skewed, resulting in the “class imbalance” problem. Data drawn from non-stationary distributions is also common in real-world applications, resulting in the “concept drift” or “non-stationary learning” problem which is often associated with streaming data scenarios. Recently, these problems have independently experienced increased research attention, however, the combined problem of addressing all of the above mentioned issues has enjoyed relatively little research. If the ultimate goal of intelligent machine learning algorithms is to be able to address a wide spectrum of real-world scenarios, then the need for a general framework for learning from, and adapting to, a non-stationary environment that may introduce imbalanced data can be hardly overstated. In this paper, we first present an overview of each of these challenging areas, followed by a comprehensive review of recent research for developing such a general framework.

instrumentation and measurement technology conference | 2004

An architecture for intelligent systems based on smart sensors

John L. Schmalzel; Fernando Figueroa; Jon Morris; Shreekanth Mandayam; Robi Polikar

Based on requirements for a next-generation rocket test facility, elements of a prototype IRTF have been implemented. A key component is distributed smart sensor elements integrated using a knowledgeware environment. One of the specific goals is to imbue sensors with the intelligence needed to perform self-diagnosis of health and to participate in a hierarchy of health determination at sensor, process, and system levels. The preliminary results provide the basis for future advanced development and validation using rocket test facilities at Stennis Space Center (SSC) 1. We have identified issues important to further development of health-enabled networks, which should be of interest to others working with smart sensors and intelligent health management systems.

IEEE Transactions on Ultrasonics Ferroelectrics and Frequency Control | 1998

Frequency invariant classification of ultrasonic weld inspection signals

Robi Polikar; Lalita Udpa; Satish S. Udpa; Tom Taylor

Automated signal classification systems are finding increasing use in many applications for the analysis and interpretation of large volumes of signals. Such systems show consistency of response and help reduce the effect of variabilities associated with human interpretation. This paper deals with the analysis of ultrasonic NDE signals obtained during weld inspection of piping in boiling water reactors. The overall approach consists of three major steps, namely, frequency invariance, multiresolution analysis, and neural network classification. The data are first preprocessed whereby signals obtained using different transducer center frequencies are transformed to an equivalent reference frequency signal. Discriminatory features are then extracted using a multiresolution analysis technique, namely, the discrete wavelet transform (DWT). The compact feature vector obtained using wavelet analysis is classified using a multilayer perceptron neural network. Two different databases containing weld inspection signals have been used to test the performance of the neural network. Initial results obtained using this approach demonstrate the effectiveness of the frequency invariance processing technique and the DWT analysis method employed for feature extraction.

Advances in Bioinformatics | 2008

Metagenome Fragment Classification Using N-Mer Frequency Profiles

Gail Rosen; Elaine Garbarine; Diamantino Caseiro; Robi Polikar; Bahrad A. Sokhansanj

A vast amount of microbial sequencing data is being generated through large-scale projects in ecology, agriculture, and human health. Efficient high-throughput methods are needed to analyze the mass amounts of metagenomic data, all DNA present in an environmental sample. A major obstacle in metagenomics is the inability to obtain accuracy using technology that yields short reads. We construct the unique N-mer frequency profiles of 635 microbial genomes publicly available as of February 2008. These profiles are used to train a naive Bayes classifier (NBC) that can be used to identify the genome of any fragment. We show that our method is comparable to BLAST for small 25 bp fragments but does not have the ambiguity of BLASTs tied top scores. We demonstrate that this approach is scalable to identify any fragment from hundreds of genomes. It also performs quite well at the strain, species, and genera levels and achieves strain resolution despite classifying ubiquitous genomic fragments (gene and nongene regions). Cross-validation analysis demonstrates that species-accuracy achieves 90% for highly-represented species containing an average of 8 strains. We demonstrate that such a tool can be used on the Sargasso Sea dataset, and our analysis shows that NBC can be further enhanced.

IEEE Transactions on Neural Networks | 2014

COMPOSE: A Semisupervised Learning Framework for Initially Labeled Nonstationary Streaming Data

Karl B. Dyer; Robert Capo; Robi Polikar

An increasing number of real-world applications are associated with streaming data drawn from drifting and nonstationary distributions that change over time. These applications demand new algorithms that can learn and adapt to such changes, also known as concept drift. Proper characterization of such data with existing approaches typically requires substantial amount of labeled instances, which may be difficult, expensive, or even impractical to obtain. In this paper, we introduce compacted object sample extraction (COMPOSE), a computational geometry-based framework to learn from nonstationary streaming data, where labels are unavailable (or presented very sporadically) after initialization. We introduce the algorithm in detail, and discuss its results and performances on several synthetic and real-world data sets, which demonstrate the ability of the algorithm to learn under several different scenarios of initially labeled streaming environments. On carefully designed synthetic data sets, we compare the performance of COMPOSE against the optimal Bayes classifier, as well as the arbitrary subpopulation tracker algorithm, which addresses a similar environment referred to as extreme verification latency. Furthermore, using the real-world National Oceanic and Atmospheric Administration weather data set, we demonstrate that COMPOSE is competitive even with a well-established and fully supervised nonstationary learning algorithm that receives labeled data in every batch.

Explore More