Aleksandar Lazarevic | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aleksandar Lazarevic is active.

Explore More

Publication

Featured researches published by Aleksandar Lazarevic.

european conference on principles of data mining and knowledge discovery | 2003

SMOTEBoost: Improving Prediction of the Minority Class in Boosting

Nitesh V. Chawla; Aleksandar Lazarevic; Lawrence O. Hall; Kevin W. Bowyer

Many real world data mining applications involve learning from imbalanced data sets. Learning from data sets that contain very few instances of the minority (or interesting) class usually produces biased classifiers that have a higher predictive accuracy over the majority class(es), but poorer predictive accuracy over the minority class. SMOTE (Synthetic Minority Over-sampling TEchnique) is specifically designed for learning from imbalanced data sets. This paper presents a novel approach for learning from imbalanced data sets, based on a combination of the SMOTE algorithm and the boosting procedure. Unlike standard boosting where all misclassified examples are given equal weights, SMOTEBoost creates synthetic examples from the rare or minority class, thus indirectly changing the updating weights and compensating for skewed distributions. SMOTEBoost applied to several highly and moderately imbalanced data sets shows improvement in prediction performance on the minority class and overall improved F-values.

knowledge discovery and data mining | 2005

Feature bagging for outlier detection

Aleksandar Lazarevic; Vipin Kumar

Outlier detection has recently become an important problem in many industrial and financial applications. In this paper, a novel feature bagging approach for detecting outliers in very large, high dimensional and noisy databases is proposed. It combines results from multiple outlier detection algorithms that are applied using different set of features. Every outlier detection algorithm uses a small subset of features that are randomly selected from the original feature set. As a result, each outlier detector identifies different outliers, and thus assigns to all data records outlier scores that correspond to their probability of being outliers. The outlier scores computed by the individual outlier detection algorithms are then combined in order to find the better quality outliers. Experiments performed on several synthetic and real life data sets show that the proposed methods for combining outputs from multiple outlier detection algorithms provide non-trivial improvements over the base algorithm.

computational intelligence and data mining | 2007

Incremental Local Outlier Detection for Data Streams

Dragoljub Pokrajac; Aleksandar Lazarevic; Longin Jan Latecki

Outlier detection has recently become an important problem in many industrial and financial applications. This problem is further complicated by the fact that in many cases, outliers have to be detected from data streams that arrive at an enormous pace. In this paper, an incremental LOF (local outlier factor) algorithm, appropriate for detecting outliers in data streams, is proposed. The proposed incremental LOF algorithm provides equivalent detection performance as the iterated static LOF algorithm (applied after insertion of each data record), while requiring significantly less computational time. In addition, the incremental LOF algorithm also dynamically updates the profiles of data points. This is a very important property, since data profiles may change over time. The paper provides theoretical evidence that insertion of a new data point as well as deletion of an old data point influence only limited number of their closest neighbors and thus the number of updates per such insertion/deletion does not depend on the total number of points TV in the data set. Our experiments performed on several simulated and real life data sets have demonstrated that the proposed incremental LOF algorithm is computationally efficient, while at the same time very successful in detecting outliers and changes of distributional behavior in various data stream applications

Archive | 2005

Intrusion Detection: A Survey

Aleksandar Lazarevic; Vipin Kumar; Jaideep Srivastava

This chapter provides the overview of the state of the art in intrusion detection research. Intrusion detection systems are software and/or hardware components that monitor computer systems and analyze events occurring in them for signs of intrusions. Due to widespread diversity and complexity of computer infrastructures, it is difficult to provide a completely secure computer system. Therefore, there are numerous security systems and intrusion detection systems that address different aspects of computer security. This chapter first provides taxonomy of computer intrusions, along with brief descriptions of major computer attack categories. Second, a common architecture of intrusion detection systems and their basic characteristics are presented. Third, taxonomy of intrusion detection systems based on five criteria (information source, analysis strategy, time aspects, architecture, response) is given. Finally, intrusion detection systems are classified according to each of these categories and the most representative research prototypes are briefly described.

Distributed and Parallel Databases | 2002

Boosting Algorithms for Parallel and Distributed Learning

Aleksandar Lazarevic; Zoran Obradovic

The growing amount of available information and its distributed and heterogeneous nature has a major impact on the field of data mining. In this paper, we propose a framework for parallel and distributed boosting algorithms intended for efficient integrating specialized classifiers learned over very large, distributed and possibly heterogeneous databases that cannot fit into main computer memory. Boosting is a popular technique for constructing highly accurate classifier ensembles, where the classifiers are trained serially, with the weights on the training instances adaptively set according to the performance of previous classifiers. Our parallel boosting algorithm is designed for tightly coupled shared memory systems with a small number of processors, with an objective of achieving the maximal prediction accuracy in fewer iterations than boosting on a single processor. After all processors learn classifiers in parallel at each boosting round, they are combined according to the confidence of their prediction. Our distributed boosting algorithm is proposed primarily for learning from several disjoint data sites when the data cannot be merged together, although it can also be used for parallel learning where a massive data set is partitioned into several disjoint subsets for a more efficient analysis. At each boosting round, the proposed method combines classifiers from all sites and creates a classifier ensemble on each site. The final classifier is constructed as an ensemble of all classifier ensembles built on disjoint data sets. The new proposed methods applied to several data sets have shown that parallel boosting can achieve the same or even better prediction accuracy considerably faster than the standard sequential boosting. Results from the experiments also indicate that distributed boosting has comparable or slightly improved classification accuracy over standard boosting, while requiring much less memory and computational time since it uses smaller data sets.

knowledge discovery and data mining | 2001

The distributed boosting algorithm

Aleksandar Lazarevic; Zoran Obradovic

In this paper, we propose a general framework for distributed boosting intended for efficient integrating specialized classifiers learned over very large and distributed homogeneous databases that cannot be merged at a single location. Our distributed boosting algorithm can also be used as a parallel classification technique, where a massive database that cannot fit into main computer memory is partitioned into disjoint subsets for a more efficient analysis. In the proposed method, at each boosting round the classifiers are first learned from disjoint datasets and then exchanged amongst the sites. Finally the classifiers are combined into a weighted voting ensemble on each disjoint data set. The ensemble that is applied to an unseen test set represents an ensemble of ensembles built on all distributed sites. In experiments performed on four large data sets the proposed distributed boosting method achieved classification accuracy comparable or even slightly better than the standard boosting algorithm while requiring less memory and less computational time. In addition, the communication overhead of the distributed boosting algorithm is very small making it a viable alternative to the standard boosting for large-scale databases.

machine learning and data mining in pattern recognition | 2007

Outlier Detection with Kernel Density Functions

Longin Jan Latecki; Aleksandar Lazarevic; Dragoljub Pokrajac

Outlier detection has recently become an important problem in many industrial and financial applications. In this paper, a novel unsupervised algorithm for outlier detection with a solid statistical foundation is proposed. First we modify a nonparametric density estimate with a variable kernel to yield a robust local density estimation. Outliers are then detected by comparing the local density of each point to the local density of its neighbors. Our experiments performed on several simulated data sets have demonstrated that the proposed approach can outperform two widely used outlier detection algorithms (LOF and LOCI).

international symposium on neural networks | 2001

Effective pruning of neural network classifier ensembles

Aleksandar Lazarevic; Zoran Obradovic

Neural network ensemble techniques have been shown to be very accurate classification techniques. However, in some real-life applications a number of classifiers required to achieve a reasonable accuracy is enormously large and hence very space consuming. The paper proposes several methods for pruning neural network ensembles. The clustering based approach applies k-means clustering to entire set of classifiers in order to identify the groups of similar classifiers and then eliminates redundant classifiers inside each cluster. Another proposed approach contains the sequence of the depth-first building the tree of classifiers according to their diversity followed by the process of tree pruning. The proposed methods applied to several data sets have shown that by selecting an optimal subset of neural network classifiers, it is possible to obtain significantly smaller ensemble of classifiers while achieving the same or even slightly better generalizability as when using the entire ensemble.

Artificial Intelligence in Medicine | 2005

Applying spatial distribution analysis techniques to classification of 3D medical images

Dragoljub Pokrajac; Vasileios Megalooikonomou; Aleksandar Lazarevic; Despina Kontos; Zoran Obradovic

OBJECTIVE The objective of this paper is to classify 3D medical images by analyzing spatial distributions to model and characterize the arrangement of the regions of interest (ROIs) in 3D space. METHODS AND MATERIAL Two methods are proposed for facilitating such classification. The first method uses measures of similarity, such as the Mahalanobis distance and the Kullback-Leibler (KL) divergence, to compute the difference between spatial probability distributions of ROIs in an image of a new subject and each of the considered classes represented by historical data (e.g., normal versus disease class). A new subject is predicted to belong to the class corresponding to the most similar dataset. The second method employs the maximum likelihood (ML) principle to predict the class that most likely produced the dataset of the new subject. RESULTS The proposed methods have been experimentally evaluated on three datasets: synthetic data (mixtures of Gaussian distributions), realistic lesion-deficit data (generated by a simulator conforming to a clinical study), and functional MRI activation data obtained from a study designed to explore neuroanatomical correlates of semantic processing in Alzheimers disease (AD). CONCLUSION Performed experiments demonstrated that the approaches based on the KL divergence and the ML method provide superior accuracy compared to the Mahalanobis distance. The later technique could still be a method of choice when the distributions differ significantly, since it is faster and less complex. The obtained classification accuracy with errors smaller than 1% supports that useful diagnosis assistance could be achieved assuming sufficiently informative historic data and sufficient information on the new subject.

medical image computing and computer assisted intervention | 2004

Extraction of Discriminative Functional MRI Activation Patterns and an Application to Alzheimer’s Disease

Despina Kontos; Vasileios Megalooikonomou; Dragoljub Pokrajac; Aleksandar Lazarevic; Zoran Obradovic; Orest B. Boyko; James Ford; Fillia Makedon; Andrew J. Saykin

We propose a novel Dynamic Recursive Partitioning approach for discovering discriminative patterns of functional MRI activation. The goal is to efficiently identify spatial regions that are associated with non-spatial variables through adaptive recursive partitioning of the 3D space into a number of hyper-rectangles utilizing statistical tests. As a case study, we analyze fMRI datasets obtained from a study that explores neuroanatomical correlates of semantic processing in Alzheimer’s disease. We seek to discover brain activation areas that discriminate controls from patients. We evaluate the results by presenting classification experiments that utilize information extracted from these regions. The discovered areas elucidated large hemispheric and lobar differences being consistent with prior findings. The overall classification accuracy based on activation patterns in these areas exceeded 90%. The proposed approach being general enough has great potential for elucidating structure-function relationships and can be valuable to human brain mapping.

Explore More