Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Andrei V. Kelarev is active.

Publication


Featured researches published by Andrei V. Kelarev.


Computer Methods and Programs in Biomedicine | 2013

Multistage approach for clustering and classification of ECG data

Jemal H. Abawajy; Andrei V. Kelarev; Morshed U. Chowdhury

Accurate and fast approaches for automatic ECG data classification are vital for clinical diagnosis of heart disease. To this end, we propose a novel multistage algorithm that combines various procedures for dimensionality reduction, consensus clustering of randomized samples and fast supervised classification algorithms for processing of the highly dimensional large ECG datasets. We carried out extensive experiments to study the effectiveness of the proposed multistage clustering and classification scheme using precision, recall and F-measure metrics. We evaluated the performance of numerous combinations of various methods for dimensionality reduction, consensus functions and classification algorithms incorporated in our multistage scheme. The results of the experiments demonstrate that the highest precision, recall and F-measure are achieved by the combination of the rank correlation coefficient for dimensionality reduction, HBGF consensus function and the SMO classifier with the polynomial kernel.


Artificial Intelligence in Medicine | 2013

An approach for Ewing test selection to support the clinical assessment of cardiac autonomic neuropathy

Andrew Stranieri; Jemal H. Abawajy; Andrei V. Kelarev; Shamsul Huda; Morshed U. Chowdhury; Herbert F. Jelinek

OBJECTIVE This article addresses the problem of determining optimal sequences of tests for the clinical assessment of cardiac autonomic neuropathy (CAN). We investigate the accuracy of using only one of the recommended Ewing tests to classify CAN and the additional accuracy obtained by adding the remaining tests of the Ewing battery. This is important as not all five Ewing tests can always be applied in each situation in practice. METHODS AND MATERIAL We used new and unique database of the diabetes screening research initiative project, which is more than ten times larger than the data set used by Ewing in his original investigation of CAN. We utilized decision trees and the optimal decision path finder (ODPF) procedure for identifying optimal sequences of tests. RESULTS We present experimental results on the accuracy of using each one of the recommended Ewing tests to classify CAN and the additional accuracy that can be achieved by adding the remaining tests of the Ewing battery. We found the best sequences of tests for cost-function equal to the number of tests. The accuracies achieved by the initial segments of the optimal sequences for 2, 3 and 4 categories of CAN are 80.80, 91.33, 93.97 and 94.14, and respectively, 79.86, 89.29, 91.16 and 91.76, and 78.90, 86.21, 88.15 and 88.93. They show significant improvement compared to the sequence considered previously in the literature and the mathematical expectations of the accuracies of a random sequence of tests. The complete outcomes obtained for all subsets of the Ewing features are required for determining optimal sequences of tests for any cost-function with the use of the ODPF procedure. We have also found two most significant additional features that can increase the accuracy when some of the Ewing attributes cannot be obtained. CONCLUSIONS The outcomes obtained can be used to determine the optimal sequences of tests for each individual cost-function by following the ODPF procedure. The results show that the best single Ewing test for diagnosing CAN is the deep breathing heart rate variation test. Optimal sequences found for the cost-function equal to the number of tests guarantee that the best accuracy is achieved after any number of tests and provide an improvement in comparison with the previous ordering of tests or a random sequence.


IEEE Transactions on Emerging Topics in Computing | 2014

Large Iterative Multitier Ensemble Classifiers for Security of Big Data

Jemal H. Abawajy; Andrei V. Kelarev; Morshed U. Chowdhury

This paper introduces and investigates large iterative multitier ensemble (LIME) classifiers specifically tailored for big data. These classifiers are very large, but are quite easy to generate and use. They can be so large that it makes sense to use them only for big data. They are generated automatically as a result of several iterations in applying ensemble meta classifiers. They incorporate diverse ensemble meta classifiers into several tiers simultaneously and combine them into one automatically generated iterative system so that many ensemble meta classifiers function as integral parts of other ensemble meta classifiers at higher tiers. In this paper, we carry out a comprehensive investigation of the performance of LIME classifiers for a problem concerning security of big data. Our experiments compare LIME classifiers with various base classifiers and standard ordinary ensemble meta classifiers. The results obtained demonstrate that LIME classifiers can significantly increase the accuracy of classifications. LIME classifiers performed better than the base classifiers and standard ensemble meta classifiers.


Computers in Biology and Medicine | 2013

Predicting cardiac autonomic neuropathy category for diabetic data with missing values

Jemal H. Abawajy; Andrei V. Kelarev; Morshed U. Chowdhury; Andrew Stranieri; Herbert F. Jelinek

Cardiovascular autonomic neuropathy (CAN) is a serious and well known complication of diabetes. Previous articles circumvented the problem of missing values in CAN data by deleting all records and fields with missing values and applying classifiers trained on different sets of features that were complete. Most of them also added alternative features to compensate for the deleted ones. Here we introduce and investigate a new method for classifying CAN data with missing values. In contrast to all previous papers, our new method does not delete attributes with missing values, does not use classifiers, and does not add features. Instead it is based on regression and meta-regression combined with the Ewing formula for identifying the classes of CAN. This is the first article using the Ewing formula and regression to classify CAN. We carried out extensive experiments to determine the best combination of regression and meta-regression techniques for classifying CAN data with missing values. The best outcomes have been obtained by the additive regression meta-learner based on M5Rules and combined with the Ewing formula. It has achieved the best accuracy of 99.78% for two classes of CAN, and 98.98% for three classes of CAN. These outcomes are substantially better than previous results obtained in the literature by deleting all missing attributes and applying traditional classifiers to different sets of features without regression. Another advantage of our method is that it does not require practitioners to perform more tests collecting additional alternative features.


Journal of The Australian Mathematical Society | 2009

Rees matrix constructions for clustering of data

Andrei V. Kelarev; Paul A. Watters; John Yearwood

This paper continues the investigation of semigroup constructions motivated by applications in data mining. We give a complete description of the error-correcting capabilities of a large family of clusterers based on Rees matrix semigroups well known in semigroup theory. This result strengthens and complements previous formulas recently obtained in the literature. Examples show that our theorems do not generalize to other classes of semigroups.


Optimization | 2012

Derivative-free optimization and neural networks for robust regression

Gleb Beliakov; Andrei V. Kelarev; John Yearwood

Large outliers break down linear and nonlinear regression models. Robust regression methods allow one to filter out the outliers when building a model. By replacing the traditional least squares criterion with the least trimmed squares criterion, in which half of data is treated as potential outliers, one can fit accurate regression models to strongly contaminated data. High-breakdown methods have become very well established in linear regression, but have started being applied for non-linear regression only recently. In this work, we examine the problem of fitting artificial neural networks to contaminated data using least trimmed squares criterion. We introduce a penalized least trimmed squares criterion which prevents unnecessary removal of valid data. Training of ANNs leads to a challenging non-smooth global optimization problem. We compare the efficiency of several derivative-free optimization methods in solving it, and show that our approach identifies the outliers correctly when ANNs are used for nonlinear regression.Large outliers break down linear and nonlinear regression models. Robust regression methods allow one to filter out the outliers when building a model. By replacing the traditional least squares criterion with the least trimmed squares (LTS) criterion, in which half of data is treated as potential outliers, one can fit accurate regression models to strongly contaminated data. High-breakdown methods have become very well established in linear regression, but have started being applied for non-linear regression only recently. In this work, we examine the problem of fitting artificial neural networks (ANNs) to contaminated data using LTS criterion. We introduce a penalized LTS criterion which prevents unnecessary removal of valid data. Training of ANNs leads to a challenging non-smooth global optimization problem. We compare the efficiency of several derivative-free optimization methods in solving it, and show that our approach identifies the outliers correctly when ANNs are used for nonlinear regression.


IEEE Journal of Biomedical and Health Informatics | 2016

Enhancing Predictive Accuracy of Cardiac Autonomic Neuropathy Using Blood Biochemistry Features and Iterative Multitier Ensembles

Jemal H. Abawajy; Andrei V. Kelarev; Morshed U. Chowdhury; Herbert F. Jelinek

Blood biochemistry attributes form an important class of tests, routinely collected several times per year for many patients with diabetes. The objective of this study is to investigate the role of blood biochemistry for improving the predictive accuracy of the diagnosis of cardiac autonomic neuropathy (CAN) progression. Blood biochemistry contributes to CAN, and so it is a causative factor that can provide additional power for the diagnosis of CAN especially in the absence of a complete set of Ewing tests. We introduce automated iterative multitier ensembles (AIME) and investigate their performance in comparison to base classifiers and standard ensemble classifiers for blood biochemistry attributes. AIME incorporate diverse ensembles into several tiers simultaneously and combine them into one automatically generated integrated system so that one ensemble acts as an integral part of another ensemble. We carried out extensive experimental analysis using large datasets from the diabetes screening research initiative (DiScRi) project. The results of our experiments show that several blood biochemistry attributes can be used to supplement the Ewing battery for the detection of CAN in situations where one or more of the Ewing tests cannot be completed because of the individual difficulties faced by each patient in performing the tests. The results show that AIME provide higher accuracy as a multitier CAN classification paradigm. The best predictive accuracy of 99.57% has been obtained by the AIME combining decorate on top tier with bagging on middle tier based on random forest. Practitioners can use these findings to increase the accuracy of CAN diagnosis.


Optimization Methods & Software | 2013

Global non-smooth optimization in robust multivariate regression

Gleb Beliakov; Andrei V. Kelarev

Robust regression in statistics leads to challenging optimization problems. Here, we study one such problem, in which the objective is non-smooth, non-convex and expensive to calculate. We study the numerical performance of several derivative-free optimization algorithms with the aim of computing robust multivariate estimators. Our experiences demonstrate that the existing algorithms often fail to deliver optimal solutions. We introduce three new methods that use Powells derivative-free algorithm. The proposed methods are reliable and can be used when processing very large data sets containing outliers.


Journal of Networks | 2014

Automatic generation of meta classifiers with large levels for distributed computing and networking

Jemal H. Abawajy; Andrei V. Kelarev; Morshed U. Chowdhury

This paper is devoted to a case study of a new construction of classifiers. These classifiers are called automatically generated multi-level meta classifiers, AGMLMC. The construction combines diverse meta classifiers in a new way to create a unified system. This original construction can be generated automatically producing classifiers with large levels. Different meta classifiers are incorporated as low-level integral parts of another meta classifier at the top level. It is intended for the distributed computing and networking. The AGMLMC classifiers are unified classifiers with many parts that can operate in parallel. This make it easy to adopt them in distributed applications. This paper introduces new construction of classifiers and undertakes an experimental study of their performance. We look at a case study of their effectiveness in the special case of the detection and filtering of phishing emails. This is a possible important application area for such large and distributed classification systems. Our experiments investigate the effectiveness of combining diverse meta classifiers into one AGMLMC classifier in the case study of detection and filtering of phishing emails. The results show that new classifiers with large levels achieved better performance compared to the base classifiers and simple meta classifiers classifiers. This demonstrates that the new technique can be applied to increase the performance if diverse meta classifiers are included in the system.


Concurrency and Computation: Practice and Experience | 2017

Multilayer hybrid strategy for phishing email zero-day filtering

Morshed U. Chowdhury; Jemal H. Abawajy; Andrei V. Kelarev; Teruhisa Hochin

The cyber security threats from phishing emails have been growing buoyed by the capacity of their distributors to fine‐tune their trickery and defeat previously known filtering techniques. The detection of novel phishing emails that had not appeared previously, also known as zero‐day phishing emails, remains a particular challenge. This paper proposes a multilayer hybrid strategy (MHS) for zero‐day filtering of phishing emails that appear during a separate time span by using training data collected previously during another time span. This strategy creates a large ensemble of classifiers and then applies a novel method for pruning the ensemble. The majority of known pruning algorithms belong to the following three categories: ranking based, clustering based, and optimization‐based pruning. This paper introduces and investigates a multilayer hybrid pruning. Its application in MHS combines all three approaches in one scheme: ranking, clustering, and optimization. Furthermore, we carry out thorough empirical study of the performance of the MHS for the filtering of phishing emails. Our empirical study compares the performance of MHS strategy with other machine learning classifiers. The results of our empirical study demonstrate that MHS achieved the best outcomes and multilayer hybrid pruning performed better than other pruning techniques. Copyright

Collaboration


Dive into the Andrei V. Kelarev's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

John Yearwood

Charles Sturt University

View shared research outputs
Top Co-Authors

Avatar

Andrew Stranieri

Federation University Australia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Joe Ryan

University of Newcastle

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge