Shamsul Huda | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shamsul Huda is active.

Explore More

Publication

Featured researches published by Shamsul Huda.

Artificial Intelligence in Medicine | 2013

An approach for Ewing test selection to support the clinical assessment of cardiac autonomic neuropathy

Andrew Stranieri; Jemal H. Abawajy; Andrei V. Kelarev; Shamsul Huda; Morshed U. Chowdhury; Herbert F. Jelinek

OBJECTIVE This article addresses the problem of determining optimal sequences of tests for the clinical assessment of cardiac autonomic neuropathy (CAN). We investigate the accuracy of using only one of the recommended Ewing tests to classify CAN and the additional accuracy obtained by adding the remaining tests of the Ewing battery. This is important as not all five Ewing tests can always be applied in each situation in practice. METHODS AND MATERIAL We used new and unique database of the diabetes screening research initiative project, which is more than ten times larger than the data set used by Ewing in his original investigation of CAN. We utilized decision trees and the optimal decision path finder (ODPF) procedure for identifying optimal sequences of tests. RESULTS We present experimental results on the accuracy of using each one of the recommended Ewing tests to classify CAN and the additional accuracy that can be achieved by adding the remaining tests of the Ewing battery. We found the best sequences of tests for cost-function equal to the number of tests. The accuracies achieved by the initial segments of the optimal sequences for 2, 3 and 4 categories of CAN are 80.80, 91.33, 93.97 and 94.14, and respectively, 79.86, 89.29, 91.16 and 91.76, and 78.90, 86.21, 88.15 and 88.93. They show significant improvement compared to the sequence considered previously in the literature and the mathematical expectations of the accuracies of a random sequence of tests. The complete outcomes obtained for all subsets of the Ewing features are required for determining optimal sequences of tests for any cost-function with the use of the ODPF procedure. We have also found two most significant additional features that can increase the accuracy when some of the Ewing attributes cannot be obtained. CONCLUSIONS The outcomes obtained can be used to determine the optimal sequences of tests for each individual cost-function by following the ODPF procedure. The results show that the best single Ewing test for diagnosing CAN is the deep breathing heart rate variation test. Optimal sequences found for the cost-function equal to the number of tests guarantee that the best accuracy is achieved after any number of tests and provide an improvement in comparison with the previous ordering of tests or a random sequence.

Future Generation Computer Systems | 2016

Hybrids of support vector machine wrapper and filter based framework for malware detection

Shamsul Huda; Jemal H. Abawajy; Mamoun Alazab; Mali Abdollalihian; Rafiqul Islam; John Yearwood

Malware replicates itself and produces offspring with the same characteristics but different signatures by using code obfuscation techniques. Current generation Anti-Virus (AV) engines employ a signature-template type detection approach where malware can easily evade existing signatures in the database. This reduces the capability of current AV engines in detecting malware. In this paper we propose a hybrid framework for malware detection by using the hybrids of Support Vector Machines Wrapper, Maximum-Relevance-Minimum-Redundancy Filter heuristics where Application Program Interface (API) call statistics are used as a malware features. The novelty of our hybrid framework is that it injects the filters ranking score in the wrapper selection process and combines the properties of both wrapper and filters and API call statistics which can detect malware based on the nature of infectious actions instead of signature. To the best of our knowledge, this kind of hybrid approach has not been explored yet in the literature in the context of feature selection and malware detection. Knowledge about the intrinsic characteristics of malicious activities is determined by the API call statistics which is injected as a filter score into the wrappers backward elimination process in order to find the most significant APIs. While using the most significant APIs in the wrapper classification on both obfuscated and benign types malware datasets, the results show that the proposed hybrid framework clearly surpasses the existing models including the independent filters and wrappers using only a very compact set of significant APIs. The performances of the proposed and existing models have further been compared using binary logistic regression. Various goodness of fit comparison criteria such as Chi Square, Akaikes Information Criterion (AIC) and Receiver Operating Characteristic Curve ROC are deployed to identify the best performing models. Experimental outcomes based on the above criteria also show that the proposed hybrid framework outperforms other existing models of signature types including independent wrapper and filter approaches to identify malware. A signature-free malware detection approach has been proposed.A hybrid wrapper-Filter based malware feature selection has been proposed.Proposed hybrid approach can take advantages from both filter and wrapper.Models have also been validated by statistical model selection criteria such as Chi Square and Akaike information criterion (AIC).

systems man and cybernetics | 2009

A Constraint-Based Evolutionary Learning Approach to the Expectation Maximization for Optimal Estimation of the Hidden Markov Model for Speech Signal Modeling

Shamsul Huda; John Yearwood; Roberto Togneri

This paper attempts to overcome the tendency of the expectation-maximization (EM) algorithm to locate a local rather than global maximum when applied to estimate the hidden Markov model (HMM) parameters in speech signal modeling. We propose a hybrid algorithm for estimation of the HMM in automatic speech recognition (ASR) using a constraint-based evolutionary algorithm (EA) and EM, the CEL-EM. The novelty of our hybrid algorithm (CEL-EM) is that it is applicable for estimation of the constraint-based models with many constraints and large numbers of parameters (which use EM) like HMM. Two constraint-based versions of the CEL-EM with different fusion strategies have been proposed using a constraint-based EA and the EM for better estimation of HMM in ASR. The first one uses a traditional constraint-handling mechanism of EA. The other version transforms a constrained optimization problem into an unconstrained problem using Lagrange multipliers. Fusion strategies for the CEL-EM use a staged-fusion approach where EM has been plugged with the EA periodically after the execution of EA for a specific period of time to maintain the global sampling capabilities of EA in the hybrid algorithm. A variable initialization approach (VIA) has been proposed using a variable segmentation to provide a better initialization for EA in the CEL-EM. Experimental results on the TIMIT speech corpus show that CEL-EM obtains higher recognition accuracies than the traditional EM algorithm as well as a top-standard EM (VIA-EM, constructed by applying the VIA to EM).

Pattern Recognition Letters | 2009

A stochastic version of Expectation Maximization algorithm for better estimation of Hidden Markov Model

Shamsul Huda; John Yearwood; Roberto Togneri

This paper attempts to overcome the local convergence problem of the Expectation Maximization (EM) based training of the Hidden Markov Model (HMM) in speech recognition. We propose a hybrid algorithm, Simulated Annealing Stochastic version of EM (SASEM), combining Simulated Annealing with EM that reformulates the HMM estimation process using a stochastic step between the EM steps and the SA. The stochastic processes of SASEM inside EM can prevent EM from converging to a local maximum and find improved estimation for HMM using the global convergence properties of SA. Experiments on the TIMIT speech corpus show that SASEM obtains higher recognition accuracies than the EM.

Information Sciences | 2017

Defending unknown attacks on cyber-physical systems by semi-supervised approach and available unlabeled data

Shamsul Huda; Suruz Miah; Mohammad Mehedi Hassan; Rafiqul Islam; John Yearwood; Majed A. AlRubaian; Ahmad Almogren

Abstract Cyber-physical systems (CPS) are used increasingly in modern industrial systems. These systems currently encounter a significant threat of malicious activities created by malicious software intent on exploiting the fact that the software of such industrial systems is integrated with hardware and network systems. Malicious codes dynamically and continuously change their internal structure and attack patterns using obfuscation techniques, such as polymorphism and metamorphism, in order to bypass and hide from conventional malware detection engines. This requires continuously updating the database of the malware detection engine, which requires periodic effort from manual experts. This could limit the real-time protection of CPS. In addition, this also makes preserving the availability and integrity of the services provided by CPS against malicious code challenging because there is a demand for the development of specialized malware detection techniques for CPS. In this paper, we propose a semi-supervised approach that automatically integrates the knowledge about unknown malware from already available and cheap unlabeled data into the detection system. The novelty of the proposed approach is that it does not require expert effort to update the database of the detection engine. Instead, the dynamic changes in malware attack patterns are extracted by unsupervised clustering from already available unlabeled data. Then the extracted geometric information about the intrinsic attack characteristics of the clusters is integrated into the classification systems of the detection engine, which updates the detection system automatically. The proposed approach uses global K-means clustering with term-frequency (TF), inverse document frequency (IDF), and cosine similarity as a distance measure for extracting the cluster information and adding it to a support vector machine (SVM) classification system. The proposed approach has been tested extensively on a real malware data set for both static and dynamic malware features. The experiment results show that the proposed semi-supervised approach achieves higher accuracy over the existing supervised approaches for all classifiers. We note that the static feature-based semi-supervised approach can improve detection accuracy significantly. While applying the proposed semi-supervised approach with the run-time characteristics of dynamic feature analysis, the combined effect of dynamic analysis and the proposed approach further increases the detection accuracy of all classifiers by up to a 100% for the SVM and the random forest classifiers, thus exceeding the existing supervised approaches with similar features.

IEEE Access | 2016

A Hybrid Feature Selection With Ensemble Classification for Imbalanced Healthcare Data: A Case Study for Brain Tumor Diagnosis

Shamsul Huda; John Yearwood; Herbert F. Jelinek; Mohammad Mehedi Hassan; Giancarlo Fortino; Michael E. Buckland

Electronic health records (EHRs) are providing increased access to healthcare data that can be made available for advanced data analysis. This can be used by the healthcare professionals to make a more informed decision providing improved quality of care. However, due to the inherent heterogeneous and imbalanced characteristics of medical data from EHRs, data analysis task faces a big challenge. In this paper, we address the challenges of imbalanced medical data about a brain tumor diagnosis problem. Morphometric analysis of histopathological images is rapidly emerging as a valuable diagnostic tool for neuropathology. Oligodendroglioma is one type of brain tumor that has a good response to treatment provided the tumor subtype is recognized accurately. The genetic variant, 1p-/19q-, has recently been found to have high chemosensitivity, and has morphological attributes that may lend it to automated image analysis and histological processing and diagnosis. This paper aims to achieve a fast, affordable, and objective diagnosis of this genetic variant of oligodendroglioma with a novel data mining approach combining a feature selection and ensemble-based classification. In this paper, 63 instances of brain tumor with oligodendroglioma are obtained due to prevalence and incidence of the tumor variant. In order to minimize the effect of an imbalanced healthcare data set, a global optimization-based hybrid wrapper-filter feature selection with ensemble classification is applied. The experiment results show that the proposed approach outperforms the standard techniques used in brain tumor classification problem to overcome the imbalanced characteristics of medical data.

European Journal of Operational Research | 2014

A hybrid wrapper–filter approach to detect the source(s) of out-of-control signals in multivariate manufacturing process

Shamsul Huda; M. Abdollahian; Musa Mammadov; John Yearwood; Shafiq Ahmed; Ibrahim A. Sultan

With modern data-acquisition equipment and on-line computers used during production, it is now common to monitor several correlated quality characteristics simultaneously in multivariate processes. Multivariate control charts (MCC) are important tools for monitoring multivariate processes. One difficulty encountered with multivariate control charts is the identification of the variable or group of variables that cause an out-of-control signal. Expert knowledge either in combination with wrapper-based supervised classifier or a pre-filter with wrapper are the standard approaches to detect the sources of out-of-control signal. However gathering expert knowledge in source identification is costly and may introduce human error. Individual univariate control charts (UCC) and decomposition of T2 statistics are also used in many cases simultaneously to identify the sources, but these either ignore the correlations between the sources or may take more time with the increase of dimensions. The aim of this paper is to develop a source identification approach that does not need any expert-knowledge and can detect out-of-control signal in less computational complexity. We propose, a hybrid wrapper–filter based source identification approach that hybridizes a Mutual Information (MI) based Maximum Relevance (MR) filter ranking heuristic with an Artificial Neural Network (ANN) based wrapper. The Artificial Neural Network Input Gain Measurement Approximation (ANNIGMA) has been combined with MR (MR-ANNIGMA) to utilize the knowledge about the intrinsic pattern of the quality characteristics computed by the filter for directing the wrapper search process. To compute optimal ANNIGMA score, we also propose a Global MR-ANNIGMA using non-functional relationship between variables which is independent of the derivative of the objective function and has a potential to overcome the local optimization problem of ANN training. The novelty of the proposed approaches is that they combine the advantages of both filter and wrapper approaches and do not require any expert knowledge about the sources of the out-of-control signals. Heuristic score based subset generation process also reduces the search space into polynomial growth which in turns reduces computational time. The proposed approaches were tested by exhaustive experiments using both simulated and real manufacturing data and compared to existing methods including independent filter, wrapper and Multivariate EWMA (MEWMA) methods. The results indicate that the proposed approaches can identify the sources of out-of-control signals more accurately than existing approaches.

network and system security | 2010

Hybrid Wrapper-Filter Approaches for Input Feature Selection Using Maximum Relevance and Artificial Neural Network Input Gain Measurement Approximation (ANNIGMA)

Shamsul Huda; John Yearwood; A Strainieri

Feature selection is an important research problem in machine learning and data mining applications. This paper proposes a hybrid wrapper and filter feature selection algorithm by introducing the filter’s feature ranking score in the wrapper stage to speed up the search process for wrapper and thereby finding a more compact feature subset. The approach hybridizes a Mutual Information (MI) based Maximum Relevance (MR) filter ranking heuristic with an Artificial Neural Network (ANN) based wrapper approach where Artificial Neural Network Input Gain Measurement Approximation (ANNIGMA) has been combined with MR (MR-ANNIGMA) to guide the search process in the wrapper. The novelty of our approach is that we use hybrid of wrapper and filter methods that combines filter’s ranking score with the wrapper-heuristic’s score to take advantages of both filter and wrapper heuristics. Performance of the proposed MR-ANNIGMA has been verified using bench mark data sets and compared to both independent filter and wrapper based approaches. Experimental results show that MR-ANNIGMA achieves more compact feature sets and higher accuracies than both filter and wrapper approaches alone.

IEEE Transactions on Systems, Man, and Cybernetics | 2014

Hybrid Metaheuristic Approaches to the Expectation Maximization for Estimation of the Hidden Markov Model for Signal Modeling

Shamsul Huda; John Yearwood; Roberto Togneri

The expectation maximization (EM) is the standard training algorithm for hidden Markov model (HMM). However, EM faces a local convergence problem in HMM estimation. This paper attempts to overcome this problem of EM and proposes hybrid metaheuristic approaches to EM for HMM. In our earlier research, a hybrid of a constraint-based evolutionary learning approach to EM (CEL-EM) improved HMM estimation. In this paper, we propose a hybrid simulated annealing stochastic version of EM (SASEM) that combines simulated annealing (SA) with EM. The novelty of our approach is that we develop a mathematical reformulation of HMM estimation by introducing a stochastic step between the EM steps and combine SA with EM to provide better control over the acceptance of stochastic and EM steps for better HMM estimation. We also extend our earlier work [1] and propose a second hybrid which is a combination of an EA and the proposed SASEM, (EA-SASEM). The proposed EA-SASEM uses the best constraint-based EA strategies from CEL-EM and stochastic reformulation of HMM. The complementary properties of EA and SA and stochastic reformulation of HMM of SASEM provide EA-SASEM with sufficient potential to find better estimation for HMM. To the best of our knowledge, this type of hybridization and mathematical reformulation have not been explored in the context of EM and HMM training. The proposed approaches have been evaluated through comprehensive experiments to justify their effectiveness in signal modeling using the speech corpus: TIMIT. Experimental results show that proposed approaches obtain higher recognition accuracies than the EM algorithm and CEL-EM as well.

Journal of Networks | 1969

A Hybrid Wrapper-Filter Approach for Malware Detection

Mamoun Alazab; Shamsul Huda; Jemal H. Abawajy; Rafiqul Islam; John Yearwood; Sitalakshmi Venkatraman; Roderic Broadhurst

This paper presents an efficient and novel approach for malware detection. The proposed approach uses a hybrid wrapper-filter model for malware feature selection, which combines Maximum Relevance (MR) filter heuristics and Artificial Neural Net Input Gain Measurement Approximation (ANNIGMA) wrapper heuristic for sub-set selection by capitalizing on each classifier’s strengths. The novelty of the proposed approach is that it injects the intrinsic characteristics of data obtained by the filter into the wrapper stage and combines this with wrapper’s heuristic score. This in turn can reduce the search space and guide the search for the most significant malware features that assist in detection. Extensive cross-validated experimental investigations on actual malware datasets were conducted to evaluate the performance of the proposed model. The model was compared with several existing models including independent wrapper and filter approaches. The results of the model’s performance on both obfuscated malware as well as benign datasets showed that the proposed hybrid MRANNIGMA model out-performed the independent filter and wrapper approaches by achieving the highest accuracy of 97%. Furthermore, this hybrid model improved execution time by using a more compact set of operation code features, and also reduced the rate of false positives.

Explore More