Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Muhammad Shoaib B. Sehgal is active.

Publication


Featured researches published by Muhammad Shoaib B. Sehgal.


Bioinformatics | 2005

Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data

Muhammad Shoaib B. Sehgal; Iqbal Gondal; Laurence S. Dooley

MOTIVATION Microarray data are used in a range of application areas in biology, although often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning algorithms so there is a strong motivation to estimate these values as accurately as possible before using these algorithms. While many imputation algorithms have been proposed, more robust techniques need to be developed so that further analysis of biological data can be accurately undertaken. In this paper, an innovative missing value imputation algorithm called collateral missing value estimation (CMVE) is presented which uses multiple covariance-based imputation matrices for the final prediction of missing values. The matrices are computed and optimized using least square regression and linear programming methods. RESULTS The new CMVE algorithm has been compared with existing estimation techniques including Bayesian principal component analysis imputation (BPCA), least square impute (LSImpute) and K-nearest neighbour (KNN). All these methods were rigorously tested to estimate missing values in three separate non-time series (ovarian cancer based) and one time series (yeast sporulation) dataset. Each method was quantitatively analyzed using the normalized root mean square (NRMS) error measure, covering a wide range of randomly introduced missing value probabilities from 0.01 to 0.2. Experiments were also undertaken on the yeast dataset, which comprised 1.7% actual missing values, to test the hypothesis that CMVE performed better not only for randomly occurring but also for a real distribution of missing values. The results confirmed that CMVE consistently demonstrated superior and robust estimation capability of missing values compared with other methods for both series types of data, for the same order of computational complexity. A concise theoretical framework has also been formulated to validate the improved performance of the CMVE algorithm. AVAILABILITY The CMVE software is available upon request from the authors.


computational intelligence in bioinformatics and computational biology | 2004

Statistical neural networks and support vector machine for the classification of genetic mutations in ovarian cancer

Muhammad Shoaib B. Sehgal; Iqbal Gondal; Laurence S. Dooley

An optimal genetic mutation diagnosis requires proper selection of mutation classifier. This work investigates the performance of different classification, missing value estimation (MVE) and data dimension reduction techniques for the classification of gene expression data for BRCA1, BRCA2 and Sporadic mutations of epithelial ovarian cancer. Bayesian MVE and zero imputation techniques were employed to deal with missing values. Our study showed the better performance of the Bayesian technique. A novel approach is introduced to use generalized regression neural network (GRNN) as genetic mutation classifier which completely outperformed both well established support vector machine and probabilistic neural network.


Bioinformatics | 2011

A probabilistic model of nuclear import of proteins

Ahmed M. Mehdi; Muhammad Shoaib B. Sehgal; Bostjan Kobe; Timothy L. Bailey; Mikael Bodén

MOTIVATION Nucleo-cytoplasmic trafficking of proteins is a core regulatory process that sustains the integrity of the nuclear space of eukaryotic cells via an interplay between numerous factors. Despite progress on experimentally characterizing a number of nuclear localization signals, their presence alone remains an unreliable indicator of actual translocation. RESULTS This article introduces a probabilistic model that explicitly recognizes a variety of nuclear localization signals, and integrates relevant amino acid sequence and interaction data for any candidate nuclear protein. In particular, we develop and incorporate scoring functions based on distinct classes of classical nuclear localization signals. Our empirical results show that the model accurately predicts whether a protein is imported into the nucleus, surpassing the classification accuracy of similar predictors when evaluated on the mouse and yeast proteomes (area under the receiver operator characteristic curve of 0.84 and 0.80, respectively). The model also predicts the sequence position of a nuclear localization signal and whether it interacts with importin-α. AVAILABILITY http://pprowler.itee.uq.edu.au/NucImport


Journal of Biomedical Informatics | 2008

Ameliorative missing value imputation for robust biological knowledge inference

Muhammad Shoaib B. Sehgal; Iqbal Gondal; Laurence S. Dooley; Ross L. Coppel

Gene expression data is widely used in various post genomic analyses. The data is often probed using microarrays due to their ability to simultaneously measure the expressions of thousands of genes. The expression data, however, contains significant numbers of missing values, which can impact on subsequent biological analysis. To minimize the impact of these missing values, several imputation algorithms including Collateral Missing Value Estimation (CMVE), Bayesian Principal Component Analysis (BPCA), Least Square Impute (LSImpute), Local Least Square Impute (LLSImpute), and K-Nearest Neighbour (KNN) have been proposed. These algorithms, however, exploit either only the global or local correlation structure of the data, which normally can lead to higher estimation errors. This paper presents an Ameliorative Missing Value Imputation (AMVI) technique which has ability to exploit global/local and positive/negative correlations in a given dataset by automatic selection of the optimal number of predictor genes k using a wrapper non-parametric method based on Monte Carlo simulations. The AMVI technique has CMVE strategy at its core because CMVE has demonstrated improved performance compared to both low variance methods like BPCA, LLSImpute, and high variance methods such as KNN and ZeroImpute, as CMVE exploits positive/negative correlations. The performance of AMVI is compared with CMVE, BPCA, LLSImpute, and KNN by randomly removing between 1% and 15% missing values in eight different ovarian, breast cancer and yeast datasets. Together with the standard NRMS error metric, the True Positive (TP) rate of the significant genes selection, biological significance of the selected genes and the statistical significance test results are presented to investigate the impact of missing values on subsequent biological analysis. The enhanced performance of AMVI was demonstrated by its lower NRMS error, improved TP rate, bio significance of the selected genes and statistical significance test results, when compared with the aforementioned imputation methods across all the datasets. The results show that AMVI adapted to the latent correlation structure of the data and proved to be an effective and robust approach compared with the trial and error methodology for selecting k. The results confirmed that AMVI can be successfully applied to accurately impute missing values prior to any microarray data analysis.


international conference hybrid intelligent systems | 2004

Support vector machine and generalized regression neural network based classification fusion models for cancer diagnosis

Muhammad Shoaib B. Sehgal; Iqbal Gondal; Laurence S. Dooley

This paper presents decision-based fusion models to classify BRCA1, BRCA2 and Sporadic genetic mutations for breast and ovarian cancer. Different ensembles of base classifiers using the stacked generalization technique have been proposed including support vector machines (SVM) with linear, polynomial and radial base function kernels. A generalized regression neural network (GRNN) is then applied to predict the mutation type based on the outputs of base classifiers, and experimental results show that the new proposed fusion methodology for selecting the best and removing weak classifiers outperforms single classification models.


international conference hybrid intelligent systems | 2004

K-ranked covariance based missing values estimation for microarray data classification

Muhammad Shoaib B. Sehgal; Iqbal Gondal; Laurence S. Dooley

Microarray data often contains multiple missing genetic expression values that degrade the performance of statistical and machine learning algorithms. This paper presents a K-ranked diagonal covariance-based missing value estimation algorithm (KRCOV) that has demonstrated significantly superior performance compared to the more commonly used K-nearest neighbour (KNN) imputation algorithm when it is applied to estimate missing values of BRCA1, BRCA2 and sporadic genetic mutation samples present in ovarian cancer. Experimental results confirm KRCOV outperformed both KNN and zero imputation techniques in terms of their classification accuracies when used to impute randomly missing values from 1% to 5%. The classifier used for this purpose was the generalized regression neural network. The paper also provides a hypothesis for why KRCOV performs better than KNN not only for bio informatics data but also for other data types having strong correlated values.


australasian joint conference on artificial intelligence | 2005

Collateral missing value estimation: robust missing value estimation for consequent microarray data processing

Muhammad Shoaib B. Sehgal; Iqbal Gondal; Laurence S. Dooley

Microarrays have unique ability to probe thousands of genes at a time that makes it a useful tool for variety of applications, ranging from diagnosis to drug discovery. However, data generated by microarrays often contains multiple missing gene expressions that affect the subsequent analysis, as most of the times these missing values are ignored. In this paper we have analyzed how accurate estimation of missing values can lead to better subsequent gene selection and class prediction. Collateral Missing Values Estimation (CMVE), which demonstrates superior imputation performance compared to Bayesian Principal Component Analysis (BPCA) Impute, K-Nearest Neighbour (KNN) algorithm, when estimating missing values in the BRCA1, BRCA2 and Sporadic genetic mutation samples present in ovarian cancer by exploiting both local/global and positive/negative correlation values. CMVE also consistently outperforms, in terms of classification accuracies, BPCA, KNN and ZeroImpute techniques. The imputation is followed by gene selection using fusion of Between Group to within Group Sum ofSquares and Weighted Partial Least Squares where Ridge Partial Least Square algorithm is used as a class predictor.


Bioinformatics | 2013

DLocalMotif: A discriminative approach for discovering local motifs in protein sequences

Ahmed M. Mehdi; Muhammad Shoaib B. Sehgal; Bostjan Kobe; Timothy L. Bailey; Mikael Bodén

MOTIVATION Local motifs are patterns of DNA or protein sequences that occur within a sequence interval relative to a biologically defined anchor or landmark. Current protein motif discovery methods do not adequately consider such constraints to identify biologically significant motifs that are only weakly over-represented but spatially confined. Using negatives, i.e. sequences known to not contain a local motif, can further increase the specificity of their discovery. RESULTS This article introduces the method DLocalMotif that makes use of positional information and negative data for local motif discovery in protein sequences. DLocalMotif combines three scoring functions, measuring degrees of motif over-representation, entropy and spatial confinement, specifically designed to discriminatively exploit the availability of negative data. The method is shown to outperform current methods that use only a subset of these motif characteristics. We apply the method to several biological datasets. The analysis of peroxisomal targeting signals uncovers several novel motifs that occur immediately upstream of the dominant peroxisomal targeting signal-1 signal. The analysis of proline-tyrosine nuclear localization signals uncovers multiple novel motifs that overlap with C2H2 zinc finger domains. We also evaluate the method on classical nuclear localization signals and endoplasmic reticulum retention signals and find that DLocalMotif successfully recovers biologically relevant sequence properties. AVAILABILITY http://bioinf.scmb.uq.edu.au/dlocalmotif/


ieee international conference on fuzzy systems | 2006

AFEGRN: Adaptive Fuzzy Evolutionary Gene Regulatory Network Re-construction Framework

Muhammad Shoaib B. Sehgal; Iqbal Gondal; Laurence S. Dooley; Ross L. Coppel

Most of gene regulatory network (GRN) studies are based on crisp and parametric algorithms, despite inherent fuzzy nature of gene co-regulation. This paper presents adaptive fuzzy evolutionary GRN Reconstruction (AFEGRN) framework for modeling GRNs. The AFEGRN automatically determines model parameters, such as, number of clusters for fuzzy c-means using fuzzy-PBM index and estimation of Gaussian distribution algorithm. The proposed strategy was tested for breast cancer and normal GRNs. The results conformed to biological knowledge and showed that most of cancer related GRN changes were caused by differentially expressed genes. This demonstrates effectiveness of AFEGRN to model any GRN.


Eurasip Journal on Bioinformatics and Systems Biology | 2009

How to improve postgenomic knowledge discovery using imputation

Muhammad Shoaib B. Sehgal; Iqbal Gondal; Laurence S. Dooley; Ross L. Coppel

While microarrays make it feasible to rapidly investigate many complex biological problems, their multistep fabrication has the proclivity for error at every stage. The standard tactic has been to either ignore or regard erroneous gene readings as missing values, though this assumption can exert a major influence upon postgenomic knowledge discovery methods like gene selection and gene regulatory network (GRN) reconstruction. This has been the catalyst for a raft of new flexible imputation algorithms including local least square impute and the recent heuristic collateral missing value imputation, which exploit the biological transactional behaviour of functionally correlated genes to afford accurate missing value estimation. This paper examines the influence of missing value imputation techniques upon postgenomic knowledge inference methods with results for various algorithms consistently corroborating that instead of ignoring missing values, recycling microarray data by flexible and robust imputation can provide substantial performance benefits for subsequent downstream procedures.

Collaboration


Dive into the Muhammad Shoaib B. Sehgal's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ahmed M. Mehdi

University of Queensland

View shared research outputs
Top Co-Authors

Avatar

Bostjan Kobe

University of Queensland

View shared research outputs
Top Co-Authors

Avatar

Mikael Bodén

University of Queensland

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge