Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Md. Nurul Haque Mollah is active.

Publication


Featured researches published by Md. Nurul Haque Mollah.


Neural Networks | 2010

Robust extraction of local structures by the minimum β-divergence method

Md. Nurul Haque Mollah; Nayeema Sultana; Mihoko Minami; Shinto Eguchi

This paper discusses a new highly robust learning algorithm for exploring local principal component analysis (PCA) structures in which an observed data follow one of several heterogeneous PCA models. The proposed method is formulated by minimizing beta-divergence. It searches a local PCA structure based on an initial location of the shifting parameter and a value for the tuning parameter beta. If the initial choice of the shifting parameter belongs to a data cluster, then the proposed method detects the local PCA structure of that data cluster, ignoring data in other clusters as outliers. We discuss the selection procedures for the tuning parameter beta and the initial value of the shifting parameter mu in this article. We demonstrate the performance of the proposed method by simulation. Finally, we compare the proposed method with a method based on a finite mixture model.


BioMed Research International | 2017

Metabolomic Biomarker Identification in Presence of Outliers and Missing Values

Nishith Kumar; Md. Aminul Hoque; Shahjaman; S. M. Shahinul Islam; Md. Nurul Haque Mollah

Metabolomics is the sophisticated and high-throughput technology based on the entire set of metabolites which is known as the connector between genotypes and phenotypes. For any phenotypic changes, potential metabolite (biomarker) identification is very important because it provides diagnostic as well as prognostic markers and can help to develop new biomolecular therapy. Biomarker identification from metabolomics data analysis is hampered by the use of high-throughput technology that provides high dimensional data matrix which contains missing values as well as outliers. However, missing value imputation and outliers handling techniques play important role in identifying biomarker correctly. Although several missing value imputation techniques are available, outliers deteriorate the accuracy of imputation as well as the accuracy of biomarker identification. Therefore, in this paper we have proposed a new biomarker identification technique combining the groupwise robust singular value decomposition, t-test, and fold-change approach that can identify biomarkers more correctly from metabolomics dataset. We have also compared the performance of the proposed technique with those of other traditional techniques for biomarker identification using both simulated and real data analysis in absence and presence of outliers. Using our proposed method in hepatocellular carcinoma (HCC) dataset, we have also identified the four upregulated and two downregulated metabolites as potential metabolomic biomarkers for HCC disease.


PLOS ONE | 2015

A Hybrid One-Way ANOVA Approach for the Robust and Efficient Estimation of Differential Gene Expression with Multiple Patterns

Mohammad Manir Hossain Mollah; Rahman Jamal; Norfilza Mohd Mokhtar; Roslan Harun; Md. Nurul Haque Mollah

Background Identifying genes that are differentially expressed (DE) between two or more conditions with multiple patterns of expression is one of the primary objectives of gene expression data analysis. Several statistical approaches, including one-way analysis of variance (ANOVA), are used to identify DE genes. However, most of these methods provide misleading results for two or more conditions with multiple patterns of expression in the presence of outlying genes. In this paper, an attempt is made to develop a hybrid one-way ANOVA approach that unifies the robustness and efficiency of estimation using the minimum β-divergence method to overcome some problems that arise in the existing robust methods for both small- and large-sample cases with multiple patterns of expression. Results The proposed method relies on a β-weight function, which produces values between 0 and 1. The β-weight function with β = 0.2 is used as a measure of outlier detection. It assigns smaller weights (≥ 0) to outlying expressions and larger weights (≤ 1) to typical expressions. The distribution of the β-weights is used to calculate the cut-off point, which is compared to the observed β-weight of an expression to determine whether that gene expression is an outlier. This weight function plays a key role in unifying the robustness and efficiency of estimation in one-way ANOVA. Conclusion Analyses of simulated gene expression profiles revealed that all eight methods (ANOVA, SAM, LIMMA, EBarrays, eLNN, KW, robust BetaEB and proposed) perform almost identically for m = 2 conditions in the absence of outliers. However, the robust BetaEB method and the proposed method exhibited considerably better performance than the other six methods in the presence of outliers. In this case, the BetaEB method exhibited slightly better performance than the proposed method for the small-sample cases, but the the proposed method exhibited much better performance than the BetaEB method for both the small- and large-sample cases in the presence of more than 50% outlying genes. The proposed method also exhibited better performance than the other methods for m > 2 conditions with multiple patterns of expression, where the BetaEB was not extended for this condition. Therefore, the proposed approach would be more suitable and reliable on average for the identification of DE genes between two or more conditions with multiple patterns of expression.


Bioinformation | 2017

Serum and Plasma Metabolomic Biomarkers for Lung Cancer

Nishith Kumar; Md. Shahjaman; Md. Nurul Haque Mollah; S. M. Shahinul Islam; Md. Aminul Hoque

In drug invention and early disease prediction of lung cancer, metabolomic biomarker detection is very important. Mortality rate can be decreased, if cancer is predicted at the earlier stage. Recent diagnostic techniques for lung cancer are not prognosis diagnostic techniques. However, if we know the name of the metabolites, whose intensity levels are considerably changing between cancer subject and control subject, then it will be easy to early diagnosis the disease as well as to discover the drug. Therefore, in this paper we have identified the influential plasma and serum blood sample metabolites for lung cancer and also identified the biomarkers that will be helpful for early disease prediction as well as for drug invention. To identify the influential metabolites, we considered a parametric and a nonparametric test namely student׳s t-test as parametric and Kruskal-Wallis test as non-parametric test. We also categorized the up-regulated and down-regulated metabolites by the heatmap plot and identified the biomarkers by support vector machine (SVM) classifier and pathway analysis. From our analysis, we got 27 influential (p-value<0.05) metabolites from plasma sample and 13 influential (p-value<0.05) metabolites from serum sample. According to the importance plot through SVM classifier, pathway analysis and correlation network analysis, we declared 4 metabolites (taurine, aspertic acid, glutamine and pyruvic acid) as plasma biomarker and 3 metabolites (aspartic acid, taurine and inosine) as serum biomarker.


Bioinformation | 2018

Biomarker Identification from RNA-Seq Data using a Robust Statistical Approach

Zobaer Akond; Munirul Alam; Md. Nurul Haque Mollah

Biomarker identification by differentially expressed genes (DEGs) using RNA-sequencing technology is an important task to characterize the transcriptomics data. This is possible with the advancement of next-generation sequencing technology (NGS). There are a number of statistical techniques to identify DEGs from high-dimensional RNA-seq count data with different groups or conditions such as edgeR, SAMSeq, voom-limma, etc. However, these methods produce high false positives and low accuracy in presence of outliers. We describe a robust t-statistic method to overcome these drawbacks using both simulated and real RNA-seq datasets. The model performance with 61.2%, 35.2%, 21.6%, 6.9%, 74.5%, 78.4%, 93.1%, 35.2% sensitivity, specificity, MER, FDR, AUC, ACC, PPV, and NPV, respectively at 20% outliers is reported. We identified 409 DE genes with p-values<0.05 using robust t-test in HIV viremic vs avirmeic state real dataset. There are 28 up-regulated genes and 381 down-regulated genes estimated by log2 fold change (FC) approach at threshold value 1.5. The up-regulated genes form three clusters and it is found that 11 genes are highly associated in HIV- 1/AIDS. Protein-protein interaction (PPI) of up-regulated genes using STRING database found 21 genes with strong association among themselves. Thus, the identification of potential biomarkers from RNA-seq dataset using a robust t-statistical model is demonstrated.


Bioinformation | 2017

Robust Feature Selection Approach for Patient Classification using Gene Expression Data

Shahjaman; Nishith Kumar; Md. Shakil Ahmed; AnjumanAra Begum; S. M. Shahinul Islam; Md. Nurul Haque Mollah

Patient classification through feature selection (FS) based on gene expression data (GED) has already become popular to the research communities. T-test is the well-known statistical FS method in GED analysis. However, it produces higher false positives and lower accuracies for small sample sizes or in presence of outliers. To get rid from the shortcomings of t-test with small sample sizes, SAM has been applied in GED. But, it is highly sensitive to outliers. Recently, robust SAM using the minimum β-divergence estimators has overcome all the problems of classical t-test & SAM and it has been successfully applied for identification of differentially expressed (DE) genes. But, it was not applied in classification. Therefore, in this paper, we employ robust SAM as a feature selection approach along with classifiers for patient classification. We demonstrate the performance of the robust SAM in a comparison of classical t-test and SAM along with four popular classifiers (LDA, KNN, SVM and naive Bayes) using both simulated and real gene expression datasets. The results obtained from simulation and real data analysis confirm that the performance of the four classifiers improve with robust SAM than the classical t-test and SAM. From a real Colon cancer dataset we identified 21 additional DE genes using robust SAM that were not identified by the classical t-test or SAM. To reveal the biological functions and pathways of these 21 genes, we perform KEGG pathway enrichment analysis and found that these genes are involved in some important pathways related to cancer disease.


BioMed Research International | 2017

Robust Significance Analysis of Microarrays by Minimum β-Divergence Method

Shahjaman; Nishith Kumar; Md. Manir Hossain Mollah; Md. Shakil Ahmed; Anjuman Ara Begum; S. M. Shahinul Islam; Md. Nurul Haque Mollah

Identification of differentially expressed (DE) genes with two or more conditions is an important task for discovery of few biomarker genes. Significance Analysis of Microarrays (SAM) is a popular statistical approach for identification of DE genes for both small- and large-sample cases. However, it is sensitive to outlying gene expressions and produces low power in presence of outliers. Therefore, in this paper, an attempt is made to robustify the SAM approach using the minimum β-divergence estimators instead of the maximum likelihood estimators of the parameters. We demonstrated the performance of the proposed method in a comparison of some other popular statistical methods such as ANOVA, SAM, LIMMA, KW, EBarrays, GaGa, and BRIDGE using both simulated and real gene expression datasets. We observe that all methods show good and almost equal performance in absence of outliers for the large-sample cases, while in the small-sample cases only three methods (SAM, LIMMA, and proposed) show almost equal and better performance than others with two or more conditions. However, in the presence of outliers, on an average, only the proposed method performs better than others for both small- and large-sample cases with each condition.


data mining in bioinformatics | 2010

Robust QTL analysis by minimum β-divergence method

Md. Nurul Haque Mollah; Shinto Eguchi

Robustness has received too little attention in Quantitative Trait Loci (QTL) analysis in experimental crosses. This paper discusses a robust QTL mapping algorithm based on Composite Interval Mapping (CIM) model by minimising beta-divergence using the EM like algorithm. We investigate the robustness performance of the proposed method in a comparison of Interval Mapping (IM) and CIM algorithms using both synthetic and real datasets. Experimental results show that the proposed method significantly improves the performance over the traditional IM and CIM methods for QTL analysis in presence of outliers; otherwise, it keeps equal performance.


Bioinformation | 2018

Toxic Dose prediction of Chemical Compounds to Biomarkers using an ANOVA based Gene Expression Analysis

Mohammad Nazmol Hasan; Zobaer Akond; Md. Jahangir Alam; Anjuman Ara Begum; Moizur Rahman; Md. Nurul Haque Mollah

The aim of toxicogenomic studies is to optimize the toxic dose levels of chemical compounds (CCs) and their regulated biomarker genes. This is also crucial in drug discovery and development. There are popular online computational tools such as ToxDB and Toxygates to identify toxicogenomic biomarkers using t-test. However, they are not suitable for the identification of biomarker gene regulatory dose of corresponding CCs. Hence, we describe a one-way ANOVA model together with Tukeys HSD test for the identification of toxicogenomic biomarker genes and their influencing CC dose with improved efficiency. Glutathione metabolism pathway data analysis shows high and middle dose for acetaminophen, and nitrofurazone as well as high dose for methapyrilene as significant toxic CC dose. The corresponding regulated top seven toxicogenomic biomarker genes found in this analysis is Gstp1, Gsr, Mgst2, Gclm, G6pd, Gsta5 and Gclc.


BioMed Research International | 2017

Robustification of Naïve Bayes Classifier and Its Application for Microarray Gene Expression Data Analysis

Md. Shakil Ahmed; Shahjaman; Md. Masud Rana; Md. Nurul Haque Mollah

The naïve Bayes classifier (NBC) is one of the most popular classifiers for class prediction or pattern recognition from microarray gene expression data (MGED). However, it is very much sensitive to outliers with the classical estimates of the location and scale parameters. It is one of the most important drawbacks for gene expression data analysis by the classical NBC. The gene expression dataset is often contaminated by outliers due to several steps involved in the data generating process from hybridization of DNA samples to image analysis. Therefore, in this paper, an attempt is made to robustify the Gaussian NBC by the minimum β-divergence method. The role of minimum β-divergence method in this article is to produce the robust estimators for the location and scale parameters based on the training dataset and outlier detection and modification in test dataset. The performance of the proposed method depends on the tuning parameter β. It reduces to the traditional naïve Bayes classifier when β → 0. We investigated the performance of the proposed beta naïve Bayes classifier (β-NBC) in a comparison with some popular existing classifiers (NBC, KNN, SVM, and AdaBoost) using both simulated and real gene expression datasets. We observed that the proposed method improved the performance over the others in presence of outliers. Otherwise, it keeps almost equal performance.

Collaboration


Dive into the Md. Nurul Haque Mollah's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shinto Eguchi

Graduate University for Advanced Studies

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge