Ehsan Ullah
Qatar Computing Research Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ehsan Ullah.
Journal of Translational Medicine | 2018
Ehsan Ullah; Raghvendra Mall; Reda Rawi; Naima M. Moustaid; Adeel A. Butt; Halima Bensmail
BackgroundHuman tissues are invaluable resources for researchers worldwide. Biobanks are repositories of such human tissues and can have a strategic importance for genetic research, clinical care, and future discoveries and treatments. One of the aims of Qatar Biobank is to improve the understanding and treatment of common diseases afflicting Qatari population such as obesity and diabetes.MethodsIn this study we apply a panorama of state-of-the-art statistical methods and machine learning algorithms to investigate associations and risk factors for diabetes and obesity on a sample of 1000 Qatari population.ResultsRegarding diabetes, we identified pronounced associations and risk factors in Qatari population including magnesium, chloride, c-peptide of insulin, insulin, and uric acid. Similarly, for obesity, significant associations and risk factors include insulin, c-peptide of insulin, albumin, and uric acid. Moreover, our study has revealed interactions of hypomagnesemia with HDL-C, triglycerides, and free thyroxine.ConclusionsOur study strongly confirms known associations and risk factors associated with diabetes and obesity in Qatari population as previously found in other population studies in different parts of the world. Moreover, interactions of hypomagnesemia with other associations and risk factors merit further investigations.
Journal of Translational Medicine | 2018
Ehsan Ullah; Raghvendra Mall; Reda Rawi; Naima Moustaid-Moussa; Adeel A. Butt; Halima Bensmail
Following publication of the original article [1], the authors reported that one of the authors’ names was processed incorrectly. In this Correction the incorrect and correct author name are shown. The original publication of this article has been corrected.
Frontiers in Neuroscience | 2018
Mohamed Ali; Fazle Rakib; Essam M. Abdelalim; Andreas Limbeck; Raghvendra Mall; Ehsan Ullah; Nasrin Mesaeli; Donald McNaughton; Tariq Ahmed; Khalid Al-Saad
Objective: Stroke is the main cause of adult disability in the world, leaving more than half of the patients dependent on daily assistance. Understanding the post-stroke biochemical and molecular changes are critical for patient survival and stroke management. The aim of this work was to investigate the photo-thrombotic ischemic stroke in male rats with particular focus on biochemical and elemental changes in the primary stroke lesion in the somatosensory cortex and surrounding areas, including the corpus callosum. Materials and Methods: FT-IR imaging spectroscopy and LA-ICPMS techniques examined stroke brain samples, which were compared with standard immunohistochemistry studies. Results: The FTIR results revealed that in the lesioned gray matter the relative distribution of lipid, lipid acyl and protein contents decreased significantly. Also at this locus, there was a significant increase in aggregated protein as detected by high-levels Aβ1-42. Areas close to the stroke focus experienced decrease in the lipid and lipid acyl contents associated with an increase in lipid ester, olefin, and methyl bio-contents with a novel finding of Aβ1-42 in the PL-GM and L-WM. Elemental analyses realized major changes in the different brain structures that may underscore functionality. Conclusion: In conclusion, FTIR bio-spectroscopy is a non-destructive, rapid, and a refined technique to characterize oxidative stress markers associated with lipid degradation and protein denaturation not characterized by routine approaches. This technique may expedite research into stroke and offer new approaches for neurodegenerative disorders. The results suggest that a good therapeutic strategy should include a mechanism that provides protective effect from brain swelling (edema) and neurotoxicity by scavenging the lipid peroxidation end products.
Bioinformatics | 2018
Khalid Kunji; Ehsan Ullah; Alejandro Q. Nato; Ellen M. Wijsman; Mohamad Saad
Summary Genome-wide association studies have become common over the last ten years, with a shift towards targeting rare variants, especially in pedigree-data. Despite lower costs, sequencing for rare variants still remains expensive. To have a relatively large sample with acceptable cost, imputation approaches may be used, such as GIGI for pedigree data. GIGI is an imputation method that handles large pedigrees and is particularly good for rare variant imputation. GIGI requires a subset of individuals in a pedigree to be fully sequenced, while other individuals are sequenced only at relevant markers. The imputation will infer the missing genotypes at untyped markers. Running GIGI on large pedigrees for large numbers of markers can be very time consuming. We present GIGI-Quick as a method to efficiently split GIGIs input, run GIGI in parallel and efficiently merge the output to reduce the runtime with the number of cores. This allows obtaining imputation results faster, and therefore all subsequent association analyses. Availability and and implementation GIGI-Quick is open source and publicly available via: https://cse-git.qcri.org/Imputation/GIGI-Quick. Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.
international conference on bioinformatics | 2017
Raghvendra Mall; Ehsan Ullah; Khalid Kunji; Fulvio D'Angelo; Halima Bensmail; Michele Ceccarelli
Motivation: Biological networks unravel the inherent structure of molecular interactions which can lead to discovery of driver genes and meaningful pathways especially in cancer context. Often due to gene mutations, the gene expression undergoes changes and the corresponding gene regulatory network sustains some amount of localized re-wiring. The ability to identify significant changes in the interaction patterns caused by the progression of the disease can lead to the revelation of novel relevant signatures. Methods: The task of identifying differential sub-networks in paired biological networks (A:control,B:case) can be re-phrased as one of finding dense communities in a single noisy differential topological (DT) graph constructed by taking absolute difference between the topological graphs of A and B. In this paper, we propose a fast three-stage approach, namely Differential Community Detection (DCD), to identify differential sub-networks as differential communities in a de-noised version of the DT graph. In the first stage, we iteratively re-order the nodes of the DT graph to determine approximate block diagonals present in the DT adjacency matrix using neighbourhood information of the nodes and Jaccard similarity. In the second stage, the ordered DT adjacency matrix is traversed along the diagonal to remove all the edges associated with a node, if that node has no immediate edges within a window. Finally, we apply community detection methods on this de-noised DT graph to discover differential sub-networks as communities. Results: Our proposed DCD approach can effectively locate differential sub-networks in several simulated paired random-geometric networks and various paired scale-free graphs with different power-law exponents. The DCD approach easily outperforms community detection methods applied on the original noisy DT graph and recent statistical techniques in simulation studies. We applied DCD method on two real datasets: a) Ovarian cancer dataset to discover differential DNA co-methylation sub-networks in patients and controls; b) Glioma cancer dataset to discover the difference between the regulatory networks of IDH-mutant and IDH-wild-type. We demonstrate the potential benefits of DCD for finding network-inferred bio-markers/pathways associated with a trait of interest. Conclusion: The proposed DCD approach overcomes the limitations of previous statistical techniques and the issues associated with identifying differential sub-networks by use of community detection methods on the noisy DT graph. This is reflected in the superior performance of the DCD method with respect to various metrics like Precision, Accuracy, Kappa and Specificity. The code implementing proposed DCD method is available at https://sites.google.com/site/raghvendramallmlresearcher/codes.
Journal of Health and Medical Informatics | 2017
Raghvendra Mall; Reda Rawi; Ehsan Ullah; Khalid Kunji; Abdelkrim Khadir; Ali Tiss; Jehad Abubaker; Michal A Kulinski; Mohammad M Ramzi; Mohammed Dehbi; Halima Bensmai
Background: Obesity and its co-morbidities are characterized by a chronic low-grade in amatory state, uncontrolled expression of metabolic measurements and dis-regulation of various forms of stress response. However, the contribution and correlation of in ammation, metabolism and stress responses to the disease are not fully elucidated. In this paper a cross-sectional case study was conducted on clinical data comprising 117 human male and female subjects with and without Type 2 Diabetes (T2D). Characteristics such as anthropometric, clinical and biochemical measurements were collected. Methods: Association of these variables with T2D and BMI were assessed using penalized hierarchical linear and logistic regression. In particular, elastic net, hdi and glinternet were used as regularization models to distinguish between cases and controls. Differential network analysis using closed-form approach was performed to identify pairwise-interaction of variables that influence prediction of the phenotype. Results: For the 117 participants, physical variables such as PBF, HDL and TBW had absolute coefficients 0.75, 0.65 and 0.34 using the glinternet approach, biochemical variables such as MIP, ROS and RANTES were identified as determinants of obesity with some interaction between inflammatory markers such as IL-4, IL-6, MIP, CSF, Eotaxin and ROS. Diabetes was associated with a significant increase in Thiobarbituric Acid Reactive Substances (TBARS) which are considered as an index of endogenous lipid peroxidation and an increase in two inflammatory markers, MIP-1 and RANTES. Furthermore, we obtained 13 pairwise effects. The pairwise effects include pairs from and within physical, clinical and biochemical features, in particular metabolic, inflammatory, and oxidative stress markers. Conclusion: We showcase those markers of oxidative stress (derived from lipid peroxidation) such as MIP-1 and RANTES participate in the pathogenesis of diseases such as diabetes and obesity in the Arab population.
international conference on bioinformatics | 2016
Ehsan Ullah; Raghvendra Mall; Reda Rawi; Halima Bensmail
Metabolomics encompasses analysis of metabolites using profiling techniques such as mass spectroscopy (MS) and nuclear magnetic resonance (NMR). Statistical analysis is performed on the profiled data to determine variations in the levels of metabolites. The goal here is to reveal relationships between the variations in the concentrations of metabolites and specific pathophysiological conditions such as diseases or external factors. Metabolomics has been widely used to characterize metabolites in various body fluids such as saliva, serum and urine in various fields of medical research including cancer [3], cardialogy [6], diabetes [5], human infections [12], neurology [7], neonatology [4] and respiratory diseases [2] to name a few. In the statistical analysis of metabolomics data, many methods are used which can be categorized as univariate and multivariate analysis methods. Univariate methods are very commonly applied due to their ease of use and interpretation. These methods consider metabolomic features (variables) one at a time independent of each other, thus, ignoring correlations with other features. Moreover, as pointed by Alonso et al. [1], these methods ignore confounding variables such as age, gender, body mass index (BMI), which may lead to incorrect results [13, 15]. On the other hand, multivariate methods consider all the features and their correlations during data analysis. These methods include unsupervised methods such as principal component analysis (PCA), and supervised methods such as partial least squares (PLS) and support vector machine (SVM). Alonso et al. has provided a review of univariate and multivariate methods used in metabolomics. To the best of our knowledge, there are many state of the art statistical methods that have not be used for metabolomic data analysis. A significant advantage of these methods over commonly used methods is their ability to process high-dimensional data. Along with state-of-the-art statistical methods we have used differential network analysis to identify variations at system level. In this work we have analyzed urine samples from Qatar Metabolomics Study on Diabetes (QMDiab) for identification of potential biomarkers. QMDiab was conducted by Hamad Medical Corporation, Qatar (HMC) and Weill Cornell Medical College, Qatar in 2012 with approval from the Institutional Review Boards of HMC and Weill Cornell Medical College-Qatar (Research Protocol number 11131/11). Written informed consent was obtained from all participants. Subjects in the study included males and females from Arab and Asian ethnicities aging 17-81 years. Urine samples were sent to Chenomx Inc., Alberta, Canada for proton nuclear magnetic resonance (1H NMR). Although the original study was targeting investigation of type 2 diabetes, in this paper we are focusing on obesity as well by using BMI as a representative measure of obesity. In this work we have used regularization models and differential network analysis. We have used the elastic net, glinternet, the lasso projection and high-dimensional inference. The elastic net uses L1 and L2 penalty resulting in a mix of ridge and lasso regression. The glinternet is a group-lasso based method developed by Lim and Hastie [9]. The method learns pairwise interactions of variables in linear regression models satisfying strong hierarchy. The lasso projection (lasso proj) or de-sparsified lasso is a regularization based method that performs statistical inference of low dimensional parameters with high dimensional data [17]. The method uses low dimension projection approach to construct confidence intervals for the estimated regression parameters. The high-dimensional inference computes P-values of variables and associated confidence intervals in high-dimensional data [10]. Further, we performed differential network analysis to identify variable interactions, which differentiate between diabetic and non-diabetic, or obese and lean subjects. The network is constructed using mutual information between the variables for different groups of samples. We applied the differential network analysis, dGHD algorithm, proposed by Ruan et al. [14] for detecting interaction patterns, which differentiate two networks. The algorithm uses the Generalised Hamming Distance (GHD) for calculating topological differences between the networks along with computation of their statistical significance. It is astonishing that the proposed methods, which have not been applied in the field yet, identify potential biomarkers, proposed in the literature by previous studies, in a small dataset. The results for the elastic net, the glinternet and the lasso proj are summarized in Table 1. For diabetes analysis, identified significant variables include age, betaine, glycolate and glucose, well known biomarkers for diabetes [8, 11]. For obesity analysis, identified significant variables include age, dimethylamine, succinate and cis-aconitate, previously identified by [16]. The high-dimensional inference only identified age and betaine for diabetes study. We conclude that state-of-the-art statistical and network analysis methods can be used for metabolomics data analysis for datasets with limited number of samples. The number of metabolomic features is increasing with the advancement of technologies. The ability of these methods to handle high-dimensional data make them suitable in the settings where the number of samples is smaller than the number of features. These methods can help in identification potential biomarkers in future studies.
ieee pacific visualization symposium | 2016
Michaël Aupetit; Ehsan Ullah; Reda Rawi; Halima Bensmail
Genome Wide Association Studies (GWAS) examine genetic variants in different individuals to detect variants associated to specific diseases. The 1000 Genomes project is such a collaborative research effort to sequence the genomes of at least 1000 participants of 26 different ethnicities, to establish a detailed summary of human genetic variation. The kinship information is a measure of individuals ancestor relationships within the considered populations. We study the design of kinship data visualizations allowing the experts to discover anomalies in GWAS data. The visual analysis of the 1000 Genomes Project kinship data reveals inconsistencies which call for a deeper analysis of the data quality within this project.
BMC Bioinformatics | 2016
Reda Rawi; Raghvendra Mall; Khalid Kunji; Mohammed El Anbari; Michaël Aupetit; Ehsan Ullah; Halima Bensmail
BackgroundThe post-genomic era with its wealth of sequences gave rise to a broad range of protein residue-residue contact detecting methods. Although various coevolution methods such as PSICOV, DCA and plmDCA provide correct contact predictions, they do not completely overlap. Hence, new approaches and improvements of existing methods are needed to motivate further development and progress in the field. We present a new contact detecting method, COUSCOus, by combining the best shrinkage approach, the empirical Bayes covariance estimator and GLasso.ResultsUsing the original PSICOV benchmark dataset, COUSCOus achieves mean accuracies of 0.74, 0.62 and 0.55 for the top L/10 predicted long, medium and short range contacts, respectively. In addition, COUSCOus attains mean areas under the precision-recall curves of 0.25, 0.29 and 0.30 for long, medium and short contacts and outperforms PSICOV. We also observed that COUSCOus outperforms PSICOV w.r.t. Matthew’s correlation coefficient criterion on full list of residue contacts. Furthermore, COUSCOus achieves on average 10% more gain in prediction accuracy compared to PSICOV on an independent test set composed of CASP11 protein targets. Finally, we showed that when using a simple random forest meta-classifier, by combining contact detecting techniques and sequence derived features, PSICOV predictions should be replaced by the more accurate COUSCOus predictions.ConclusionWe conclude that the consideration of superior covariance shrinkage approaches will boost several research fields that apply the GLasso procedure, amongst the presented one of residue-residue contact prediction as well as fields such as gene network reconstruction.
Qatar Foundation Annual Research Conference Proceedings | 2018
Ehsan Ullah; Raghvendra Mall; Halima Bensmail