Wail Ba-alawi
King Abdullah University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Wail Ba-alawi.
BMC Genomics | 2014
Yulia A. Medvedeva; Abdullah M. Khamis; Ivan V. Kulakovskiy; Wail Ba-alawi; Shariful Islam Bhuyan; Hideya Kawaji; Timo Lassmann; Matthias Harbers; Alistair R. R. Forrest; Vladimir B. Bajic
BackgroundDNA methylation in promoters is closely linked to downstream gene repression. However, whether DNA methylation is a cause or a consequence of gene repression remains an open question. If it is a cause, then DNA methylation may affect the affinity of transcription factors (TFs) for their binding sites (TFBSs). If it is a consequence, then gene repression caused by chromatin modification may be stabilized by DNA methylation. Until now, these two possibilities have been supported only by non-systematic evidence and they have not been tested on a wide range of TFs. An average promoter methylation is usually used in studies, whereas recent results suggested that methylation of individual cytosines can also be important.ResultsWe found that the methylation profiles of 16.6% of cytosines and the expression profiles of neighboring transcriptional start sites (TSSs) were significantly negatively correlated. We called the CpGs corresponding to such cytosines “traffic lights”. We observed a strong selection against CpG “traffic lights” within TFBSs. The negative selection was stronger for transcriptional repressors as compared with transcriptional activators or multifunctional TFs as well as for core TFBS positions as compared with flanking TFBS positions.ConclusionsOur results indicate that direct and selective methylation of certain TFBS that prevents TF binding is restricted to special cases and cannot be considered as a general regulatory mechanism of transcription.
Nucleic Acids Research | 2016
Ivan V. Kulakovskiy; Ilya E. Vorontsov; Ivan S. Yevshin; Anastasiia V. Soboleva; Artem S. Kasianov; Haitham Ashoor; Wail Ba-alawi; Vladimir B. Bajic; Yulia A. Medvedeva; Fedor A. Kolpakov; Vsevolod J. Makeev
Models of transcription factor (TF) binding sites provide a basis for a wide spectrum of studies in regulatory genomics, from reconstruction of regulatory networks to functional annotation of transcripts and sequence variants. While TFs may recognize different sequence patterns in different conditions, it is pragmatic to have a single generic model for each particular TF as a baseline for practical applications. Here we present the expanded and enhanced version of HOCOMOCO (http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco10), the collection of models of DNA patterns, recognized by transcription factors. HOCOMOCO now provides position weight matrix (PWM) models for binding sites of 601 human TFs and, in addition, PWMs for 396 mouse TFs. Furthermore, we introduce the largest up to date collection of dinucleotide PWM models for 86 (52) human (mouse) TFs. The update is based on the analysis of massive ChIP-Seq and HT-SELEX datasets, with the validation of the resulting models on in vivo data. To facilitate a practical application, all HOCOMOCO models are linked to gene and protein databases (Entrez Gene, HGNC, UniProt) and accompanied by precomputed score thresholds. Finally, we provide command-line tools for PWM and diPWM threshold estimation and motif finding in nucleotide sequences.
The ISME Journal | 2015
David Kamanda Ngugi; Jochen Blom; Intikhab Alam; Masmoon Rashid; Wail Ba-alawi; Guishan Zhang; Tyas I. Hikmawan; Yue Guan; André Antunes; Rania Siam; Hamza El Dorry; Vladimir B. Bajic; Ulrich Stingl
The bottom of the Red Sea harbors over 25 deep hypersaline anoxic basins that are geochemically distinct and characterized by vertical gradients of extreme physicochemical conditions. Because of strong changes in density, particulate and microbial debris get entrapped in the brine-seawater interface (BSI), resulting in increased dissolved organic carbon, reduced dissolved oxygen toward the brines and enhanced microbial activities in the BSI. These features coupled with the deep-sea prevalence of ammonia-oxidizing archaea (AOA) in the global ocean make the BSI a suitable environment for studying the osmotic adaptations and ecology of these important players in the marine nitrogen cycle. Using phylogenomic-based approaches, we show that the local archaeal community of five different BSI habitats (with up to 18.2% salinity) is composed mostly of a single, highly abundant Nitrosopumilus-like phylotype that is phylogenetically distinct from the bathypelagic thaumarchaea; ammonia-oxidizing bacteria were absent. The composite genome of this novel Nitrosopumilus-like subpopulation (RSA3) co-assembled from multiple single-cell amplified genomes (SAGs) from one such BSI habitat further revealed that it shares ∼54% of its predicted genomic inventory with sequenced Nitrosopumilus species. RSA3 also carries several, albeit variable gene sets that further illuminate the phylogenetic diversity and metabolic plasticity of this genus. Specifically, it encodes for a putative proline-glutamate ‘switch’ with a potential role in osmotolerance and indirect impact on carbon and energy flows. Metagenomic fragment recruitment analyses against the composite RSA3 genome, Nitrosopumilus maritimus, and SAGs of mesopelagic thaumarchaea also reiterate the divergence of the BSI genotypes from other AOA.
Scientific Reports | 2016
Romano Mwirichia; Intikhab Alam; Mamoon Rashid; Manikandan Vinu; Wail Ba-alawi; Allan Anthony Kamau; David Kamanda Ngugi; Markus Göker; Hans-Peter Klenk; Vladimir B. Bajic; Ulrich Stingl
The candidate Division MSBL1 (Mediterranean Sea Brine Lakes 1) comprises a monophyletic group of uncultured archaea found in different hypersaline environments. Previous studies propose methanogenesis as the main metabolism. Here, we describe a metabolic reconstruction of MSBL1 based on 32 single-cell amplified genomes from Brine Pools of the Red Sea (Atlantis II, Discovery, Nereus, Erba and Kebrit). Phylogeny based on rRNA genes as well as conserved single copy genes delineates the group as a putative novel lineage of archaea. Our analysis shows that MSBL1 may ferment glucose via the Embden–Meyerhof–Parnas pathway. However, in the absence of organic carbon, carbon dioxide may be fixed via the ribulose bisphosphate carboxylase, Wood-Ljungdahl pathway or reductive TCA cycle. Therefore, based on the occurrence of genes for glycolysis, absence of the core genes found in genomes of all sequenced methanogens and the phylogenetic position, we hypothesize that the MSBL1 are not methanogens, but probably sugar-fermenting organisms capable of autotrophic growth. Such a mixotrophic lifestyle would confer survival advantage (or possibly provide a unique narrow niche) when glucose and other fermentable sugars are not available.
PLOS ONE | 2015
Othman Soufan; Wail Ba-alawi; Moataz Afeef; Magbubah Essack; Valentin O. Rodionov; Panos Kalnis; Vladimir B. Bajic
High-throughput screening (HTS) experiments provide a valuable resource that reports biological activity of numerous chemical compounds relative to their molecular targets. Building computational models that accurately predict such activity status (active vs. inactive) in specific assays is a challenging task given the large volume of data and frequently small proportion of active compounds relative to the inactive ones. We developed a method, DRAMOTE, to predict activity status of chemical compounds in HTP activity assays. For a class of HTP assays, our method achieves considerably better results than the current state-of-the-art-solutions. We achieved this by modification of a minority oversampling technique. To demonstrate that DRAMOTE is performing better than the other methods, we performed a comprehensive comparison analysis with several other methods and evaluated them on data from 11 PubChem assays through 1,350 experiments that involved approximately 500,000 interactions between chemicals and their target proteins. As an example of potential use, we applied DRAMOTE to develop robust models for predicting FDA approved drugs that have high probability to interact with the thyroid stimulating hormone receptor (TSHR) in humans. Our findings are further partially and indirectly supported by 3D docking results and literature information. The results based on approximately 500,000 interactions suggest that DRAMOTE has performed the best and that it can be used for developing robust virtual screening models. The datasets and implementation of all solutions are available as a MATLAB toolbox online at www.cbrc.kaust.edu.sa/dramote and can be found on Figshare.
FEMS Microbiology Ecology | 2014
Francy Jimenez-Infante; David Kamanda Ngugi; Intikhab Alam; Mamoon Rashid; Wail Ba-alawi; Allan Anthony Kamau; Vladimir B. Bajic; Ulrich Stingl
Using dilution-to-extinction cultivation, we isolated a strain affiliated with the PS1 clade from surface waters of the Red Sea. Strain RS24 represents the second isolate of this group of marine Alphaproteobacteria after IMCC14465 that was isolated from the East (Japan) Sea. The PS1 clade is a sister group to the OCS116 clade, together forming a putatively novel order closely related to Rhizobiales. While most genomic features and most of the genetic content are conserved between RS24 and IMCC14465, their average nucleotide identity (ANI) is < 81%, suggesting two distinct species of the PS1 clade. Next to encoding two different variants of proteorhodopsin genes, they also harbor several unique genomic islands that contain genes related to degradation of aromatic compounds in IMCC14465 and in polymer degradation in RS24, possibly reflecting the physicochemical differences in the environment they were isolated from. No clear differences in abundance of the genomic content of either strain could be found in fragment recruitment analyses using different metagenomic datasets, in which both genomes were detectable albeit as minor part of the communities. The comparative genomic analysis of both isolates of the PS1 clade and the fragment recruitment analysis provide first insights into the ecology of this group.
Journal of Cheminformatics | 2016
Othman Soufan; Wail Ba-alawi; Moataz Afeef; Magbubah Essack; Panos Kalnis; Vladimir B. Bajic
Background Mining high-throughput screening (HTS) assays is key for enhancing decisions in the area of drug repositioning and drug discovery. However, many challenges are encountered in the process of developing suitable and accurate methods for extracting useful information from these assays. Virtual screening and a wide variety of databases, methods and solutions proposed to-date, did not completely overcome these challenges. This study is based on a multi-label classification (MLC) technique for modeling correlations between several HTS assays, meaning that a single prediction represents a subset of assigned correlated labels instead of one label. Thus, the devised method provides an increased probability for more accurate predictions of compounds that were not tested in particular assays.ResultsHere we present DRABAL, a novel MLC solution that incorporates structure learning of a Bayesian network as a step to model dependency between the HTS assays. In this study, DRABAL was used to process more than 1.4 million interactions of over 400,000 compounds and analyze the existing relationships between five large HTS assays from the PubChem BioAssay Database. Compared to different MLC methods, DRABAL significantly improves the F1Score by about 22%, on average. We further illustrated usefulness and utility of DRABAL through screening FDA approved drugs and reported ones that have a high probability to interact with several targets, thus enabling drug-multi-target repositioning. Specifically DRABAL suggests the Thiabendazole drug as a common activator of the NCP1 and Rab-9A proteins, both of which are designed to identify treatment modalities for the Niemann–Pick type C disease.ConclusionWe developed a novel MLC solution based on a Bayesian active learning framework to overcome the challenge of lacking fully labeled training data and exploit actual dependencies between the HTS assays. The solution is motivated by the need to model dependencies between existing experimental confirmatory HTS assays and improve prediction performance. We have pursued extensive experiments over several HTS assays and have shown the advantages of DRABAL. The datasets and programs can be downloaded from https://figshare.com/articles/DRABAL/3309562.Graphical abstract.
Nucleic Acids Research | 2018
Petr Smirnov; Victor Kofia; Alexander Maru; Mark Freeman; Chantal Ho; Nehme El-Hachem; George-Alexandru Adam; Wail Ba-alawi; Zhaleh Safikhani; Benjamin Haibe-Kains
Abstract Recent cancer pharmacogenomic studies profiled large panels of cell lines against hundreds of approved drugs and experimental chemical compounds. The overarching goal of these screens is to measure sensitivity of cell lines to chemical perturbations, correlate these measures to genomic features, and thereby develop novel predictors of drug response. However, leveraging these valuable data is challenging due to the lack of standards for annotating cell lines and chemical compounds, and quantifying drug response. Moreover, it has been recently shown that the complexity and complementarity of the experimental protocols used in the field result in high levels of technical and biological variation in the in vitro pharmacological profiles. There is therefore a need for new tools to facilitate rigorous comparison and integrative analysis of large-scale drug screening datasets. To address this issue, we have developed PharmacoDB (pharmacodb.pmgenomics.ca), a database integrating the largest cancer pharmacogenomic studies published to date. Here, we describe how the curation of cell line and chemical compound identifiers maximizes the overlap between datasets and how users can leverage such data to compare and extract robust drug phenotypes. PharmacoDB provides a unique resource to mine a compendium of curated cancer pharmacogenomic datasets that are otherwise disparate and difficult to integrate.
Scientific Reports | 2018
Othman Soufan; Wail Ba-alawi; Arturo Magana-Mora; Magbubah Essack; Vladimir B. Bajic
High-throughput screening (HTS) performs the experimental testing of a large number of chemical compounds aiming to identify those active in the considered assay. Alternatively, faster and cheaper methods of large-scale virtual screening are performed computationally through quantitative structure-activity relationship (QSAR) models. However, the vast amount of available HTS heterogeneous data and the imbalanced ratio of active to inactive compounds in an assay make this a challenging problem. Although different QSAR models have been proposed, they have certain limitations, e.g., high false positive rates, complicated user interface, and limited utilization options. Therefore, we developed DPubChem, a novel web tool for deriving QSAR models that implement the state-of-the-art machine-learning techniques to enhance the precision of the models and enable efficient analyses of experiments from PubChem BioAssay database. DPubChem also has a simple interface that provides various options to users. DPubChem predicted active compounds for 300 datasets with an average geometric mean and F1 score of 76.68% and 76.53%, respectively. Furthermore, DPubChem builds interaction networks that highlight novel predicted links between chemical compounds and biological assays. Using such a network, DPubChem successfully suggested a novel drug for the Niemann-Pick type C disease. DPubChem is freely available at www.cbrc.kaust.edu.sa/dpubchem.
Journal of Cheminformatics | 2016
Wail Ba-alawi; Othman Soufan; Magbubah Essack; Panos Kalnis; Vladimir B. Bajic