Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Khalid Kunji is active.

Publication


Featured researches published by Khalid Kunji.


PLOS ONE | 2015

Coevolution Analysis of HIV-1 Envelope Glycoprotein Complex

Reda Rawi; Khalid Kunji; Abdelali Haoudi; Halima Bensmail

The HIV-1 Env spike is the main protein complex that facilitates HIV-1 entry into CD4+ host cells. HIV-1 entry is a multistep process that is not yet completely understood. This process involves several protein-protein interactions between HIV-1 Env and a variety of host cell receptors along with many conformational changes within the spike. HIV-1 Env developed due to high mutation rates and plasticity escape strategies from immense immune pressure and entry inhibitors. We applied a coevolution and residue-residue contact detecting method to identify coevolution patterns within HIV-1 Env protein sequences representing all group M subtypes. We identified 424 coevolving residue pairs within HIV-1 Env. The majority of predicted pairs are residue-residue contacts and are proximal in 3D structure. Furthermore, many of the detected pairs have functional implications due to contributions in either CD4 or coreceptor binding, or variable loop, gp120-gp41, and interdomain interactions. This study provides a new dimension of information in HIV research. The identified residue couplings may not only be important in assisting gp120 and gp41 coordinate structure prediction, but also in designing new and effective entry inhibitors that incorporate mutation patterns of HIV-1 Env.


Nucleic Acids Research | 2018

RGBM: regularized gradient boosting machines for identification of the transcriptional regulators of discrete glioma subtypes

Raghvendra Mall; Luigi Cerulo; Luciano Garofano; Veronique Frattini; Khalid Kunji; Halima Bensmail; Thais S. Sabedot; Houtan Noushmehr; Anna Lasorella; Antonio Iavarone; Michele Ceccarelli

Abstract We propose a generic framework for gene regulatory network (GRN) inference approached as a feature selection problem. GRNs obtained using Machine Learning techniques are often dense, whereas real GRNs are rather sparse. We use a Tikonov regularization inspired optimal L-curve criterion that utilizes the edge weight distribution for a given target gene to determine the optimal set of TFs associated with it. Our proposed framework allows to incorporate a mechanistic active biding network based on cis-regulatory motif analysis. We evaluate our regularization framework in conjunction with two non-linear ML techniques, namely gradient boosting machines (GBM) and random-forests (GENIE), resulting in a regularized feature selection based method specifically called RGBM and RGENIE respectively. RGBM has been used to identify the main transcription factors that are causally involved as master regulators of the gene expression signature activated in the FGFR3-TACC3-positive glioblastoma. Here, we illustrate that RGBM identifies the main regulators of the molecular subtypes of brain tumors. Our analysis reveals the identity and corresponding biological activities of the master regulators characterizing the difference between G-CIMP-high and G-CIMP-low subtypes and between PA-like and LGm6-GBM, thus providing a clue to the yet undetermined nature of the transcriptional events among these subtypes.


Bioinformatics | 2018

DeepSol: a deep learning framework for sequence-based protein solubility prediction

Sameer Khurana; Reda Rawi; Khalid Kunji; Gwo-Yu Chuang; Halima Bensmail; Raghvendra Mall

Motivation: Protein solubility plays a vital role in pharmaceutical research and production yield. For a given protein, the extent of its solubility can represent the quality of its function, and is ultimately defined by its sequence. Thus, it is imperative to develop novel, highly accurate in silico sequence‐based protein solubility predictors. In this work we propose, DeepSol, a novel Deep Learning‐based protein solubility predictor. The backbone of our framework is a convolutional neural network that exploits k‐mer structure and additional sequence and structural features extracted from the protein sequence. Results: DeepSol outperformed all known sequence‐based state‐of‐the‐art solubility prediction methods and attained an accuracy of 0.77 and Matthews correlation coefficient of 0.55. The superior prediction accuracy of DeepSol allows to screen for sequences with enhanced production capacity and can more reliably predict solubility of novel proteins. Availability and implementation: DeepSols best performing models and results are publicly deposited at https://doi.org/10.5281/zenodo.1162886 (Khurana and Mall, 2018). Supplementary information: Supplementary data are available at Bioinformatics online.


Bioinformatics | 2018

PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine

Reda Rawi; Raghvendra Mall; Khalid Kunji; Chen-Hsiang Shen; Peter D. Kwong; Gwo-Yu Chuang

Motivation Protein solubility can be a decisive factor in both research and production efficiency, and in silico sequence-based predictors that can accurately estimate solubility outcomes are highly sought. Results In this study, we present a novel approach termed PRotein SolubIlity Predictor (PaRSnIP), which uses a gradient boosting machine algorithm as well as an approximation of sequence and structural features of the protein of interest. Based on an independent test set, PaRSnIP outperformed other state-of-the-art sequence-based methods by more than 9% in accuracy and 0.17 in Matthews correlation coefficient, with an overall accuracy of 74% and Matthews correlation coefficient of 0.48. Additionally, PaRSnIP provides importance scores for all features used in training. We observed higher fractions of exposed residues to associate positively with protein solubility and tripeptide stretches with multiple histidines to associate negatively with solubility. The improved prediction accuracy of PaRSnIP should enable it to predict protein solubility with greater reliability and to screen for sequence variants with enhanced manufacturability. Availability and implementation PaRSnIP software is available for download under GitHub (https://github.com/RedaRawi/PaRSnIP). Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.


Bioinformatics | 2018

GIGI-Quick: a fast approach to impute missing genotypes in genome-wide association family data

Khalid Kunji; Ehsan Ullah; Alejandro Q. Nato; Ellen M. Wijsman; Mohamad Saad

Summary Genome-wide association studies have become common over the last ten years, with a shift towards targeting rare variants, especially in pedigree-data. Despite lower costs, sequencing for rare variants still remains expensive. To have a relatively large sample with acceptable cost, imputation approaches may be used, such as GIGI for pedigree data. GIGI is an imputation method that handles large pedigrees and is particularly good for rare variant imputation. GIGI requires a subset of individuals in a pedigree to be fully sequenced, while other individuals are sequenced only at relevant markers. The imputation will infer the missing genotypes at untyped markers. Running GIGI on large pedigrees for large numbers of markers can be very time consuming. We present GIGI-Quick as a method to efficiently split GIGIs input, run GIGI in parallel and efficiently merge the output to reduce the runtime with the number of cores. This allows obtaining imputation results faster, and therefore all subsequent association analyses. Availability and and implementation GIGI-Quick is open source and publicly available via: https://cse-git.qcri.org/Imputation/GIGI-Quick. Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.


international conference on bioinformatics | 2017

Differential Community Detection in Paired Biological Networks

Raghvendra Mall; Ehsan Ullah; Khalid Kunji; Fulvio D'Angelo; Halima Bensmail; Michele Ceccarelli

Motivation: Biological networks unravel the inherent structure of molecular interactions which can lead to discovery of driver genes and meaningful pathways especially in cancer context. Often due to gene mutations, the gene expression undergoes changes and the corresponding gene regulatory network sustains some amount of localized re-wiring. The ability to identify significant changes in the interaction patterns caused by the progression of the disease can lead to the revelation of novel relevant signatures. Methods: The task of identifying differential sub-networks in paired biological networks (A:control,B:case) can be re-phrased as one of finding dense communities in a single noisy differential topological (DT) graph constructed by taking absolute difference between the topological graphs of A and B. In this paper, we propose a fast three-stage approach, namely Differential Community Detection (DCD), to identify differential sub-networks as differential communities in a de-noised version of the DT graph. In the first stage, we iteratively re-order the nodes of the DT graph to determine approximate block diagonals present in the DT adjacency matrix using neighbourhood information of the nodes and Jaccard similarity. In the second stage, the ordered DT adjacency matrix is traversed along the diagonal to remove all the edges associated with a node, if that node has no immediate edges within a window. Finally, we apply community detection methods on this de-noised DT graph to discover differential sub-networks as communities. Results: Our proposed DCD approach can effectively locate differential sub-networks in several simulated paired random-geometric networks and various paired scale-free graphs with different power-law exponents. The DCD approach easily outperforms community detection methods applied on the original noisy DT graph and recent statistical techniques in simulation studies. We applied DCD method on two real datasets: a) Ovarian cancer dataset to discover differential DNA co-methylation sub-networks in patients and controls; b) Glioma cancer dataset to discover the difference between the regulatory networks of IDH-mutant and IDH-wild-type. We demonstrate the potential benefits of DCD for finding network-inferred bio-markers/pathways associated with a trait of interest. Conclusion: The proposed DCD approach overcomes the limitations of previous statistical techniques and the issues associated with identifying differential sub-networks by use of community detection methods on the noisy DT graph. This is reflected in the superior performance of the DCD method with respect to various metrics like Precision, Accuracy, Kappa and Specificity. The code implementing proposed DCD method is available at https://sites.google.com/site/raghvendramallmlresearcher/codes.


Journal of Health and Medical Informatics | 2017

Application of High-Dimensional Statistics and Network Based VisualizationTechniques on Arab Diabetes and Obesity Data

Raghvendra Mall; Reda Rawi; Ehsan Ullah; Khalid Kunji; Abdelkrim Khadir; Ali Tiss; Jehad Abubaker; Michal A Kulinski; Mohammad M Ramzi; Mohammed Dehbi; Halima Bensmai

Background: Obesity and its co-morbidities are characterized by a chronic low-grade in amatory state, uncontrolled expression of metabolic measurements and dis-regulation of various forms of stress response. However, the contribution and correlation of in ammation, metabolism and stress responses to the disease are not fully elucidated. In this paper a cross-sectional case study was conducted on clinical data comprising 117 human male and female subjects with and without Type 2 Diabetes (T2D). Characteristics such as anthropometric, clinical and biochemical measurements were collected. Methods: Association of these variables with T2D and BMI were assessed using penalized hierarchical linear and logistic regression. In particular, elastic net, hdi and glinternet were used as regularization models to distinguish between cases and controls. Differential network analysis using closed-form approach was performed to identify pairwise-interaction of variables that influence prediction of the phenotype. Results: For the 117 participants, physical variables such as PBF, HDL and TBW had absolute coefficients 0.75, 0.65 and 0.34 using the glinternet approach, biochemical variables such as MIP, ROS and RANTES were identified as determinants of obesity with some interaction between inflammatory markers such as IL-4, IL-6, MIP, CSF, Eotaxin and ROS. Diabetes was associated with a significant increase in Thiobarbituric Acid Reactive Substances (TBARS) which are considered as an index of endogenous lipid peroxidation and an increase in two inflammatory markers, MIP-1 and RANTES. Furthermore, we obtained 13 pairwise effects. The pairwise effects include pairs from and within physical, clinical and biochemical features, in particular metabolic, inflammatory, and oxidative stress markers. Conclusion: We showcase those markers of oxidative stress (derived from lipid peroxidation) such as MIP-1 and RANTES participate in the pathogenesis of diseases such as diabetes and obesity in the Arab population.


BMC Bioinformatics | 2016

COUSCOus: improved protein contact prediction using an empirical Bayes covariance estimator

Reda Rawi; Raghvendra Mall; Khalid Kunji; Mohammed El Anbari; Michaël Aupetit; Ehsan Ullah; Halima Bensmail

BackgroundThe post-genomic era with its wealth of sequences gave rise to a broad range of protein residue-residue contact detecting methods. Although various coevolution methods such as PSICOV, DCA and plmDCA provide correct contact predictions, they do not completely overlap. Hence, new approaches and improvements of existing methods are needed to motivate further development and progress in the field. We present a new contact detecting method, COUSCOus, by combining the best shrinkage approach, the empirical Bayes covariance estimator and GLasso.ResultsUsing the original PSICOV benchmark dataset, COUSCOus achieves mean accuracies of 0.74, 0.62 and 0.55 for the top L/10 predicted long, medium and short range contacts, respectively. In addition, COUSCOus attains mean areas under the precision-recall curves of 0.25, 0.29 and 0.30 for long, medium and short contacts and outperforms PSICOV. We also observed that COUSCOus outperforms PSICOV w.r.t. Matthew’s correlation coefficient criterion on full list of residue contacts. Furthermore, COUSCOus achieves on average 10% more gain in prediction accuracy compared to PSICOV on an independent test set composed of CASP11 protein targets. Finally, we showed that when using a simple random forest meta-classifier, by combining contact detecting techniques and sequence derived features, PSICOV predictions should be replaced by the more accurate COUSCOus predictions.ConclusionWe conclude that the consideration of superior covariance shrinkage approaches will boost several research fields that apply the GLasso procedure, amongst the presented one of residue-residue contact prediction as well as fields such as gene network reconstruction.


bioinformatics and biomedicine | 2015

Residue-residue contact prediction in the HIV-1 envelope glycoprotein complex

Reda Rawi; Khalid Kunji; Abdelali Haoudi; Halima Bensmail

HIV-1 Env glycoprotein complex is the key protein that mediates binding and entry of HIV-1 into human host cells. The complex entry process involves three main steps. First, the attachment, the interaction of gp120 and CD4. Second, the coreceptor binding, where gp120 binds the chemokine receptor CCR5 or CXCR4, and finally, the fusion of viral and host cell membranes. Despite the fact that several coordinate structures of HIV-1 Env in unliganded state exist (as well as in complex with CD4, CD4 mimics, or various antibodies, and of gp41 in intermediate and post-fusion state), a comprehensive understanding of structural arrangements and communication within gp120 and gp41 domains during the entry is far from complete. In this study, we applied a direct amino acid interaction detecting method to analyse the function and structure of HIV-1 Env protein sequences representing all group M subtypes. We identified more than 400 coevolving residue pairs within Env, of which the majority are real contacts and proximal in the available coordinate structures, or have functional implications such as receptor binding, variable loop, gp120-gp41, and interdomain interactions. This work provides a new dimension of information in HIV research, important in assisting protein coordinate structure prediction and in designing new and effective entry inhibitors.


F1000Research | 2018

An unsupervised disease module identification technique in biological networks using novel quality metric based on connectivity, conductance and modularity

Raghvendra Mall; Ehsan Ullah; Khalid Kunji; Michele Ceccarelli; Halima Bensmail

Collaboration


Dive into the Khalid Kunji's collaboration.

Top Co-Authors

Avatar

Halima Bensmail

Qatar Computing Research Institute

View shared research outputs
Top Co-Authors

Avatar

Raghvendra Mall

Qatar Computing Research Institute

View shared research outputs
Top Co-Authors

Avatar

Ehsan Ullah

Qatar Computing Research Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Reda Rawi

University of Duisburg-Essen

View shared research outputs
Top Co-Authors

Avatar

Reda Rawi

University of Duisburg-Essen

View shared research outputs
Top Co-Authors

Avatar

Abdelali Haoudi

Eastern Virginia Medical School

View shared research outputs
Top Co-Authors

Avatar

Gwo-Yu Chuang

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Anna Lasorella

Columbia University Medical Center

View shared research outputs
Researchain Logo
Decentralizing Knowledge