Hani Z. Girgis
National Institutes of Health
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hani Z. Girgis.
Cell | 2013
Axel Visel; Leila Taher; Hani Z. Girgis; Dalit May; Olga Golonzhka; Renée V. Hoch; Gabriel L. McKinsey; Kartik Pattabiraman; Shanni N. Silberberg; Matthew J. Blow; David V. Hansen; Alex S. Nord; Jennifer A. Akiyama; Amy Holt; Roya Hosseini; Sengthavy Phouanenavong; Ingrid Plajzer-Frick; Malak Shoukry; Veena Afzal; Tommy Kaplan; Arnold R. Kriegstein; Edward M. Rubin; Ivan Ovcharenko; Len A. Pennacchio; John L.R. Rubenstein
The mammalian telencephalon plays critical roles in cognition, motor function, and emotion. Though many of the genes required for its development have been identified, the distant-acting regulatory sequences orchestrating their in vivo expression are mostly unknown. Here, we describe a digital atlas of in vivo enhancers active in subregions of the developing telencephalon. We identified more than 4,600 candidate embryonic forebrain enhancers and studied the in vivo activity of 329 of these sequences in transgenic mouse embryos. We generated serial sets of histological brain sections for 145 reproducible forebrain enhancers, resulting in a publicly accessible web-based data collection comprising more than 32,000 sections. We also used epigenomic analysis of human and mouse cortex tissue to directly compare the genome-wide enhancer architecture in these species. These data provide a primary resource for investigating gene regulatory mechanisms of telencephalon development and enable studies of the role of distant-acting enhancers in neurodevelopmental disorders.
BMC Bioinformatics | 2015
Hani Z. Girgis
BackgroundWith rapid advancements in technology, the sequences of thousands of species’ genomes are becoming available. Within the sequences are repeats that comprise significant portions of genomes. Successful annotations thus require accurate discovery of repeats. As species-specific elements, repeats in newly sequenced genomes are likely to be unknown. Therefore, annotating newly sequenced genomes requires tools to discover repeats de-novo. However, the currently available de-novo tools have limitations concerning the size of the input sequence, ease of use, sensitivities to major types of repeats, consistency of performance, speed, and false positive rate.ResultsTo address these limitations, I designed and developed Red, applying Machine Learning. Red is the first repeat-detection tool capable of labeling its training data and training itself automatically on an entire genome. Red is easy to install and use. It is sensitive to both transposons and simple repeats; in contrast, available tools such as RepeatScout and ReCon are sensitive to transposons, and WindowMasker to simple repeats. Red performed consistently well on seven genomes; the other tools performed well only on some genomes. Red is much faster than RepeatScout and ReCon and has a much lower false positive rate than WindowMasker. On human genes with five or more copies, Red was more specific than RepeatScout by a wide margin. When tested on genomes of unusual nucleotide compositions, Red located repeats with high sensitivities and maintained moderate false positive rates. Red outperformed the related tools on a bacterial genome. Red identified 46,405 novel repetitive segments in the human genome. Finally, Red is capable of processing assembled and unassembled genomes.ConclusionsRed’s innovative methodology and its excellent performance on seven different genomes represent a valuable advancement in the field of repeats discovery.
BMC Bioinformatics | 2012
Hani Z. Girgis; Ivan Ovcharenko
BackgroundResearchers seeking to unlock the genetic basis of human physiology and diseases have been studying gene transcription regulation. The temporal and spatial patterns of gene expression are controlled by mainly non-coding elements known as cis-regulatory modules (CRMs) and epigenetic factors. CRMs modulating related genes share the regulatory signature which consists of transcription factor (TF) binding sites (TFBSs). Identifying such CRMs is a challenging problem due to the prohibitive number of sequence sets that need to be analyzed.ResultsWe formulated the challenge as a supervised classification problem even though experimentally validated CRMs were not required. Our efforts resulted in a software system named CrmMiner. The system mines for CRMs in the vicinity of related genes. CrmMiner requires two sets of sequences: a mixed set and a control set. Sequences in the vicinity of the related genes comprise the mixed set, whereas the control set includes random genomic sequences. CrmMiner assumes that a large percentage of the mixed set is made of background sequences that do not include CRMs. The system identifies pairs of closely located motifs representing vertebrate TFBSs that are enriched in the training mixed set consisting of 50% of the gene loci. In addition, CrmMiner selects a group of the enriched pairs to represent the tissue-specific regulatory signature. The mixed and the control sets are searched for candidate sequences that include any of the selected pairs. Next, an optimal Bayesian classifier is used to distinguish candidates found in the mixed set from their control counterparts. Our study proposes 62 tissue-specific regulatory signatures and putative CRMs for different human tissues and cell types. These signatures consist of assortments of ubiquitously expressed TFs and tissue-specific TFs. Under controlled settings, CrmMiner identified known CRMs in noisy sets up to 1:25 signal-to-noise ratio. CrmMiner was 21-75% more precise than a related CRM predictor. The sensitivity of the system to locate known human heart enhancers reached up to 83%. CrmMiner precision reached 82% while mining for CRMs specific to the human CD4+ T cells. On several data sets, the system achieved 99% specificity.ConclusionThese results suggest that CrmMiner predictions are accurate and likely to be tissue-specific CRMs. We expect that the predicted tissue-specific CRMs and the regulatory signatures broaden our knowledge of gene transcription regulation.
medical image computing and computer assisted intervention | 2009
Sharmishtaa Seshamani; Purnima Rajan; Rajesh Kumar; Hani Z. Girgis; Themistocles Dassopoulos; Gerard E. Mullin; Gregory D. Hager
A variety of pixel and feature based methods have been proposed for registering multiple views of anatomy visible in studies obtained using diagnostic, minimally invasive imaging. A given registration method may outperform another depending on anatomical variations, imaging conditions, and imaging sensor performance, and it is often difficult a priori to determine the best registration method for a particular application. To address this problem, we propose a registration framework that pools the results of multiple registration methods using a decision function for validating registrations. We refer to this as meta registration. We demonstrate that our framework outperforms several individual registration methods on the task of registering multiple views of Crohns disease lesions sampled from a Capsule Endoscopy (CE) study database. We also report on preliminary work on assessing the quality of registrations obtained, and the possibility of using such assessment in the registration framework.
conference on object-oriented programming systems, languages, and applications | 2005
Hani Z. Girgis; Bharat Jayaraman; Paul Gestwicki
We describe the results of visualizing object oriented programs errors, and utilizing these results in the design of a set of visual queries on the runtime execution history of a program. We bring together under one coherent framework different approaches such as error classification, bug patterns in object oriented programs, and visual queries. Our work is founded on a novel interactive visualization system for Java called JIVE, developed at Buffalo.
Nucleic Acids Research | 2013
Hani Z. Girgis; Sergey L. Sheetlin
Microsatellites (MSs) are DNA regions consisting of repeated short motif(s). MSs are linked to several diseases and have important biomedical applications. Thus, researchers have developed several computational tools to detect MSs. However, the currently available tools require adjusting many parameters, or depend on a list of motifs or on a library of known MSs. Therefore, two laboratories analyzing the same sequence with the same computational tool may obtain different results due to the user-adjustable parameters. Recent studies have indicated the need for a standard computational tool for detecting MSs. To this end, we applied machine-learning algorithms to develop a tool called MsDetector. The system is based on a hidden Markov model and a general linear model. The user is not obligated to optimize the parameters of MsDetector. Neither a list of motifs nor a library of known MSs is required. MsDetector is memory- and time-efficient. We applied MsDetector to several species. MsDetector located the majority of MSs found by other widely used tools. In addition, MsDetector identified novel MSs. Furthermore, the system has a very low false-positive rate resulting in a precision of up to 99%. MsDetector is expected to produce consistent results across studies analyzing the same sequence.
international symposium on biomedical imaging | 2010
Hani Z. Girgis; Ben Mitchell; Themos Dassopoulos; Gerard E. Mullin; Gregory D. Hager
A Wireless Capsule Endoscope (WCE) is a small device that is capable of acquiring thousands of images as it travels through the gastrointestinal track. WCE is becoming a widely accepted method which physicians use in the diagnosis of Crohns disease, an inflammatory disease that occurs mainly in the small intestine. In this article we present a novel method to detect those images showing inflammation among the thousands of images acquired by the WCE. Further, our method is capable of delineating the inflammation region(s) in each detected frame. Our system utilizes the mean-shift algorithm to find centers of candidate regions that may show Crohns disease inflammation. Then the system classifies these regions by a trained Support Vector Machine. We have trained, validated and tested our method on three mutually exclusive sets. Our systems testing accuracy, specificity and sensitivity are 87%, 93% and 80% respectively.
international conference of the ieee engineering in medicine and biology society | 2009
Hani Z. Girgis; Jason J. Corso; Daniel Fischer
To predict the three dimensional structure of proteins, many computational methods sample the conformational space, generating a large number of candidate structures. Subsequently, such methods rank the generated structures using a variety of model quality assessment programs in order to obtain a small set of structures that are most likely to resemble the unknown experimentally determined structure. Model quality assessment programs suffer from two main limitations: (i) the rank-one structure is not always the best predicted structure; in other words, the best predicted structure could be ranked as the 10th structure (ii) no single assessment method can correctly rank the predicted structures for all target proteins. However, because often at least some of the methods achieve a good ranking, a model quality assessment method that is based on a consensus of a number of model quality assessment methods is likely to perform better. We have devised the STPdata algorithm, a consensus method based on five model quality assessment programs. We have applied it to build an on-line “custom-trained” hierarchy of general linear models to select and rank the best predicted structures. By “custom-trained”, we mean for each target protein the STPdata algorithm trains a unique model on data related to the input target protein. To evaluate our method we participated in CASP8 as human predictors. In CASP8, the STPdata algorithm has trained 128 hierarchical models for each of the 128 target proteins. Based on the official results of CASP8 our method outperformed the best server by 6% and won the fourth position among human predictors. Our CASP results are purely based on computational methods without any human intervention.
Archive | 2008
Jason J. Corso; Hani Z. Girgis
Archive | 2004
Bharat Jayaraman; Paul Gestwicki; Manu Pushpendran; Akshay V. Hegde; Hani Z. Girgis