Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Benjamin Georgi is active.

Publication


Featured researches published by Benjamin Georgi.


PLOS Genetics | 2013

From mouse to human: evolutionary genomics analysis of human orthologs of essential genes.

Benjamin Georgi; Benjamin F. Voight; Maja Bucan

Understanding the core set of genes that are necessary for basic developmental functions is one of the central goals in biology. Studies in model organisms identified a significant fraction of essential genes through the analysis of null-mutations that lead to lethality. Recent large-scale next-generation sequencing efforts have provided unprecedented data on genetic variation in human. However, evolutionary and genomic characteristics of human essential genes have never been directly studied on a genome-wide scale. Here we use detailed phenotypic resources available for the mouse and deep genomics sequencing data from human populations to characterize patterns of genetic variation and mutational burden in a set of 2,472 human orthologs of known essential genes in the mouse. Consistent with the action of strong, purifying selection, these genes exhibit comparatively reduced levels of sequence variation, skew in allele frequency towards more rare, and exhibit increased conservation across the primate and rodent lineages relative to the remainder of genes in the genome. In individual genomes we observed ∼12 rare mutations within essential genes predicted to be damaging. Consistent with the hypothesis that mutations in essential genes are risk factors for neurodevelopmental disease, we show that de novo variants in patients with Autism Spectrum Disorder are more likely to occur in this collection of genes. While incomplete, our set of human orthologs shows characteristics fully consistent with essential function in human and thus provides a resource to inform and facilitate interpretation of sequence data in studies of human disease.


PLOS Genetics | 2014

Genomic View of Bipolar Disorder Revealed by Whole Genome Sequencing in a Genetic Isolate

Benjamin Georgi; David Craig; Rachel L. Kember; Wencheng Liu; Ingrid Lindquist; Sara Nasser; Christopher D. Brown; Janice A. Egeland; Steven M. Paul; Maja Bucan

Bipolar disorder is a common, heritable mental illness characterized by recurrent episodes of mania and depression. Despite considerable effort to elucidate the genetic underpinnings of bipolar disorder, causative genetic risk factors remain elusive. We conducted a comprehensive genomic analysis of bipolar disorder in a large Old Order Amish pedigree. Microsatellite genotypes and high-density SNP-array genotypes of 388 family members were combined with whole genome sequence data for 50 of these subjects, comprising 18 parent-child trios. This study design permitted evaluation of candidate variants within the context of haplotype structure by resolving the phase in sequenced parent-child trios and by imputation of variants into multiple unsequenced siblings. Non-parametric and parametric linkage analysis of the entire pedigree as well as on smaller clusters of families identified several nominally significant linkage peaks, each of which included dozens of predicted deleterious variants. Close inspection of exonic and regulatory variants in genes under the linkage peaks using family-based association tests revealed additional credible candidate genes for functional studies and further replication in population-based cohorts. However, despite the in-depth genomic characterization of this unique, large and multigenerational pedigree from a genetic isolate, there was no convergence of evidence implicating a particular set of risk loci or common pathways. The striking haplotype and locus heterogeneity we observed has profound implications for the design of studies of bipolar and other related disorders.


intelligent systems in molecular biology | 2006

Context-specific independence mixture modeling for positional weight matrices

Benjamin Georgi; Alexander Schliep

MOTIVATION A positional weight matrix (PWM) is a statistical representation of the binding pattern of a transcription factor estimated from known binding site sequences. Previous studies showed that for factors which bind to divergent binding sites, mixtures of multiple PWMs increase performance. However, estimating a conventional mixture distribution for each position will in many cases cause overfitting. RESULTS We propose a context-specific independence (CSI) mixture model and a learning algorithm based on a Bayesian approach. The CSI model adjusts complexity to fit the amount of variation observed on the sequence level in each position of a site. This not only yields a more parsimonious description of binding patterns, which improves parameter estimates, it also increases robustness as the model automatically adapts the number of components to fit the data. Evaluation of the CSI model on simulated data showed favorable results compared to conventional mixtures. We demonstrate its adaptive properties in a classical model selection setup. The increased parsimony of the CSI model was shown for the transcription factor Leu3 where two binding-energy subgroups were distinguished equally well as with a conventional mixture but requiring 30% less parameters. Analysis of the human-mouse conservation of predicted binding sites of 64 JASPAR TFs showed that CSI was as good or better than a conventional mixture for 89% of the TFs and for 70% for a single PWM model. AVAILABILITY http://algorithmics.molgen.mpg.de/mixture.


Human Molecular Genetics | 2014

A population-based study of KCNH7 p.Arg394His and bipolar spectrum disorder

Kevin A. Strauss; Sander Markx; Benjamin Georgi; Steven M. Paul; Robert N. Jinks; Toshinori Hoshi; Ann McDonald; Michael B. First; Wencheng Liu; Abigail R. Benkert; Adam D. Heaps; Yutao Tian; Aravinda Chakravarti; Maja Bucan; Erik G. Puffenberger

We conducted blinded psychiatric assessments of 26 Amish subjects (52 ± 11 years) from four families with prevalent bipolar spectrum disorder, identified 10 potentially pathogenic alleles by exome sequencing, tested association of these alleles with clinical diagnoses in the larger Amish Study of Major Affective Disorder (ASMAD) cohort, and studied mutant potassium channels in neurons. Fourteen of 26 Amish had bipolar spectrum disorder. The only candidate allele shared among them was rs78247304, a non-synonymous variant of KCNH7 (c.1181G>A, p.Arg394His). KCNH7 c.1181G>A and nine other potentially pathogenic variants were subsequently tested within the ASMAD cohort, which consisted of 340 subjects grouped into controls subjects and affected subjects from overlapping clinical categories (bipolar 1 disorder, bipolar spectrum disorder and any major affective disorder). KCNH7 c.1181G>A had the highest enrichment among individuals with bipolar spectrum disorder (χ2 = 7.3) and the strongest family-based association with bipolar 1 (P = 0.021), bipolar spectrum (P = 0.031) and any major affective disorder (P = 0.016). In vitro, the p.Arg394His substitution allowed normal expression, trafficking, assembly and localization of HERG3/Kv11.3 channels, but altered the steady-state voltage dependence and kinetics of activation in neuronal cells. Although our genome-wide statistical results do not alone prove association, cumulative evidence from multiple independent sources (parallel genome-wide study cohorts, pharmacological studies of HERG-type potassium channels, electrophysiological data) implicates neuronal HERG3/Kv11.3 potassium channels in the pathophysiology of bipolar spectrum disorder. Such a finding, if corroborated by future studies, has implications for mental health services among the Amish, as well as development of drugs that specifically target HERG3/Kv11.3.


BMC Bioinformatics | 2010

PyMix--the python mixture package--a tool for clustering of heterogeneous biological data.

Benjamin Georgi; Ivan G. Costa; Alexander Schliep

BackgroundCluster analysis is an important technique for the exploratory analysis of biological data. Such data is often high-dimensional, inherently noisy and contains outliers. This makes clustering challenging. Mixtures are versatile and powerful statistical models which perform robustly for clustering in the presence of noise and have been successfully applied in a wide range of applications.ResultsPyMix - the Python mixture package implements algorithms and data structures for clustering with basic and advanced mixture models. The advanced models include context-specific independence mixtures, mixtures of dependence trees and semi-supervised learning. PyMix is licenced under the GNU General Public licence (GPL). PyMix has been successfully used for the analysis of biological sequence, complex disease and gene expression data.ConclusionsPyMix is a useful tool for cluster analysis of biological data. Due to the general nature of the framework, PyMix can be applied to a wide range of applications and data sets.


BMC Genetics | 2015

Copy number variants encompassing Mendelian disease genes in a large multigenerational family segregating bipolar disorder

Rachel L. Kember; Benjamin Georgi; Joan E. Bailey-Wilson; Dwight Stambolian; Steven M. Paul; Maja Bucan

BackgroundBipolar affective disorder (BP) is a common, highly heritable psychiatric disorder characterized by periods of depression and mania. Using dense SNP genotype data, we characterized CNVs in 388 members of an Old Order Amish Pedigree with bipolar disorder. We identified CNV regions arising from common ancestral mutations by utilizing the pedigree information. By combining this analysis with whole genome sequence data in the same individuals, we also explored the role of compound heterozygosity.ResultsHere we describe 541 inherited CNV regions, of which 268 are rare in a control population of European origin but present in a large number of Amish individuals. In addition, we highlight a set of CNVs found at higher frequencies in BP individuals, and within genes known to play a role in human development and disease. As in prior reports, we find no evidence for an increased burden of CNVs in BP individuals, but we report a trend towards a higher burden of CNVs in known Mendelian disease loci in bipolar individuals (BPI and BPII, p = 0.06).ConclusionsWe conclude that CNVs may be contributing factors in the phenotypic presentation of mood disorders and co-morbid medical conditions in this family. These results reinforce the hypothesis of a complex genetic architecture underlying BP disorder, and suggest that the role of CNVs should continue to be investigated in BP data sets.


BMC Structural Biology | 2009

Partially-supervised protein subclass discovery with simultaneous annotation of functional residues

Benjamin Georgi; Jörg Schultz; Alexander Schliep

BackgroundThe study of functional subfamilies of protein domain families and the identification of the residues which determine substrate specificity is an important question in the analysis of protein domains. One way to address this question is the use of clustering methods for protein sequence data and approaches to predict functional residues based on such clusterings. The locations of putative functional residues in known protein structures provide insights into how different substrate specificities are reflected on the protein structure level.ResultsWe have developed an extension of the context-specific independence mixture model clustering framework which allows for the integration of experimental data. As these are usually known only for a few proteins, our algorithm implements a partially-supervised learning approach. We discover domain subfamilies and predict functional residues for four protein domain families: phosphatases, pyridoxal dependent decarboxylases, WW and SH3 domains to demonstrate the usefulness of our approach.ConclusionThe partially-supervised clustering revealed biologically meaningful subfamilies even for highly heterogeneous domains and the predicted functional residues provide insights into the basis of the different substrate specificities.


GfKl | 2008

Mixture Model Based Group Inference in Fused Genotype and Phenotype Data

Benjamin Georgi; M. Anne Spence; Pamela Flodman; Alexander Schliep

The analysis of genetic diseases has classically been directed towards establishing direct links between cause, a genetic variation, and effect, the observable deviation of phenotype. For complex diseases which are caused by multiple factors and which show a wide spread of variations in the phenotypes this is unlikely to succeed. One example is the Attention Deficit Hyperactivity Disorder (ADHD), where it is expected that phenotypic variations will be caused by the overlapping effects of several distinct genetic mechanisms. The classical statistical models to cope with overlapping subgroups are mixture models, essentially convex combinations of density functions, which allow inference of descriptive models from data as well as the deduction of groups. An extension of conventional mixtures with attractive properties for clustering is the context-specific independence (CSI) framework. CSI allows for an automatic adaption of model complexity to avoid overfitting and yields a highly descriptive model.


european conference on principles of data mining and knowledge discovery | 2007

Context-Specific Independence Mixture Modelling for Protein Families

Benjamin Georgi; Jörg Schultz; Alexander Schliep

Protein families can be divided into subgroups with functional differences. The analysis of these subgroups and the determination of which residues convey substrate specificity is a central question in the study of these families. We present a clustering procedure using the context-specific independencemixture framework using a Dirichlet mixture prior for simultaneous inference of subgroups and prediction of specificity determining residues based on multiple sequence alignments of protein families. Application of the method on several well studied families revealed a good clustering performance and ample biological support for the predicted positions. The software we developed to carry out this analysis PyMix - the Python mixture packageis available from http://www.algorithmics.molgen.mpg.de/pymix.html.


PLOS Biology | 2007

Incomplete and inaccurate vocal imitation after knockdown of FoxP2 in songbird basal ganglia nucleus area X

Sebastian Haesler; Christelle Rochefort; Benjamin Georgi; Pawel Licznerski; Pavel Osten; Constance Scharff

Collaboration


Dive into the Benjamin Georgi's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Maja Bucan

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Pavel Osten

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar

Rachel L. Kember

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge