Hedi Hegyi
Hungarian Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hedi Hegyi.
Emerging Infectious Diseases | 2007
Gustavo Palacios; Phuong-Lan Quan; Omar J. Jabado; Sean Conlan; David L. Hirschberg; Yang Liu; Junhui Zhai; Neil Renwick; Jeffrey Hui; Hedi Hegyi; Allen Grolla; James E. Strong; Jonathan S. Towner; Thomas W. Geisbert; Peter B. Jahrling; Cornelia Büchen-Osmond; Heinz Ellerbrok; María Paz Sánchez-Seco; Yves A. Lussier; Pierre Formenty; Stuart T. Nichol; Heinz Feldmann; Thomas Briese; W. Ian Lipkin
To facilitate rapid, unbiased, differential diagnosis of infectious diseases, we designed GreeneChipPm, a panmicrobial microarray comprising 29,455 sixty-mer oligonucleotide probes for vertebrate viruses, bacteria, fungi, and parasites. Methods for nucleic acid preparation, random primed PCR amplification, and labeling were optimized to allow the sensitivity required for application with nucleic acid extracted from clinical materials and cultured isolates. Analysis of nasopharyngeal aspirates, blood, urine, and tissue from persons with various infectious diseases confirmed the presence of viruses and bacteria identified by other methods, and implicated Plasmodium falciparum in an unexplained fatal case of hemorrhagic feverlike disease during the Marburg hemorrhagic fever outbreak in Angola in 2004–2005.
Proceedings of the National Academy of Sciences of the United States of America | 2007
Michael L. Tress; Pier Luigi Martelli; Adam Frankish; Gabrielle A. Reeves; Jan Jaap Wesselink; Corin Yeats; Páll ĺsólfur Ólason; Mario Albrecht; Hedi Hegyi; Alejandro Giorgetti; Domenico Raimondo; Julien Lagarde; Roman A. Laskowski; Gonzalo López; Michael I. Sadowski; James D. Watson; Piero Fariselli; Ivan Rossi; Alinda Nagy; Wang Kai; Zenia M Størling; Massimiliano Orsini; Yassen Assenov; Hagen Blankenburg; Carola Huthmacher; Fidel Ramírez; Andreas Schlicker; P. D. Jones; Samuel Kerrien; Sandra Orchard
Alternative premessenger RNA splicing enables genes to generate more than one gene product. Splicing events that occur within protein coding regions have the potential to alter the biological function of the expressed protein and even to create new protein functions. Alternative splicing has been suggested as one explanation for the discrepancy between the number of human genes and functional complexity. Here, we carry out a detailed study of the alternatively spliced gene products annotated in the ENCODE pilot project. We find that alternative splicing in human genes is more frequent than has commonly been suggested, and we demonstrate that many of the potential alternative gene products will have markedly different structure and function from their constitutively spliced counterparts. For the vast majority of these alternative isoforms, little evidence exists to suggest they have a role as functional proteins, and it seems unlikely that the spectrum of conventional enzymatic or structural functions can be substantially extended through alternative splicing.
Genome Biology | 2011
Eva Schad; Peter Tompa; Hedi Hegyi
BackgroundSequencing the genomes of the first few eukaryotes created the impression that gene number shows no correlation with organism complexity, often referred to as the G-value paradox. Several attempts have previously been made to resolve this paradox, citing multifunctionality of proteins, alternative splicing, microRNAs or non-coding DNA. As intrinsic protein disorder has been linked with complex responses to environmental stimuli and communication between cells, an additional possibility is that structural disorder may effectively increase the complexity of species.ResultsWe revisited the G-value paradox by analyzing many new proteomes whose complexity measured with their number of distinct cell types is known. We found that complexity and proteome size measured by the total number of amino acids correlate significantly and have a power function relationship. We systematically analyzed numerous other features in relation to complexity in several organisms and tissues and found: the fraction of protein structural disorder increases significantly between prokaryotes and eukaryotes but does not further increase over the course of evolution; the number of predicted binding sites in disordered regions in a proteome increases with complexity; the fraction of protein disorder, predicted binding sites, alternative splicing and protein-protein interactions all increase with the complexity of human tissues.ConclusionsWe conclude that complexity is a multi-parametric trait, determined by interaction potential, alternative splicing capacity, tissue-specific protein disorder and, above all, proteome size. The G-value paradox is only apparent when plants are grouped with metazoans, as they have a different relationship between complexity and proteome size.
BMC Structural Biology | 2007
Hedi Hegyi; Eva Schad; Peter Tompa
BackgroundThe idea that the assembly of protein complexes is linked with protein disorder has been inferred from a few large complexes, such as the viral capsid or bacterial flagellar system, only. The relationship, which suggests that larger complexes have more disorder, has never been systematically tested. The recent high-throughput analyses of protein-protein interactions and protein complexes in the cell generated data that enable to address this issue by bioinformatic means.ResultsIn this work we predicted structural disorder for both E. coli and S. cerevisiae, and correlated it with the size of complexes. Using IUPred to predict the disorder for each complex, we found a statistically significant correlation between disorder and the number of proteins assembled into complexes. The distribution of disorder has a median value of 10% in yeast for complexes of 2–4 components (6% in E. coli), but 18% for complexes in the size range of 11–100 proteins (12% in E. coli). The level of disorder as assessed for regions longer than 30 consecutive disordered residues shows an even stronger division between small and large complexes (median values about 4% for complexes of 2–4 components, but 12% for complexes of 11–100 components in yeast). The predicted correlation is also supported by experimental evidence, by observing the structural disorder in protein components of complexes that can be found in the Protein Data Bank (median values 1. 5% for complexes of 2–4 components, and 9.6% for complexes of 11–100 components in yeast). Further analysis shows that this correlation is not directly linked with the increased disorder in hub proteins, but reflects a genuine systemic property of the proteins that make up the complexes.ConclusionOverall, it is suggested and discussed that the assembly of protein-protein complexes is enabled and probably promoted by protein disorder.
PLOS Computational Biology | 2009
Hedi Hegyi; László Buday; Peter Tompa
Chromosomal translocations, which often generate chimeric proteins by fusing segments of two distinct genes, represent the single major genetic aberration leading to cancer. We suggest that the unifying theme of these events is a high level of intrinsic structural disorder, enabling fusion proteins to evade cellular surveillance mechanisms that eliminate misfolded proteins. Predictions in 406 translocation-related human proteins show that they are significantly enriched in disorder (43.3% vs. 20.7% in all human proteins), they have fewer Pfam domains, and their translocation breakpoints tend to avoid domain splitting. The vicinity of the breakpoint is significantly more disordered than the rest of these already highly disordered fusion proteins. In the unlikely event of domain splitting in fusion it usually spares much of the domain or splits at locations where the newly exposed hydrophobic surface area approximates that of an intact domain. The mechanisms of action of fusion proteins suggest that in most cases their structural disorder is also essential to the acquired oncogenic function, enabling the long-range structural communication of remote binding and/or catalytic elements. In this respect, there are three major mechanisms that contribute to generating an oncogenic signal: (i) a phosphorylation site and a tyrosine-kinase domain are fused, and structural disorder of the intervening region enables intramolecular phosphorylation (e.g., BCR-ABL); (ii) a dimerisation domain fuses with a tyrosine kinase domain and disorder enables the two subunits within the homodimer to engage in permanent intermolecular phosphorylations (e.g., TFG-ALK); (iii) the fusion of a DNA-binding element to a transactivator domain results in an aberrant transcription factor that causes severe misregulation of transcription (e.g. EWS-ATF). Our findings also suggest novel strategies of intervention against the ensuing neoplastic transformations.
BMC Bioinformatics | 2008
Alinda Nagy; Hedi Hegyi; Krisztina Farkas; Hedvig Tordai; Evelin Kozma; László Bányai; László Patthy
BackgroundDespite significant improvements in computational annotation of genomes, sequences of abnormal, incomplete or incorrectly predicted genes and proteins remain abundant in public databases. Since the majority of incomplete, abnormal or mispredicted entries are not annotated as such, these errors seriously affect the reliability of these databases. Here we describe the MisPred approach that may provide an efficient means for the quality control of databases. The current version of the MisPred approach uses five distinct routines for identifying abnormal, incomplete or mispredicted entries based on the principle that a sequence is likely to be incorrect if some of its features conflict with our current knowledge about protein-coding genes and proteins: (i) conflict between the predicted subcellular localization of proteins and the absence of the corresponding sequence signals; (ii) presence of extracellular and cytoplasmic domains and the absence of transmembrane segments; (iii) co-occurrence of extracellular and nuclear domains; (iv) violation of domain integrity; (v) chimeras encoded by two or more genes located on different chromosomes.ResultsAnalyses of predicted EnsEMBL protein sequences of nine deuterostome (Homo sapiens, Mus musculus, Rattus norvegicus, Monodelphis domestica, Gallus gallus, Xenopus tropicalis, Fugu rubripes, Danio rerio and Ciona intestinalis) and two protostome species (Caenorhabditis elegans and Drosophila melanogaster) have revealed that the absence of expected signal peptides and violation of domain integrity account for the majority of mispredictions. Analyses of sequences predicted by NCBIs GNOMON annotation pipeline show that the rates of mispredictions are comparable to those of EnsEMBL. Interestingly, even the manually curated UniProtKB/Swiss-Prot dataset is contaminated with mispredicted or abnormal proteins, although to a much lesser extent than UniProtKB/TrEMBL or the EnsEMBL or GNOMON-predicted entries.ConclusionMisPred works efficiently in identifying errors in predictions generated by the most reliable gene prediction tools such as the EnsEMBL and NCBIs GNOMON pipelines and also guides the correction of errors. We suggest that application of the MisPred approach will significantly improve the quality of gene predictions and the associated databases.
Nucleic Acids Research | 2011
Hedi Hegyi; Lajos Kalmár; Tamás Horváth; Peter Tompa
According to current estimations ∼95% of multi-exonic human protein-coding genes undergo alternative splicing (AS). However, for 4000 human proteins in PDB, only 14 human proteins have structures of at least two alternative isoforms. Surveying these structural isoforms revealed that the maximum insertion accommodated by an isoform of a fully ordered protein domain was 5 amino acids, other instances of domain changes involved intrinsic structural disorder. After collecting 505 minor isoforms of human proteins with evidence for their existence we analyzed their length, protein disorder and exposed hydrophobic surface. We found that strict rules govern the selection of alternative splice variants aimed to preserve the integrity of globular domains: alternative splice sites (i) tend to avoid globular domains or (ii) affect them only marginally or (iii) tend to coincide with a location where the exposed hydrophobic surface is minimal or (iv) the protein is disordered. We also observed an inverse correlation between the domain fraction lost and the full length of the minor isoform containing the domain, possibly indicating a buffering effect for the isoform protein counteracting the domain truncation effect. These observations provide the basis for a prediction method (currently under development) to predict the viability of splice variants.
PLOS Computational Biology | 2008
Hedi Hegyi; Peter Tompa
Intrinsically disordered/unstructured proteins (IDPs) are extremely sensitive to proteolysis in vitro, but show no enhanced degradation rates in vivo. Their existence and functioning may be explained if IDPs are preferentially associated with chaperones in the cell, which may offer protection against degradation by proteases. To test this inference, we took pairwise interaction data from high-throughput interaction studies and analyzed to see if predicted disorder correlates with the tendency of chaperone binding by proteins. Our major finding is that disorder predicted by the IUPred algorithm actually shows negative correlation with chaperone binding in E. coli, S. cerevisiae, and metazoa species. Since predicted disorder positively correlates with the tendency of partner binding in the interactome, the difference between the disorder of chaperone-binding and non-binding proteins is even more pronounced if normalized to their overall tendency to be involved in pairwise protein–protein interactions. We argue that chaperone binding is primarily required for folding of globular proteins, as reflected in an increased preference for chaperones of proteins in which at least one Pfam domain exists. In terms of the functional consequences of chaperone binding of mostly disordered proteins, we suggest that its primary reason is not the assistance of folding, but promotion of assembly with partners. In support of this conclusion, we show that IDPs that bind chaperones also tend to bind other proteins.
Proteins | 2002
Hedi Hegyi; Jimmy Lin; Dov Greenbaum; Mark Gerstein
We conducted a structural genomics analysis of the folds and structural superfamilies in the first 20 completely sequenced genomes by focusing on the patterns of fold usage and trying to identify structural characteristics of typical and atypical folds. We assigned folds to sequences using PSI‐blast, run with a systematic protocol to reduce the amount of computational overhead. On average, folds could be assigned to about a fourth of the ORFs in the genomes and about a fifth of the amino acids in the proteomes. More than 80% of all the folds in the SCOP structural classification were identified in one of the 20 organisms, with worm and E. coli having the largest number of distinct folds. Folds are particularly effective at comprehensively measuring levels of gene duplication, because they group together even very remote homologues. Using folds, we find the average level of duplication varies depending on the complexity of the organism, ranging from 2.4 in M. genitalium to 32 for the worm, values significantly higher than those observed based purely on sequence similarity. We rank the common folds in the 20 organisms, finding that the top three are the P‐loop NTP hydrolase, the ferrodoxin fold, and the TIM‐barrel, and discuss in detail the many factors that affect and bias these rankings. We also identify atypical folds that are “unique” to one of the organisms in our study and compare the characteristics of these folds with the most common ones. We find that common folds tend be more multifunctional and associated with more regular, “symmetrical” structures than the unique ones. In addition, many of the unique folds are associated with proteins involved in cell defense (e.g., toxins). We analyze specific patterns of fold occurrence in the genomes by associating some of them with instances of horizontal transfer and others with gene loss. In particular, we find three possible examples of transfer between archaea and bacteria and six between eukarya and bacteria. We make available our detailed results at http://genecensus.org/20. Proteins 2002;47:126–141.
Bioinformatics | 2008
Gábor Tusnády; Lajos Kalmár; Hedi Hegyi; Peter Tompa; István Simon
Summary: The TOPDOM database is a collection of domains and sequence motifs located consistently on the same side of the membrane in α-helical transmembrane proteins. The database was created by scanning well-annotated transmembrane protein sequences in the UniProt database by specific domain or motif detecting algorithms. The identified domains or motifs were added to the database if they were uniformly annotated on the same side of the membrane of the various proteins in the UniProt database. The information about the location of the collected domains and motifs can be incorporated into constrained topology prediction algorithms, like HMMTOP, increasing the prediction accuracy. Availability: The TOPDOM database and the constrained HMMTOP prediction server are available on the page http://topdom.enzim.hu Contact: [email protected]; [email protected]