Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where István Miklós is active.

Publication


Featured researches published by István Miklós.


Ecology | 2004

RANDOMIZATION OF PRESENCE–ABSENCE MATRICES: COMMENTS AND NEW ALGORITHMS

István Miklós; János Podani

Randomization of presence–absence data matrices with fixed row and column totals is an important tool in ecological research wherever the significance of data-based statistics (e.g., species association measures) is to be evaluated. In the current literature of numerical ecology, however, there has been no algorithm that randomizes moderately large matrices in short time such that equidistribution of results is guaranteed. We show that a simple modification of the swap algorithm, called here the “trial-swap method,” satisfies the requirement for equidistribution. Since this is relatively slow, we suggest two fast algorithms that, combined with the trial-swap method, produce all possible results with equal chance. The three procedures are illustrated using actual examples taken from bird biogeography and vegetation ecology.


PLOS Genetics | 2008

Dynamics of Genome Rearrangement in Bacterial Populations

Aaron E. Darling; István Miklós; Mark A. Ragan

Genome structure variation has profound impacts on phenotype in organisms ranging from microbes to humans, yet little is known about how natural selection acts on genome arrangement. Pathogenic bacteria such as Yersinia pestis, which causes bubonic and pneumonic plague, often exhibit a high degree of genomic rearrangement. The recent availability of several Yersinia genomes offers an unprecedented opportunity to study the evolution of genome structure and arrangement. We introduce a set of statistical methods to study patterns of rearrangement in circular chromosomes and apply them to the Yersinia. We constructed a multiple alignment of eight Yersinia genomes using Mauve software to identify 78 conserved segments that are internally free from genome rearrangement. Based on the alignment, we applied Bayesian statistical methods to infer the phylogenetic inversion history of Yersinia. The sampling of genome arrangement reconstructions contains seven parsimonious tree topologies, each having different histories of 79 inversions. Topologies with a greater number of inversions also exist, but were sampled less frequently. The inversion phylogenies agree with results suggested by SNP patterns. We then analyzed reconstructed inversion histories to identify patterns of rearrangement. We confirm an over-representation of “symmetric inversions”—inversions with endpoints that are equally distant from the origin of chromosomal replication. Ancestral genome arrangements demonstrate moderate preference for replichore balance in Yersinia. We found that all inversions are shorter than expected under a neutral model, whereas inversions acting within a single replichore are much shorter than expected. We also found evidence for a canonical configuration of the origin and terminus of replication. Finally, breakpoint reuse analysis reveals that inversions with endpoints proximal to the origin of DNA replication are nearly three times more frequent. Our findings represent the first characterization of genome arrangement evolution in a bacterial population evolving outside laboratory conditions. Insight into the process of genomic rearrangement may further the understanding of pathogen population dynamics and selection on the architecture of circular bacterial chromosomes.


Molecular Biology and Evolution | 2009

Streamlining and Large Ancestral Genomes in Archaea Inferred with a Phylogenetic Birth-and-Death Model

Miklos Csuros; István Miklós

Homologous genes originate from a common ancestor through vertical inheritance, duplication, or horizontal gene transfer. Entire homolog families spawned by a single ancestral gene can be identified across multiple genomes based on protein sequence similarity. The sequences, however, do not always reveal conclusively the history of large families. To study the evolution of complete gene repertoires, we propose here a mathematical framework that does not rely on resolved gene family histories. We show that so-called phylogenetic profiles, formed by family sizes across multiple genomes, are sufficient to infer principal evolutionary trends. The main novelty in our approach is an efficient algorithm to compute the likelihood of a phylogenetic profile in a model of birth-and-death processes acting on a phylogeny. We examine known gene families in 28 archaeal genomes using a probabilistic model that involves lineage- and family-specific components of gene acquisition, duplication, and loss. The model enables us to consider all possible histories when inferring statistics about archaeal evolution. According to our reconstruction, most lineages are characterized by a net loss of gene families. Major increases in gene repertoire have occurred only a few times. Our reconstruction underlines the importance of persistent streamlining processes in shaping genome composition in Archaea. It also suggests that early archaeal genomes were as complex as typical modern ones, and even show signs, in the case of the methanogenic ancestor, of an extremely large gene repertoire.


Nucleic Acids Research | 2005

Statistical evidence for conserved, local secondary structure in the coding regions of eukaryotic mRNAs and pre-mRNAs

Irmtraud M. Meyer; István Miklós

Owing to the degeneracy of the genetic code, protein-coding regions of mRNA sequences can harbour more than only amino acid information. We search the mRNA sequences of 11 human protein-coding genes for evolutionarily conserved secondary structure elements using RNA-Decoder, a comparative secondary structure prediction program that is capable of explicitly taking the known protein-coding context of the mRNA sequences into account. We detect well-defined, conserved RNA secondary structure elements in the coding regions of the mRNA sequences and show that base-paired codons strongly correlate with sparse codons. We also investigate the role of repetitive elements in the formation of secondary structure and explain the use of alternate start codons in the caveolin-1 gene by a conserved secondary structure element overlapping the nominal start codon. We discuss the functional roles of our novel findings in regulating the gene expression on mRNA level. We also investigate the role of secondary structure on the correct splicing of the human CFTR gene. We study the wild-type version of the pre-mRNA as well as 29 variants with synonymous mutations in exon 12. By comparing our predicted secondary structures to the experimentally determined splicing efficiencies, we find with weak statistical significance that pre-mRNAs with high-splicing efficiencies have different predicted secondary structures than pre-mRNAs with low-splicing efficiencies.


Bioinformatics | 2008

StatAlign: An extendable software package for joint Bayesian estimation of alignments and evolutionary trees

Ádám Novák; István Miklós; Rune B. Lyngsø; Jotun Hein

MOTIVATION Bayesian analysis is one of the most popular methods in phylogenetic inference. The most commonly used methods fix a single multiple alignment and consider only substitutions as phylogenetically informative mutations, though alignments and phylogenies should be inferred jointly as insertions and deletions also carry informative signals. Methods addressing these issues have been developed only recently and there has not been so far a user-friendly program with a graphical interface that implements these methods. RESULTS We have developed an extendable software package in the Java programming language that samples from the joint posterior distribution of phylogenies, alignments and evolutionary parameters by applying the Markov chain Monte Carlo method. The package also offers tools for efficient on-the-fly summarization of the results. It has a graphical interface to configure, start and supervise the analysis, to track the status of the Markov chain and to save the results. The background model for insertions and deletions can be combined with any substitution model. It is easy to add new substitution models to the software package as plugins. The samples from the Markov chain can be summarized in several ways, and new postprocessing plugins may also be installed.


PLOS Computational Biology | 2007

SimulFold: Simultaneously Inferring RNA Structures Including Pseudoknots, Alignments, and Trees Using a Bayesian MCMC Framework

Irmtraud M. Meyer; István Miklós

Computational methods for predicting evolutionarily conserved rather than thermodynamic RNA structures have recently attracted increased interest. These methods are indispensable not only for elucidating the regulatory roles of known RNA transcripts, but also for predicting RNA genes. It has been notoriously difficult to devise them to make the best use of the available data and to predict high-quality RNA structures that may also contain pseudoknots. We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences. We also present an implementation of the framework in a new computer program, called SimulFold, which employs a Bayesian Markov chain Monte Carlo method to sample from the joint posterior distribution of RNA structures, alignments, and trees. We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs. We also present preliminary data that show SimulFolds potential as an alignment and phylogeny prediction method. SimulFold overcomes many conceptual limitations that current RNA structure prediction methods face, introduces several new theoretical techniques, and generates high-quality predictions of conserved RNA structures that may include pseudoknots. It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.


Ecology | 2002

RESEMBLANCE COEFFICIENTS AND THE HORSESHOE EFFECT IN PRINCIPAL COORDINATES ANALYSIS

János Podani; István Miklós

Although principal coordinates analysis is one of the most widely used or- dination methods in ecology, no study had been undertaken as yet on the combined effect of gradient type and resemblance coefficient on the results. We examine the performance of principal coordinates analysis with different choices of the resemblance function and different types of a single underlying gradient. Whereas unimodal species response to long gradients always leads to horseshoe (or arch)-shaped configurations in the first two dimensions, the converse is not true; curvilinear arrangements cannot generally be explained by the Gaussian model. Several resemblance coefficients widely used in ecology produce paradoxical arches from perfectly linear data. Species richness changes alone may also lead to a horseshoe for even more distance functions, with the noted exception of Manhattan metric. The appearance of arches is a mathematical necessity in these cases; true artifacts are introduced only if distances are treated inappropriately before eigenanalysis. Examples illustrate that similar configurations (curves and even circles) may arise from very different data structures; there- fore the shape of the point scatter is insufficient by itself to identify background ecological phenomena. The horseshoe effect may be diminished and eigenvalue extraction may be made more efficient if input measures are raised to high powers; but this operation is recommended only in combination with standard analyses, as part of a comparative approach. We derive a new distance function, implying standardization by species totals, from the chi-square dis- tance. We found that this function improves gradient recovery when there is unimodal species response and some species have their optima outside the range of study.


research in computational molecular biology | 2006

A probabilistic model for gene content evolution with duplication, loss, and horizontal transfer

Miklós Csűrös; István Miklós

We introduce a Markov model for the evolution of a gene family along a phylogeny. The model includes parameters for the rates of horizontal gene transfer, gene duplication, and gene loss, in addition to branch lengths in the phylogeny. The likelihood for the changes in the size of a gene family across different organisms can be calculated in O(N+hM2) time and O(N+M2) space, where N is the number of organisms, h is the height of the phylogeny, and M is the sum of family sizes. We apply the model to the evolution of gene content in Proteobacteria using the gene families in the COG (Clusters of Orthologous Groups) database.


Journal of Computational Biology | 2003

An Efficient Algorithm for Statistical Multiple Alignment on Arbitrary Phylogenetic Trees

Gerton Lunter; István Miklós; Yun S. Song; Jotun Hein

We present an efficient algorithm for statistical multiple alignment based on the TKF91 model of Thorne, Kishino, and Felsenstein (1991) on an arbitrary k-leaved phylogenetic tree. The existing algorithms use a hidden Markov model approach, which requires at least O( radical 5(k)) states and leads to a time complexity of O(5(k)L(k)), where L is the geometric mean sequence length. Using a combinatorial technique reminiscent of inclusion/exclusion, we are able to sum away the states, thus improving the time complexity to O(2(k)L(k)) and considerably reducing memory requirements. This makes statistical multiple alignment under the TKF91 model a definite practical possibility in the case of a phylogenetic tree with a modest number of leaves.


Journal of Physics A | 2009

Degree-based graph construction

Hyunju Kim; Zoltán Toroczkai; P. Erdős; István Miklós; László A. Székely

Degree-based graph construction is a ubiquitous problem in network modelling (Newman et al 2006 The Structure and Dynamics of Networks (Princeton Studies in Complexity) (Princeton, NJ: Princeton University Press), Boccaletti et al 2006 Phys. Rep. 424 175), ranging from social sciences to chemical compounds and biochemical reaction networks in the cell. This problem includes existence, enumeration, exhaustive construction and sampling questions with aspects that are still open today. Here we give necessary and sufficient conditions for a sequence of nonnegative integers to be realized as a simple graphs degree sequence, such that a given (but otherwise arbitrary) set of connections from an arbitrarily given node is avoided. We then use this result to present a swap-free algorithm that builds all simple graphs realizing a given degree sequence. In a wider context, we show that our result provides a greedy construction method to build all the f-factor subgraphs (Tutte 1952 Can. J. Math. 4 314) embedded within Kn Sk, where Kn is the complete graph and Sk is a star graph centred on one of the nodes.

Collaboration


Dive into the István Miklós's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

P. Erdős

Alfréd Rényi Institute of Mathematics

View shared research outputs
Top Co-Authors

Avatar

Gerton Lunter

Wellcome Trust Centre for Human Genetics

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

János Podani

Eötvös Loránd University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Irmtraud M Meyer

European Bioinformatics Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge