Georgy P. Karev
National Institutes of Health
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Georgy P. Karev.
Nature | 2002
Eugene V. Koonin; Yuri I. Wolf; Georgy P. Karev
Despite the practically unlimited number of possible protein sequences, the number of basic shapes in which proteins fold seems not only to be finite, but also to be relatively small, with probably no more than 10,000 folds in existence. Moreover, the distribution of proteins among these folds is highly non-homogeneous — some folds and superfamilies are extremely abundant, but most are rare. Protein folds and families encoded in diverse genomes show similar size distributions with notable mathematical properties, which also extend to the number of connections between domains in multidomain proteins. All these distributions follow asymptotic power laws, such as have been identified in a wide variety of biological and physical systems, and which are typically associated with scale-free networks. These findings suggest that genome evolution is driven by extremely general mechanisms based on the preferential attachment principle.
BMC Evolutionary Biology | 2002
Georgy P. Karev; Yuri I. Wolf; Andrey Rzhetsky; Faina S. Berezovskaya; Eugene V. Koonin
BackgroundPower distributions appear in numerous biological, physical and other contexts, which appear to be fundamentally different. In biology, power laws have been claimed to describe the distributions of the connections of enzymes and metabolites in metabolic networks, the number of interactions partners of a given protein, the number of members in paralogous families, and other quantities. In network analysis, power laws imply evolution of the network with preferential attachment, i.e. a greater likelihood of nodes being added to pre-existing hubs. Exploration of different types of evolutionary models in an attempt to determine which of them lead to power law distributions has the potential of revealing non-trivial aspects of genome evolution.ResultsA simple model of evolution of the domain composition of proteomes was developed, with the following elementary processes: i) domain birth (duplication with divergence), ii) death (inactivation and/or deletion), and iii) innovation (emergence from non-coding or non-globular sequences or acquisition via horizontal gene transfer). This formalism can be described as a b irth, d eath and i nnovation m odel (BDIM). The formulas for equilibrium frequencies of domain families of different size and the total number of families at equilibrium are derived for a general BDIM. All asymptotics of equilibrium frequencies of domain families possible for the given type of models are found and their appearance depending on model parameters is investigated. It is proved that the power law asymptotics appears if, and only if, the model is balanced, i.e. domain duplication and deletion rates are asymptotically equal up to the second order. It is further proved that any power asymptotic with the degree not equal to -1 can appear only if the hypothesis of independence of the duplication/deletion rates on the size of a domain family is rejected. Specific cases of BDIMs, namely simple, linear, polynomial and rational models, are considered in details and the distributions of the equilibrium frequencies of domain families of different size are determined for each case. We apply the BDIM formalism to the analysis of the domain family size distributions in prokaryotic and eukaryotic proteomes and show an excellent fit between these empirical data and a particular form of the model, the second-order balanced linear BDIM. Calculation of the parameters of these models suggests surprisingly high innovation rates, comparable to the total domain birth (duplication) and elimination rates, particularly for prokaryotic genomes.ConclusionsWe show that a straightforward model of genome evolution, which does not explicitly include selection, is sufficient to explain the observed distributions of domain family sizes, in which power laws appear as asymptotic. However, for the model to be compatible with the data, there has to be a precise balance between domain birth, death and innovation rates, and this is likely to be maintained by selection. The developed approach is oriented at a mathematical description of evolution of domain composition of proteomes, but a simple reformulation could be applied to models of other evolving networks with preferential attachment.
Proceedings of the National Academy of Sciences of the United States of America | 2009
Yuri I. Wolf; Pavel S. Novichkov; Georgy P. Karev; Eugene V. Koonin; David J. Lipman
The evolutionary rates of protein-coding genes in an organism span, approximately, 3 orders of magnitude and show a universal, approximately log-normal distribution in a broad variety of species from prokaryotes to mammals. This universal distribution implies a steady-state process, with identical distributions of evolutionary rates among genes that are gained and genes that are lost. A mathematical model of such process is developed under the single assumption of the constancy of the distributions of the propensities for gene loss (PGL). This model predicts that genes of different ages, that is, genes with homologs detectable at different phylogenetic depths, substantially differ in those variables that correlate with PGL. We computationally partition protein-coding genes from humans, flies, and Aspergillus fungus into age classes, and show that genes of different ages retain the universal log-normal distribution of evolutionary rates, with a shift toward higher rates in “younger” classes but also with a substantial overlap. The only exception involves human primate-specific genes that show a heavy tail of rapidly evolving genes, probably owing to gene annotation artifacts. As predicted, the gene age classes differ in characteristics correlated with PGL. Compared with “young” genes (e.g., mammal-specific human ones), “old” genes (e.g., eukaryote-specific), on average, are longer, are expressed at a higher level, possess a higher intron density, evolve slower on the short time scale, and are subject to stronger purifying selection. Thus, genome evolution fits a simple model with approximately uniform rates of gene gain and loss, without major bursts of genomic innovation.
Biology Direct | 2006
Artem S. Novozhilov; Faina S. Berezovskaya; Eugene V. Koonin; Georgy P. Karev
BackgroundOncolytic viruses that specifically target tumor cells are promising anti-cancer therapeutic agents. The interaction between an oncolytic virus and tumor cells is amenable to mathematical modeling using adaptations of techniques employed previously for modeling other types of virus-cell interaction.ResultsA complete parametric analysis of dynamic regimes of a conceptual model of anti-tumor virus therapy is presented. The role and limitations of mass-action kinetics are discussed. A functional response, which is a function of the ratio of uninfected to infected tumor cells, is proposed to describe the spread of the virus infection in the tumor. One of the main mathematical features of ratio-dependent models is that the origin is a complicated equilibrium point whose characteristics determine the main properties of the model. It is shown that, in a certain area of parameter values, the trajectories of the model form a family of homoclinics to the origin (so-called elliptic sector). Biologically, this means that both infected and uninfected tumor cells can be eliminated with time, and complete recovery is possible as a result of the virus therapy within the framework of deterministic models.ConclusionOur model, in contrast to the previously published models of oncolytic virus-tumor interaction, exhibits all possible outcomes of oncolytic virus infection, i.e., no effect on the tumor, stabilization or reduction of the tumor load, and complete elimination of the tumor. The parameter values that result in tumor elimination, which is, obviously, the desired outcome, are compatible with some of the available experimental data.ReviewersThis article was reviewed by Mikhail Blagosklonny, David Krakauer, Erik Van Nimwegen, and Ned Wingreen.
Archive | 2006
Eugene V. Koonin; Yuri I. Wolf; Georgy P. Karev; Eivind Almaas; Albert-László Barabási; K. I. Goh; B. Kahng; Doochul Kim; Sergei Maslov; Kim Sneppen; Andreas Wagner; J. S. Bader; Nikolay V. Dokholyan; Eugene I. Shakhnovich; T. G. Dewey; David J. Galas; Sergey V. Buldyrev; Michael Kamal; S. Rackovsky; Pau Fernández; Ricard V. Solé; Itai Yanai; Erik van Nimwegen
Power Laws in Biological Networks.- Graphical Analysis of Biocomplex Networks and Transport Phenomena.- Large-Scale Topological Properties of Molecular Networks.- The Connectivity of Large Genetic Networks.- The Drosophila Protein Interaction Network May Be neither Power-Law nor Scale-Free.- Birth and Death Models of Genome Evolution.- Scale-Free Evolution.- Gene Regulatory Networks.- Power Law Correlations in DNA Sequences.- Analytical Evolutionary Model for Protein Fold Occurrence in Genomes, Accounting for the Effects of Gene Duplication, Deletion, Acquisition and Selective Pressure.- The Protein Universes.- The Role of Computation in Complex Regulatory Networks.- Neutrality and Selection in the Evolution of Gene Families.- Scaling Laws in the Functional Content of Genomes.
BMC Evolutionary Biology | 2004
Georgy P. Karev; Yuri I. Wolf; Faina S. Berezovskaya; Eugene V. Koonin
BackgroundThe size distribution of gene families in a broad range of genomes is well approximated by a generalized Pareto function. Evolution of ensembles of gene families can be described with Birth, Death, and Innovation Models (BDIMs). Analysis of the properties of different versions of BDIMs has the potential of revealing important features of genome evolution.ResultsIn this work, we extend our previous analysis of stochastic BDIMs.In addition to the previously examined rational BDIMs, we introduce potentially more realistic logistic BDIMs, in which birth/death rates are limited for the largest families, and show that their properties are similar to those of models that include no such limitation. We show that the mean time required for the formation of the largest gene families detected in eukaryotic genomes is limited by the mean number of duplications per gene and does not increase indefinitely with the model degree. Instead, this time reaches a minimum value, which corresponds to a non-linear rational BDIM with the degree of approximately 2.7. Even for this BDIM, the mean time of the largest family formation is orders of magnitude greater than any realistic estimates based on the timescale of lifes evolution. We employed the embedding chains technique to estimate the expected number of elementary evolutionary events (gene duplications and deletions) preceding the formation of gene families of the observed size and found that the mean number of events exceeds the family size by orders of magnitude, suggesting a highly dynamic process of genome evolution. The variance of the time required for the formation of the largest families was found to be extremely large, with the coefficient of variation >> 1. This indicates that some gene families might grow much faster than the mean rate such that the minimal time required for family formation is more relevant for a realistic representation of genome evolution than the mean time. We determined this minimal time using Monte Carlo simulations of family growth from an ensemble of simultaneously evolving singletons. In these simulations, the time elapsed before the formation of the largest family was much shorter than the estimated mean time and was compatible with the timescale of evolution of eukaryotes.ConclusionsThe analysis of stochastic BDIMs presented here shows that non-linear versions of such models can well approximate not only the size distribution of gene families but also the dynamics of their formation during genome evolution. The fact that only higher degree BDIMs are compatible with the observed characteristics of genome evolution suggests that the growth of gene families is self-accelerating, which might reflect differential selective pressure acting on different genes.
Biology Direct | 2006
Georgy P. Karev; Artem S. Novozhilov; Eugene V. Koonin
Background:One of the mechanisms that ensure cancer robustness is tumor heterogeneity, and its effects on tumor cells dynamics have to be taken into account when studying cancer progression. There is no unifying theoretical framework in mathematical modeling of carcinogenesis that would account for parametric heterogeneity.Results:Here we formulate a modeling approach that naturally takes stock of inherent cancer cell heterogeneity and illustrate it with a model of interaction between a tumor and an oncolytic virus. We show that several phenomena that are absent in homogeneous models, such as cancer recurrence, tumor dormancy, and others, appear in heterogeneous setting. We also demonstrate that, within the applied modeling framework, to overcome the adverse effect of tumor cell heterogeneity on the outcome of cancer treatment, a heterogeneous population of an oncolytic virus must be used. Heterogeneity in parameters of the model, such as tumor cell susceptibility to virus infection and the ability of an oncolytic virus to infect tumor cells, can lead to complex, irregular evolution of the tumor. Thus, quasi-chaotic behavior of the tumor-virus system can be caused not only by random perturbations but also by the heterogeneity of the tumor and the virus.Conclusion:The modeling approach described here reveals the importance of tumor cell and virus heterogeneity for the outcome of cancer therapy. It should be straightforward to apply these techniques to mathematical modeling of other types of anticancer therapy.Reviewers:Leonid Hanin (nominated by Arcady Mushegian), Natalia Komarova (nominated by Orly Alter), and David Krakauer.
Ecological Modelling | 2003
Georgy P. Karev
It is shown that some rather different formulas of forest stand self-thinning can be considered as solutions of unique inhomogeneous Malthus extinction model with corresponding initial distributions of the mortality rate of trees. The results allow hypothesising that the distribution of mortality rate is specific for different tree species. Developed theory suggests new methods to improve existing and to derive new self-thinning formulas using appropriate distributions of Malthusian parameter.
Mathematical Medicine and Biology-a Journal of The Ima | 2011
Georgy P. Karev; Artem S. Novozhilov; Faina S. Berezovskaya
Selection systems and the corresponding replicator equations model the evolution of replicators with a high level of abstraction. In this paper, we apply novel methods of analysis of selection systems to the replicator equations. To be suitable for the suggested algorithm, the interaction matrix of the replicator equation should be transformed; in particular, the standard singular value decomposition allows us to rewrite the replicator equation in a convenient form. The original n-dimensional problem is reduced to the analysis of asymptotic behaviour of the solutions to the so-called escort system, which in some important cases can be of significantly smaller dimension than the original system. The Newton diagram methods are applied to study the asymptotic behaviour of the solutions to the escort system, when interaction matrix has Rank 1 or 2. A general replicator equation with the interaction matrix of Rank 1 is fully analysed; the conditions are provided when the asymptotic state is a polymorphic equilibrium. As an example of the system with the interaction matrix of Rank 2, we consider the problem from Adams & Sornborger (2007, Analysis of a certain class of replicator equations. J. Math. Biol., 54, 357–384), for which we show, for an arbitrary dimension of the system and under some suitable conditions, that generically one globally stable equilibrium exits on the 1-skeleton of the simplex.
Bioinformatics | 2005
Georgy P. Karev; Faina S. Berezovskaya; Eugene V. Koonin
MOTIVATION In our previous studies, we developed discrete-space birth, death and innovation models (BDIMs) of genome evolution. These models explain the origin of the characteristic Pareto distribution of paralogous gene family sizes in genomes, and model parameters that provide for the evolution of these distributions within a realistic time frame have been identified. However, extracting the temporal dynamics of genome evolution from discrete-space BDIM was not technically feasible. We were interested in obtaining dynamic portraits of the genome evolution process by developing a diffusion approximation of BDIM. RESULTS The diffusion version of BDIM belongs to a class of continuous-state models whose dynamics is described by the Fokker-Plank equation and the stationary solution could be any specified Pareto function. The diffusion models have time-dependent solutions of a special kind, namely, generalized self-similar solutions, which describe the transition from one stationary distribution of the system to another; this provides for the possibility of examining the temporal dynamics of genome evolution. Analysis of the generalized self-similar solutions of the diffusion BDIM reveals a biphasic curve of genome growth in which the initial, relatively short, self-accelerating phase is followed by a prolonged phase of slow deceleration. This evolutionary dynamics was observed both when genome growth started from zero and proceeded via innovation (a potential model of primordial evolution), and when evolution proceeded from one stationary state to another. In biological terms, this regime of evolution can be tentatively interpreted as a punctuated-equilibrium-like phenomenon whereby evolutionary transitions are accompanied by rapid gene amplification and innovation, followed by slow relaxation to a new stationary state.