Is this you? Create Your Porfile

Victor V. Solovyev

Royal Holloway, University of London

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Victor V. Solovyev is active.

Explore More

Publication

Featured researches published by Victor V. Solovyev.

Nature | 2004

Community structure and metabolism through reconstruction of microbial genomes from the environment

Gene W. Tyson; Jarrod Chapman; Philip Hugenholtz; Eric E. Allen; Rachna J. Ram; Paul M. Richardson; Victor V. Solovyev; Edward M. Rubin; Daniel S. Rokhsar; Jillian F. Banfield

Microbial communities are vital in the functioning of all ecosystems; however, most microorganisms are uncultivated, and their roles in natural systems are unclear. Here, using random shotgun sequencing of DNA from a natural acidophilic biofilm, we report reconstruction of near-complete genomes of Leptospirillum group II and Ferroplasma type II, and partial recovery of three other genomes. This was possible because the biofilm was dominated by a small number of species populations and the frequency of genomic rearrangements and gene insertions or deletions was relatively low. Because each sequence read came from a different individual, we could determine that single-nucleotide polymorphisms are the predominant form of heterogeneity at the strain level. The Leptospirillum group II genome had remarkably few nucleotide polymorphisms, despite the existence of low-abundance variants. The Ferroplasma type II genome seems to be a composite from three ancestral strains that have undergone homologous recombination to form a large population of mosaic genomes. Analysis of the gene complement for each organism revealed the pathways for carbon and nitrogen fixation and energy generation, and provided insights into survival strategies in an extreme environment.

Genome Research | 2011

Assemblathon 1: A competitive assessment of de novo short read assembly methods

Dent Earl; Keith Bradnam; John St. John; Aaron E. Darling; Dawei Lin; Joseph Fass; Hung On Ken Yu; Vince Buffalo; Daniel R. Zerbino; Mark Diekhans; Ngan Nguyen; Pramila Ariyaratne; Wing-Kin Sung; Zemin Ning; Matthias Haimel; Jared T. Simpson; Nuno A. Fonseca; Inanc Birol; T. Roderick Docking; Isaac Ho; Daniel S. Rokhsar; Rayan Chikhi; Dominique Lavenier; Guillaume Chapuis; Delphine Naquin; Nicolas Maillet; Michael C. Schatz; David R. Kelley; Adam M. Phillippy; Sergey Koren

Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.

Genome Biology | 2006

Automatic annotation of eukaryotic genes, pseudogenes and promoters

Victor V. Solovyev; Peter Kosarev; Igor Seledsov; Denis Vorobyev

BackgroundThe ENCODE gene prediction workshop (EGASP) has been organized to evaluate how well state-of-the-art automatic gene finding methods are able to reproduce the manual and experimental gene annotation of the human genome. We have used Softberry gene finding software to predict genes, pseudogenes and promoters in 44 selected ENCODE sequences representing approximately 1% (30 Mb) of the human genome. Predictions of gene finding programs were evaluated in terms of their ability to reproduce the ENCODE-HAVANA annotation.ResultsThe Fgenesh++ gene prediction pipeline can identify 91% of coding nucleotides with a specificity of 90%. Our automatic pseudogene finder (PSF program) found 90% of the manually annotated pseudogenes and some new ones. The Fprom promoter prediction program identifies 80% of TATA promoters sequences with one false positive prediction per 2,000 base-pairs (bp) and 50% of TATA-less promoters with one false positive prediction per 650 bp. It can be used to identify transcription start sites upstream of annotated coding parts of genes found by gene prediction software.ConclusionWe review our software and underlying methods for identifying these three important structural and functional genome components and discuss the accuracy of predictions, recent advances and open problems in annotating genomic sequences. We have demonstrated that our methods can be effectively used for initial automatic annotation of the eukaryotic genome.

Nature | 2014

The ctenophore genome and the evolutionary origins of neural systems

Leonid L. Moroz; Kevin M. Kocot; Mathew R. Citarella; Sohn Dosung; Tigran P. Norekian; Inna S. Povolotskaya; Anastasia P. Grigorenko; Christopher A. Dailey; Eugene Berezikov; Katherine M. Buckley; Andrey Ptitsyn; Denis Reshetov; Krishanu Mukherjee; Tatiana P. Moroz; Yelena Bobkova; Fahong Yu; Vladimir V. Kapitonov; Jerzy Jurka; Yuri V. Bobkov; Joshua J. Swore; David Orion Girardo; Alexander Fodor; Fedor Gusev; Rachel Sanford; Rebecca Bruders; Ellen L. W. Kittler; Claudia E. Mills; Jonathan P. Rast; Romain Derelle; Victor V. Solovyev

The origins of neural systems remain unresolved. In contrast to other basal metazoans, ctenophores (comb jellies) have both complex nervous and mesoderm-derived muscular systems. These holoplanktonic predators also have sophisticated ciliated locomotion, behaviour and distinct development. Here we present the draft genome of Pleurobrachia bachei, Pacific sea gooseberry, together with ten other ctenophore transcriptomes, and show that they are remarkably distinct from other animal genomes in their content of neurogenic, immune and developmental genes. Our integrative analyses place Ctenophora as the earliest lineage within Metazoa. This hypothesis is supported by comparative analysis of multiple gene families, including the apparent absence of HOX genes, canonical microRNA machinery, and reduced immune complement in ctenophores. Although two distinct nervous systems are well recognized in ctenophores, many bilaterian neuron-specific genes and genes of ‘classical’ neurotransmitter pathways either are absent or, if present, are not expressed in neurons. Our metabolomic and physiological data are consistent with the hypothesis that ctenophore neural systems, and possibly muscle specification, evolved independently from those in other animals.

Cell | 1995

Expression of msl-2 causes assembly of dosage compensation regulators on the X chromosomes and female lethality in Drosophila.

Richard L. Kelley; Irina Solovyeva; Laura M. Lyman; Ron Richman; Victor V. Solovyev; Mitzi I. Kuroda

Male-specific lethal-2 (msl-2) is a RING finger protein that is required for X chromosome dosage compensation in Drosophila males. Consistent with the formation of a dosage compensation protein complex, msl-2 colocalizes with the other MSL proteins on the male X chromosome and coimmunoprecipitates with msl-1 from male larval extracts. Ectopic expression of msl-2 in females results in the appearance of the other MSL dosage compensation regulators on the female X chromosomes and decreased female viability. We suggest that msl-2 RNA is the primary target of SxI regulation in the dosage compensation pathway and present a speculative model for the regulation of two distinct modes of dosage compensation by SxI.

Nucleic Acids Research | 2003

PlantProm: a database of plant promoter sequences

Ilham A. Shahmuradov; Alexander Gammerman; John M. Hancock; Peter M. Bramley; Victor V. Solovyev

PlantProm DB, a plant promoter database, is an annotated, non-redundant collection of proximal promoter sequences for RNA polymerase II with experimentally determined transcription start site(s), TSS, from various plant species. The first release (2002.01) of PlantProm DB contains 305 entries including 71, 220 and 14 promoters from monocot, dicot and other plants, respectively. It provides DNA sequence of the promoter regions (-200 : +51) with TSS on the fixed position +201, taxonomic/promoter type classification of promoters and Nucleotide Frequency Matrices (NFM) for promoter elements: TATA-box, CCAAT-box and TSS-motif (Inr). Analysis of TSS-motifs revealed that their composition is different in dicots and monocots, as well as for TATA and TATA-less promoters. The database serves as learning set in developing plant promoter prediction programs. One such program (TSSP) based on discriminant analysis has been created by Softberry Inc. and the application of a support ftp: vector machine approach for promoter identification is under development. PlantProm DB is available at http://mendel.cs.rhul.ac.uk/ and http://www.softberry.com/.

Nature | 2004

The DNA sequence and biology of human chromosome 19

Jane Grimwood; Laurie Gordon; Anne S. Olsen; Astrid Terry; Jeremy Schmutz; Jane Lamerdin; Uffe Hellsten; David Goodstein; Olivier Couronne; Mary Tran-Gyamfi; Andrea Aerts; Michael R. Altherr; Linda Ashworth; Eva Bajorek; Stacey Black; Elbert Branscomb; Sean Caenepeel; Anthony Carrano; Yee Man Chan; Mari Christensen; Catherine A. Cleland; Alex Copeland; Eileen Dalin; Paramvir Dehal; Mirian Denys; John C. Detter; Julio Escobar; Dave Flowers; Dea Fotopulos; Carmen Garcia

Chromosome 19 has the highest gene density of all human chromosomes, more than double the genome-wide average. The large clustered gene families, corresponding high G + C content, CpG islands and density of repetitive DNA indicate a chromosome rich in biological and evolutionary significance. Here we describe 55.8 million base pairs of highly accurate finished sequence representing 99.9% of the euchromatin portion of the chromosome. Manual curation of gene loci reveals 1,461 protein-coding genes and 321 pseudogenes. Among these are genes directly implicated in mendelian disorders, including familial hypercholesterolaemia and insulin-resistant diabetes. Nearly one-quarter of these genes belong to tandemly arranged families, encompassing more than 25% of the chromosome. Comparative analyses show a fascinating picture of conservation and divergence, revealing large blocks of gene orthology with rodents, scattered regions with more recent gene family expansions and deletions, and segments of coding and non-coding conservation with the distant fish species Takifugu.

Nucleic Acids Research | 2001

SpliceDB: database of canonical and non-canonical mammalian splice sites

M. Burset; Igor A. Seledtsov; Victor V. Solovyev

A database (SpliceDB) of known mammalian splice site sequences has been developed. We extracted 43 337 splice pairs from mammalian divisions of the gene-centered Infogene database, including sites from incomplete or alternatively spliced genes. Known EST sequences supported 22 815 of them. After discarding sequences with putative errors and ambiguous location of splice junctions the verified dataset includes 22 489 entries. Of these, 98.71% contain canonical GT-AG junctions (22 199 entries) and 0.56% have non-canonical GC-AG splice site pairs. The remainder (0.73%) occurs in a lot of small groups (with a maximum size of 0.05%). We especially studied non-canonical splice sites, which comprise 3.73% of GenBank annotated splice pairs. EST alignments allowed us to verify only the exonic part of splice sites. To check the conservative dinucleotides we compared sequences of human non-canonical splice sites with sequences from the high throughput genome sequencing project (HTG). Out of 171 human non-canonical and EST-supported splice pairs, 156 (91.23%) had a clear match in the human HTG. They can be classified after sequence analysis as: 79 GC-AG pairs (of which one was an error that corrected to GC-AG), 61 errors corrected to GT-AG canonical pairs, six AT-AC pairs (of which two were errors corrected to AT-AC), one case was produced from a non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two other cases left of supported non-canonical splice pairs. The information about verified splice site sequences for canonical and non-canonical sites is presented in SpliceDB with the supporting evidence. We also built weight matrices for the major splice groups, which can be incorporated into gene prediction programs. SpliceDB is available at the computational genomic Web server of the Sanger Centre: http://genomic.sanger.ac. uk/spldb/SpliceDB.html and at http://www.softberry. com/spldb/SpliceDB.html.

Gene | 2000

A novel type of RNase III family proteins in eukaryotes.

Valery Filippov; Victor V. Solovyev; Maria Filippova; Sarjeet S. Gill

The RNase III family of double-stranded RNA-specific endonucleases is characterized by the presence of a highly conserved 9 amino acid stretch in their catalytic center known as the RNase III signature motif. We isolated the drosha gene, a new member of this family in Drosophila melanogaster. Characterization of this gene revealed the presence of two RNase III signature motifs in its sequence that may indicate that it is capable of forming an active catalytic center as a monomer. The drosha protein also contains an 825 amino acid N-terminus with an unknown function. A search for the known homologues of the drosha protein revealed that it has a similarity to two adjacent annotated genes identified during C. elegans genome sequencing. Analysis of the genomic region of these genes by the Fgenesh program and sequencing of the EST cDNA clone derived from it revealed that this region encodes only one gene. This newly identified gene in nematode genome shares a high similarity to Drosophila drosha throughout its entire protein sequence. A potential drosha homologue is also found among the deposited human cDNA sequences. A comparison of these drosha proteins to other members of the RNase III family indicates that they form a new group of proteins within this family.

Genome Biology | 2003

An integrated gene annotation and transcriptional profiling approach towards the full gene content of the Drosophila genome.

Marc Hild; B. Beckmann; Stefan A. Haas; Britta Koch; Victor V. Solovyev; C. Busold; Kurt Fellenberg; Michael Boutros; Martin Vingron; F. Sauer; Jörg D. Hoheisel; Renato Paro

BackgroundWhile the genome sequences for a variety of organisms are now available, the precise number of the genes encoded is still a matter of debate. For the human genome several stringent annotation approaches have resulted in the same number of potential genes, but a careful comparison revealed only limited overlap. This indicates that only the combination of different computational prediction methods and experimental evaluation of such in silico data will provide more complete genome annotations. In order to get a more complete gene content of the Drosophila melanogaster genome, we based our new D. melanogaster whole-transcriptome microarray, the Heidelberg FlyArray, on the combination of the Berkeley Drosophila Genome Project (BDGP) annotation and a novel ab initio gene prediction of lower stringency using the Fgenesh software.ResultsHere we provide evidence for the transcription of approximately 2,600 additional genes predicted by Fgenesh. Validation of the developmental profiling data by RT-PCR and in situ hybridization indicates a lower limit of 2,000 novel annotations, thus substantially raising the number of genes that make a fly.ConclusionsThe successful design and application of this novel Drosophila microarray on the basis of our integrated in silico/wet biology approach confirms our expectation that in silico approaches alone will always tend to be incomplete. The identification of at least 2,000 novel genes highlights the importance of gathering experimental evidence to discover all genes within a genome. Moreover, as such an approach is independent of homology criteria, it will allow the discovery of novel genes unrelated to known protein families or those that have not been strictly conserved between species.

Explore More