Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Vsevolod J. Makeev is active.

Publication


Featured researches published by Vsevolod J. Makeev.


Nature Biotechnology | 2005

Assessing computational tools for the discovery of transcription factor binding sites

Martin Tompa; Nan Li; Timothy L. Bailey; George M. Church; Bart De Moor; Eleazar Eskin; Alexander V. Favorov; Martin C. Frith; Yutao Fu; W. James Kent; Vsevolod J. Makeev; Andrei A. Mironov; William Stafford Noble; Giulio Pavesi; Mireille Régnier; Nicolas Simonis; Saurabh Sinha; Gert Thijs; Jacques van Helden; Mathias Vandenbogaert; Zhiping Weng; Christopher T. Workman; Chun Ye; Zhou Zhu

The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.


Journal of Virology | 2004

Identification of Proteins Associated with Murine Cytomegalovirus Virions

Lisa M. Kattenhorn; Ryan Mills; Markus Wagner; Alexandre Lomsadze; Vsevolod J. Makeev; Mark Borodovsky; Hidde L. Ploegh; Benedikt M. Kessler

ABSTRACT Proteins associated with the murine cytomegalovirus (MCMV) viral particle were identified by a combined approach of proteomic and genomic methods. Purified MCMV virions were dissociated by complete denaturation and subjected to either separation by sodium dodecyl sulfate-polyacrylamide gel electrophoresis and in-gel digestion or treated directly by in-solution tryptic digestion. Peptides were separated by nanoflow liquid chromatography and analyzed by tandem mass spectrometry (LC-MS/MS). The MS/MS spectra obtained were searched against a database of MCMV open reading frames (ORFs) predicted to be protein coding by an MCMV-specific version of the gene prediction algorithm GeneMarkS. We identified 38 proteins from the capsid, tegument, glycoprotein, replication, and immunomodulatory protein families, as well as 20 genes of unknown function. Observed irregularities in coding potential suggested possible sequence errors in the 3′-proximal ends of m20 and M31. These errors were experimentally confirmed by sequencing analysis. The MS data further indicated the presence of peptides derived from the unannotated ORFs ORFc225441-226898 (m166.5) and ORF105932-106072. Immunoblot experiments confirmed expression of m166.5 during viral infection.


BMC Genomics | 2011

De novo sequencing and characterization of floral transcriptome in two species of buckwheat ( Fagopyrum )

Maria D. Logacheva; Artem S. Kasianov; Dmitriy V. Vinogradov; Tagir H. Samigullin; Mikhail S. Gelfand; Vsevolod J. Makeev; Aleksey A. Penin

BackgroundTranscriptome sequencing data has become an integral component of modern genetics, genomics and evolutionary biology. However, despite advances in the technologies of DNA sequencing, such data are lacking for many groups of living organisms, in particular, many plant taxa. We present here the results of transcriptome sequencing for two closely related plant species. These species, Fagopyrum esculentum and F. tataricum, belong to the order Caryophyllales - a large group of flowering plants with uncertain evolutionary relationships. F. esculentum (common buckwheat) is also an important food crop. Despite these practical and evolutionary considerations Fagopyrum species have not been the subject of large-scale sequencing projects.ResultsNormalized cDNA corresponding to genes expressed in flowers and inflorescences of F. esculentum and F. tataricum was sequenced using the 454 pyrosequencing technology. This resulted in 267 (for F. esculentum) and 229 (F. tataricum) thousands of reads with average length of 341-349 nucleotides. De novo assembly of the reads produced about 25 thousands of contigs for each species, with 7.5-8.2× coverage. Comparative analysis of two transcriptomes demonstrated their overall similarity but also revealed genes that are presumably differentially expressed. Among them are retrotransposon genes and genes involved in sugar biosynthesis and metabolism. Thirteen single-copy genes were used for phylogenetic analysis; the resulting trees are largely consistent with those inferred from multigenic plastid datasets. The sister relationships of the Caryophyllales and asterids now gained high support from nuclear gene sequences.Conclusions454 transcriptome sequencing and de novo assembly was performed for two congeneric flowering plant species, F. esculentum and F. tataricum. As a result, a large set of cDNA sequences that represent orthologs of known plant genes as well as potential new genes was generated.


Nucleic Acids Research | 2013

HOCOMOCO: a comprehensive collection of human transcription factor binding sites models

Ivan V. Kulakovskiy; Yulia A. Medvedeva; Ulf Schaefer; Artem S. Kasianov; Ilya E. Vorontsov; Vladimir B. Bajic; Vsevolod J. Makeev

Transcription factor (TF) binding site (TFBS) models are crucial for computational reconstruction of transcription regulatory networks. In existing repositories, a TF often has several models (also called binding profiles or motifs), obtained from different experimental data. Having a single TFBS model for a TF is more pragmatic for practical applications. We show that integration of TFBS data from various types of experiments into a single model typically results in the improved model quality probably due to partial correction of source specific technique bias. We present the Homo sapiens comprehensive model collection (HOCOMOCO, http://autosome.ru/HOCOMOCO/, http://cbrc.kaust.edu.sa/hocomoco/) containing carefully hand-curated TFBS models constructed by integration of binding sequences obtained by both low- and high-throughput methods. To construct position weight matrices to represent these TFBS models, we used ChIPMunk software in four computational modes, including newly developed periodic positional prior mode associated with DNA helix pitch. We selected only one TFBS model per TF, unless there was a clear experimental evidence for two rather distinct TFBS models. We assigned a quality rating to each model. HOCOMOCO contains 426 systematically curated TFBS models for 401 human TFs, where 172 models are based on more than one data source.


Bioinformatics | 2010

Deep and wide digging for binding motifs in ChIP-Seq data

Ivan V. Kulakovskiy; Valentina Boeva; Alexander V. Favorov; Vsevolod J. Makeev

SUMMARY ChIP-Seq data are a new challenge for motif discovery. Such a data typically consists of thousands of DNA segments with base-specific coverage values. We present a new version of our DNA motif discovery software ChIPMunk adapted for ChIP-Seq data. ChIPMunk is an iterative algorithm that combines greedy optimization with bootstrapping and uses coverage profiles as motif positional preferences. ChIPMunk does not require truncation of long DNA segments and it is practical for processing up to tens of thousands of data sequences. Comparison with traditional (MEME) or ChIP-Seq-oriented (HMS) motif discovery tools shows that ChIPMunk identifies the correct motifs with the same or better quality but works dramatically faster. AVAILABILITY AND IMPLEMENTATION ChIPMunk is freely available within the ru_genetika Java package: http://line.imb.ac.ru/ChIPMunk. Web-based version is also available. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Bioinformatics | 2005

A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length

Alexander V. Favorov; Mikhail S. Gelfand; Anna V. Gerasimova; Dmitry A. Ravcheev; Andrey A. Mironov; Vsevolod J. Makeev

MOTIVATION Transcription regulatory protein factors often bind DNA as homo-dimers or hetero-dimers. Thus they recognize structured DNA motifs that are inverted or direct repeats or spaced motif pairs. However, these motifs are often difficult to identify owing to their high divergence. The motif structure included explicitly into the motif recognition algorithm improves recognition efficiency for highly divergent motifs as well as estimation of motif geometric parameters. RESULT We present a modification of the Gibbs sampling motif extraction algorithm, SeSiMCMC (Sequence Similarities by Markov Chain Monte Carlo), which finds structured motifs of these types, as well as non-structured motifs, in a set of unaligned DNA sequences. It employs improved estimators of motif and spacer lengths. The probability that a sequence does not contain any motif is accounted for in a rigorous Bayesian manner. We have applied the algorithm to a set of upstream regions of genes from two Escherichia coli regulons involved in respiration. We have demonstrated that accounting for a symmetric motif structure allows the algorithm to identify weak motifs more accurately. In the examples studied, ArcA binding sites were demonstrated to have the structure of a direct spaced repeat, whereas NarP binding sites exhibited the palindromic structure. AVAILABILITY The WWW interface of the program, its FreeBSD (4.0) and Windows 32 console executables are available at http://bioinform.genetika.ru/SeSiMCMC


Nucleic Acids Research | 2016

HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models

Ivan V. Kulakovskiy; Ilya E. Vorontsov; Ivan S. Yevshin; Anastasiia V. Soboleva; Artem S. Kasianov; Haitham Ashoor; Wail Ba-alawi; Vladimir B. Bajic; Yulia A. Medvedeva; Fedor A. Kolpakov; Vsevolod J. Makeev

Models of transcription factor (TF) binding sites provide a basis for a wide spectrum of studies in regulatory genomics, from reconstruction of regulatory networks to functional annotation of transcripts and sequence variants. While TFs may recognize different sequence patterns in different conditions, it is pragmatic to have a single generic model for each particular TF as a baseline for practical applications. Here we present the expanded and enhanced version of HOCOMOCO (http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco10), the collection of models of DNA patterns, recognized by transcription factors. HOCOMOCO now provides position weight matrix (PWM) models for binding sites of 601 human TFs and, in addition, PWMs for 396 mouse TFs. Furthermore, we introduce the largest up to date collection of dinucleotide PWM models for 86 (52) human (mouse) TFs. The update is based on the analysis of massive ChIP-Seq and HT-SELEX datasets, with the validation of the resulting models on in vivo data. To facilitate a practical application, all HOCOMOCO models are linked to gene and protein databases (Entrez Gene, HGNC, UniProt) and accompanied by precomputed score thresholds. Finally, we provide command-line tools for PWM and diPWM threshold estimation and motif finding in nucleotide sequences.


PLOS Computational Biology | 2012

Exploring Massive, Genome Scale Datasets with the GenometriCorr Package

Alexander V. Favorov; Loris Mularoni; Leslie Cope; Yulia A. Medvedeva; Andrey A. Mironov; Vsevolod J. Makeev; Sarah J. Wheelan

We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor.


Bioinformatics | 2006

Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression

Valentina Boeva; Mireille Régnier; Dmitri Papatsenko; Vsevolod J. Makeev

MOTIVATION Genomic sequences are highly redundant and contain many types of repetitive DNA. Fuzzy tandem repeats (FTRs) are of particular interest. They are found in regulatory regions of eukaryotic genes and are reported to interact with transcription factors. However, accurate assessment of FTR occurrences in different genome segments requires specific algorithm for efficient FTR identification and classification. RESULTS We have obtained formulas for P-values of FTR occurrence and developed an FTR identification algorithm implemented in TandemSWAN software. Using TandemSWAN we compared the structure and the occurrence of FTRs with short period length (up to 24 bp) in coding and non-coding regions including UTRs, heterochromatic, intergenic and enhancer sequences of Drosophila melanogaster and Drosophila pseudoobscura. Tandems with period three and its multiples were found in coding segments, whereas FTRs with periods multiple of six are overrepresented in all non-coding segment. Periods equal to 5-7 and 11-14 were characteristic of the enhancer regions and other non-coding regions close to genes. AVAILABILITY TandemSWAN web page, stand-alone version and documentation can be found at http://bioinform.genetika.ru/projects/swan/www/ SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Algorithms for Molecular Biology | 2007

Exact p-value calculation for heterotypic clusters of regulatory motifs and its application in computational annotation of cis-regulatory modules

Valentina Boeva; Julien Clement; Mireille Régnier; Mikhail A. Roytberg; Vsevolod J. Makeev

Backgroundcis-Regulatory modules (CRMs) of eukaryotic genes often contain multiple binding sites for transcription factors. The phenomenon that binding sites form clusters in CRMs is exploited in many algorithms to locate CRMs in a genome. This gives rise to the problem of calculating the statistical significance of the event that multiple sites, recognized by different factors, would be found simultaneously in a text of a fixed length. The main difficulty comes from overlapping occurrences of motifs. So far, no tools have been developed allowing the computation of p-values for simultaneous occurrences of different motifs which can overlap.ResultsWe developed and implemented an algorithm computing the p-value that s different motifs occur respectively k1, ..., ksor more times, possibly overlapping, in a random text. Motifs can be represented with a majority of popular motif models, but in all cases, without indels. Zero or first order Markov chains can be adopted as a model for the random text. The computational tool was tested on the set of cis-regulatory modules involved in D. melanogaster early development, for which there exists an annotation of binding sites for transcription factors. Our test allowed us to correctly identify transcription factors cooperatively/competitively binding to DNA.MethodThe algorithm that precisely computes the probability of simultaneous motif occurrences is inspired by the Aho-Corasick automaton and employs a prefix tree together with a transition function. The algorithm runs with the O(n|Σ|(m|ℋMathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@| + K|σ|K) ∏iki) time complexity, where n is the length of the text, |Σ| is the alphabet size, m is the maximal motif length, |ℋMathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@| is the total number of words in motifs, K is the order of Markov model, and kiis the number of occurrences of the i th motif.ConclusionThe primary objective of the program is to assess the likelihood that a given DNA segment is CRM regulated with a known set of regulatory factors. In addition, the program can also be used to select the appropriate threshold for PWM scanning. Another application is assessing similarity of different motifs.AvailabilityProject web page, stand-alone version and documentation can be found at http://bioinform.genetika.ru/AhoPro/

Collaboration


Dive into the Vsevolod J. Makeev's collaboration.

Top Co-Authors

Avatar

Ivan V. Kulakovskiy

Engelhardt Institute of Molecular Biology

View shared research outputs
Top Co-Authors

Avatar

Ilya E. Vorontsov

Russian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Artem S. Kasianov

Russian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Yulia A. Medvedeva

Russian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Howard S. Kruth

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Michael Bukrinsky

George Washington University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xueting Jin

National Institutes of Health

View shared research outputs
Researchain Logo
Decentralizing Knowledge