Gregory Kucherov | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gregory Kucherov is active.

Explore More

Publication

Featured researches published by Gregory Kucherov.

foundations of computer science | 1999

Finding maximal repetitions in a word in linear time

Roman Kolpakov; Gregory Kucherov

A repetition in a word w is a subword with the period of at most half of the subword length. We study maximal repetitions occurring in w, that is those for which any extended subword of w has a bigger period. The set of such repetitions represents in a compact way all repetitions in w. We first prove a combinatorial result asserting that the sum of exponents of all maximal repetitions of a word of length n is bounded by a linear function in n. This implies, in particular that there is only a linear number of maximal repetitions in a word. This allows us to construct a linear-time algorithm for finding all maximal repetitions. Some consequences and applications of these results are discussed, as well as related works.

international conference on bioinformatics | 2005

YASS: enhancing the sensitivity of DNA similarity search

Laurent Noé; Gregory Kucherov

YASS is a DNA local alignment tool based on an efficient and sensitive filtering algorithm. It applies transition-constrained seeds to specify the most probable conserved motifs between homologous sequences, combined with a flexible hit criterion used to identify groups of seeds that are likely to exhibit significant alignments. A web interface () is available to upload input sequences in fasta format, query the program and visualize the results obtained in several forms (dot-plot, tabular output and others). A standalone version is available for download from the web page.

Nucleic Acids Research | 2007

NORINE: a database of nonribosomal peptides

Ségolène Caboche; Maude Pupin; Valérie Leclère; Arnaud Fontaine; Philippe Jacques; Gregory Kucherov

Norine is the first database entirely dedicated to nonribosomal peptides (NRPs). In bacteria and fungi, in addition to the traditional ribosomal proteic biosynthesis, an alternative ribosome-independent pathway called NRP synthesis allows peptide production. It is performed by huge protein complexes called nonribosomal peptide synthetases (NRPSs). The molecules synthesized by NRPS contain a high proportion of nonproteogenic amino acids. The primary structure of these peptides is not always linear but often more complex and may contain cycles and branchings. In recent years, NRPs attracted a lot of attention because of their biological activities and pharmacological properties (antibiotic, immunosuppressor, antitumor, etc.). However, few computational resources and tools dedicated to those peptides have been available so far. Norine is focused on NRPs and contains more than 700 entries. The database is freely accessible at http://bioinfo.lifl.fr/norine/. It provides a complete computational tool for systematic study of NRPs in numerous species, and as such, should permit to obtain a better knowledge of these metabolic products and underlying biological mechanisms, and ultimately to contribute to the redesigning of natural products in order to obtain new bioactive compounds for drug discovery.

fundamentals of computation theory | 1999

On Maximal Repetitions in Words

Roman Kolpakov; Gregory Kucherov

A (fractional) repetition in a word w is a subword with the period of at most half of the subword length. We study maximal repetitions occurring in w, that is those for which any extended subword of w has a bigger period. The set of such repetitions represents in a compact way all repetitions in w. We first study maximal repetitions in Fibonacci words - we count their exact number, and estimate the sum of their exponents. These quantities turn out to be linearly-bounded in the length of the word. We then prove that the maximal number of maximal repetitions in general words (on arbitrary alphabet) of length n is linearly-bounded in n, and we mention some applications and consequences of this result.

Journal of Bioinformatics and Computational Biology | 2006

A UNIFYING FRAMEWORK FOR SEED SENSITIVITY AND ITS APPLICATION TO SUBSET SEEDS

Gregory Kucherov; Laurent Noé; Mikhail A. Roytberg

We propose a general approach to compute the seed sensitivity, that can be applied to different definitions of seeds. It treats separately three components of the seed sensitivity problem--a set of target alignments, an associated probability distribution, and a seed model--that are specified by distinct finite automata. The approach is then applied to a new concept of subset seeds for which we propose an efficient automaton construction. Experimental results confirm that sensitive subset seeds can be efficiently designed using our approach, and can then be used in similarity search producing better results than ordinary spaced seeds.

BMC Bioinformatics | 2004

Improved hit criteria for DNA local alignment

Laurent Noé; Gregory Kucherov

BackgroundThe hit criterion is a key component of heuristic local alignment algorithms. It specifies a class of patterns assumed to witness a potential similarity, and this choice is decisive for the selectivity and sensitivity of the whole method.ResultsIn this paper, we propose two ways to improve the hit criterion. First, we define the group criterion combining the advantages of the single-seed and double-seed approaches used in existing algorithms. Second, we introduce transition-constrained seeds that extend spaced seeds by the possibility of distinguishing transition and transversion mismatches. We provide analytical data as well as experimental results, obtained with the YASS software, supporting both improvements.ConclusionsProposed algorithmic ideas allow to obtain a significant gain in sensitivity of similarity search without increase in execution time. The method has been implemented in YASS software available at http://www.loria.fr/projects/YASS/.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2005

Multiseed Lossless Filtration

Gregory Kucherov; Laurent Noé; Mikhail A. Roytberg

We study a method of seed-based lossless filtration for approximate string matching and related bioinformatics applications. The method is based on a simultaneous use of several spaced seeds rather than a single seed as studied by Burkhardt and Karkkainen. We present algorithms to compute several important parameters of seed families, study their combinatorial properties, and describe several techniques to construct efficient families. We also report a large-scale application of the proposed technique to the problem of oligonucleotide selection for an EST sequence database.

Journal of Bacteriology | 2010

Diversity of Monomers in Nonribosomal Peptides: towards the Prediction of Origin and Biological Activity

Ségolène Caboche; Valérie Leclère; Maude Pupin; Gregory Kucherov; Philippe Jacques

Nonribosomal peptides (NRPs) are molecules produced by microorganisms that have a broad spectrum of biological activities and pharmaceutical applications (e.g., antibiotic, immunomodulating, and antitumor activities). One particularity of the NRPs is the biodiversity of their monomers, extending far beyond the 20 proteogenic amino acid residues. Norine, a comprehensive database of NRPs, allowed us to review for the first time the main characteristics of the NRPs and especially their monomer biodiversity. Our analysis highlighted a significant similarity relationship between NRPs synthesized by bacteria and those isolated from metazoa, especially from sponges, supporting the hypothesis that some NRPs isolated from sponges are actually synthesized by symbiotic bacteria rather than by the sponges themselves. A comparison of peptide monomeric compositions as a function of biological activity showed that some monomers are specific to a class of activities. An analysis of the monomer compositions of peptide products predicted from genomic information (metagenomics and high-throughput genome sequencing) or of new peptides detected by mass spectrometry analysis applied to a culture supernatant can provide indications of the origin of a peptide and/or its biological activity.

Discrete Applied Mathematics | 1998

Reconstructing a Hamiltonian cycle by querying the graph: application to DNA physical mapping

Vladimir Grebinski; Gregory Kucherov

Abstract This paper studies four mathematical models of the multiplex PCR method of genome physical mapping described in Sorokin et al. (1996). The models are expressed as combinatorial group testing problems of finding an unknown Hamiltonian cycle in the complete graph by means of queries of different type. For each model, an efficient algorithm is proposed that matches asymptotically the information-theoretic lower bound.

european symposium on algorithms | 1997

Optimal Reconstruction of Graphs Under the Additive Model

Vladimir Grebinski; Gregory Kucherov

We study the problem of combinatorial search for graphs under the additive model. The main result concerns the reconstruction of bounded degree graphs, i.e. graphs with the degree of all vertices bounded by a constant d. We show that such graphs can be reconstructed in O(dn) non-adaptive queries, that matches the information-theoretic lower bound. The proof is based on the technique of separating matrices. In particular, a new upper bound is obtained for d-separating matrices, that settles an open question stated by Lindstrom in [17]. Finally, we consider several particular classes of graphs. We show how an optimal non-adaptive solution of O(n2/log n) queries for general graphs can be obtained.

Explore More