Volodymyr Kuleshov | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Volodymyr Kuleshov is active.

Explore More

Publication

Featured researches published by Volodymyr Kuleshov.

Nature Biotechnology | 2014

Whole-genome haplotyping using long reads and statistical methods

Volodymyr Kuleshov; Dan Xie; Rui Chen; Dmitry Pushkarev; Zhihai Ma; Tim Blauwkamp; Michael Kertesz; Michael Snyder

The rapid growth of sequencing technologies has greatly contributed to our understanding of human genetics. Yet, despite this growth, mainstream technologies have not been fully able to resolve the diploid nature of the human genome. Here we describe statistically aided, long-read haplotyping (SLRH), a rapid, accurate method that uses a statistical algorithm to take advantage of the partially phased information contained in long genomic fragments analyzed by short-read sequencing. For a human sample, as little as 30 Gbp of additional sequencing data are needed to phase genotypes identified by 50× coverage whole-genome sequencing. Using SLRH, we phase 99% of single-nucleotide variants in three human genomes into long haplotype blocks 0.2–1 Mbp in length. We apply our method to determine allele-specific methylation patterns in a human genome and identify hundreds of differentially methylated regions that were previously unknown. SLRH should facilitate population-scale haplotyping of human genomes.

Nature Biotechnology | 2016

Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome

Volodymyr Kuleshov; Chao Jiang; Wenyu Zhou; Fereshteh Jahanbani; Serafim Batzoglou; Michael Snyder

Identifying bacterial strains in metagenome and microbiome samples using computational analyses of short-read sequences remains a difficult problem. Here, we present an analysis of a human gut microbiome using TruSeq synthetic long reads combined with computational tools for metagenomic long-read assembly, variant calling and haplotyping (Nanoscope and Lens). Our analysis identifies 178 bacterial species, of which 51 were not found using shotgun reads alone. We recover bacterial contigs that comprise multiple operons, including 22 contigs of >1 Mbp. Furthermore, we observe extensive intraspecies variation within microbial strains in the form of haplotypes that span up to hundreds of Kbp. Incorporation of synthetic long-read sequencing technology with standard short-read approaches enables more precise and comprehensive analyses of metagenomic samples.

Bioinformatics | 2014

Probabilistic single-individual haplotyping

Volodymyr Kuleshov

Motivation: Accurate haplotyping—determining from which parent particular portions of the genome are inherited—is still mostly an unresolved problem in genomics. This problem has only recently started to become tractable, thanks to the development of new long read sequencing technologies. Here, we introduce ProbHap, a haplotyping algorithm targeted at such technologies. The main algorithmic idea of ProbHap is a new dynamic programming algorithm that exactly optimizes a likelihood function specified by a probabilistic graphical model and which generalizes a popular objective called the minimum error correction. In addition to being accurate, ProbHap also provides confidence scores at phased positions. Results: On a standard benchmark dataset, ProbHap makes 11% fewer errors than current state-of-the-art methods. This accuracy can be further increased by excluding low-confidence positions, at the cost of a small drop in haplotype completeness. Availability: Our source code is freely available at: https://github.com/kuleshov/ProbHap. Contact: [email protected]

Bioinformatics | 2016

Genome assembly from synthetic long read clouds

Volodymyr Kuleshov; Michael Snyder; Serafim Batzoglou

Motivation: Despite rapid progress in sequencing technology, assembling de novo the genomes of new species as well as reconstructing complex metagenomes remains major technological challenges. New synthetic long read (SLR) technologies promise significant advances towards these goals; however, their applicability is limited by high sequencing requirements and the inability of current assembly paradigms to cope with combinations of short and long reads. Results: Here, we introduce Architect, a new de novo scaffolder aimed at SLR technologies. Unlike previous assembly strategies, Architect does not require a costly subassembly step; instead it assembles genomes directly from the SLR’s underlying short reads, which we refer to as read clouds. This enables a 4- to 20-fold reduction in sequencing requirements and a 5-fold increase in assembly contiguity on both genomic and metagenomic datasets relative to state-of-the-art assembly strategies aimed directly at fully subassembled long reads. Availability and Implementation: Our source code is freely available at https://github.com/kuleshov/architect. Contact: [email protected]

Ai Magazine | 2018

Learning with Weak Supervision from Physics and Data-Driven Constraints

Hongyu Ren; Russell Stewart; Jiaming Song; Volodymyr Kuleshov

In many applications of machine learning, labeled data is scarce and obtaining additional labels is expensive. We introduce a new approach to supervising learning algorithms without labels by enforcing a small number of domain-specific constraints over the algorithms’ outputs. The constraints can be provided explicitly based on prior knowledge — e.g. we may require that objects detected in videos satisfy the laws of physics — or implicitly extracted from data using a novel framework inspired by adversarial training. We demonstrate the effectiveness of constraint-based learning on a variety of tasks — including tracking, object detection, and human pose estimation — and we find that algorithms supervised with constraints achieve high accuracies with only a small amount of labels, or with no labels at all in some cases.

bioRxiv | 2017

GATTACA: Lightweight Metagenomic Binning With Compact Indexing Of Kmer Counts And MinHash-based Panel Selection

Victoria Popic; Volodymyr Kuleshov; Michael Snyder; Serafim Batzoglou

We introduce GATTACA, a framework for rapid and accurate binning of metagenomic contigs from a single or multiple metagenomic samples into clusters associated with individual species. The clusters are computed using co-abundance profiles within a set of reference metagnomes; unlike previous methods, GATTACA estimates these profiles from k-mer counts stored in a highly compact index. On multiple synthetic and real benchmark datasets, GATTACA produces clusters that correspond to distinct bacterial species with an accuracy that matches earlier methods, while being up to 20× faster when the reference panel index can be computed offline and 6× faster for online co-abundance estimation. Leveraging the MinHash technique to quickly compare metagenomic samples, GATTACA also provides an efficient way to identify publicly-available metagenomic data that can be incorporated into the set of reference metagenomes to further improve binning accuracy. Thus, enabling easy indexing and reuse of publicly-available metagenomic datasets, GATTACA makes accurate metagenomic analyses accessible to a much wider range of researchers.

arXiv: Artificial Intelligence | 2014