Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Pavel Skums is active.

Publication


Featured researches published by Pavel Skums.


BMC Bioinformatics | 2012

Efficient error correction for next-generation sequencing of viral amplicons

Pavel Skums; Zoya Dimitrova; David S. Campo; Gilberto Vaughan; Livia Maria Gonçalves Rossi; Joseph C. Forbi; Jonny Yokosawa; Alexander Zelikovsky; Yury Khudyakov

BackgroundNext-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing.ResultsIn this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones.ConclusionsBoth algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses.The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm


Nature Biotechnology | 2015

Good laboratory practice for clinical next-generation sequencing informatics pipelines

Amy S. Gargis; Lisa Kalman; David P. Bick; Cristina da Silva; David Dimmock; Birgit Funke; Sivakumar Gowrisankar; Madhuri Hegde; Shashikant Kulkarni; Christopher E. Mason; Rakesh Nagarajan; Karl V. Voelkerding; Elizabeth A. Worthey; Nazneen Aziz; John Barnes; Sarah F. Bennett; Himani Bisht; Deanna M. Church; Zoya Dimitrova; Shaw R. Gargis; Nabil Hafez; Tina Hambuch; Fiona Hyland; Ruth Ann Luna; Duncan MacCannell; Tobias Mann; Megan R. McCluskey; Timothy K. McDaniel; Lilia Ganova-Raeva; Heidi L. Rehm

Amy S Gargis, Centers for Disease Control & Prevention Lisa Kalman, Centers for Disease Control & Prevention David P Bick, Medical College of Wisconsin Cristina da Silva, Emory University David P Dimmock, Medical College of Wisconsin Birgit H Funke, Partners Healthcare Personalized Medicine Sivakumar Gowrisankar, Partners Healthcare Personalized Medicine Madhuri Hegde, Emory University Shashikant Kulkarni, Washington University Christopher E Mason, Cornell University


BMC Genomics | 2014

Next-generation sequencing reveals large connected networks of intra-host HCV variants

David S. Campo; Zoya Dimitrova; Lílian Hiromi Tomonari Yamasaki; Pavel Skums; Daryl Lau; Gilberto Vaughan; Joseph C. Forbi; Chong-Gee Teo; Yury Khudyakov

BackgroundNext-generation sequencing (NGS) allows for sampling numerous viral variants from infected patients. This provides a novel opportunity to represent and study the mutational landscape of Hepatitis C Virus (HCV) within a single host.ResultsIntra-host variants of the HCV E1/E2 region were extensively sampled from 58 chronically infected patients. After NGS error correction, the average number of reads and variants obtained from each sample were 3202 and 464, respectively. The distance between each pair of variants was calculated and networks were created for each patient, where each node is a variant and two nodes are connected by a link if the nucleotide distance between them is 1. The work focused on large components having > 5% of all reads, which in average account for 93.7% of all reads found in a patient.The distance between any two variants calculated over the component correlated strongly with nucleotide distances (r = 0.9499; p = 0.0001), a better correlation than the one obtained with Neighbour-Joining trees (r = 0.7624; p = 0.0001). In each patient, components were well separated, with the average distance between (6.53%) being 10 times greater than within each component (0.68%). The ratio of nonsynonymous to synonymous changes was calculated and some patients (6.9%) showed a mixture of networks under strong negative and positive selection. All components were robust to in silico stochastic sampling; even after randomly removing 85% of all reads, the largest connected component in the new subsample still involved 82.4% of remaining nodes. In vitro sampling showed that 93.02% of components present in the original sample were also found in experimental replicas, with 81.6% of reads found in both. When syringe-sharing transmission events were simulated, 91.2% of all simulated transmission events seeded all components present in the source.ConclusionsMost intra-host variants are organized into distinct single-mutation components that are: well separated from each other, represent genetic distances between viral variants, robust to sampling, reproducible and likely seeded during transmission events. Facilitated by NGS, large components offer a novel evolutionary framework for genetic analysis of intra-host viral populations and understanding transmission, immune escape and drug resistance.


The Journal of Infectious Diseases | 2016

Accurate Genetic Detection of Hepatitis C Virus Transmissions in Outbreak Settings

David S. Campo; Guoliang Xia; Zoya Dimitrova; Yulin Lin; Joseph C. Forbi; Lilia Ganova-Raeva; Lili Punkova; Hong Thai; Pavel Skums; Seth Sims; Inna Rytsareva; Gilberto Vaughan; Ha-Jung Roh; Michael A. Purdy; Amanda Sue; Yury Khudyakov

Hepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections are associated with unsafe injection practices, drug diversion, and other exposures to blood and are difficult to detect and investigate. Here, we developed and validated a simple approach for molecular detection of HCV transmissions in outbreak settings. We obtained sequences from the HCV hypervariable region 1 (HVR1), using end-point limiting-dilution (EPLD) technique, from 127 cases involved in 32 epidemiologically defined HCV outbreaks and 193 individuals with unrelated HCV strains. We compared several types of genetic distances and calculated a threshold, using minimal Hamming distances, that identifies transmission clusters in all tested outbreaks with 100% accuracy. The approach was also validated on sequences obtained using next-generation sequencing from HCV strains recovered from 239 individuals, and findings showed the same accuracy as that for EPLD. On average, the nucleotide diversity of the intrahost population was 6.2 times greater in the source case than in any incident case, allowing the correct detection of transmission direction in 8 outbreaks for which source cases were known. A simple and accurate distance-based approach developed here for detecting HCV transmissions streamlines molecular investigation of outbreaks, thus improving the public health capacity for rapid and effective control of hepatitis C.


Journal of Virology | 2014

Analysis of the Evolution and Structure of a Complex Intrahost Viral Population in Chronic Hepatitis C Virus Mapped by Ultradeep Pyrosequencing

Brendan A. Palmer; Zoya Dimitrova; Pavel Skums; Orla Crosbie; Elizabeth Kenny-Walsh; Liam J. Fanning

ABSTRACT Hepatitis C virus (HCV) causes chronic infection in up to 50% to 80% of infected individuals. Hypervariable region 1 (HVR1) variability is frequently studied to gain an insight into the mechanisms of HCV adaptation during chronic infection, but the changes to and persistence of HCV subpopulations during intrahost evolution are poorly understood. In this study, we used ultradeep pyrosequencing (UDPS) to map the viral heterogeneity of a single patient over 9.6 years of chronic HCV genotype 4a infection. Informed error correction of the raw UDPS data was performed using a temporally matched clonal data set. The resultant data set reported the detection of low-frequency recombinants throughout the study period, implying that recombination is an active mechanism through which HCV can explore novel sequence space. The data indicate that polyvirus infection of hepatocytes has occurred but that the fitness quotients of recombinant daughter virions are too low for the daughter virions to compete against the parental genomes. The subpopulations of parental genomes contributing to the recombination events highlighted a dynamic virome where subpopulations of variants are in competition. In addition, we provide direct evidence that demonstrates the growth of subdominant populations to dominance in the absence of a detectable humoral response. IMPORTANCE Analysis of ultradeep pyrosequencing data sets derived from virus amplicons frequently relies on software tools that are not optimized for amplicon analysis, assume random incorporation of sequencing errors, and are focused on achieving higher specificity at the expense of sensitivity. Such analysis is further complicated by the presence of hypervariable regions. In this study, we made use of a temporally matched reference sequence data set to inform error correction algorithms. Using this methodology, we were able to (i) detect multiple instances of hepatitis C virus intrasubtype recombination at the E1/E2 junction (a phenomenon rarely reported in the literature) and (ii) interrogate the longitudinal quasispecies complexity of the virome. Parallel to the UDPS, isolation of IgG-bound virions was found to coincide with the collapse of specific viral subpopulations.


BMC Bioinformatics | 2013

Reconstruction of viral population structure from next-generation sequencing data using multicommodity flows

Pavel Skums; Nicholas Mancuso; Alexander Artyomenko; Bassam Tork; Ion I. Mandoiu; Yuri Khudyakov; Alexander Zelikovsky

BackgroundHighly mutable RNA viruses exist in infected hosts as heterogeneous populations of genetically close variants known as quasispecies. Next-generation sequencing (NGS) allows for analysing a large number of viral sequences from infected patients, presenting a novel opportunity for studying the structure of a viral population and understanding virus evolution, drug resistance and immune escape. Accurate reconstruction of genetic composition of intra-host viral populations involves assembling the NGS short reads into whole-genome sequences and estimating frequencies of individual viral variants. Although a few approaches were developed for this task, accurate reconstruction of quasispecies populations remains greatly unresolved.ResultsTwo new methods, AmpMCF and ShotMCF, for reconstruction of the whole-genome intra-host viral variants and estimation of their frequencies were developed, based on Multicommodity Flows (MCFs). AmpMCF was designed for NGS reads obtained from individual PCR amplicons and ShotMCF for NGS shotgun reads. While AmpMCF, based on covering formulation, identifies a minimal set of quasispecies explaining all observed reads, ShotMCS, based on packing formulation, engages the maximal number of reads to generate the most probable set of quasispecies. Both methods were evaluated on simulated data in comparison to Maximum Bandwidth and ViSpA, previously developed state-of-the-art algorithms for estimating quasispecies spectra from the NGS amplicon and shotgun reads, respectively. Both algorithms were accurate in estimation of quasispecies frequencies, especially from large datasets.ConclusionsThe problem of viral population reconstruction from amplicon or shotgun NGS reads was solved using the MCF formulation. The two methods, ShotMCF and AmpMCF, developed here afford accurate reconstruction of the structure of intra-host viral population from NGS reads. The implementations of the algorithms are available at http://alan.cs.gsu.edu/vira.html (AmpMCF) and http://alan.cs.gsu.edu/NGS/?q=content/shotmcf (ShotMCF).


in Silico Biology | 2011

Reconstructing viral quasispecies from NGS amplicon reads

Nicholas Mancuso; Bassam Tork; Pavel Skums; Lilia Ganova-Raeva; Ion Măndoiu; Alexander Zelikovsky

This paper addresses the problem of reconstructing viral quasispecies from next-generation sequencing reads obtained from amplicons (i.e., reads generated from predefined amplified overlapping regions). We compare the parsimonious and likelihood models for this problem and propose several novel assembling algorithms. The proposed methods have been validated on simulated error-free HCV and real HBV amplicon reads. The new algorithms have been shown to outperform the method of Prosperi et. al. Our experiments also show that viral quasispecies can be reconstructed in most cases more accurately from amplicon reads rather than shotgun reads. All algorithms have been implemented and made available at https://bitbucket.org/nmancuso/bioa/.


in Silico Biology | 2011

Evaluation of viral heterogeneity using next-generation sequencing, end-point limiting-dilution and mass spectrometry

Zoya Dimitrova; David S. Campo; Gilberto Vaughan; Lilia Ganova-Raeva; Yulin Lin; Joseph C. Forbi; Guoliang Xia; Pavel Skums; Brian Pearlman; Yuri Khudyakov

Hepatitis C Virus sequence studies mainly focus on the viral amplicon containing the Hypervariable region 1 (HVR1) to obtain a sample of sequences from which several population genetics parameters can be calculated. Recent advances in sequencing methods allow for analyzing an unprecedented number of viral variants from infected patients and present a novel opportunity for understanding viral evolution, drug resistance and immune escape. In the present paper, we compared three recent technologies for amplicon analysis: (i) Next-Generation Sequencing; (ii) Clonal sequencing using End-point Limiting-dilution for isolation of individual sequence variants followed by Real-Time PCR and sequencing; and (iii) Mass spectrometry of base-specific cleavage reactions of a target sequence. These three technologies were used to assess intra-host diversity and inter-host genetic relatedness in HVR1 amplicons obtained from 38 patients (subgenotypes 1a and 1b). Assessments of intra-host diversity varied greatly between sequence-based and mass-spectrometry-based data. However, assessments of inter-host variability by all three technologies were equally accurate in identification of genetic relatedness among viral strains. These results support the application of all three technologies for molecular epidemiology and population genetics studies. Mass spectrometry is especially promising given its high throughput, low cost and comparable results with sequence-based methods.


bioinformatics and biomedicine | 2011

Viral quasispecies reconstruction from amplicon 454 pyrosequencing reads

Nicholas Mancuso; Bassam Tork; Pavel Skums; Ion I. Mandoiu; Alexander Zelikovsky

We consider the quasispecies spectrum reconstruction problem in amplicon reads. The main contribution of this paper is several methods to reconstruct HCV quasispecies from simulated error-free amplicon reads. Our comparison with existing methods for quasispecies spectrum reconstruction both based on shotgun and amplicon reads show significant advantages of the proposed technique. In most of the cases, even low coverage allows to reconstruct majority of quasispecies and very accurately estimate their frequencies in the simulated samples. The source code for all implemented algorithms is available at https://bitbucket.org/nmancuso/bioa/


Clinical Pharmacology & Therapeutics | 2014

Drug resistance of a viral population and its individual intrahost variants during the first 48 hours of therapy.

David S. Campo; Pavel Skums; Zoya Dimitrova; Gilberto Vaughan; Joseph C. Forbi; Chong-Gee Teo; Yury Khudyakov; Daryl Lau

Using hepatitis C virus (HCV) and interferon (IFN) resistance as a proof of concept, we have devised a new method for calculating the effect of a drug on a viral population, as well as the resistance of the populations individual intrahost variants. By means of next‐generation sequencing, HCV variants were obtained from sera collected at nine time points from 16 patients during the first 48 h after injection of IFN‐α. IFN‐resistance coefficients were calculated for individual variants using changes in their relative frequencies, and for the entire intrahost viral population using changes in viral titer. Population‐wide resistance and presence of IFN‐resistant variants were highly associated with pegylated IFN‐α2a/ribavirin treatment outcome at week 12 (P = 3.78 × 10−5 and 0.0114, respectively). This new method allows an accurate measurement of resistance based solely on changes in viral titer or the relative frequency of intrahost viral variants during a short observation time.

Collaboration


Dive into the Pavel Skums's collaboration.

Top Co-Authors

Avatar

Yury Khudyakov

Centers for Disease Control and Prevention

View shared research outputs
Top Co-Authors

Avatar

Zoya Dimitrova

Centers for Disease Control and Prevention

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David S. Campo

Centers for Disease Control and Prevention

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gilberto Vaughan

Centers for Disease Control and Prevention

View shared research outputs
Top Co-Authors

Avatar

Lilia Ganova-Raeva

Centers for Disease Control and Prevention

View shared research outputs
Top Co-Authors

Avatar

Ion I. Mandoiu

University of Connecticut

View shared research outputs
Top Co-Authors

Avatar

Olga Glebova

Georgia State University

View shared research outputs
Top Co-Authors

Avatar

Joseph C. Forbi

Centers for Disease Control and Prevention

View shared research outputs
Researchain Logo
Decentralizing Knowledge