Alexander Artyomenko
Georgia State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alexander Artyomenko.
BMC Bioinformatics | 2013
Pavel Skums; Nicholas Mancuso; Alexander Artyomenko; Bassam Tork; Ion I. Mandoiu; Yuri Khudyakov; Alexander Zelikovsky
BackgroundHighly mutable RNA viruses exist in infected hosts as heterogeneous populations of genetically close variants known as quasispecies. Next-generation sequencing (NGS) allows for analysing a large number of viral sequences from infected patients, presenting a novel opportunity for studying the structure of a viral population and understanding virus evolution, drug resistance and immune escape. Accurate reconstruction of genetic composition of intra-host viral populations involves assembling the NGS short reads into whole-genome sequences and estimating frequencies of individual viral variants. Although a few approaches were developed for this task, accurate reconstruction of quasispecies populations remains greatly unresolved.ResultsTwo new methods, AmpMCF and ShotMCF, for reconstruction of the whole-genome intra-host viral variants and estimation of their frequencies were developed, based on Multicommodity Flows (MCFs). AmpMCF was designed for NGS reads obtained from individual PCR amplicons and ShotMCF for NGS shotgun reads. While AmpMCF, based on covering formulation, identifies a minimal set of quasispecies explaining all observed reads, ShotMCS, based on packing formulation, engages the maximal number of reads to generate the most probable set of quasispecies. Both methods were evaluated on simulated data in comparison to Maximum Bandwidth and ViSpA, previously developed state-of-the-art algorithms for estimating quasispecies spectra from the NGS amplicon and shotgun reads, respectively. Both algorithms were accurate in estimation of quasispecies frequencies, especially from large datasets.ConclusionsThe problem of viral population reconstruction from amplicon or shotgun NGS reads was solved using the MCF formulation. The two methods, ShotMCF and AmpMCF, developed here afford accurate reconstruction of the structure of intra-host viral population from NGS reads. The implementations of the algorithms are available at http://alan.cs.gsu.edu/vira.html (AmpMCF) and http://alan.cs.gsu.edu/NGS/?q=content/shotmcf (ShotMCF).
Bioinformatics | 2015
Pavel Skums; Alexander Artyomenko; Olga Glebova; Ion I. Mandoiu; David S. Campo; Zoya Dimitrova; Alexander Zelikovsky; Yuri Khudyakov
MOTIVATION Next-generation sequencing (NGS) allows for analyzing a large number of viral sequences from infected patients, providing an opportunity to implement large-scale molecular surveillance of viral diseases. However, despite improvements in technology, traditional protocols for NGS of large numbers of samples are still highly cost and labor intensive. One of the possible cost-effective alternatives is combinatorial pooling. Although a number of pooling strategies for consensus sequencing of DNA samples and detection of SNPs have been proposed, these strategies cannot be applied to sequencing of highly heterogeneous viral populations. RESULTS We developed a cost-effective and reliable protocol for sequencing of viral samples, that combines NGS using barcoding and combinatorial pooling and a computational framework including algorithms for optimal virus-specific pools design and deconvolution of individual samples from sequenced pools. Evaluation of the framework on experimental and simulated data for hepatitis C virus showed that it substantially reduces the sequencing costs and allows deconvolution of viral populations with a high accuracy. AVAILABILITY AND IMPLEMENTATION The source code and experimental data sets are available at http://alan.cs.gsu.edu/NGS/?q=content/pooling.
international conference on computational advances in bio and medical sciences | 2013
Alexander Artyomenko; Nicholas Mancuso; Alexander Zelikovsky; Pavel Skums; Ion I. Mandoiu
The main challenge in local viral quasispecies reconstruction is to eliminate sequencing errors while preserving the natural heterogeneity of the viral population. This paper presents a new approach to error correction via an expectation maximization (EM) method.
international conference on computational advances in bio and medical sciences | 2014
Pavel Skums; Alexander Artyomenko; Olga Glebova; Alexander Zelikovsky; David S. Campo; Zoya Dimitrova; Yury Khudyakov
We present a novel general tool for highly sensitive detection of genetic relatedness between highly heterogeneous viral samples based on the clustering of next-generation sequencing data. The tool may be used for detection of viral transmissions and outbreaks and for laboratory quality control.
BMC Genomics | 2017
Olga Glebova; Sergey Knyazev; Andrew Melnyk; Alexander Artyomenko; Yury Khudyakov; Alexander Zelikovsky; Pavel Skums
BackgroundRNA viruses such as HCV and HIV mutate at extremely high rates, and as a result, they exist in infected hosts as populations of genetically related variants. Recent advances in sequencing technologies make possible to identify such populations at great depth. In particular, these technologies provide new opportunities for inference of relatedness between viral samples, identification of transmission clusters and sources of infection, which are crucial tasks for viral outbreaks investigations.ResultsWe present (i) an evolutionary simulation algorithm Viral Outbreak InferenCE (VOICE) inferring genetic relatedness, (ii) an algorithm MinDistB detecting possible transmission using minimal distances between intra-host viral populations and sizes of their relative borders, and (iii) a non-parametric recursive clustering algorithm Relatedness Depth (ReD) analyzing clusters’ structure to infer possible transmissions and their directions. All proposed algorithms were validated using real sequencing data from HCV outbreaks.ConclusionsAll algorithms are applicable to the analysis of outbreaks of highly heterogeneous RNA viruses. Our experimental validation shows that they can successfully identify genetic relatedness between viral populations, as well as infer transmission clusters and outbreak sources.
research in computational molecular biology | 2016
Alexander Artyomenko; Nicholas C. Wu; Serghei Mangul; Eleazar Eskin; Ren Sun; Alexander Zelikovsky
As a result of a high rate of mutations and recombination events, an RNA-virus exists as a heterogeneous “swarm” of mutant variants. The long read length offered by single-molecule sequencing technologies allows each mutant variant to be sequenced in a single pass. However, high error rate limits the ability to reconstruct heterogeneous viral population composed of rare, related mutant variants. In this paper, we present 2SNV, a method able to tolerate the high error-rate of the single-molecule protocol and reconstruct mutant variants. 2SNV uses linkage between single nucleotide variations to efficiently distinguish them from read errors. To benchmark the sensitivity of 2SNV, we performed a single-molecule sequencing experiment on a sample containing a titrated level of known viral mutant variants. Our method is able to accurately reconstruct clone with frequency of 0.2 % and distinguish clones that differed in only two nucleotides distantly located on the genome. 2SNV outperforms existing methods for full-length viral mutant reconstruction. The open source implementation of 2SNV is freely available for download at http://alan.cs.gsu.edu/NGS/?q=content/2snv.
international conference on computational advances in bio and medical sciences | 2014
Adrian Caciula; Olga Glebova; Alexander Artyomenko; Serghei Mangul; James Lindsay; Ion I. Mandoiu; Alexander Zelikovsky
We present a deterministic version of our novel Monte-Carlo Regression based method MCReg [1] for transcriptome quantification from RNA-Seq reads. Experiments on simulated and real datasets demonstrate better transcriptome frequency estimation accuracy compared to that of the existing tools which tend to skew the estimated frequency toward super-transcripts.
international conference on computational advances in bio and medical sciences | 2014
Alexander Artyomenko; Serghei Mangul; Nicholas C. Wu; Eleazar Eskin; Ren Sun; Alexander Zelikovsky
Pacific Biosciences (PacBio) sequencing is providing thousands of reads with the length up to 10,000 bases. In most cases this length is enough to cover entire region of interest however this technology has high (≈ 15%) error rate. We propose a method for viral haplotype reconstruction generalizes k-means clustering with Hamming distance and capable of handling up to 25% random errors. When applied to PacBio reads from an Influenza A Virus (IAV) sample with ten variants, our method was able to reconstruct the four most frequent.
bioRxiv | 2018
Sergey Knyazev; Viachaslau Tsyvina; Andrew Melnyk; Alexander Artyomenko; Tatiana Malygina; Yuri Porozov; Ellsworth Campbell; William M. Switzer; Pavel Skums; Alexander Zelikovsky
Rapidly evolving RNA viruses continuously produce viral minority variants following infection that can quickly spread and become dominant variants if they are drug-resistant or can better evade the immune system. Early detection of minority viral variants may help to promptly change a patient9s treatment plan preventing potential disease complications. Next-generation sequencing (NGS) technologies used for viral testing have recently gained popularity in commercial pipelines and can efficiently identify minority variants. Unfortunately, NGS data require nontrivial computational analyses to eliminate sequencing noise and current computational haplotyping methods do not adequately address this challenging task. To overcome this limitation, we developed CliqueSNV, which finds statistically linked mutations from the same haplotypes to detect minor mutations with an abundance below the sequencing error rates. We compared the performance of CliqueSNV with five state-of-the-art methods with six benchmarks, including three Illumina and one PacBio in vitro sequencing datasets. We show that CliqueSNV can assemble viral haplotypes with frequencies as low as 0.1%. CliqueSNV haplotype predictions were the closest to the experimental viral populations and showed up to a 2.94-fold improvement on Illumina data and a 19.3-fold improvement on PacBio data.Highly mutable RNA viruses such as influenza A virus, human immunodeficiency virus and hepatitis C virus exist in infected hosts as highly heterogeneous populations of closely related genomic variants. The presence of low-frequency variants with few mutations with respect to major strains may result in an immune escape, emergence of drug resistance, and an increase of virulence and infectivity. Next-generation sequencing technologies permit detection of sample intra-host viral population at extremely great depth, thus providing an opportunity to access low-frequency variants. Long read lengths offered by single-molecule sequencing technologies allow all viral variants to be sequenced in a single pass. However, high sequencing error rates limit the ability to study heterogeneous viral populations composed of rare, closely related variants. In this article, we present CliqueSNV, a novel reference-based method for reconstruction of viral variants from NGS data. It efficiently constructs an allele graph based on linkage between single nucleotide variations and identifies true viral variants by merging cliques of that graph using combinatorial optimization techniques. The new method outperforms existing methods in both accuracy and running time on experimental and simulated NGS data for titrated levels of known viral variants. For PacBio reads, it accurately reconstructs variants with frequency as low as 0.1%. For Illumina reads, it fully reconstructs main variants. The open source implementation of CliqueSNV is freely available for download at https://github.com/vyacheslav-tsivina/CliqueSNVRapidly evolving RNA viruses continuously produce minority haplotypes that can become dominant if they are drug-resistant or can better evade the immune system. Therefore, early detection and identification of minority viral haplotypes may help to promptly adjust the patients treatment plan preventing potential disease complications. Minority haplotypes can be identified using next-generation sequencing (NGS), but sequencing noise hinders accurate identification. The elimination of sequencing noise is a non-trivial task that still remains open. Here we propose CliqueSNV based on extracting pairs of statistically linked mutations from noisy reads. This effectively reduces sequencing noise and enables identifying minority haplotypes with the frequency below the sequencing error rate. We comparatively assess the performance of CliqueSNV using an in vitro mixture of nine haplotypes that were derived from the mutation profile of an existing HIV patient. We show that CliqueSNV can accurately assemble viral haplotypes with frequencies as low as 0.1% and maintains consistent performance across short and long bases sequencing platforms.
international symposium on bioinformatics research and applications | 2017
Alexander Artyomenko; Pelin Burcak Icer; Pavel Skums; Yury Khudyakov; Alexander Zelikovsky
Intra-host genetic diversity of hepatitis C virus (HCV) plays crucial role in disease progression and treatment outcome. Development of new treatment strategies, generation and validation of new biomedical hypothesis, development of algorithms and models for analysis of viral data and understanding of viral evolution require studying of thousands of intra-host viral populations. Since such amounts of experimental data are not readily available, simulated data are required. However, to the best of our knowledge, currently, there is no a general framework for generation of realistic intra-host HCV populations, which takes into account complex interactions between virus and host, impact of dynamic selection pressures and statistical effects, such as bottleneck and genetic drift.