Alexander Artyomenko | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alexander Artyomenko is active.

Explore More

Publication

Featured researches published by Alexander Artyomenko.

BMC Bioinformatics | 2013

Reconstruction of viral population structure from next-generation sequencing data using multicommodity flows

Pavel Skums; Nicholas Mancuso; Alexander Artyomenko; Bassam Tork; Ion I. Mandoiu; Yuri Khudyakov; Alexander Zelikovsky

BackgroundHighly mutable RNA viruses exist in infected hosts as heterogeneous populations of genetically close variants known as quasispecies. Next-generation sequencing (NGS) allows for analysing a large number of viral sequences from infected patients, presenting a novel opportunity for studying the structure of a viral population and understanding virus evolution, drug resistance and immune escape. Accurate reconstruction of genetic composition of intra-host viral populations involves assembling the NGS short reads into whole-genome sequences and estimating frequencies of individual viral variants. Although a few approaches were developed for this task, accurate reconstruction of quasispecies populations remains greatly unresolved.ResultsTwo new methods, AmpMCF and ShotMCF, for reconstruction of the whole-genome intra-host viral variants and estimation of their frequencies were developed, based on Multicommodity Flows (MCFs). AmpMCF was designed for NGS reads obtained from individual PCR amplicons and ShotMCF for NGS shotgun reads. While AmpMCF, based on covering formulation, identifies a minimal set of quasispecies explaining all observed reads, ShotMCS, based on packing formulation, engages the maximal number of reads to generate the most probable set of quasispecies. Both methods were evaluated on simulated data in comparison to Maximum Bandwidth and ViSpA, previously developed state-of-the-art algorithms for estimating quasispecies spectra from the NGS amplicon and shotgun reads, respectively. Both algorithms were accurate in estimation of quasispecies frequencies, especially from large datasets.ConclusionsThe problem of viral population reconstruction from amplicon or shotgun NGS reads was solved using the MCF formulation. The two methods, ShotMCF and AmpMCF, developed here afford accurate reconstruction of the structure of intra-host viral population from NGS reads. The implementations of the algorithms are available at http://alan.cs.gsu.edu/vira.html (AmpMCF) and http://alan.cs.gsu.edu/NGS/?q=content/shotmcf (ShotMCF).

Bioinformatics | 2015

Computational framework for next-generation sequencing of heterogeneous viral populations using combinatorial pooling

Pavel Skums; Alexander Artyomenko; Olga Glebova; Ion I. Mandoiu; David S. Campo; Zoya Dimitrova; Alexander Zelikovsky; Yuri Khudyakov

MOTIVATION Next-generation sequencing (NGS) allows for analyzing a large number of viral sequences from infected patients, providing an opportunity to implement large-scale molecular surveillance of viral diseases. However, despite improvements in technology, traditional protocols for NGS of large numbers of samples are still highly cost and labor intensive. One of the possible cost-effective alternatives is combinatorial pooling. Although a number of pooling strategies for consensus sequencing of DNA samples and detection of SNPs have been proposed, these strategies cannot be applied to sequencing of highly heterogeneous viral populations. RESULTS We developed a cost-effective and reliable protocol for sequencing of viral samples, that combines NGS using barcoding and combinatorial pooling and a computational framework including algorithms for optimal virus-specific pools design and deconvolution of individual samples from sequenced pools. Evaluation of the framework on experimental and simulated data for hepatitis C virus showed that it substantially reduces the sequencing costs and allows deconvolution of viral populations with a high accuracy. AVAILABILITY AND IMPLEMENTATION The source code and experimental data sets are available at http://alan.cs.gsu.edu/NGS/?q=content/pooling.

international conference on computational advances in bio and medical sciences | 2013

kGEM: An EM-based algorithm for local reconstruction of viral quasispecies

Alexander Artyomenko; Nicholas Mancuso; Alexander Zelikovsky; Pavel Skums; Ion I. Mandoiu

The main challenge in local viral quasispecies reconstruction is to eliminate sequencing errors while preserving the natural heterogeneity of the viral population. This paper presents a new approach to error correction via an expectation maximization (EM) method.

international conference on computational advances in bio and medical sciences | 2014

Detection of genetic relatedness between viral samples using EM-based clustering of next-generation sequencing data

Pavel Skums; Alexander Artyomenko; Olga Glebova; Alexander Zelikovsky; David S. Campo; Zoya Dimitrova; Yury Khudyakov

We present a novel general tool for highly sensitive detection of genetic relatedness between highly heterogeneous viral samples based on the clustering of next-generation sequencing data. The tool may be used for detection of viral transmissions and outbreaks and for laboratory quality control.

BMC Genomics | 2017

Inference of genetic relatedness between viral quasispecies from sequencing data

Olga Glebova; Sergey Knyazev; Andrew Melnyk; Alexander Artyomenko; Yury Khudyakov; Alexander Zelikovsky; Pavel Skums

BackgroundRNA viruses such as HCV and HIV mutate at extremely high rates, and as a result, they exist in infected hosts as populations of genetically related variants. Recent advances in sequencing technologies make possible to identify such populations at great depth. In particular, these technologies provide new opportunities for inference of relatedness between viral samples, identification of transmission clusters and sources of infection, which are crucial tasks for viral outbreaks investigations.ResultsWe present (i) an evolutionary simulation algorithm Viral Outbreak InferenCE (VOICE) inferring genetic relatedness, (ii) an algorithm MinDistB detecting possible transmission using minimal distances between intra-host viral populations and sizes of their relative borders, and (iii) a non-parametric recursive clustering algorithm Relatedness Depth (ReD) analyzing clusters’ structure to infer possible transmissions and their directions. All proposed algorithms were validated using real sequencing data from HCV outbreaks.ConclusionsAll algorithms are applicable to the analysis of outbreaks of highly heterogeneous RNA viruses. Our experimental validation shows that they can successfully identify genetic relatedness between viral populations, as well as infer transmission clusters and outbreak sources.

research in computational molecular biology | 2016

Long Single-Molecule Reads Can Resolve the Complexity of the Influenza Virus Composed of Rare, Closely Related Mutant Variants

Alexander Artyomenko; Nicholas C. Wu; Serghei Mangul; Eleazar Eskin; Ren Sun; Alexander Zelikovsky

As a result of a high rate of mutations and recombination events, an RNA-virus exists as a heterogeneous “swarm” of mutant variants. The long read length offered by single-molecule sequencing technologies allows each mutant variant to be sequenced in a single pass. However, high error rate limits the ability to reconstruct heterogeneous viral population composed of rare, related mutant variants. In this paper, we present 2SNV, a method able to tolerate the high error-rate of the single-molecule protocol and reconstruct mutant variants. 2SNV uses linkage between single nucleotide variations to efficiently distinguish them from read errors. To benchmark the sensitivity of 2SNV, we performed a single-molecule sequencing experiment on a sample containing a titrated level of known viral mutant variants. Our method is able to accurately reconstruct clone with frequency of 0.2 % and distinguish clones that differed in only two nucleotides distantly located on the genome. 2SNV outperforms existing methods for full-length viral mutant reconstruction. The open source implementation of 2SNV is freely available for download at http://alan.cs.gsu.edu/NGS/?q=content/2snv.

international conference on computational advances in bio and medical sciences | 2014

Deterministic regression algorithm for transcriptome frequency estimation

Adrian Caciula; Olga Glebova; Alexander Artyomenko; Serghei Mangul; James Lindsay; Ion I. Mandoiu; Alexander Zelikovsky

We present a deterministic version of our novel Monte-Carlo Regression based method MCReg [1] for transcriptome quantification from RNA-Seq reads. Experiments on simulated and real datasets demonstrate better transcriptome frequency estimation accuracy compared to that of the existing tools which tend to skew the estimated frequency toward super-transcripts.

international conference on computational advances in bio and medical sciences | 2014

Reconstruction of influenza a virus variants from PacBio reads

Alexander Artyomenko; Serghei Mangul; Nicholas C. Wu; Eleazar Eskin; Ren Sun; Alexander Zelikovsky

Pacific Biosciences (PacBio) sequencing is providing thousands of reads with the length up to 10,000 bases. In most cases this length is enough to cover entire region of interest however this technology has high (≈ 15%) error rate. We propose a method for viral haplotype reconstruction generalizes k-means clustering with Hamming distance and capable of handling up to 25% random errors. When applied to PacBio reads from an Influenza A Virus (IAV) sample with ten variants, our method was able to reconstruct the four most frequent.

bioRxiv | 2018

CliqueSNV: Scalable Reconstruction of Intra-Host Viral Populations from NGS Reads

Sergey Knyazev; Viachaslau Tsyvina; Andrew Melnyk; Alexander Artyomenko; Tatiana Malygina; Yuri Porozov; Ellsworth Campbell; William M. Switzer; Pavel Skums; Alexander Zelikovsky

Rapidly evolving RNA viruses continuously produce viral minority variants following infection that can quickly spread and become dominant variants if they are drug-resistant or can better evade the immune system. Early detection of minority viral variants may help to promptly change a patient9s treatment plan preventing potential disease complications. Next-generation sequencing (NGS) technologies used for viral testing have recently gained popularity in commercial pipelines and can efficiently identify minority variants. Unfortunately, NGS data require nontrivial computational analyses to eliminate sequencing noise and current computational haplotyping methods do not adequately address this challenging task. To overcome this limitation, we developed CliqueSNV, which finds statistically linked mutations from the same haplotypes to detect minor mutations with an abundance below the sequencing error rates. We compared the performance of CliqueSNV with five state-of-the-art methods with six benchmarks, including three Illumina and one PacBio in vitro sequencing datasets. We show that CliqueSNV can assemble viral haplotypes with frequencies as low as 0.1%. CliqueSNV haplotype predictions were the closest to the experimental viral populations and showed up to a 2.94-fold improvement on Illumina data and a 19.3-fold improvement on PacBio data.Highly mutable RNA viruses such as influenza A virus, human immunodeficiency virus and hepatitis C virus exist in infected hosts as highly heterogeneous populations of closely related genomic variants. The presence of low-frequency variants with few mutations with respect to major strains may result in an immune escape, emergence of drug resistance, and an increase of virulence and infectivity. Next-generation sequencing technologies permit detection of sample intra-host viral population at extremely great depth, thus providing an opportunity to access low-frequency variants. Long read lengths offered by single-molecule sequencing technologies allow all viral variants to be sequenced in a single pass. However, high sequencing error rates limit the ability to study heterogeneous viral populations composed of rare, closely related variants. In this article, we present CliqueSNV, a novel reference-based method for reconstruction of viral variants from NGS data. It efficiently constructs an allele graph based on linkage between single nucleotide variations and identifies true viral variants by merging cliques of that graph using combinatorial optimization techniques. The new method outperforms existing methods in both accuracy and running time on experimental and simulated NGS data for titrated levels of known viral variants. For PacBio reads, it accurately reconstructs variants with frequency as low as 0.1%. For Illumina reads, it fully reconstructs main variants. The open source implementation of CliqueSNV is freely available for download at https://github.com/vyacheslav-tsivina/CliqueSNVRapidly evolving RNA viruses continuously produce minority haplotypes that can become dominant if they are drug-resistant or can better evade the immune system. Therefore, early detection and identification of minority viral haplotypes may help to promptly adjust the patients treatment plan preventing potential disease complications. Minority haplotypes can be identified using next-generation sequencing (NGS), but sequencing noise hinders accurate identification. The elimination of sequencing noise is a non-trivial task that still remains open. Here we propose CliqueSNV based on extracting pairs of statistically linked mutations from noisy reads. This effectively reduces sequencing noise and enables identifying minority haplotypes with the frequency below the sequencing error rate. We comparatively assess the performance of CliqueSNV using an in vitro mixture of nine haplotypes that were derived from the mutation profile of an existing HIV patient. We show that CliqueSNV can accurately assemble viral haplotypes with frequencies as low as 0.1% and maintains consistent performance across short and long bases sequencing platforms.

international symposium on bioinformatics research and applications | 2017

Agent-Based in Silico Evolution of HCV Quasispecies

Alexander Artyomenko; Pelin Burcak Icer; Pavel Skums; Yury Khudyakov; Alexander Zelikovsky

Intra-host genetic diversity of hepatitis C virus (HCV) plays crucial role in disease progression and treatment outcome. Development of new treatment strategies, generation and validation of new biomedical hypothesis, development of algorithms and models for analysis of viral data and understanding of viral evolution require studying of thousands of intra-host viral populations. Since such amounts of experimental data are not readily available, simulated data are required. However, to the best of our knowledge, currently, there is no a general framework for generation of realistic intra-host HCV populations, which takes into account complex interactions between virus and host, impact of dynamic selection pressures and statistical effects, such as bottleneck and genetic drift.

Explore More