Sergey Aganezov
George Washington University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sergey Aganezov.
Computational Biology and Chemistry | 2015
Sergey Aganezov; Nadia Sitdykova; Max A. Alekseyev
Advances in DNA sequencing technology over the past decade have increased the volume of raw sequenced genomic data available for further assembly and analysis. While there exist many algorithms for assembly of sequenced genomic material, they often experience difficulties in constructing complete genomic sequences. Instead, they produce long genomic subsequences (scaffolds), which then become a subject to scaffold assembly aimed at reconstruction of their order along genome chromosomes. The balance between reliability and cost for scaffold assembly is not there just yet, which inspires one to seek for new approaches to address this problem. We present a new method for scaffold assembly based on the analysis of gene orders and genome rearrangements in multiple related genomes (some or even all of which may be fragmented). Evaluation of the proposed method on artificially fragmented mammalian genomes demonstrates its high reliability. We also apply our method for incomplete anophelinae genomes, which expose high fragmentation, and further validate the assembly results with referenced-based scaffolding. While the two methods demonstrate consistent results, the proposed method is able to identify more assembly points than the reference-based scaffolding.
international symposium on bioinformatics research and applications | 2016
Sergey Aganezov; Max A. Alekseyev
Advances in the DNA sequencing technology over the past decades have increased the volume of raw sequenced genomic data available for further assembly and analysis. While there exist many software tools for assembly of sequenced genomic material, they often experience difficulties with reconstructing complete chromosomes. Major obstacles include uneven read coverage and long similar subsequences (repeats) in genomes. Assemblers therefore often are able to reliably reconstruct only long subsequences, called scaffolds.
international conference on computational advances in bio and medical sciences | 2016
Sergey Aganezov; Max A. Alekseyev
Motivation: Despite the recent progress in genome sequencing and assembly, many of the currently available assembled genomes come in a draft form. Such draft genomes consist of a large number of genomic fragments (scaffolds), whose positions and orientations along the genome are unknown. While there exists a number of methods for reconstruction of the genome from its scaffolds, utilizing various computational and wet-lab techniques, they often can produce only partial error-prone scaffold assemblies. It therefore becomes important to compare and merge scaffold assemblies produced by different methods, thus combining their advantages and highlighting present conflicts for further investigation. These tasks may be labor intensive if performed manually.
bioRxiv | 2018
Robert M. Waterhouse; Sergey Aganezov; Yoann Anselmetti; Jiyoung Lee; Livio Ruzzante; Maarten Jmf Reijnders; Sèverine Bérard; Phillip George; Matthew W. Hahn; Paul I. Howell; Maryam Kamali; Sergey Koren; Daniel Lawson; Gareth Maslen; Ashley Peery; Adam M. Phillippy; Maria V. Sharakhova; Eric Tannier; Maria F. Unger; Simo V. Zhang; Max A. Alekseyev; Nora J. Besansky; Cedric Chauve; Scott J. Emrich; Igor V. Sharakhov
Background New sequencing technologies have lowered financial barriers to whole genome sequencing, but resulting assemblies are often fragmented and far from ‘finished’. Updating multi-scaffold drafts to chromosome-level status can be achieved through experimental mapping or re-sequencing efforts. Avoiding the costs associated with such approaches, comparative genomic analysis of gene order conservation (synteny) to predict scaffold neighbours (adjacencies) offers a potentially useful complementary method for improving draft assemblies. Results We employed three gene synteny-based methods applied to 21 Anopheles mosquito assemblies to produce consensus sets of scaffold adjacencies. For subsets of the assemblies we integrated these with additional supporting data to confirm and complement the synteny-based adjacencies: six with physical mapping data that anchor scaffolds to chromosome locations, 13 with paired-end RNA sequencing (RNAseq) data, and three with new assemblies based on re-scaffolding or Pacific Biosciences long-read data. Our combined analyses produced 20 new superscaffolded assemblies with improved contiguities: seven for which assignments of non-anchored scaffolds to chromosome arms span more than 75% of the assemblies, and a further seven with chromosome anchoring including an 88% anchored Anopheles arabiensis assembly and, respectively, 73% and 84% anchored assemblies with comprehensively updated cytogenetic photomaps for Anopheles funestus and Anopheles stephensi. Conclusions Experimental data from probe mapping, RNAseq, or long-read technologies, where available, all contribute to successful upgrading of draft assemblies. Our comparisons show that gene synteny-based computational methods represent a valuable alternative or complementary approach. Our improved Anopheles reference assemblies highlight the utility of applying comparative genomics approaches to improve community genomic resources.While new sequencing technologies have lowered financial barriers to whole genome sequencing, resulting assemblies are often fragmented and far from 9finished9. Subsequent improvements towards chromosomal-level status can be achieved by both experimental and computational approaches. Requiring only annotated assemblies and gene orthology data, comparative genomics approaches that aim to capture evolutionary signals to predict scaffold neighbours (adjacencies) offer potentially substantive improvements without the costs associated with experimental scaffolding or re-sequencing. We leverage the combined detection power of three such gene synteny-based methods applied to 21 Anopheles mosquito assemblies with variable contiguity levels to produce consensus sets of scaffold adjacency predictions. Three complementary validations were performed on subsets of assemblies with additional supporting data: six with physical mapping data; 13 with paired-end RNA sequencing (RNAseq) data; and three with new assemblies based on re-scaffolding or incorporating Pacific Biosciences (PacBio) sequencing data. Improved assemblies were built by integrating the consensus adjacency predictions with supporting experimental data, resulting in 20 new reference assemblies with improved contiguities. Combined with physical mapping data for six anophelines, chromosomal positioning of scaffolds improved assembly anchoring by 47% for A. funestus and 38% A. stephensi. Reconciling an A. funestus PacBio assembly with synteny-based and RNAseq-based adjacencies and physical mapping data resulted in a new 81.5% chromosomally mapped reference assembly and cytogenetic photomap. While complementary experimental data are clearly key to achieving high-quality chromosomal-level assemblies, our assessments and validations of gene synteny-based computational methods highlight the utility of applying comparative genomics approaches to improve community genomic resources.
research in computational molecular biology | 2017
Sergey Aganezov; Max A. Alekseyev
Despite the recent progress in genome sequencing and assembly, many of the currently available assembled genomes come in a draft form. Such draft genomes consist of a large number of genomic fragments (scaffolds), whose order and/or orientation (i.e., strand) in the genome are unknown. There exist various scaffold assembly methods, which attempt to determine the order and orientation of scaffolds along the genome chromosomes. Some of these methods (e.g., based on FISH physical mapping, chromatin conformation capture, etc.) can infer the order of scaffolds, but not necessarily their orientation. This leads to a special case of the scaffold orientation problem (i.e., deducing the orientation of each scaffold) with a known order of the scaffolds.
international conference on computational advances in bio and medical sciences | 2017
Sergey Aganezov; Max A. Alekseyev
Introduction: There exists a number of methods that attempt to reconstruct a genome from a set of scaffolds. To do so, they (i) determine the order of scaffolds; and (ii) determine the orientation (i.e., strand of origin) of scaffolds. Some methods attempt to solve these subproblems jointly by using various types of additional data including jumping libraries, long error-prone reads, homology relationships between genomes, etc. Other methods (typically based on wet-lab experiments) can often reliably reconstruct the order of scaffolds, but may fail to impose their orientation. This inspires us to consider the special case of the problem (ii) where the order of scaffolds is given.
BMC Bioinformatics | 2017
Sergey Aganezov; Max A. Alekseyev
BackgroundDespite the recent progress in genome sequencing and assembly, many of the currently available assembled genomes come in a draft form. Such draft genomes consist of a large number of genomic fragments (scaffolds), whose positions and orientations along the genome are unknown. While there exists a number of methods for reconstruction of the genome from its scaffolds, utilizing various computational and wet-lab techniques, they often can produce only partial error-prone scaffold assemblies. It therefore becomes important to compare and merge scaffold assemblies produced by different methods, thus combining their advantages and highlighting present conflicts for further investigation. These tasks may be labor intensive if performed manually.ResultsWe present CAMSA—a tool for comparative analysis and merging of two or more given scaffold assemblies. The tool (i) creates an extensive report with several comparative quality metrics; (ii) constructs the most confident merged scaffold assembly; and (iii) provides an interactive framework for a visual comparative analysis of the given assemblies. Among the CAMSA features, only scaffold merging can be evaluated in comparison to existing methods. Namely, it resembles the functionality of assembly reconciliation tools, although their primary targets are somewhat different. Our evaluations show that CAMSA produces merged assemblies of comparable or better quality than existing assembly reconciliation tools while being the fastest in terms of the total running time.ConclusionsCAMSA addresses the current deficiency of tools for automated comparison and analysis of multiple assemblies of the same set scaffolds. Since there exist numerous methods and techniques for scaffold assembly, identifying similarities and dissimilarities across assemblies produced by different methods is beneficial both for the developers of scaffold assembly algorithms and for the researchers focused on improving draft assemblies of specific organisms.
Journal of Computational Biology | 2016
Pavel Avdeyev; Shuai Jiang; Sergey Aganezov; Fei Hu; Max A. Alekseyev
BMC Bioinformatics | 2012
Sergey Aganezov; Max A. Alekseyev
BMC Genomics | 2016
Yu-Chen Liu; Sheng-Da Hsu; Chih-Hung Chou; Wei-Yun Huang; Yu-Hung Chen; Chia-Yu Liu; Guan-Jay Lyu; Shao-Zhen Huang; Sergey Aganezov; Max A. Alekseyev; Chung-Der Hsiao; Hsien-Da Huang