Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Marghoob Mohiyuddin is active.

Publication


Featured researches published by Marghoob Mohiyuddin.


Bioinformatics | 2012

Fast and accurate read alignment for resequencing

John C. Mu; Hui Jiang; Amirhossein Kiani; Marghoob Mohiyuddin; Narges Bani Asadi; Wing Hung Wong

MOTIVATION Next-generation sequence analysis has become an important task both in laboratory and clinical settings. A key stage in the majority sequence analysis workflows, such as resequencing, is the alignment of genomic reads to a reference genome. The accurate alignment of reads with large indels is a computationally challenging task for researchers. RESULTS We introduce SeqAlto as a new algorithm for read alignment. For reads longer than or equal to 100 bp, SeqAlto is up to 10 × faster than existing algorithms, while retaining high accuracy and the ability to align reads with large (up to 50 bp) indels. This improvement in efficiency is particularly important in the analysis of future sequencing data where the number of reads approaches many billions. Furthermore, SeqAlto uses less than 8 GB of memory to align against the human genome. SeqAlto is benchmarked against several existing tools with both real and simulated data. AVAILABILITY Linux and Mac OS X binaries free for academic use are available at http://www.stanford.edu/group/wonglab/seqalto CONTACT [email protected].


Nature Communications | 2015

Analysis of deletion breakpoints from 1,092 humans reveals details of mutation mechanisms

Alexej Abyzov; Shantao Li; Daniel Rhee Kim; Marghoob Mohiyuddin; Adrian M. Stütz; Nicholas F. Parrish; Xinmeng Jasmine Mu; Wyatt T. Clark; Ken Chen; Jan O. Korbel; Hugo Y. K. Lam; Charles Lee; Mark Gerstein

Continuous and precise space-based photometry has made it p oss ble to measure the orbital frequency modulation of pulsating stars in binary systems w ith extremely high precision over long time spans. Frequency modulation caused by binary orbi tal motion manifests itself as a multiplet with equal spacing of the orbital frequency in the Fourier transform. The amplitudes and phases of the peaks in these multiplets reflect the orbita l properties, hence the orbital parameters can be extracted by analysing such precise photo metric data alone. We derive analytically the theoretical relations between the multiple t roperties and the orbital parameters, and present a method for determining these parameters, incl ud g the eccentricity and the argument of periapsis, from a quintuplet or a higher order mult iplet. This is achievable with the photometry alone, without spectroscopic radial velocity m easurements. We apply this method to Keplermission data of KIC 8264492, KIC 9651065, and KIC 10990452, e ach of which is shown to have an eccentricity exceeding 0.5. Radial velocit y curves are also derived from the Kepler photometric data. We demonstrate that the results are in goo d agreement with those obtained by another technique based on the analysis of the pu lsation phases.Investigating genomic structural variants at basepair resolution is crucial for understanding their formation mechanisms. We identify and analyze 8,943 deletion breakpoints in 1,092 samples from the 1000 Genomes Project. We find breakpoints have more nearby SNPs and indels than the genomic average, likely a consequence of relaxed selection. By investigating the correlation of breakpoints with DNA methylation, Hi-C interactions, and histone marks and the substitution patterns of nucleotides near them, we find that breakpoints with the signature of non-allelic homologous recombination (NAHR) are associated with open chromatin. We hypothesize that some NAHR deletions occur without DNA replication and cell division, in embryonic and germline cells. In contrast, breakpoints associated with non-homologous (NH) mechanisms often have sequence micro-insertions, templated from later replicating genomic sites, spaced at two characteristic distances from the breakpoint. These micro-insertions are consistent with template-switching events and suggest a particular spatiotemporal configuration for DNA during the events.


Bioinformatics | 2015

MetaSV: An accurate and integrative structural-variant caller for next generation sequencing

Marghoob Mohiyuddin; John C. Mu; Jian Li; Narges Bani Asadi; Mark Gerstein; Alexej Abyzov; Wing Hung Wong; Hugo Y. K. Lam

Summary: Structural variations (SVs) are large genomic rearrangements that vary significantly in size, making them challenging to detect with the relatively short reads from next-generation sequencing (NGS). Different SV detection methods have been developed; however, each is limited to specific kinds of SVs with varying accuracy and resolution. Previous works have attempted to combine different methods, but they still suffer from poor accuracy particularly for insertions. We propose MetaSV, an integrated SV caller which leverages multiple orthogonal SV signals for high accuracy and resolution. MetaSV proceeds by merging SVs from multiple tools for all types of SVs. It also analyzes soft-clipped reads from alignment to detect insertions accurately since existing tools underestimate insertion SVs. Local assembly in combination with dynamic programming is used to improve breakpoint resolution. Paired-end and coverage information is used to predict SV genotypes. Using simulation and experimental data, we demonstrate the effectiveness of MetaSV across various SV types and sizes. Availability and implementation: Code in Python is at http://bioinform.github.io/metasv/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Bioinformatics | 2015

VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications

John C. Mu; Marghoob Mohiyuddin; Jian Li; Narges Bani Asadi; Mark Gerstein; Alexej Abyzov; Wing Hung Wong; Hugo Y. K. Lam

Summary: VarSim is a framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously reported mutations to make the synthetic genomes biologically relevant. VarSim simulates and validates a wide range of variants, including single nucleotide variants, small indels and large structural variants. It is an automated, comprehensive compute framework supporting parallel computation and multiple read simulators. Furthermore, we developed a novel map data structure to validate read alignments, a strategy to compare variants binned in size ranges and a lightweight, interactive, graphical report to visualize validation results with detailed statistics. Thus far, it is the most comprehensive validation tool for secondary analysis in next generation sequencing. Availability and implementation: Code in Java and Python along with instructions to download the reads and variants is at http://bioinform.github.io/varsim. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Nature Communications | 2017

Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis

Sayed Mohammad Ebrahim Sahraeian; Marghoob Mohiyuddin; Robert Sebra; Hagen Tilgner; Pegah Tootoonchi Afshar; Kin Fai Au; Narges Bani Asadi; Mark Gerstein; Wing Hung Wong; Michael Snyder; Eric E. Schadt; Hugo Y. K. Lam

RNA-sequencing (RNA-seq) is an essential technique for transcriptome studies, hundreds of analysis tools have been developed since it was debuted. Although recent efforts have attempted to assess the latest available tools, they have not evaluated the analysis workflows comprehensively to unleash the power within RNA-seq. Here we conduct an extensive study analysing a broad spectrum of RNA-seq workflows. Surpassing the expression analysis scope, our work also includes assessment of RNA variant-calling, RNA editing and RNA fusion detection techniques. Specifically, we examine both short- and long-read RNA-seq technologies, 39 analysis tools resulting in ~120 combinations, and ~490 analyses involving 15 samples with a variety of germline, cancer and stem cell data sets. We report the performance and propose a comprehensive RNA-seq analysis protocol, named RNACocktail, along with a computational pipeline achieving high accuracy. Validation on different samples reveals that our proposed protocol could help researchers extract more biologically relevant predictions by broad analysis of the transcriptome.RNA-seq is widely used for transcriptome analysis. Here, the authors analyse a wide spectrum of RNA-seq workflows and present a comprehensive analysis protocol named RNACocktail as well as a computational pipeline leveraging the widely used tools for accurate RNA-seq analysis.


Genome Biology | 2015

An ensemble approach to accurately detect somatic mutations using SomaticSeq

Li Tai Fang; Pegah Tootoonchi Afshar; Aparna Chhibber; Marghoob Mohiyuddin; Yu Fan; John C. Mu; Greg Gibeling; Sharon Barr; Narges Bani Asadi; Mark Gerstein; Daniel C. Koboldt; Wenyi Wang; Wing Hung Wong; Hugo Y. K. Lam

SomaticSeq is an accurate somatic mutation detection pipeline implementing a stochastic boosting algorithm to produce highly accurate somatic mutation calls for both single nucleotide variants and small insertions and deletions. The workflow currently incorporates five state-of-the-art somatic mutation callers, and extracts over 70 individual genomic and sequencing features for each candidate site. A training set is provided to an adaptively boosted decision tree learner to create a classifier for predicting mutation statuses. We validate our results with both synthetic and real data. We report that SomaticSeq is able to achieve better overall accuracy than any individual tool incorporated.


bioRxiv | 2018

Deep convolutional neural networks for accurate somatic mutation detection

Sayed Mohammad Ebrahim Sahraeian; Ruolin Liu; Bayo Lau; Marghoob Mohiyuddin; Hugo Y. K. Lam

We present NeuSomatic, the first convolutional neural network approach for somatic mutation detection, which significantly outperforms previous methods on different sequencing platforms, sequencing strategies, and tumor purities. NeuSomatic summarizes sequence alignments into small matrices and incorporates more than a hundred features to capture mutation signals effectively. It can be used universally as a stand-alone somatic mutation detection method or with an ensemble of existing methods to achieve the highest accuracy.


Bioinformatics | 2016

LongISLND: in silico sequencing of lengthy and noisy datatypes

Bayo Lau; Marghoob Mohiyuddin; John C. Mu; Li Tai Fang; Narges Bani Asadi; Carolina Dallett; Hugo Y. K. Lam

Summary: LongISLND is a software package designed to simulate sequencing data according to the characteristics of third generation, single-molecule sequencing technologies. The general software architecture is easily extendable, as demonstrated by the emulation of Pacific Biosciences (PacBio) multi-pass sequencing with P5 and P6 chemistries, producing data in FASTQ, H5, and the latest PacBio BAM format. We demonstrate its utility by downstream processing with consensus building and variant calling. Availability and Implementation: LongISLND is implemented in Java and available at http://bioinform.github.io/longislnd Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Scientific Reports | 2015

Leveraging long read sequencing from a single individual to provide a comprehensive resource for benchmarking variant calling methods

John C. Mu; Pegah Tootoonchi Afshar; Marghoob Mohiyuddin; Xi Chen; Jian Li; Narges Bani Asadi; Mark Gerstein; Wing Hung Wong; Hugo Y. K. Lam

A high-confidence, comprehensive human variant set is critical in assessing accuracy of sequencing algorithms, which are crucial in precision medicine based on high-throughput sequencing. Although recent works have attempted to provide such a resource, they still do not encompass all major types of variants including structural variants (SVs). Thus, we leveraged the massive high-quality Sanger sequences from the HuRef genome to construct by far the most comprehensive gold set of a single individual, which was cross validated with deep Illumina sequencing, population datasets, and well-established algorithms. It was a necessary effort to completely reanalyze the HuRef genome as its previously published variants were mostly reported five years ago, suffering from compatibility, organization, and accuracy issues that prevent their direct use in benchmarking. Our extensive analysis and validation resulted in a gold set with high specificity and sensitivity. In contrast to the current gold sets of the NA12878 or HS1011 genomes, our gold set is the first that includes small variants, deletion SVs and insertion SVs up to a hundred thousand base-pairs. We demonstrate the utility of our HuRef gold set to benchmark several published SV detection tools.


BMC Genomics | 2016

svclassify: a method to establish benchmark structural variant calls

Hemang Parikh; Marghoob Mohiyuddin; Hugo Y. K. Lam; Hariharan K. Iyer; Desu Chen; Mark Pratt; Gabor Bartha; Noah Spies; Wolfgang Losert; Justin M. Zook; Marc L. Salit

Collaboration


Dive into the Marghoob Mohiyuddin's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge