Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ion Măndoiu is active.

Publication


Featured researches published by Ion Măndoiu.


Algorithms for Molecular Biology | 2011

Estimation of alternative splicing isoform frequencies from RNA-Seq data

Marius Nicolae; Serghei Mangul; Ion Măndoiu; Alexander Zelikovsky

BackgroundMassively parallel whole transcriptome sequencing, commonly referred as RNA-Seq, is quickly becoming the technology of choice for gene expression profiling. However, due to the short read length delivered by current sequencing technologies, estimation of expression levels for alternative splicing gene isoforms remains challenging.ResultsIn this paper we present a novel expectation-maximization algorithm for inference of isoform- and gene-specific expression levels from RNA-Seq data. Our algorithm, referred to as IsoEM, is based on disambiguating information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information when available. The open source Java implementation of IsoEM is freely available at http://dna.engr.uconn.edu/software/IsoEM/.ConclusionsEmpirical experiments on both synthetic and real RNA-Seq datasets show that IsoEM has scalable running time and outperforms existing methods of isoform and gene expression level estimation. Simulation experiments confirm previous findings that, for a fixed sequencing cost, using reads longer than 25-36 bases does not necessarily lead to better accuracy for estimating expression levels of annotated isoforms and genes.


BMC Bioinformatics | 2011

Inferring viral quasispecies spectra from 454 pyrosequencing reads

Irina Astrovskaya; Bassam Tork; Serghei Mangul; Kelly Westbrooks; Ion Măndoiu; Peter Balfe; Alexander Zelikovsky

BackgroundRNA viruses infecting a host usually exist as a set of closely related sequences, referred to as quasispecies. The genomic diversity of viral quasispecies is a subject of great interest, particularly for chronic infections, since it can lead to resistance to existing therapies. High-throughput sequencing is a promising approach to characterizing viral diversity, but unfortunately standard assembly software was originally designed for single genome assembly and cannot be used to simultaneously assemble and estimate the abundance of multiple closely related quasispecies sequences.ResultsIn this paper, we introduce a new Vi ral Sp ectrum A ssembler (ViSpA) method for quasispecies spectrum reconstruction and compare it with the state-of-the-art ShoRAH tool on both simulated and real 454 pyrosequencing shotgun reads from HCV and HIV quasispecies. Experimental results show that ViSpA outperforms ShoRAH on simulated error-free reads, correctly assembling 10 out of 10 quasispecies and 29 sequences out of 40 quasispecies. While ShoRAH has a significant advantage over ViSpA on reads simulated with sequencing errors due to its advanced error correction algorithm, ViSpA is better at assembling the simulated reads after they have been corrected by ShoRAH. ViSpA also outperforms ShoRAH on real 454 reads. Indeed, 7 most frequent sequences reconstructed by ViSpA from a real HCV dataset are viable (do not contain internal stop codons), and the most frequent sequence was within 1% of the actual open reading frame obtained by cloning and Sanger sequencing. In contrast, only one of the sequences reconstructed by ShoRAH is viable. On a real HIV dataset, ShoRAH correctly inferred only 2 quasispecies sequences with at most 4 mismatches whereas ViSpA correctly reconstructed 5 quasispecies with at most 2 mismatches, and 2 out of 5 sequences were inferred without any mismatches. ViSpA source code is available at http://alla.cs.gsu.edu/~software/VISPA/vispa.html.ConclusionsViSpA enables accurate viral quasispecies spectrum reconstruction from 454 pyrosequencing reads. We are currently exploring extensions applicable to the analysis of high-throughput sequencing data from bacterial metagenomic samples and ecological samples of eukaryote populations.


Bioinformatics | 2005

DNA-BAR: distinguisher selection for DNA barcoding

Bhaskar DasGupta; Kishori M. Konwar; Ion Măndoiu; Alexander A. Shvartsman

DNA-BAR is a software package for selecting DNA probes (henceforth referred to as distinguishers) that can be used in genomic-based identification of microorganisms. Given the genomic sequences of the microorganisms, DNA-BAR finds a near-minimum number of distinguishers yielding a distinct hybridization pattern for each microorganism. Selected distinguishers satisfy user specified bounds on length, melting temperature and GC content, as well as redundancy and cross-hybridization constraints.


BMC Genomics | 2012

Towards accurate detection and genotyping of expressed variants from whole transcriptome sequencing data

Jorge Duitama; Pramod K. Srivastava; Ion Măndoiu

BackgroundMassively parallel transcriptome sequencing (RNA-Seq) is becoming the method of choice for studying functional effects of genetic variability and establishing causal relationships between genetic variants and disease. However, RNA-Seq poses new technical and computational challenges compared to genome sequencing. In particular, mapping transcriptome reads onto the genome is more challenging than mapping genomic reads due to splicing. Furthermore, detection and genotyping of single nucleotide variants (SNVs) requires statistical models that are robust to variability in read coverage due to unequal transcript expression levels.ResultsIn this paper we present a strategy to more reliably map transcriptome reads by taking advantage of the availability of both the genome reference sequence and transcript databases such as CCDS. We also present a novel Bayesian model for SNV discovery and genotyping based on quality scores.ConclusionsExperimental results on RNA-Seq data generated from blood cell tissue of three Hapmap individuals show that our methods yield increased accuracy compared to several widely used methods. The open source code implementing our methods, released under the GNU General Public License, is available at http://dna.engr.uconn.edu/software/NGSTools/.


Information Processing Letters | 2000

A note on the MST heuristic for bounded edge-length Steiner trees with minimum number of Steiner points

Ion Măndoiu; Alexander Zelikovsky

We give a tight analysis of the MST heuristic recently introduced by G.-H. Lin and G. Xue for approximating the Steiner tree with minimum number of Steiner points and bounded edge-lengths. The approximation factor of the heuristic is shown to be one less than the MST number of the underlying space, defined as the maximum possible degree of a minimum-degree MST spanning points from the space. In particular, on instances drawn from the rectilinear (respectively Euclidean) plane, the MST heuristic is shown to have tight approximation factors of 3, respectively 4.


Nucleic Acids Research | 2009

PrimerHunter: a primer design tool for PCR-based virus subtype identification

Jorge Duitama; Dipu Mohan Kumar; Edward Hemphill; Mazhar I. Khan; Ion Măndoiu; Craig E. Nelson

Rapid and reliable virus subtype identification is critical for accurate diagnosis of human infections, effective response to epidemic outbreaks and global-scale surveillance of highly pathogenic viral subtypes such as avian influenza H5N1. The polymerase chain reaction (PCR) has become the method of choice for virus subtype identification. However, designing subtype-specific PCR primer pairs is a very challenging task: on one hand, selected primer pairs must result in robust amplification in the presence of a significant degree of sequence heterogeneity within subtypes, on the other, they must discriminate between the subtype of interest and closely related subtypes. In this article, we present a new tool, called PrimerHunter, that can be used to select highly sensitive and specific primers for virus subtyping. Our tool takes as input sets of both target and nontarget sequences. Primers are selected such that they efficiently amplify any one of the target sequences, and none of the nontarget sequences. PrimerHunter ensures the desired amplification properties by using accurate estimates of melting temperature with mismatches, computed based on the nearest neighbor model via an efficient fractional programming algorithm. Validation experiments with three avian influenza HA subtypes confirm that primers selected by PrimerHunter have high sensitivity and specificity for target sequences.


BMC Genomics | 2014

Bootstrap-based differential gene expression analysis for RNA-Seq data with and without replicates

Sahar Al Seesi; Yvette Temate Tiagueu; Alexander Zelikovsky; Ion Măndoiu

A major application of RNA-Seq is to perform differential gene expression analysis. Many tools exist to analyze differentially expressed genes in the presence of biological replicates. Frequently, however, RNA-Seq experiments have no or very few biological replicates and development of methods for detecting differentially expressed genes in these scenarios is still an active research area.In this paper we introduce a novel method, called IsoDE, for differential gene expression analysis based on bootstrapping. We compared IsoDE against four existing methods (Fishers exact test, GFOLD, edgeR and Cuffdiff) on RNA-Seq datasets generated using three different sequencing technologies, both with and without replicates. Experiments on MAQC RNA-Seq datasets without replicates show that IsoDE has consistently high accuracy as defined by the qPCR ground truth, frequently higher than that of the compared methods, particularly for low coverage data and at lower fold change thresholds. In experiments on RNA-Seq datasets with up to 7 replicates, IsoDE has also achieved high accuracy. Furthermore, unlike GFOLD and edgeR, IsoDE accuracy varies smoothly with the number of replicates, and is relatively uniform across the entire range of gene expression levels.The proposed non-parametric method based on bootstrapping has practical running time, and achieves robust performance over a broad range of technologies, number of replicates, sequencing depths, and minimum fold change thresholds.


Journal of Computational Biology | 2008

Genotype error detection using Hidden Markov Models of haplotype diversity.

Justin Kennedy; Ion Măndoiu; Bogdan Pasaniuc

The presence of genotyping errors can invalidate statistical tests for linkage and disease association, particularly for methods based on haplotype analysis. Becker et al. have recently proposed a simple likelihood ratio approach for detecting errors in trio genotype data. Under this approach, a SNP genotype is flagged as a potential error if the likelihood associated with the original trio genotype data increases by a multiplicative factor exceeding a user selected threshold when the SNP genotype under test is deleted. In this article we give improved error detection methods using the likelihood ratio test approach in conjunction with likelihood functions that can be efficiently computed based on a Hidden Markov Model of haplotype diversity in the population under study. Experimental results on both simulated and real datasets show that proposed methods have highly scalable running time and achieve significantly improved detection accuracy compared to previous methods.


international conference on computational science | 2006

Minimum multicolored subgraph problem in multiplex PCR primer set selection and population haplotyping

Mohammad Taghi Hajiaghayi; Kamal Jain; Lap Chi Lau; Ion Măndoiu; Alexander Russell; Vijay V. Vazirani

In this paper we consider the minimum weight multicolored subgraph problem (MWMCSP), which is a common generalization of minimum cost multiplex PCR primer set selection and maximum likelihood population haplotyping. In this problem one is given an undirected graph G with non-negative vertex weights and a color function that assigns to each edge one or more of n given colors, and the goal is to find a minimum weight set of vertices inducing edges of all n colors. We obtain improved approximation algorithms and hardness results for MWMCSP and its variant in which the goal is to find a minimum number of vertices inducing edges of at least k colors for a given integer k≤ n.


BMC Bioinformatics | 2014

Feature selection and classifier performance on diverse bio- logical datasets

Edward Hemphill; James Lindsay; Chih Lee; Ion Măndoiu; Craig E. Nelson

BackgroundThere is an ever-expanding range of technologies that generate very large numbers of biomarkers for research and clinical applications. Choosing the most informative biomarkers from a high-dimensional data set, combined with identifying the most reliable and accurate classification algorithms to use with that biomarker set, can be a daunting task. Existing surveys of feature selection and classification algorithms typically focus on a single data type, such as gene expression microarrays, and rarely explore the models performance across multiple biological data types.ResultsThis paper presents the results of a large scale empirical study whereby a large number of popular feature selection and classification algorithms are used to identify the tissue of origin for the NCI-60 cancer cell lines. A computational pipeline was implemented to maximize predictive accuracy of all models at all parameters on five different data types available for the NCI-60 cell lines. A validation experiment was conducted using external data in order to demonstrate robustness.ConclusionsAs expected, the data type and number of biomarkers have a significant effect on the performance of the predictive models. Although no model or data type uniformly outperforms the others across the entire range of tested numbers of markers, several clear trends are visible. At low numbers of biomarkers gene and protein expression data types are able to differentiate between cancer cell lines significantly better than the other three data types, namely SNP, array comparative genome hybridization (aCGH), and microRNA data.Interestingly, as the number of selected biomarkers increases best performing classifiers based on SNP data match or slightly outperform those based on gene and protein expression, while those based on aCGH and microRNA data continue to perform the worst. It is observed that one class of feature selection and classifier are consistently top performers across data types and number of markers, suggesting that well performing feature-selection/classifier pairings are likely to be robust in biological classification problems regardless of the data type used in the analysis.

Collaboration


Dive into the Ion Măndoiu's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sahar Al Seesi

University of Connecticut

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Craig E. Nelson

University of Connecticut

View shared research outputs
Top Co-Authors

Avatar

James Lindsay

University of Connecticut

View shared research outputs
Top Co-Authors

Avatar

Justin Kennedy

University of Connecticut

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bassam Tork

Georgia State University

View shared research outputs
Top Co-Authors

Avatar

Edward Hemphill

University of Connecticut

View shared research outputs
Researchain Logo
Decentralizing Knowledge