Jorge Duitama
University of Connecticut
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jorge Duitama.
Journal of Experimental Medicine | 2014
Fei Duan; Jorge Duitama; Al Seesi S; Ayres Cm; Corcelli Sa; Pawashe Ap; Blanchard T; McMahon D; John Sidney; Alessandro Sette; Brian M. Baker; Mandoiu; Pramod K. Srivastava
Srivastava et al. define a new and improved way to predict immunoprotective cancer neoepitopes based in part on the difference in MHC-binding scores between the mutant epitope and its wild-type counterpart. Remarkably, all neoepitopes that elicited tumor regression bound to class I MHC molecules with very low affinity.
Nucleic Acids Research | 2012
Jorge Duitama; Gayle McEwen; Thomas Huebsch; Stefanie Palczewski; Sabrina Schulz; Kevin Verstrepen; Eun-Kyung Suk; Margret R. Hoehe
Determining the underlying haplotypes of individual human genomes is an essential, but currently difficult, step toward a complete understanding of genome function. Fosmid pool-based next-generation sequencing allows genome-wide generation of 40-kb haploid DNA segments, which can be phased into contiguous molecular haplotypes computationally by Single Individual Haplotyping (SIH). Many SIH algorithms have been proposed, but the accuracy of such methods has been difficult to assess due to the lack of real benchmark data. To address this problem, we generated whole genome fosmid sequence data from a HapMap trio child, NA12878, for which reliable haplotypes have already been produced. We assembled haplotypes using eight algorithms for SIH and carried out direct comparisons of their accuracy, completeness and efficiency. Our comparisons indicate that fosmid-based haplotyping can deliver highly accurate results even at low coverage and that our SIH algorithm, ReFHap, is able to efficiently produce high-quality haplotypes. We expanded the haplotypes for NA12878 by combining the current haplotypes with our fosmid-based haplotypes, producing near-to-complete new gold-standard haplotypes containing almost 98% of heterozygous SNPs. This improvement includes notable fractions of disease-related and GWA SNPs. Integrated with other molecular biological data sets, this phase information will advance the emerging field of diploid genomics.
international conference on bioinformatics | 2010
Jorge Duitama; Thomas Huebsch; Gayle McEwen; Eun-Kyung Suk; Margret R. Hoehe
Full human genomic sequences have been published in the latest two years for a growing number of individuals. Most of them are a mixed consensus of the two real haplotypes because it is still very expensive to separate information coming from the two copies of a chromosome. However, latest improvements and new experimental approaches promise to solve these issues and provide enough information to reconstruct the sequences for the two copies of each chromosome through bioinformatics methods such as single individual haplotyping. Full haploid sequences provide a complete understanding of the structure of the human genome, allowing accurate predictions of translation in protein coding regions and increasing power of association studies. In this paper we present a novel problem formulation for single individual haplotyping. We start by assigning a score to each pair of fragments based on their common allele calls and then we use these score to formulate the problem as the cut of fragments that maximize an objective function, similar to the well known max-cut problem. Our algorithm initially finds the best cut based on a heuristic algorithm for max-cut and then builds haplotypes consistent with that cut. We have compared both accuracy and running time of ReFHap with other heuristic methods on both simulated and real data and found that ReFHap performs significantly faster than previous methods without loss of accuracy.
Nucleic Acids Research | 2009
Jorge Duitama; Dipu Mohan Kumar; Edward Hemphill; Mazhar I. Khan; Ion Măndoiu; Craig E. Nelson
Rapid and reliable virus subtype identification is critical for accurate diagnosis of human infections, effective response to epidemic outbreaks and global-scale surveillance of highly pathogenic viral subtypes such as avian influenza H5N1. The polymerase chain reaction (PCR) has become the method of choice for virus subtype identification. However, designing subtype-specific PCR primer pairs is a very challenging task: on one hand, selected primer pairs must result in robust amplification in the presence of a significant degree of sequence heterogeneity within subtypes, on the other, they must discriminate between the subtype of interest and closely related subtypes. In this article, we present a new tool, called PrimerHunter, that can be used to select highly sensitive and specific primers for virus subtyping. Our tool takes as input sets of both target and nontarget sequences. Primers are selected such that they efficiently amplify any one of the target sequences, and none of the nontarget sequences. PrimerHunter ensures the desired amplification properties by using accurate estimates of melting temperature with mismatches, computed based on the nearest neighbor model via an efficient fractional programming algorithm. Validation experiments with three avian influenza HA subtypes confirm that primers selected by PrimerHunter have high sensitivity and specificity for target sequences.
BMC Bioinformatics | 2011
Jorge Duitama; Justin Kennedy; Sanjiv Dinakar; Yözen Hernández; Yufeng Wu; Ion Măndoiu
BackgroundRecent technology advances have enabled sequencing of individual genomes, promising to revolutionize biomedical research. However, deep sequencing remains more expensive than microarrays for performing whole-genome SNP genotyping.ResultsIn this paper we introduce a new multi-locus statistical model and computationally efficient genotype calling algorithms that integrate shotgun sequencing data with linkage disequilibrium (LD) information extracted from reference population panels such as Hapmap or the 1000 genomes project. Experiments on publicly available 454, Illumina, and ABI SOLiD sequencing datasets suggest that integration of LD information results in genotype calling accuracy comparable to that of microarray platforms from sequencing data of low-coverage. A software package implementing our algorithm, released under the GNU General Public License, is available at http://dna.engr.uconn.edu/software/GeneSeq/.ConclusionsIntegration of LD information leads to significant improvements in genotype calling accuracy compared to prior LD-oblivious methods, rendering low-coverage sequencing as a viable alternative to microarrays for conducting large-scale genome-wide association studies.
international conference on computational advances in bio and medical sciences | 2011
Jorge Duitama; Pramod K. Srivastava; Ion I. Mandoiu
Massively parallel transcriptome sequencing (RNA-Seq) is becoming the method of choice for studying functional effects of genetic variability and establishing causal relationships between genetic variants and disease. However, RNA-Seq poses new technical and computational challenges compared to genome sequencing. In particular, mapping transcriptome reads onto the genome is more challenging than mapping genomic reads due to splicing. Furthermore, detection and genotyping of single nucleotide variants (SNVs) requires statistical models that are robust to variability in read coverage due to unequal transcript expression levels. In this paper we present a strategy to more reliably map transcriptome reads by taking advantage of the availability of both the genome reference sequence and transcript databases such as CCDS. We also present a novel Bayesian model for SNV discovery and genotyping based on quality scores, along with experimental results on RNA-Seq data generated from blood cell tissue of a Hapmap individual showing that our methods yield increased accuracy compared to several widely used methods. The open source code implementing our methods, released under the GNU General Public License, is available at http://dna.engr.uconn.edu/software/NGSTools/.
Methods of Molecular Biology | 2017
Eun-Kyung Suk; Sabrina Schulz; Birgit Mentrup; Thomas Huebsch; Jorge Duitama; Margret R. Hoehe
Haplotype resolution of human genomes is essential to describe and interpret genetic variation and its impact on biology and disease. Our approach to haplotyping relies on converting genomic DNA into a fosmid library, which represents the entire diploid genome as a collection of haploid DNA clones of ~40 kb in size. These can be partitioned into pools such that the probability that the same pool contains both parental haplotypes is reduced to ~1 %. This is the key principle of this method, allowing entire pools of fosmids to be massively parallel sequenced, yielding haploid sequence output. Here, we present a detailed protocol for fosmid pool-based next generation sequencing to haplotype-resolve whole genomes including the following steps: (1) generation of high molecular weight DNA fragments of ~40 kb in size from genomic DNA; (2) fosmid cloning and partitioning into 96-well plates; (3) barcoded sequencing library preparation from fosmid pools for next generation sequencing; and (4) computational analysis of fosmid sequences and assembly into contiguous haploid sequences.This method can be used in combination with, but also without, whole genome shotgun sequencing to extensively resolve heterozygous SNPs and structural variants within genomic regions, resulting in haploid contigs of several hundred kb up to several Mb. This method has a broad range of applications including population and ancestry genetics, the clinical interpretation of mutations in personal genomes, the analysis of cancer genomes and highly complex disease gene regions such as MHC. Moreover, haplotype-resolved genome sequencing allows description and interpretation of the diploid nature of genome biology, for example through the analysis of haploid gene forms and allele-specific phenomena. Application of this method has enabled the production of most of the molecular haplotype-resolved genomes reported to date.
international conference on computational advances in bio and medical sciences | 2011
Jorge Duitama; Eun-Kyung Suk; Sabrina Schulz; Gayle McEwen; Thomas Huebsch; Margret R. Hoehe
The process of grouping alleles of heterozygous variants coming from the same chromosome copy of an individual is called haplotyping. Haplotypes are required to accurately predict translation to proteins in coding regions and in general to achieve a full understanding of the structure and function of particular regions in the genome like the MHC. Classical approaches for haplotyping need parental or population information to predict the most likely haplotype configurations. Unfortunately, these approaches do not provide accurate results in regions with low linkage disequilibrium, and moreover, they can not be applied if the required input data is not available. On the other hand, next generation sequencing (NGS) of genomic DNA does not provide accurate haplotyping results because the reads are too short to span two contigous heterozygous variants in most of the cases. Experimental approaches to solve this problem try to sequence different haploid subsamples which can be combined to assemble separately the two chromosome copies in particular regions. However, to achieve this goal, custom bioinformatics tools need to be developed to analyze reads coming from haploid subsamples instead of whole genome DNA samples.
Genome Research | 2011
Eun-Kyung Suk; Gayle McEwen; Jorge Duitama; Katja Nowick; Sabrina Schulz; Stefanie Palczewski; Stefan Schreiber; Dustin T Holloway; Stephen F. McLaughlin; Heather E. Peckham; Clarence Lee; Thomas Huebsch; Margret R. Hoehe
Archive | 2013
Sam Crauwels; Bo Zhu; Jan Steensels; Jorge Duitama; Pieter Busschaert; Hans Rediers; Gorik De Samblanx; Kathleen Marchal; Kris Willems; Kevin Verstrepen; Bart Lievens