Carl Kingsford | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Carl Kingsford is active.

Explore More

Publication

Featured researches published by Carl Kingsford.

Bioinformatics | 2011

A fast, lock-free approach for efficient parallel counting of occurrences of k-mers

Guillaume Marçais; Carl Kingsford

MOTIVATION Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including genome assembly, error correction of sequencing reads, fast multiple sequence alignment and repeat detection. Recently, the deep sequence coverage generated by next-generation sequencing technologies has caused the amount of sequence to be processed during a genome project to grow rapidly, and has rendered current k-mer counting tools too slow and memory intensive. At the same time, large multicore computers have become commonplace in research facilities allowing for a new parallel computational paradigm. RESULTS We propose a new k-mer counting algorithm and associated implementation, called Jellyfish, which is fast and memory efficient. It is based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length. Due to their flexibility, suffix arrays have been the data structure of choice for solving many string problems. For the task of k-mer counting, important in many biological applications, Jellyfish offers a much faster and more memory-efficient solution. AVAILABILITY The Jellyfish software is written in C++ and is GPL licensed. It is available for download at http://www.cbcb.umd.edu/software/jellyfish.

Nature Methods | 2017

Salmon provides fast and bias-aware quantification of transcript expression

Rob Patro; Geet Duggal; Michael I. Love; Rafael A. Irizarry; Carl Kingsford

We introduce Salmon, a lightweight method for quantifying transcript abundance from RNA–seq reads. Salmon combines a new dual-phase parallel inference algorithm and feature-rich bias models with an ultra-fast read mapping procedure. It is the first transcriptome-wide quantifier to correct for fragment GC-content bias, which, as we demonstrate here, substantially improves the accuracy of abundance estimates and the sensitivity of subsequent differential expression analysis.We introduce Salmon, a method for quantifying transcript abundance from RNA-seq reads that is accurate and fast. Salmon is the first transcriptome-wide quantifier to correct for fragment GC content bias, which we demonstrate substantially improves the accuracy of abundance estimates and the reliability of subsequent differential expression analysis. Salmon combines a new dual-phase parallel inference algorithm and feature-rich bias models with an ultra-fast read mapping procedure.

Nature Biotechnology | 2014

Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms

Robert Patro; Stephen M. Mount; Carl Kingsford

We introduce Sailfish, a computational method for quantifying the abundance of previously annotated RNA isoforms from RNA-seq data. Because Sailfish entirely avoids mapping reads, a time-consuming step in all current methods, it provides quantification estimates much faster than do existing approaches (typically 20 times faster) without loss of accuracy. By facilitating frequent reanalysis of data and reducing the need to optimize parameters, Sailfish exemplifies the potential of lightweight algorithms for efficiently processing sequencing reads.

Bioinformatics | 2010

The power of protein interaction networks for associating genes with diseases

Saket Navlakha; Carl Kingsford

Motivation: Understanding the association between genetic diseases and their causal genes is an important problem concerning human health. With the recent influx of high-throughput data describing interactions between gene products, scientists have been provided a new avenue through which these associations can be inferred. Despite the recent interest in this problem, however, there is little understanding of the relative benefits and drawbacks underlying the proposed techniques. Results: We assessed the utility of physical protein interactions for determining gene–disease associations by examining the performance of seven recently developed computational methods (plus several of their variants). We found that random-walk approaches individually outperform clustering and neighborhood approaches, although most methods make predictions not made by any other method. We show how combining these methods into a consensus method yields Pareto optimal performance. We also quantified how a diffuse topological distribution of disease-related proteins negatively affects prediction quality and are thus able to identify diseases especially amenable to network-based predictions and others for which additional information sources are absolutely required. Availability: The predictions made by each algorithm considered are available online at http://www.cbcb.umd.edu/DiseaseNet Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Emerging Infectious Diseases | 2007

Genome Analysis Linking Recent European and African Influenza (H5N1) Viruses

Carl Kingsford; David J. Spiro; Daniel Janies; Mona M. Aly; Ian H. Brown; Emmanuel Couacy-Hymann; Gian Mario De Mia; Do Huu Dung; Annalisa Guercio; Tony Joannis; Ali Safar Maken Ali; Azizullah Osmani; Iolanda Padalino; Magdi D. Saad; Vladimir Savić; Naomi Sengamalay; Samuel L. Yingst; Jennifer Zaborsky; Olga Zorman-Rojs; Elodie Ghedin; Ilaria Capua

Although linked, these viruses are distinct from earlier outbreak strains.

Nature Biotechnology | 2008

What are decision trees

Carl Kingsford

Decision trees have been applied to problems such as assigning protein function and predicting splice sites. How do these classifiers work, what types of problems can they solve and what are their advantages over alternatives?

Bioinformatics | 2012

Global network alignment using multiscale spectral signatures

Robert Patro; Carl Kingsford

MOTIVATION Protein interaction networks provide an important system-level view of biological processes. One of the fundamental problems in biological network analysis is the global alignment of a pair of networks, which puts the proteins of one network into correspondence with the proteins of another network in a manner that conserves their interactions while respecting other evidence of their homology. By providing a mapping between the networks of different species, alignments can be used to inform hypotheses about the functions of unannotated proteins, the existence of unobserved interactions, the evolutionary divergence between the two species and the evolution of complexes and pathways. RESULTS We introduce GHOST, a global pairwise network aligner that uses a novel spectral signature to measure topological similarity between subnetworks. It combines a seed-and-extend global alignment phase with a local search procedure and exceeds state-of-the-art performance on several network alignment tasks. We show that the spectral signature used by GHOST is highly discriminative, whereas the alignments it produces are also robust to experimental noise. When compared with other recent approaches, we find that GHOST is able to recover larger and more biologically significant, shared subnetworks between species. AVAILABILITY An efficient and parallelized implementation of GHOST, released under the Apache 2.0 license, is available at http://cbcb.umd.edu/kingsford_group/ghost CONTACT [email protected].

BMC Bioinformatics | 2010

Assembly complexity of prokaryotic genomes using short reads

Carl Kingsford; Michael C. Schatz; Mihai Pop

BackgroundDe Bruijn graphs are a theoretical framework underlying several modern genome assembly programs, especially those that deal with very short reads. We describe an application of de Bruijn graphs to analyze the global repeat structure of prokaryotic genomes.ResultsWe provide the first survey of the repeat structure of a large number of genomes. The analysis gives an upper-bound on the performance of genome assemblers for de novo reconstruction of genomes across a wide range of read lengths. Further, we demonstrate that the majority of genes in prokaryotic genomes can be reconstructed uniquely using very short reads even if the genomes themselves cannot. The non-reconstructible genes are overwhelmingly related to mobile elements (transposons, IS elements, and prophages).ConclusionsOur results improve upon previous studies on the feasibility of assembly with short reads and provide a comprehensive benchmark against which to compare the performance of the short-read assemblers currently being developed.

Informs Journal on Computing | 2004

A Semidefinite Programming Approach to Side Chain Positioning with New Rounding Strategies

Bernard Chazelle; Carl Kingsford; Mona Singh

Side chain positioning is an important subproblem of the general protein-structure-prediction problem, with applications in homology modeling and protein design. The side chain positioning problem takes a fixed backbone and a protein sequence and predicts the lowest energy conformation of the proteins side chains on this backbone. We study a widely used version of the problem where the side chain positioning procedure uses a rotamer library and an energy function that can be expressed as a sum of pairwise terms. The problem is NP-complete; we show that it cannot even be approximated. In practice, it is tackled by a variety of general search techniques and specialized heuristics. Here, we propose formulating the side chain positioning problem as an instance of semidefinite programming (SDP). We introduce two novel rounding schemes and provide theoretical justification for their effectiveness under various conditions. We apply our method on simulated data, as well as on the computational redesign of two na...

Algorithms for Molecular Biology | 2014

Identification of alternative topological domains in chromatin

Darya Filippova; Robert Patro; Geet Duggal; Carl Kingsford

Chromosome conformation capture experiments have led to the discovery of dense, contiguous, megabase-sized topological domains that are similar across cell types and conserved across species. These domains are strongly correlated with a number of chromatin markers and have since been included in a number of analyses. However, functionally-relevant domains may exist at multiple length scales. We introduce a new and efficient algorithm that is able to capture persistent domains across various resolutions by adjusting a single scale parameter. The ensemble of domains we identify allows us to quantify the degree to which the domain structure is hierarchical as opposed to overlapping, and our analysis reveals a pronounced hierarchical structure in which larger stable domains tend to completely contain smaller domains. The identified novel domains are substantially different from domains reported previously and are highly enriched for insulating factor CTCF binding and histone marks at the boundaries.

Explore More