Is this you? Create Your Porfile

Ian Holmes

University of California, Berkeley

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ian Holmes is active.

Explore More

Publication

Featured researches published by Ian Holmes.

Genome Research | 2009

JBrowse: A next-generation genome browser

Mitchell E. Skinner; Andrew V. Uzilov; Lincoln Stein; Christopher J. Mungall; Ian Holmes

We describe an open source, portable, JavaScript-based genome browser, JBrowse, that can be used to navigate genome annotations over the web. JBrowse helps preserve the users sense of location by avoiding discontinuous transitions, instead offering smoothly animated panning, zooming, navigation, and track selection. Unlike most existing genome browsers, where the genome is rendered into images on the webserver and the role of the client is restricted to displaying those images, JBrowse distributes work between the server and client and therefore uses significantly less server overhead than previous genome browsers. We report benchmark results empirically comparing server- and client-side rendering strategies, review the architecture and design considerations of JBrowse, and describe a simple wiki plug-in that allows users to upload and share annotation tracks.

PLOS Computational Biology | 2009

Fast Statistical Alignment

Robert K. Bradley; Adam Roberts; Michael Smoot; Sudeep Juvekar; Jaeyoung Do; Colin N. Dewey; Ian Holmes; Lior Pachter

We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment—previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches—yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/.

PLOS ONE | 2012

Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics

Ian Holmes; Keith Harris; Christopher Quince

We introduce Dirichlet multinomial mixtures (DMM) for the probabilistic modelling of microbial metagenomics data. This data can be represented as a frequency matrix giving the number of times each taxa is observed in each sample. The samples have different size, and the matrix is sparse, as communities are diverse and skewed to rare taxa. Most methods used previously to classify or cluster samples have ignored these features. We describe each community by a vector of taxa probabilities. These vectors are generated from one of a finite number of Dirichlet mixture components each with different hyperparameters. Observed samples are generated through multinomial sampling. The mixture components cluster communities into distinct ‘metacommunities’, and, hence, determine envirotypes or enterotypes, groups of communities with a similar composition. The model can also deduce the impact of a treatment and be used for classification. We wrote software for the fitting of DMM models using the ‘evidence framework’ (http://code.google.com/p/microbedmm/). This includes the Laplace approximation of the model evidence. We applied the DMM model to human gut microbe genera frequencies from Obese and Lean twins. From the model evidence four clusters fit this data best. Two clusters were dominated by Bacteroides and were homogenous; two had a more variable community composition. We could not find a significant impact of body mass on community structure. However, Obese twins were more likely to derive from the high variance clusters. We propose that obesity is not associated with a distinct microbiota but increases the chance that an individual derives from a disturbed enterotype. This is an example of the ‘Anna Karenina principle (AKP)’ applied to microbial communities: disturbed states having many more configurations than undisturbed. We verify this by showing that in a study of inflammatory bowel disease (IBD) phenotypes, ileal Crohns disease (ICD) is associated with a more variable community.

BMC Bioinformatics | 2005

Accelerated probabilistic inference of RNA structure evolution.

Ian Holmes

BackgroundPairwise stochastic context-free grammars (Pair SCFGs) are powerful tools for evolutionary analysis of RNA, including simultaneous RNA sequence alignment and secondary structure prediction, but the associated algorithms are intensive in both CPU and memory usage. The same problem is faced by other RNA alignment-and-folding algorithms based on Sankoffs 1985 algorithm. It is therefore desirable to constrain such algorithms, by pre-processing the sequences and using this first pass to limit the range of structures and/or alignments that can be considered.ResultsWe demonstrate how flexible classes of constraint can be imposed, greatly reducing the computational costs while maintaining a high quality of structural homology prediction. Any score-attributed context-free grammar (e.g. energy-based scoring schemes, or conditionally normalized Pair SCFGs) is amenable to this treatment. It is now possible to combine independent structural and alignment constraints of unprecedented general flexibility in Pair SCFG alignment algorithms. We outline several applications to the bioinformatics of RNA sequence and structure, including Waterman-Eggert N-best alignments and progressive multiple alignment. We evaluate the performance of the algorithm on test examples from the RFAM database.ConclusionA program, Stemloc, that implements these algorithms for efficient RNA sequence alignment and structure prediction is available under the GNU General Public License.

Genome Biology | 2013

Web Apollo: a web-based genomic annotation editing platform

Eduardo Lee; Gregg Helt; Justin T. Reese; Monica Munoz-Torres; Chris P Childers; Robert Buels; Lincoln Stein; Ian Holmes; Christine G. Elsik; Suzanna E. Lewis

Web Apollo is the first instantaneous, collaborative genomic annotation editor available on the web. One of the natural consequences following from current advances in sequencing technology is that there are more and more researchers sequencing new genomes. These researchers require tools to describe the functional features of their newly sequenced genomes. With Web Apollo researchers can use any of the common browsers (for example, Chrome or Firefox) to jointly analyze and precisely describe the features of a genome in real time, whether they are in the same room or working from opposite sides of the world.

Genome Biology | 2016

JBrowse: a dynamic web platform for genome visualization and analysis

Robert Buels; Eric Yao; Colin Diesh; Richard D. Hayes; Monica Munoz-Torres; Gregg Helt; David Goodstein; Christine G. Elsik; Suzanna E. Lewis; Lincoln Stein; Ian Holmes

BackgroundJBrowse is a fast and full-featured genome browser built with JavaScript and HTML5. It is easily embedded into websites or apps but can also be served as a standalone web page.ResultsOverall improvements to speed and scalability are accompanied by specific enhancements that support complex interactive queries on large track sets. Analysis functions can readily be added using the plugin framework; most visual aspects of tracks can also be customized, along with clicks, mouseovers, menus, and popup boxes. JBrowse can also be used to browse local annotation files offline and to generate high-resolution figures for publication.ConclusionsJBrowse is a mature web application suitable for genome visualization and analysis.

Briefings in Bioinformatics | 2013

Visualizing next-generation sequencing data with JBrowse

Oscar Westesson; Mitchell E. Skinner; Ian Holmes

JBrowse is a web-based genome browser, allowing many sources of data to be visualized, interpreted and navigated in a coherent visual framework. JBrowse uses efficient data structures, pre-generation of image tiles and client-side rendering to provide a fast, interactive browsing experience. Many of JBrowses design features make it well suited for visualizing high-volume data, such as aligned next-generation sequencing reads.

RNA | 2011

RNAcentral: A vision for an international database of RNA sequences

Alex Bateman; Shipra Agrawal; Ewan Birney; Elspeth A. Bruford; Janusz M. Bujnicki; Guy Cochrane; James R. Cole; Marcel E. Dinger; Anton J. Enright; Paul P. Gardner; Daniel Gautheret; Sam Griffiths-Jones; Jen Harrow; Javier Herrero; Ian Holmes; Hsien D A Huang; Krystyna A. Kelly; Paul J. Kersey; Ana Kozomara; Todd M. Lowe; Manja Marz; Simon Moxon; Kim D. Pruitt; Tore Samuelsson; Peter F. Stadler; Albert J. Vilella; Jan Hinnerk Vogel; Kelly P. Williams; Mathew W. Wright; Christian Zwieb

During the last decade there has been a great increase in the number of noncoding RNA genes identified, including new classes such as microRNAs and piRNAs. There is also a large growth in the amount of experimental characterization of these RNA components. Despite this growth in information, it is still difficult for researchers to access RNA data, because key data resources for noncoding RNAs have not yet been created. The most pressing omission is the lack of a comprehensive RNA sequence database, much like UniProt, which provides a comprehensive set of protein knowledge. In this article we propose the creation of a new open public resource that we term RNAcentral, which will contain a comprehensive collection of RNA sequences and fill an important gap in the provision of biomedical databases. We envision RNA researchers from all over the world joining a federated RNAcentral network, contributing specialized knowledge and databases. RNAcentral would centralize key data that are currently held across a variety of databases, allowing researchers instant access to a single, unified resource. This resource would facilitate the next generation of RNA research and help drive further discoveries, including those that improve food production and human and animal health. We encourage additional RNA database resources and research groups to join this effort. We aim to obtain international network funding to further this endeavor.

Bioinformatics | 2007

Transducers: an emerging probabilistic framework for modeling indels on trees

Robert K. Bradley; Ian Holmes

When it comes to dealing with indels, molecular evolution lags heuristic bioinformatics by decades. Sophisticated alignment algorithms have been widely known since the 1960s (and in bioinformatics since 1970), but we are still struggling to understand the corresponding phylogenetic models. Big ideas drive change: as we dream of reconstructing ancestral genotypes, it is ever clearer that indels cannot be ignored. We need to develop a robust understanding of probabilistic indel analysis and its relationship to alignment. We believe that a suitable foundation for such analysis already exists, where evolutionary models meet automata theory: the framework of finite-state transducers. This framework links Hidden Markov Models (Brown et al., 1993; Churchill, 1992), sequence alignment algorithms (Gotoh, 1982; Miller andMyers, 1988; Needleman and Wunsch, 1970; Smith and Waterman, 1981), finite-state machines and Chomsky grammars (Durbin et al., 1998) and molecular phylogenetics (Miklos et al., 2004; Thorne et al., 1991). In this letter we outline this framework, also describing a preliminary analysis of one recent algorithm— Indelign—for reconstructing ancestral indel histories (Kim and Sinha, 2007). Below, we briefly review the theory of transducers, concentrating not on the details of individual algorithms but rather on their unifying qualitative character. We show that Indelign, which reconstructs maximum-likelihood indel histories, is implicitly based on a transducer model. Thus, we can compare the computational complexity of Indelign to other transducerframed algorithms, with reference to alignment data from recent comparative genomics projects in Drosophila and Eutheria (ENCODE). Finally, we discuss several programs, algorithms and resources available for working with transducers, offering an outlook on areas of bioinformatics that may benefit from this theory. 1.1 Theory of finite-state transducers

Bioinformatics | 2005

Using evolutionary Expectation Maximization to estimate indel rates

Ian Holmes

Abstract Motivation: The Expectation Maximization (EM) algorithm, in the form of the Baum–Welch algorithm (for hidden Markov models) or the Inside-Outside algorithm (for stochastic context-free grammars), is a powerful way to estimate the parameters of stochastic grammars for biological sequence analysis. To use this algorithm for multiple-sequence evolutionary modelling, it would be useful to apply the EM algorithm to estimate not only the probability parameters of the stochastic grammar, but also the instantaneous mutation rates of the underlying evolutionary model (to facilitate the development of stochastic grammars based on phylogenetic trees, also known as Statistical Alignment). Recently, we showed how to do this for the point substitution component of the evolutionary process; here, we extend these results to the indel process. Results: We present an algorithm for maximum-likelihood estimation of insertion and deletion rates from multiple sequence alignments, using EM, under the single-residue indel model owing to Thorne, Kishino and Felsenstein (the ‘TKF91’ model). The algorithm converges extremely rapidly, gives accurate results on simulated data that are an improvement over parsimonious estimates (which are shown to underestimate the true indel rate), and gives plausible results on experimental data (coronavirus envelope domains). Owing to the algorithms close similarity to the Baum–Welch algorithm for training hidden Markov models, it can be used in an ‘unsupervised’ fashion to estimate rates for unaligned sequences, or estimate several sets of rates for sequences with heterogenous rates. Availability: Software implementing the algorithm and the benchmark is available under GPL from http://www.biowiki.org/ Contact: [email protected]

Explore More