Christina S. Leslie | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christina S. Leslie is active.

Explore More

Publication

Featured researches published by Christina S. Leslie.

pacific symposium on biocomputing | 2001

The spectrum kernel: a string kernel for SVM protein classification.

Christina S. Leslie; Eleazar Eskin; William Stafford Noble

We introduce a new sequence-similarity kernel, the spectrum kernel, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. Our kernel is conceptually simple and efficient to compute and, in experiments on the SCOP database, performs well in comparison with state-of-the-art methods for homology detection. Moreover, our method produces an SVM classifier that allows linear time classification of test sequences. Our experiments provide evidence that string-based kernels, in conjunction with SVMs, could offer a viable and computationally efficient alternative to other methods of protein classification and homology detection.

Bioinformatics | 2004

Mismatch string kernels for discriminative protein classification

Christina S. Leslie; Eleazar Eskin; Adiel Cohen; Jason Weston; William Stafford Noble

MOTIVATION Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine learning approaches provide good performance, but simplicity and computational efficiency of training and prediction are also important concerns. RESULTS We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the problem of protein classification and remote homology detection. These kernels measure sequence similarity based on shared occurrences of fixed-length patterns in the data, allowing for mutations between patterns. Thus, the kernels provide a biologically well-motivated way to compare protein sequences without relying on family-based generative models such as hidden Markov models. We compute the kernels efficiently using a mismatch tree data structure, allowing us to calculate the contributions of all patterns occurring in the data in one pass while traversing the tree. When used with an SVM, the kernels enable fast prediction on test sequences. We report experiments on two benchmark SCOP datasets, where we show that the mismatch kernel used with an SVM classifier performs competitively with state-of-the-art methods for homology detection, particularly when very few training examples are available. Examination of the highest-weighted patterns learned by the SVM classifier recovers biologically important motifs in protein families and superfamilies.

Bioinformatics | 2005

Semi-supervised protein classification using cluster kernels

Jason Weston; Christina S. Leslie; Eugene Ie; Dengyong Zhou; André Elisseeff; William Stafford Noble

MOTIVATION Building an accurate protein classification system depends critically upon choosing a good representation of the input sequences of amino acids. Recent work using string kernels for protein data has achieved state-of-the-art classification performance. However, such representations are based only on labeled data--examples with known 3D structures, organized into structural classes--whereas in practice, unlabeled data are far more plentiful. RESULTS In this work, we develop simple and scalable cluster kernel techniques for incorporating unlabeled data into the representation of protein sequences. We show that our methods greatly improve the classification performance of string kernels and outperform standard approaches for using unlabeled data, such as adding close homologs of the positive examples to the training data. We achieve equal or superior performance to previously presented cluster kernel methods and at the same time achieving far greater computational efficiency. AVAILABILITY Source code is available at www.kyb.tuebingen.mpg.de/bs/people/weston/semiprot. The Spider matlab package is available at www.kyb.tuebingen.mpg.de/bs/people/spider. SUPPLEMENTARY INFORMATION www.kyb.tuebingen.mpg.de/bs/people/weston/semiprot.

european conference on machine learning | 2002

A kernel approach for learning from almost orthogonal patterns

Bernhard Schölkopf; Jason Weston; Eleazar Eskin; Christina S. Leslie; William Stafford Noble

In kernel methods, all the information about the training data is contained in the Gram matrix. If this matrix has large diagonal values, which arises for many types of kernels, then kernel methods do not perform well. We propose and test several methods for dealing with this problem by reducing the dynamic range of the matrix while preserving the positive definiteness of the Hessian of the quadratic programming problem that one has to solve when training a Support Vector Machine.

Genome Research | 2011

Computational and experimental identification of mirtrons in Drosophila melanogaster and Caenorhabditis elegans

Wei-Jen Chung; Phaedra Agius; Jakub Orzechowski Westholm; Michael Chen; Katsutomo Okamura; Nicolas Robine; Christina S. Leslie; Eric C. Lai

Mirtrons are intronic hairpin substrates of the dicing machinery that generate functional microRNAs. In this study, we describe experimental assays that defined the essential requirements for entry of introns into the mirtron pathway. These data informed a bioinformatic screen that effectively identified functional mirtrons from the Drosophila melanogaster transcriptome. These included 17 known and six confident novel mirtrons among the top 51 candidates, and additional candidates had limited read evidence in available small RNA data. Our computational model also proved effective on Caenorhabditis elegans, for which the identification of 14 cloned mirtrons among the top 22 candidates more than tripled the number of validated mirtrons in this species. A few low-scoring introns generated mirtron-like read patterns from atypical RNA structures, but their paucity suggests that relatively few such loci were not captured by our model. Unexpectedly, we uncovered examples of clustered mirtrons in both fly and worm genomes, including a <8-kb region in C. elegans harboring eight distinct mirtrons. Altogether, we demonstrate that discovery of functional mirtrons, unlike canonical miRNAs, is amenable to computational methods independent of evolutionary constraint.

Bioinformatics | 2004

Protein backbone angle prediction with machine learning approaches

Rui Kuang; Christina S. Leslie; An-Suei Yang

MOTIVATION Protein backbone torsion angle prediction provides useful local structural information that goes beyond conventional three-state (alpha, beta and coil) secondary structure predictions. Accurate prediction of protein backbone torsion angles will substantially improve modeling procedures for local structures of protein sequence segments, especially in modeling loop conformations that do not form regular structures as in alpha-helices or beta-strands. RESULTS We have devised two novel automated methods in protein backbone conformational state prediction: one method is based on support vector machines (SVMs); the other method combines a standard feed-forward back-propagation artificial neural network (NN) with a local structure-based sequence profile database (LSBSP1). Extensive benchmark experiments demonstrate that both methods have improved the prediction accuracy rate over the previously published methods for conformation state prediction when using an alphabet of three or four states. AVAILABILITY LSBSP1 and the NN algorithm have been implemented in PrISM.1, which is available from www.columbia.edu/~ay1/. SUPPLEMENTARY INFORMATION Supplementary data for the SVM method can be downloaded from the Website www.cs.columbia.edu/compbio/backbone.

conference on learning theory | 2003

Fast Kernels for Inexact String Matching

Christina S. Leslie; Rui Kuang

We introduce several new families of string kernels designed in particular for use with support vector machines (SVMs) for classification of protein sequence data. These kernels – restricted gappy kernels, substitution kernels, and wildcard kernels – are based on feature spaces indexed by k-length subsequences from the string alphabet Σ (or the alphabet augmented by a wildcard character), and hence they are related to the recently presented (k,m)-mismatch kernel and string kernels used in text classification. However, for all kernels we define here, the kernel value K(x,y) can be computed in O(c K (|x| + |y|)) time, where the constant c K depends on the parameters of the kernel but is independent of the size |Σ| of the alphabet. Thus the computation of these kernels is linear in the length of the sequences, like the mismatch kernel, but we improve upon the parameter-dependent constant \(c_K = k^{m+1} |\Sigma|^m\) of the mismatch kernel. We compute the kernels efficiently using a recursive function based on a trie data structure and relate our new kernels to the recently described transducer formalism. Finally, we report protein classification experiments on a benchmark SCOP dataset, where we show that our new faster kernels achieve SVM classification performance comparable to the mismatch kernel and the Fisher kernel derived from profile hidden Markov models.

Proceedings of the National Academy of Sciences of the United States of America | 2015

In vivo, Argonaute-bound microRNAs exist predominantly in a reservoir of low molecular weight complexes not associated with mRNA.

Gaspare La Rocca; Scott H. Olejniczak; Alvaro J. González; Daniel Briskin; Joana A. Vidigal; Lee Spraggon; Raymond G. DeMatteo; Megan R. Radler; Tullia Lindsten; Andrea Ventura; Thomas Tuschl; Christina S. Leslie; Craig B. Thompson

Significance MicroRNAs limit gene expression by recruiting a large protein complex known as the RNA-induced silencing complex (RISC) to target mRNAs. While attempting to understand physiological regulation of RISC assembly, we found that most healthy adult tissues retain a reserve of microRNAs not stably associated with target mRNA. Recruitment of microRNAs to large mRNA-containing complexes was accompanied by an increase in their ability to repress targets and was regulated in part by phosphoinositide-3 kinase–RAC-alpha serine/threonine-protein kinase–mechanistic target of rapamycin pathway-dependent enhancement of the glycine-tryptophan protein of 182 kDa protein expression. Data presented here suggest that in vivo, many expressed microRNAs exist in an inactive reserve, allowing resting cells to use microRNAs to dynamically regulate the translation of target mRNAs in their environment. MicroRNAs repress mRNA translation by guiding Argonaute proteins to partially complementary binding sites, primarily within the 3′ untranslated region (UTR) of target mRNAs. In cell lines, Argonaute-bound microRNAs exist mainly in high molecular weight RNA-induced silencing complexes (HMW-RISC) associated with target mRNA. Here we demonstrate that most adult tissues contain reservoirs of microRNAs in low molecular weight RISC (LMW-RISC) not bound to mRNA, suggesting that these microRNAs are not actively engaged in target repression. Consistent with this observation, the majority of individual microRNAs in primary T cells were enriched in LMW-RISC. During T-cell activation, signal transduction through the phosphoinositide-3 kinase–RAC-alpha serine/threonine-protein kinase–mechanistic target of rapamycin pathway increased the assembly of microRNAs into HMW-RISC, enhanced expression of the glycine-tryptophan protein of 182 kDa, an essential component of HMW-RISC, and improved the ability of microRNAs to repress partially complementary reporters, even when expression of targeting microRNAs did not increase. Overall, data presented here demonstrate that microRNA-mediated target repression in nontransformed cells depends not only on abundance of specific microRNAs, but also on regulation of RISC assembly by intracellular signaling.

BMC Bioinformatics | 2006

Protein Ranking by Semi-Supervised Network Propagation

Jason Weston; Rui Kuang; Christina S. Leslie; William Stafford Noble

BackgroundBiologists regularly search DNA or protein databases for sequences that share an evolutionary or functional relationship with a given query sequence. Traditional search methods, such as BLAST and PSI-BLAST, focus on detecting statistically significant pairwise sequence alignments and often miss more subtle sequence similarity. Recent work in the machine learning community has shown that exploiting the global structure of the network defined by these pairwise similarities can help detect more remote relationships than a purely local measure.MethodsWe review RankProp, a ranking algorithm that exploits the global network structure of similarity relationships among proteins in a database by performing a diffusion operation on a protein similarity network with weighted edges. The original RankProp algorithm is unsupervised. Here, we describe a semi-supervised version of the algorithm that uses labeled examples. Three possible ways of incorporating label information are considered: (i) as a validation set for model selection, (ii) to learn a new network, by choosing which transfer function to use for a given query, and (iii) to estimate edge weights, which measure the probability of inferring structural similarity.ResultsBenchmarked on a human-curated database of protein structures, the original RankProp algorithm provides significant improvement over local network search algorithms such as PSI-BLAST. Furthermore, we show here that labeled data can be used to learn a network without any need for estimating parameters of the transfer function, and that diffusion on this learned network produces better results than the original RankProp algorithm with a fixed network.ConclusionIn order to gain maximal information from a network, labeled and unlabeled data should be used to extract both local and global structure.

international conference on machine learning | 2005

Multi-class protein fold recognition using adaptive codes

Eugene Ie; Jason Weston; William Stafford Noble; Christina S. Leslie

We develop a novel multi-class classification method based on output codes for the problem of classifying a sequence of amino acids into one of many known protein structural classes, called folds. Our method learns relative weights between one-vs-all classifiers and encodes information about the protein structural hierarchy for multi-class prediction. Our code weighting approach significantly improves on the standard one-vs-all method for the fold recognition problem. In order to compare against widely used methods in protein sequence analysis, we also test nearest neighbor approaches based on the PSI-BLAST algorithm. Our code weight learning algorithm strongly outperforms these PSI-BLAST methods on every structure recognition problem we consider.

Explore More