Sam T. Roweis | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sam T. Roweis is active.

Explore More

Publication

Featured researches published by Sam T. Roweis.

Journal of Machine Learning Research | 2003

Think globally, fit locally: unsupervised learning of low dimensional manifolds

Lawrence K. Saul; Sam T. Roweis

The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation. Here we describe locally linear embedding (LLE), an unsupervised learning algorithm that computes low dimensional, neighborhood preserving embeddings of high dimensional data. The data, assumed to be sampled from an underlying manifold, are mapped into a single global coordinate system of lower dimensionality. The mapping is derived from the symmetries of locally linear reconstructions, and the actual computation of the embedding reduces to a sparse eigenvalue problem. Notably, the optimizations in LLE---though capable of generating highly nonlinear embeddings---are simple to implement, and they do not involve local minima. In this paper, we describe the implementation of the algorithm in detail and discuss several extensions that enhance its performance. We present results of the algorithm applied to data sampled from known manifolds, as well as to collections of images of faces, lips, and handwritten digits. These examples are used to provide extensive illustrations of the algorithms performance---both successes and failures---and to relate the algorithm to previous and ongoing work in nonlinear dimensionality reduction.

international conference on computer graphics and interactive techniques | 2006

Removing camera shake from a single photograph

Rob Fergus; Barun Singh; Aaron Hertzmann; Sam T. Roweis; William T. Freeman

Camera shake during exposure leads to objectionable image blur and ruins many photographs. Conventional blind deconvolution methods typically assume frequency-domain constraints on images, or overly simplified parametric forms for the motion path during camera shake. Real camera motions can follow convoluted paths, and a spatial domain prior can better maintain visually salient image characteristics. We introduce a method to remove the effects of camera shake from seriously blurred images. The method assumes a uniform camera blur over the image and negligible in-plane camera rotation. In order to estimate the blur from the camera shake, the user must specify an image region without saturation effects. We show results for a variety of digital photographs taken from personal photo collections.

The Astronomical Journal | 2007

K-Corrections and Filter Transformations in the Ultraviolet, Optical, and Near-Infrared

Michael R. Blanton; Sam T. Roweis

Template fits to observed galaxy fluxes allow calculation of K-corrections and conversions among observations of galaxies at various wavelengths. We present a method for creating model-based template sets given a set of heterogeneous photometric and spectroscopic galaxy data. Our technique, nonnegative matrix factorization, is akin to principal component analysis (PCA), except that it is constrained to produce nonnegative templates, it can use a basis set of models (rather than the delta-function basis of PCA), and it naturally handles uncertainties, missing data, and heterogeneous data (including broadband fluxes at various redshifts). The particular implementation we present here is suitable for ultraviolet, optical, and near-infrared observations in the redshift range 0 < z < 1.5. Since we base our templates on stellar population synthesis models, the results are interpretable in terms of approximate stellar masses and star formation histories. We present templates fitted with this method to data from Galaxy Evolution Explorer, Sloan Digital Sky Survey spectroscopy and photometry, the Two Micron All Sky Survey, the Deep Extragalactic Evolutionary Probe, and the Great Observatories Origins Deep Survey. In addition, we present software for using such data to estimate K-corrections.

Neural Computation | 1999

A unifying review of linear Gaussian models

Sam T. Roweis; Zoubin Ghahramani

Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observations and derivations made by many previous authors and introducing a new way of linking discrete and continuous state models using a simple nonlinearity. Through the use of other nonlinearities, we show how independent component analysis is also a variation of the same basic generative model. We show that factor analysis and mixtures of gaussians can be implemented in autoencoder neural networks and learned using squared error plus the same regularization term. We introduce a new model for static data, known as sensible principal component analysis, as well as a novel concept of spatially adaptive observation noise. We also review some of the literature involving global and local mixtures of the basic models and provide pseudocode for inference and learning for all the basic models.

Cell | 2003

A Panoramic View of Yeast Noncoding RNA Processing

Wen Tao Peng; Mark D. Robinson; Sanie Mnaimneh; Nevan J. Krogan; Gerard Cagney; Quaid Morris; Armaity P. Davierwala; Jörg Grigull; Xueqi Yang; Wen Zhang; Nicholas Mitsakakis; Owen Ryan; Nira Datta; Vladimir Jojic; Chris Pal; Veronica Canadien; Dawn Richards; Bryan Beattie; Lani F. Wu; Steven J. Altschuler; Sam T. Roweis; Brendan J. Frey; Andrew Emili; Jack Greenblatt; Timothy R. Hughes

Predictive analysis using publicly available yeast functional genomics and proteomics data suggests that many more proteins may be involved in biogenesis of ribonucleoproteins than are currently known. Using a microarray that monitors abundance and processing of noncoding RNAs, we analyzed 468 yeast strains carrying mutations in protein-coding genes, most of which have not previously been associated with RNA or RNP synthesis. Many strains mutated in uncharacterized genes displayed aberrant noncoding RNA profiles. Ten factors involved in noncoding RNA biogenesis were verified by further experimentation, including a protein required for 20S pre-rRNA processing (Tsr2p), a protein associated with the nuclear exosome (Lrp1p), and a factor required for box C/D snoRNA accumulation (Bcd1p). These data present a global view of yeast noncoding RNA processing and confirm that many currently uncharacterized yeast proteins are involved in biogenesis of noncoding RNA.

international conference on machine learning | 2006

Nightmare at test time: robust learning by feature deletion

Amir Globerson; Sam T. Roweis

When constructing a classifier from labeled data, it is important not to assign too much weight to any single input feature, in order to increase the robustness of the classifier. This is particularly important in domains with nonstationary feature distributions or with input sensor failures. A common approach to achieving such robustness is to introduce regularization which spreads the weight more evenly between the features. However, this strategy is very generic, and cannot induce robustness specifically tailored to the classification task at hand. In this work, we introduce a new algorithm for avoiding single feature over-weighting by analyzing robustness using a game theoretic formalization. We develop classifiers which are optimally resilient to deletion of features in a minimax sense, and show how to construct such classifiers using quadratic programming. We illustrate the applicability of our methods on spam filtering and handwritten digit recognition tasks, where feature deletion is indeed a realistic noise model.

Journal of Computational Biology | 1998

A Sticker-Based Model for DNA Computation

Sam T. Roweis; Erik Winfree; Richard Burgoyne; Nickolas Chelyapov; Myron F. Goodman; Paul W. K. Rothemund; Leonard M. Adleman

We introduce a new model of molecular computation that we call the sticker model. Like many previous proposals it makes use of DNA strands as the physical substrate in which information is represented and of separation by hybridization as a central mechanism. However, unlike previous models, the stickers model has a random access memory that requires no strand extension and uses no enzymes; also (at least in theory), its materials are reusable. The paper describes computation under the stickers model and discusses possible means for physically implementing each operation. Finally, we go on to propose a specific machine architecture for implementing the stickers model as a microprocessor-controlled parallel robotic workstation. In the course of this development a number of previous general concerns about molecular computation (Smith, 1996; Hartmanis, 1995; Linial et al., 1995) are addressed. First, it is clear that general-purpose algorithms can be implemented by DNA-based computers, potentially solving a wide class of search problems. Second, we find that there are challenging problems, for which only modest volumes of DNA should suffice. Third, we demonstrate that the formation and breaking of covalent bonds is not intrinsic to DNA-based computation. Fourth, we show that a single essential biotechnology, sequence-specific separation, suffices for constructing a general-purpose molecular computer. Concerns about errors in this separation operation and means to reduce them are addressed elsewhere (Karp et al., 1995; Roweis and Winfree, 1999). Despite these encouraging theoretical advances, we emphasize that substantial engineering challenges remain at almost all stages and that the ultimate success or failure of DNA computing will certainly depend on whether these challenges can be met in laboratory investigations.

Journal of Computational Biology | 1999

On Applying Molecular Computation to the Data Encryption Standard

Leonard M. Adleman; Paul W. K. Rothemund; Sam T. Roweis; Erik Winfree

Recently, Boneh, Dunworth, and Lipton (1996) described the potential use of molecular computation in attacking the United States Data Encryption Standard (DES). Here, we provide a description of such an attack using the sticker model of molecular computation. Our analysis suggests that such an attack might be mounted on a tabletop machine using approximately a gram of DNA and might succeed even in the presence of a large number of errors.

Bioinformatics | 2007

Difference detection in LC-MS data for protein biomarker discovery

Jennifer Listgarten; Radford M. Neal; Sam T. Roweis; Peter Y. Wong; Andrew Emili

MOTIVATION There is a pressing need for improved proteomic screening methods allowing for earlier diagnosis of disease, systematic monitoring of physiological responses and the uncovering of fundamental mechanisms of drug action. The combined platform of LC-MS (Liquid-Chromatography-Mass-Spectrometry) has shown promise in moving toward a solution in these areas. In this paper we present a technique for discovering differences in protein signal between two classes of samples of LC-MS serum proteomic data without use of tandem mass spectrometry, gels or labeling. This method works on data from a lower-precision MS instrument, the type routinely used by and available to the community at large today. We test our technique on a controlled (spike-in) but realistic (serum biomarker discovery) experiment which is therefore verifiable. We also develop a new method for helping to assess the difficulty of a given spike-in problem. Lastly, we show that the problem of class prediction, sometimes mistaken as a solution to biomarker discovery, is actually a much simpler problem. RESULTS Using precision-recall curves with experimentally extracted ground truth, we show that (1) our technique has good performance using seven replicates from each class, (2) performance degrades with decreasing number of replicates, (3) the signal that we are teasing out is not trivially available (i.e. the differences are not so large that the task is easy). Lastly, we easily obtain perfect classification results for data in which the problem of extracting differences does not produce absolutely perfect results. This emphasizes the different nature of the two problems and also their relative difficulties. AVAILABILITY Our data are publicly available as a benchmark for further studies of this nature at http://www.cs.toronto.edu/~jenn/LCMS

The Annals of Applied Statistics | 2011

Extreme deconvolution: inferring complete distribution functions from noisy, heterogeneous and incomplete observations

Jo Bovy; David W. Hogg; Sam T. Roweis

We generalize the well-known mixtures of Gaussians approach to density estimation and the accompanying Expectation-Maximization technique for finding the maximum likelihood parameters of the mixture to the case where each data point carries an individual d-dimensional uncertainty covariance and has unique missing data properties. This algorithm reconstructs the error-deconvolved or “underlying” distribution function common to all samples, even when the individual data points are samples from different distributions, obtained by convolving the underlying distribution with the unique uncertainty distribution of the data point and projecting out the missing data directions. We show how this basic algorithm can be extended with Bayesian priors on all of the model parameters and a “split-and-merge” procedure designed to avoid local maxima of the likelihood. We apply this technique to a few typical astrophysical applications.

Explore More