Robert R. Snapp | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robert R. Snapp is active.

Explore More

Publication

Featured researches published by Robert R. Snapp.

Neural Computation | 1990

Generalizing smoothness constraints from discrete samples

Chuanyi Ji; Robert R. Snapp; Demetri Psaltis

We study how certain smoothness constraints, for example, piecewise continuity, can be generalized from a discrete set of analog-valued data, by modifying the error backpropagation, learning algorithm. Numerical simulations demonstrate that by imposing two heuristic objectives (1) reducing the number of hidden units, and (2) minimizing the magnitudes of the weights in the network during the learning process, one obtains a network with a response function that smoothly interpolates between the training data.

international symposium on information theory | 1993

On the finite sample performance of the nearest neighbor classifier

Demetri Psaltis; Robert R. Snapp; Santosh S. Venkatesh

The finite sample performance of a nearest neighbor classifier is analyzed for a two-class pattern recognition problem. An exact integral expression is derived for the m-sample risk R/sub m/ given that a reference m-sample of labeled points is available to the classifier. The statistical setup assumes that the pattern classes arise in nature with fixed a priori probabilities and that points representing the classes are drawn from Euclidean n-space according to fixed class-conditional probability distributions. The sample is assumed to consist of m independently generated class-labeled points. For a family of smooth class-conditional distributions characterized by asymptotic expansions in general form, it is shown that the m-sample risk R/sub m/ has a complete asymptotic series expansion R/sub m//spl sim/R/sub /spl infin//+/spl Sigma//sub k=2//sup /spl infin//c/sub k/m/sup -k/n/ (m/spl rarr//spl infin/), where R/sub /spl infin// denotes the nearest neighbor risk in the infinite-sample limit and the coefficients c/sub k/ are distribution-dependent constants independent of the sample size m. The analysis thus provides further analytic validation of Bellmans curse of dimensionality. Numerical simulations corroborating the formal results are included, and extensions of the theory discussed. The analysis also contains a novel application of Laplaces asymptotic method of integration to a multidimensional integral where the integrand attains its maximum on a continuum of points. >

Journal of Structural Biology | 2010

Probabilistic principal component analysis with expectation maximization (PPCA-EM) facilitates volume classification and estimates the missing data

L. Yu; Robert R. Snapp; Teresa Ruiz; Michael Radermacher

We have developed a new method for classifying 3D reconstructions with missing data obtained by electron microscopy techniques. The method is based on principal component analysis (PCA) combined with expectation maximization. The missing data, together with the principal components, are treated as hidden variables that are estimated by maximizing a likelihood function. PCA in 3D is similar to PCA for 2D image analysis. A lower dimensional subspace of significant features is selected, into which the data are projected, and if desired, subsequently classified. In addition, our new algorithm estimates the missing data for each individual volume within the lower dimensional subspace. Application to both a large model data set and cryo-electron microscopy experimental data demonstrates the good performance of the algorithm and illustrates its potential for studying macromolecular assemblies with continuous conformational variations.

International Journal of Agricultural and Environmental Information Systems | 2012

Service Path Attribution Networks (SPANs): A Network Flow Approach to Ecosystem Service Assessment

Gary W. Johnson; Kenneth J. Bagstad; Robert R. Snapp; Ferdinando Villa

Ecosystem services are the effects on human well-being of the flow of benefits from ecosystems to people over given extents of space and time. The Service Path Attribution Network (SPAN) model provides a spatial framework for determining the topology and strength of these flows and identifies the human and ecological features which give rise to them. As an aid to decision-making, this approach discovers dependencies between provision and usage endpoints, spatial competition among users for scarce resources, and areas of highest likely impact on ecosystem service flows. Particularly novel is the modelâ€™s ability to quantify services provided by the absence of a flow. SPAN models have been developed for a number of services (scenic views, proximity to open space, carbon sequestration, flood mitigation, nutrient cycling, and avoided sedimentation/deposition), which vary in scale of effect, mechanism of provision and use, and type of flow. Results using real world data are shown for the US Puget Sound region.

ACM Transactions on Database Systems | 2005

Self-tuning cost modeling of user-defined functions in an object-relational DBMS

Zhen He; Byung Suk Lee; Robert R. Snapp

Query optimizers in object-relational database management systems typically require users to provide the execution cost models of user-defined functions (UDFs). Despite this need, however, there has been little work done to provide such a model. The existing approaches are static in that they require users to train the model a priori with pregenerated UDF execution cost data. Static approaches can not adapt to changing UDF execution patterns and thus degrade in accuracy when the UDF executions used for generating training data do not reflect the patterns of those performed during operation. This article proposes a new approach based on the recent trend of self-tuning DBMS by which the cost model is maintained dynamically and incrementally as UDFs are being executed online. In the context of UDF cost modeling, our approach faces a number of challenges, that is, it should work with limited memory, work with limited computation time, and adjust to the fluctuations in the execution costs (e.g., caching effect). In this article, we first provide a set of guidelines for developing techniques that meet these challenges, while achieving accurate and fast cost prediction with small overheads. Then, we present two concrete techniques developed under the guidelines. One is an instance-based technique based on the conventional k-nearest neighbor (KNN) technique which uses a multidimensional index like the R*-tree. The other is a summary-based technique which uses the quadtree to store summary values at multiple resolutions. We have performed extensive performance evaluations comparing these two techniques against existing histogram-based techniques and the KNN technique, using both real and synthetic UDFs/data sets. The results show our techniques provide better performance in most situations considered.

international conference on computational science and its applications | 2010

Service path attribution networks (SPANs): spatially quantifying the flow of ecosystem services from landscapes to people

Gary W. Johnson; Kenneth J. Bagstad; Robert R. Snapp; Ferdinando Villa

Ecosystem services are the effects on human well-being of the flow of benefits from ecosystems to people over given extents of space and time. The Service Path Attribution Network (span) model provides a spatial framework for quantifying these flows, providing a new means of estimating these economic benefits. This approach discovers dependencies between provision and usage endpoints, spatial competition among users for scarce resources, and landscape effects on ecosystem service flows. Particularly novel is the models ability to identify the relative density of these flows throughout landscapes and to determine which areas are affected by upstream flow depletion. span descriptions have been developed for a number of services(aesthetic viewsheds, proximity to open space, carbon sequestration, flood mitigation, nutrient cycling, and avoided sedimentation/deposition), which vary in scale of effect, mechanism of provision and use, and type of flow. Results using real world data are shown for the US Puget Sound region.

acm/ieee joint conference on digital libraries | 2001

Approximate ad-hoc query engine for simulation data

Ghaleb Abdulla; Chuck Baldwin; Terence Critchlow; Roy Kamimura; Ida Lozares; Ron Musick; Nu Ai Tang; Byung Suk Lee; Robert R. Snapp

In this paper, we describe AQSim, an ongoing effort to design and impl ement a system to manage terabytes of scientific simulation data. The goal of this project is to reduce data storage requirements and access times while permitting ad-hoc queries using statistical and mathematical models of the data. In order to facilitate data exchange between models based on different representations, we are evaluating using the ASCI common data model that is comprised of several layers of increasing semantic complexity. To support queries over the spatial-temporal mesh structured data we are in the process of defining and implementing a grammar for MeshSQL

extending database technology | 2004

Self-tuning UDF Cost Modeling Using the Memory-Limited Quadtree

Zhen He; Byung Suk Lee; Robert R. Snapp

Query optimizers in object-relational database management systems require users to provide the execution cost models of user-defined functions(UDFs). Despite this need, however, there has been little work done to provide such a model. Furthermore, none of the existing work is self-tuning and, therefore, cannot adapt to changing UDF execution patterns. This paper addresses this problem by introducing a self-tuning cost modeling approach based on the quadtree. The quadtree has the inherent desirable properties to (1) perform fast retrievals, (2) allow for fast incremental updates (without storing individual data points), and (3) store information at different resolutions. We take advantage of these properties of the quadtree and add the following in order to make the quadtree useful for UDF cost modeling: the abilities to (1) adapt to changing UDF execution patterns and (2) use limited memory. To this end, we have developed a novel technique we call the memory-limited quadtree(MLQ). In MLQ, each instance of UDF execution is mapped to a query point in a multi-dimensional space. Then, a prediction is made at the query point, and the actual value at the point is inserted as a new data point. The quadtree is then used to store summary information of the data points at different resolutions based on the distribution of the data points. This information is used to make predictions, guide the insertion of new data points, and guide the compression of the quadtree when the memory limit is reached. We have conducted extensive performance evaluations comparing MLQ with the existing (static) approach.

database systems for advanced applications | 2001

Toward a query language on simulation mesh data: an object-oriented approach

Byung Suk Lee; Robert R. Snapp; Ron Musick

As simulation is gaining popularity as an inexpensive means of experimentation in diverse fields of industry and government, the attention to the data generated by scientific simulation is also increasing. Scientific simulation generates mesh data, i.e. data configured in a grid structure, in a sequence of time steps. Its model is complex - understanding it involves mathematical topology and geometry in addition to fields (in the relational sense). Moreover, there is no query language developed on mesh data at all. We develop a comprehensive model of mesh data in an object-oriented manner, propose a set of primitive algebraic operators, show their object-oriented implementation and demonstrate that the well-known object query language OQL (from the ODMG) is powerful enough to express queries on mesh data, whether the queries are on a mesh topology, geometry, fields, or a combination of them. Finally, we discuss some physical implementation issues that are pertinent to executing queries efficiently.

international conference on pattern recognition | 1994

Asymptotic predictions of the finite-sample risk of the k-nearest-neighbor classifier

Robert R. Snapp; Santosh S. Venkatesh

The finite-sample risk of the k-nearest-neighbor classifier is analyzed for a family of two-class problems in which patterns are randomly generated from smooth probability distributions in an n-dimensional Euclidean feature space. First, an exact integral expression for the m-sample risk is obtained for a k-nearest-neighbor classifier that uses a reference sample of m labeled feature vectors. Using a multidimensional application of Laplaces method of integration, this integral can be represented as an asymptotic expansion in negative rational powers of m. The leading terms of this asymptotic expansion elucidate the curse of dimensionality and other properties of the finite-sample risk.

Explore More