Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Andreas Buja is active.

Publication


Featured researches published by Andreas Buja.


Journal of the American Statistical Association | 1994

Flexible Discriminant Analysis by Optimal Scoring

Trevor Hastie; Robert Tibshirani; Andreas Buja

Abstract Fishers linear discriminant analysis is a valuable tool for multigroup classification. With a large number of predictors, one can find a reduced number of discriminant coordinate functions that are “optimal” for separating the groups. With two such functions, one can produce a classification map that partitions the reduced space into regions that are identified with group membership, and the decision boundaries are linear. This article is about richer nonlinear classification schemes. Linear discriminant analysis is equivalent to multiresponse linear regression using optimal scorings to represent the groups. In this paper, we obtain nonparametric versions of discriminant analysis by replacing linear regression by any nonparametric regression method. In this way, any multiresponse regression technique (such as MARS or neural networks) can be postprocessed to improve its classification performance.


Journal of Computational and Graphical Statistics | 1996

Interactive High-Dimensional Data Visualization

Andreas Buja; Dianne Cook

Abstract We propose a rudimentary taxonomy of interactive data visualization based on a triad of data analytic tasks: finding Gestalt, posing queries, and making comparisons. These tasks are supported by three classes of interactive view manipulations: focusing, linking, and arranging views. This discussion extends earlier work on the principles of focusing and linking and sets them on a firmer base. Next, we give a high-level introduction to a particular system for multivariate data visualization—XGobi. This introduction is not comprehensive but emphasizes XGobi tools that are examples of focusing, linking, and arranging views; namely, high-dimensional projections, linked scatterplot brushing, and matrices of conditional plots. Finally, in a series of case studies in data visualization, we show the powers and limitations of particular focusing, linking, and arranging tools. The discussion is dominated by high-dimensional projections that form an extremely well-developed part of XGobi. Of particular inter...


Multivariate Behavioral Research | 1992

Remarks on Parallel Analysis

Andreas Buja; Nermin Eyuboglu

We investigate parallel analysis (PA), a selection rule for the number-of-factors problem, from the point of view of permutation assessment. The idea of applying permutation test ideas to PA leads to a quasi-inferential, non-parametric version of PA which accounts not only for finite-sample bias but sampling variability as well. We give evidence, however, that quasi-inferential PA based on normal random variates (as opposed to data permutations) is surprisingly independent of distributional assumptions, and enjoys therefore certain non- parametric properties as well. This is a justification for providing tables for quasi-inferential PA. Based on permutation theory, we compare PA of principal components with PA of principal factor analysis and show that PA of principal factors may tend to select too many factors. We also apply parallel analysis to so-called resistant correlations and give evidence that this yields a slightly more conservative factor selection method. Finally, we apply PA to loadings and show how this provides benchmark values for loadings which are sensitive to the number of variables, number of subjects, and order of factors. These values therefore improve on conventional fixed thresholds such as 0.5 or 0.8 which are used irrespective of the size of the data.


ieee visualization | 1991

Interactive data visualization using focusing and linking

Andreas Buja; John Alan McDonald; John Michalak; Werner Stuetzle

Two basic principles for interactive visualization of high-dimensional data-focusing and linking-are discussed. Focusing techniques may involve selecting subsets, dimension reduction, or some more general manipulation of the layout information on the page or screen. A consequent of focusing is that each view only conveys partial information about the data and needs to be linked so that the information contained in individual views can be integrated into a coherent image of the data as a whole. Examples are given of how graphical data analysis methods based on focusing and linking are used in applications including linguistics, geographic information systems, time series analysis, and the analysis of multi-channel images arising in radiology and remote sensing.<<ETX>>


Journal of Computational and Graphical Statistics | 1998

XGobi: Interactive Dynamic Data Visualization in the X Window System

Deborah F. Swayne; Dianne Cook; Andreas Buja

Abstract XGobi is a data visualization system with state-of-the-art interactive and dynamic methods for the manipulation of views of data. It implements 2-D displays of projections of points and lines in high-dimensional spaces, as well as parallel coordinate displays and textual views thereof. Projection tools include dotplots of single variables, plots of pairs of variables, 3-D data rotations, various grand tours, and interactive projection pursuit. Views of the data can be reshaped. Points can be labeled and brushed with glyphs and colors. Lines can be edited and colored. Several XGobi processes can be run simultaneously and linked for labeling, brushing, and sharing of projections. Missing data are accommodated and their patterns can be examined; multiple imputations can be given to XGobi for rapid visual diagnostics. XGobi includes an extensive online help facility. XGobi can be integrated in other software systems, as has been done for the data analysis language S, the geographic information system...


Computational Statistics & Data Analysis | 2003

GGobi: evolving from XGobi into an extensible framework for interactive data visualization

Deborah F. Swayne; Duncan Temple Lang; Andreas Buja; Dianne Cook

GGobi is a direct descendent of a data visualization system called XGobi that has been around since the early 1990s. GGobis new features include multiple plotting windows, a color lookup table manager, and an Extensible Markup Language file format for data. Perhaps the biggest advance is that GGobi can be easily extended, either by being embedded in other software or by the addition of plugins; either way, it can be controlled using an Application Programming Interface. An illustration of its extensibility is that it can be embedded in R. The result is a full marriage between GGobis direct manipulation graphical environment and Rs familiar extensible environment for statistical data analysis.


Journal of Computational and Graphical Statistics | 1995

Grand Tour and Projection Pursuit

Dianne Cook; Andreas Buja; Javier Cabrera; Catherine B. Hurley

Abstract The grand tour and projection pursuit are two methods for exploring multivariate data. We show how to combine them into a dynamic graphical tool for exploratory data analysis, called a projection pursuit guided tour. This tool assists in clustering data when clusters are oddly shaped and in finding general low-dimensional structure in high-dimensional, and in particular, sparse data. An example shows that the method, which is projection-based, can be quite powerful in situations that may cause grief for methods based on kernel smoothing. The projection pursuit guided tour is also useful for comparing and developing projection pursuit indexes and illustrating some types of asymptotic results.


Proceedings of the National Academy of Sciences of the United States of America | 2011

Dosage-dependent phenotypes in models of 16p11.2 lesions found in autism

Guy Horev; Jacob Ellegood; Jason P. Lerch; Young-Eun E. Son; Lakshmi Muthuswamy; Hannes Vogel; Abba M. Krieger; Andreas Buja; R. Mark Henkelman; Michael Wigler; Alea A. Mills

Recurrent copy number variations (CNVs) of human 16p11.2 have been associated with a variety of developmental/neurocognitive syndromes. In particular, deletion of 16p11.2 is found in patients with autism, developmental delay, and obesity. Patients with deletions or duplications have a wide range of clinical features, and siblings carrying the same deletion often have diverse symptoms. To study the consequence of 16p11.2 CNVs in a systematic manner, we used chromosome engineering to generate mice harboring deletion of the chromosomal region corresponding to 16p11.2, as well as mice harboring the reciprocal duplication. These 16p11.2 CNV models have dosage-dependent changes in gene expression, viability, brain architecture, and behavior. For each phenotype, the consequence of the deletion is more severe than that of the duplication. Of particular note is that half of the 16p11.2 deletion mice die postnatally; those that survive to adulthood are healthy and fertile, but have alterations in the hypothalamus and exhibit a “behavior trap” phenotype—a specific behavior characteristic of rodents with lateral hypothalamic and nigrostriatal lesions. These findings indicate that 16p11.2 CNVs cause brain and behavioral anomalies, providing insight into human neurodevelopmental disorders.


Journal of Computational and Graphical Statistics | 2008

Data Visualization With Multidimensional Scaling

Andreas Buja; Deborah F. Swayne; Michael L. Littman; Nathaniel Dean; Heike Hofmann; Lisha Chen

We discuss methodology for multidimensional scaling (MDS) and its implementation in two software systems, GGvis and XGvis. MDS is a visualization technique for proximity data, that is, data in the form of N × N dissimilarity matrices. MDS constructs maps (“configurations,” “embeddings”) in IRk by interpreting the dissimilarities as distances. Two frequent sources of dissimilarities are high-dimensional data and graphs. When the dissimilarities are distances between high-dimensional objects, MDS acts as a (often nonlinear) dimension-reduction technique. When the dissimilarities are shortest-path distances in a graph, MDS acts as a graph layout technique. MDS has found recent attention in machine learning motivated by image databases (“Isomap”). MDS is also of interest in view of the popularity of “kernelizing” approaches inspired by Support Vector Machines (SVMs; “kernel PCA”). This article discusses the following general topics: (1) the stability and multiplicity of MDS solutions; (2) the analysis of structure within and between subsets of objects with missing value schemes in dissimilarity matrices; (3) gradient descent for optimizing general MDS loss functions (“Strain” and “Stress”); (4) a unification of classical (Strain-based) and distance (Stress-based) MDS. Particular topics include the following: (1) blending of automatic optimization with interactive displacement of configuration points to assist in the search for global optima; (2) forming groups of objects with interactive brushing to create patterned missing values in MDS loss functions; (3) optimizing MDS loss functions for large numbers of objects relative to a small set of anchor points (“external unfolding”); and (4) a non-metric version of classical MDS. We show applications to the mapping of computer usage data, to the dimension reduction of marketing segmentation data, to the layout of mathematical graphs and social networks, and finally to the spatial reconstruction of molecules.


Journal of the American Statistical Association | 2009

Local Multidimensional Scaling for Nonlinear Dimension Reduction, Graph Drawing, and Proximity Analysis

Lisha Chen; Andreas Buja

In the past decade there has been a resurgence of interest in nonlinear dimension reduction. Among new proposals are “Local Linear Embedding,” “Isomap,” and Kernel Principal Components Analysis which all construct global low-dimensional embeddings from local affine or metric information. We introduce a competing method called “Local Multidimensional Scaling” (LMDS). Like LLE, Isomap, and KPCA, LMDS constructs its global embedding from local information, but it uses instead a combination of MDS and “force-directed” graph drawing. We apply the force paradigm to create localized versions of MDS stress functions with a tuning parameter to adjust the strength of nonlocal repulsive forces. We solve the problem of tuning parameter selection with a meta-criterion that measures how well the sets of K-nearest neighbors agree between the data and the embedding. Tuned LMDS seems to be able to outperform MDS, PCA, LLE, Isomap, and KPCA, as illustrated with two well-known image datasets. The meta-criterion can also be used in a pointwise version as a diagnostic tool for measuring the local adequacy of embeddings and thereby detect local problems in dimension reductions.

Collaboration


Dive into the Andreas Buja's collaboration.

Top Co-Authors

Avatar

Lawrence D. Brown

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Linda H. Zhao

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Richard A. Berk

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Edward I. George

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Emil Pitkin

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Abba M. Krieger

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge