Justin Talbot | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Justin Talbot is active.

Explore More

Publication

Featured researches published by Justin Talbot.

Proceedings of the second international workshop on MapReduce and its applications | 2011

Phoenix++: modular MapReduce for shared-memory systems

Justin Talbot; Richard M. Yoo; Christos Kozyrakis

This paper describes our rewrite of Phoenix, a MapReduce framework for shared-memory CMPs and SMPs. Despite successfully demonstrating the applicability of a MapReduce-style pipeline to shared-memory machines, Phoenix has a number of limitations; its uniform intermediate storage of key-value pairs, inefficient combiner implementation, and poor task overhead amortization fail to efficiently support a wide range of MapReduce applications, encouraging users to manually circumvent the framework. We describe an alternative implementation, Phoenix++, that provides a modular, flexible pipeline that can be easily adapted by the user to the characteristics of a particular workload. Compared to Phoenix, this new approach achieves a 4.7-fold performance improvement and increased scalability, while allowing users to write simple, strict MapReduce code.

human factors in computing systems | 2009

EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers

Justin Talbot; Bongshin Lee; Ashish Kapoor; Desney S. Tan

Machine learning is an increasingly used computational tool within human-computer interaction research. While most researchers currently utilize an iterative approach to refining classifier models and performance, we propose that ensemble classification techniques may be a viable and even preferable alternative. In ensemble learning, algorithms combine multiple classifiers to build one that is superior to its components. In this paper, we present EnsembleMatrix, an interactive visualization system that presents a graphical view of confusion matrices to help users understand relative merits of various classifiers. EnsembleMatrix allows users to directly interact with the visualizations in order to explore and build combination models. We evaluate the efficacy of the system and the approach in a user study. Results show that users are able to quickly combine multiple classifiers operating on multiple feature sets to produce an ensemble classifier with accuracy that approaches best-reported performance classifying images in the CalTech-101 dataset.

IEEE Transactions on Visualization and Computer Graphics | 2007

Visualization of Heterogeneous Data

Mike Cammarano; Xin Luna Dong; Bryan Chan; Jeff Klingner; Justin Talbot; Alon Y. Halevy; Pat Hanrahan

Both the resource description framework (RDF), used in the semantic web, and Maya Viz u-forms represent data as a graph of objects connected by labeled edges. Existing systems for flexible visualization of this kind of data require manual specification of the possible visualization roles for each data attribute. When the schema is large and unfamiliar, this requirement inhibits exploratory visualization by requiring a costly up-front data integration step. To eliminate this step, we propose an automatic technique for mapping data attributes to visualization attributes. We formulate this as a schema matching problem, finding appropriate paths in the data model for each required visualization attribute in a visualization template.

IEEE Transactions on Visualization and Computer Graphics | 2012

An Empirical Model of Slope Ratio Comparisons

Justin Talbot; John Gerth; Pat Hanrahan

Comparing slopes is a fundamental graph reading task and the aspect ratio chosen for a plot influences how easy these comparisons are to make. According to Banking to 45°, a classic design guideline first proposed and studied by Cleveland et al., aspect ratios that center slopes around 45° minimize errors in visual judgments of slope ratios. This paper revisits this earlier work. Through exploratory pilot studies that expand Cleveland et al.s experimental design, we develop an empirical model of slope ratio estimation that fits more extreme slope ratio judgments and two common slope ratio estimation strategies. We then run two experiments to validate our model. In the first, we show that our model fits more generally than the one proposed by Cleveland et al. and we find that, in general, slope ratio errors are not minimized around 45°. In the second experiment, we explore a novel hypothesis raised by our model: that visible baselines can substantially mitigate errors made in slope judgments. We conclude with an application of our model to aspect ratio selection.

international conference on parallel architectures and compilation techniques | 2012

Riposte: a trace-driven compiler and parallel VM for vector code in R

Justin Talbot; Zachary DeVito; Pat Hanrahan

There is a growing utilization gap between modern hardware and modern programming languages for data analysis. Due to power and other constraints, recent processor design has sought improved performance through increased SIMD and multi-core parallelism. At the same time, high-level, dynamically typed languages for data analysis have become popular. These languages emphasize ease of use and high productivity, but have, in general, low performance and limited support for exploiting hardware parallelism. In this paper, we describe Riposte, a new runtime for the R language, which bridges this gap. Riposte uses tracing, a technique commonly used to accelerate scalar code, to dynamically discover and extract sequences of vector operations from arbitrary R code. Once extracted, we can fuse traces to eliminate unnecessary memory traffic, compile them to use hardware SIMD units, and schedule them to run across multiple cores, allowing us to fully utilize the available parallelism on modern shared-memory machines. Our evaluation shows that Riposte can run vector R code near the speed of hand-optimized C, 5–50× faster than the open source implementation of R, and can also linearly scale to 32 cores for some tasks. Across 12 different workloads we achieve an overall average speed-up of over 150× without explicit programmer parallelization.

IEEE Transactions on Visualization and Computer Graphics | 2010

An Extension of Wilkinson’s Algorithm for Positioning Tick Labels on Axes

Justin Talbot; Sharon Lin; Pat Hanrahan

The non-data components of a visualization, such as axes and legends, can often be just as important as the data itself. They provide contextual information essential to interpreting the data. In this paper, we describe an automated system for choosing positions and labels for axis tick marks. Our system extends Wilkinsons optimization-based labeling approach to create a more robust, full-featured axis labeler. We define an expanded space of axis labelings by automatically generating additional nice numbers as needed and by permitting the extreme labels to occur inside the data range. These changes provide flexibility in problematic cases, without degrading quality elsewhere. We also propose an additional optimization criterion, legibility, which allows us to simultaneously optimize over label formatting, font size, and orientation. To solve this revised optimization problem, we describe the optimization function and an efficient search algorithm. Finally, we compare our method to previous work using both quantitative and qualitative metrics. This paper is a good example of how ideas from automated graphic design can be applied to information visualization.

IEEE Transactions on Visualization and Computer Graphics | 2011

Arc Length-Based Aspect Ratio Selection

Justin Talbot; John Gerth; Pat Hanrahan

The aspect ratio of a plot has a dramatic impact on our ability to perceive trends and patterns in the data. Previous approaches for automatically selecting the aspect ratio have been based on adjusting the orientations or angles of the line segments in the plot. In contrast, we recommend a simple, effective method for selecting the aspect ratio: minimize the arc length of the data curve while keeping the area of the plot constant. The approach is parameterization invariant, robust to a wide range of inputs, preserves visual symmetries in the data, and is a compromise between previously proposed techniques. Further, we demonstrate that it can be effectively used to select the aspect ratio of contour plots. We believe arc length should become the default aspect ratio selection method.

international conference on management of data | 2009

Vispedia: on-demand data integration for interactive visualization and exploration

Bryan Chan; Justin Talbot; Leslie Wu; Nathan Sakunkoo; Mike Cammarano; Pat Hanrahan

Wikipedia is an example of the large, collaborative, semi-structured data sets emerging on the Web. Typically, before these data sets can be used, they must transformed into structured tables via data integration. We present Vispedia, a Web-based visualization system which incorporates data integration into an iterative, interactive data exploration and analysis process. This reduces the upfront cost of using heterogeneous data sets like Wikipedia. Vispedia is driven by a keyword-query-based integration interface implemented using a fast graph search. The search occurs interactively over DBpedias semantic graph of Wikipedia, without depending on the existence of a structured ontology. This combination of data integration and visualization enables a broad class of non-expert users to more effectively use the semi-structured data available on the Web.

IEEE Transactions on Visualization and Computer Graphics | 2016

Automatic Selection of Partitioning Variables for Small Multiple Displays

Anushka Anand; Justin Talbot

Effective small multiple displays are created by partitioning a visualization on variables that reveal interesting conditional structure in the data. We propose a method that automatically ranks partitioning variables, allowing analysts to focus on the most promising small multiple displays. Our approach is based on a randomized, non-parametric permutation test, which allows us to handle a wide range of quality measures for visual patterns defined on many different visualization types, while discounting spurious patterns. We demonstrate the effectiveness of our approach on scatterplots of real-world, multidimensional datasets.

Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming | 2014

Just-in-time Length Specialization of Dynamic Vector Code

Justin Talbot; Zachary DeVito; Pat Hanrahan

Dynamically typed vector languages are popular in data analytics and statistical computing. In these languages, vectors have both dynamic type and dynamic length, making static generation of efficient machine code difficult. In this paper, we describe a trace-based just-in-time compilation strategy that performs partial length specialization of dynamically typed vector code. This selective specialization is designed to avoid excessive compilation overhead while still enabling the generation of efficient machine code through length-based optimizations such as vector fusion, vector copy elimination, and the use of hardware SIMD units. We have implemented our approach in a virtual machine for a subset of R, a vector-based statistical computing language. In a variety of workloads, containing both scalar and vector code, we show near autovectorized C performance over a large range of vector sizes.

Explore More