Carlos Scheidegger | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Carlos Scheidegger is active.

Explore More

Publication

Featured researches published by Carlos Scheidegger.

knowledge discovery and data mining | 2015

Certifying and Removing Disparate Impact

Michael Feldman; Sorelle A. Friedler; John Moeller; Carlos Scheidegger; Suresh Venkatasubramanian

What does it mean for an algorithm to be biased? In U.S. law, unintentional bias is encoded via disparate impact, which occurs when a selection process has widely different outcomes for different groups, even as it appears to be neutral. This legal determination hinges on a definition of a protected class (ethnicity, gender) and an explicit description of the process. When computers are involved, determining disparate impact (and hence bias) is harder. It might not be possible to disclose the process. In addition, even if the process is open, it might be hard to elucidate in a legal setting how the algorithm makes its decisions. Instead of requiring access to the process, we propose making inferences based on the data it uses. We present four contributions. First, we link disparate impact to a measure of classification accuracy that while known, has received relatively little attention. Second, we propose a test for disparate impact based on how well the protected class can be predicted from the other attributes. Third, we describe methods by which data might be made unbiased. Finally, we present empirical evidence supporting the effectiveness of our test for disparate impact and our approach for both masking bias and preserving relevant information in the data. Interestingly, our approach resembles some actual selection practices that have recently received legal scrutiny.

IEEE Transactions on Visualization and Computer Graphics | 2013

Nanocubes for Real-Time Exploration of Spatiotemporal Datasets

Lauro Didier Lins; James T. Klosowski; Carlos Scheidegger

Consider real-time exploration of large multidimensional spatiotemporal datasets with billions of entries, each defined by a location, a time, and other attributes. Are certain attributes correlated spatially or temporally? Are there trends or outliers in the data? Answering these questions requires aggregation over arbitrary regions of the domain and attributes of the data. Many relational databases implement the well-known data cube aggregation operation, which in a sense precomputes every possible aggregate query over the database. Data cubes are sometimes assumed to take a prohibitively large amount of space, and to consequently require disk storage. In contrast, we show how to construct a data cube that fits in a modern laptops main memory, even for billions of entries; we call this data structure a nanocube. We present algorithms to compute and query a nanocube, and show how it can be used to generate well-known visual encodings such as heatmaps, histograms, and parallel coordinate plots. When compared to exact visualizations created by scanning an entire dataset, nanocube plots have bounded screen error across a variety of scales, thanks to a hierarchical structure in space and time. We demonstrate the effectiveness of our technique on a variety of real-world datasets, and present memory, timing, and network bandwidth measurements. We find that the timings for the queries in our examples are dominated by network and user-interaction latencies.

IEEE Transactions on Visualization and Computer Graphics | 2014

An Algebraic Process for Visualization Design

Gordon L. Kindlmann; Carlos Scheidegger

We present a model of visualization design based on algebraic considerations of the visualization process. The model helps characterize visual encodings, guide their design, evaluate their effectiveness, and highlight their shortcomings. The model has three components: the underlying mathematical structure of the data or object being visualized, the concrete representation of the data in a computer, and (to the extent possible) a mathematical description of how humans perceive the visualization. Because we believe the value of our model lies in its practical application, we propose three general principles for good visualization design. We work through a collection of examples where our model helps explain the known properties of existing visualizations methods, both good and not-so-good, as well as suggesting some novel methods. We describe how to use the model alongside experimental user studies, since it can help frame experiment outcomes in an actionable manner. Exploring the implications and applications of our model and its design principles should provide many directions for future visualization research.

IEEE Transactions on Visualization and Computer Graphics | 2017

Hashedcubes: Simple, Low Memory, Real-Time Visual Exploration of Big Data

Cícero A.L. Pahins; Sean A. Stephens; Carlos Scheidegger; João Luiz Dihl Comba

We propose Hashedcubes, a data structure that enables real-time visual exploration of large datasets that improves the state of the art by virtue of its low memory requirements, low query latencies, and implementation simplicity. In some instances, Hashedcubes notably requires two orders of magnitude less space than recent data cube visualization proposals. In this paper, we describe the algorithms to build and query Hashedcubes, and how it can drive well-known interactive visualizations such as binned scatterplots, linked histograms and heatmaps. We report memory usage, build time and query latencies for a variety of synthetic and real-world datasets, and find that although sometimes Hashedcubes offers slightly slower querying times to the state of the art, the typical query is answered fast enough to easily sustain a interaction. In datasets with hundreds of millions of elements, only about 2% of the queries take longer than 40ms. Finally, we discuss the limitations of data structure, potential spacetime tradeoffs, and future research directions.

IEEE Transactions on Visualization and Computer Graphics | 2017

Gaussian Cubes: Real-Time Modeling for Visual Exploration of Large Multidimensional Datasets

Zhe Wang; Nivan Ferreira; Youhao Wei; Aarthy Sankari Bhaskar; Carlos Scheidegger

Recently proposed techniques have finally made it possible for analysts to interactively explore very large datasets in real time. However powerful, the class of analyses these systems enable is somewhat limited: specifically, one can only quickly obtain plots such as histograms and heatmaps. In this paper, we contribute Gaussian Cubes, which significantly improves on state-of-the-art systems by providing interactive modeling capabilities, which include but are not limited to linear least squares and principal components analysis (PCA). The fundamental insight in Gaussian Cubes is that instead of precomputing counts of many data subsets (as state-of-the-art systems do), Gaussian Cubes precomputes the best multivariate Gaussian for the respective data subsets. As an example, Gaussian Cubes can fit hundreds of models over millions of data points in well under a second, enabling novel types of visual exploration of such large datasets. We present three case studies that highlight the visualization and analysis capabilities in Gaussian Cubes, using earthquake safety simulations, astronomical catalogs, and transportation statistics. The dataset sizes range around one hundred million elements and 5 to 10 dimensions. We present extensive performance results, a discussion of the limitations in Gaussian Cubes, and future research directions.

eurographics | 2015

Map-based Visualizations Increase Recall Accuracy of Data

Bahador Saket; Carlos Scheidegger; Stephen G. Kobourov; Katy Börner

We investigate the memorability of data represented in two different visualization designs. In contrast to recent studies that examine which types of visual information make visualizations memorable, we examine the effect of different visualizations on time and accuracy of recall of the displayed data, minutes and days after interaction with the visualizations. In particular, we describe the results of an evaluation comparing the memorability of two different visualizations of the same relational data: node‐link diagrams and map‐based visualization. We find significant differences in the accuracy of the tasks performed, and these differences persist days after the original exposure to the visualizations. Specifically, participants in the study recalled the data better when exposed to map‐based visualizations as opposed to node‐link diagrams. We discuss the scope of the study and its limitations, possible implications, and future directions.

ieee vgtc conference on visualization | 2016

Comparing node-link and node-link-group visualizations from an enjoyment perspective

Bahador Saket; Carlos Scheidegger; Stephen G. Kobourov

While evaluation studies in visualization often involve traditional performance measurements, there has been a concerted effort to move beyond time and accuracy. Of these alternative aspects, memorability and recall of visualizations have been recently considered, but other aspects such as enjoyment and engagement are not as well explored. We study the enjoyment of two different visualization methods through a user study. In particular, we describe the results of a three‐phase experiment comparing the enjoyment of two different visualizations of the same relational data: node‐link and node‐link‐group visualizations. The results indicate that the participants in this study found node‐link‐group visualizations more enjoyable than node‐link visualizations.

IEEE Transactions on Visualization and Computer Graphics | 2016

A Simple Approach for Boundary Improvement of Euler Diagrams

Paolo Simonetto; Daniel W. Archambault; Carlos Scheidegger

General methods for drawing Euler diagrams tend to generate irregular polygons. Yet, empirical evidence indicates that smoother contours make these diagrams easier to read. In this paper, we present a simple method to smooth the boundaries of any Euler diagram drawing. When refining the diagram, the method must ensure that set elements remain inside their appropriate boundaries and that no region is removed or created in the diagram. Our approach uses a force system that improves the diagram while at the same time ensuring its topological structure does not change. We demonstrate the effectiveness of the approach through case studies and quantitative evaluations.

IEEE Transactions on Visualization and Computer Graphics | 2008

Edge Groups: An Approach to Understanding the Mesh Quality of Marching Methods

Carlos A. Dietrich; Carlos Scheidegger; João Luiz Dihl Comba; Luciana Porcher Nedel; Cláudio T. Silva

Marching cubes is the most popular isosurface extraction algorithm due to its simplicity, efficiency and robustness. It has been widely studied, improved, and extended. While much early work was concerned with efficiency and correctness issues, lately there has been a push to improve the quality of marching cubes meshes so that they can be used in computational codes. In this work we present a new classification of MC cases that we call edge groups, which helps elucidate the issues that impact the triangle quality of the meshes that the method generates. This formulation allows a more systematic way to bound the triangle quality, and is general enough to extend to other polyhedral cell shapes used in other polygonization algorithms. Using this analysis, we also discuss ways to improve the quality of the resulting triangle mesh, including some that require only minor modifications of the original algorithm.

arXiv: Human-Computer Interaction | 2015

Towards Understanding Enjoyment and Flow in Information Visualization

Bahador Saket; Carlos Scheidegger; Stephen G. Kobourov

Traditionally, evaluation studies in information visualization have measured effectiveness by assessing performance time and accuracy. More recently, there has been a concerted effort to understand aspects beyond time and errors. In this paper we study enjoyment, which, while arguably not the primary goal of visualization, has been shown to impact performance and memorability. Different models of enjoyment have been proposed in psychology, education and gaming; yet there is no standard approach to evaluate and measure enjoyment in visualization. In this paper we relate the flow model of Csikszentmihalyi to Munzners nested model of visualization evaluation and previous work in the area. We suggest that, even though previous papers tackled individual elements of flow, in order to understand what specifically makes a visualization enjoyable, it might be necessary to measure all specific elements.

Explore More