Huy T. Vo
New York University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Huy T. Vo.
international conference on management of data | 2006
Steven P. Callahan; Juliana Freire; Emanuele Santos; Carlos Eduardo Scheidegger; Cláudio T. Silva; Huy T. Vo
Scientists are now faced with an incredible volume of data to analyze. To successfully analyze and validate various hypothesis, it is necessary to pose several queries, correlate disparate data, and create insightful visualizations of both the simulated processes and observed phenomena. Often, insight comes from comparing the results of multiple visualizations. Unfortunately, today this process is far from interactive and contains many error-prone and time-consuming tasks. As a result, the generation and maintenance of visualizations is a major bottleneck in the scientific process, hindering both the ability to mine scientific data and the actual use of the data. The VisTrails system represents our initial attempt to improve the scientific discovery process and reduce the time to insight. In VisTrails, we address the problem of visualization from a data management perspective: VisTrails manages the data and metadata of a visualization product. In this demonstration, we show the power and flexibility of our system by presenting actual scenarios in which scientific visualization is used and showing how our system improves usability, enables reproducibility, and greatly reduces the time required to create scientific visualizations.
ieee visualization | 2005
Louis Bavoil; Steven P. Callahan; Patricia Crossno; Juliana Freire; Carlos Eduardo Scheidegger; Cláudio T. Silva; Huy T. Vo
VisTrails is a new system that enables interactive multiple-view visualizations by simplifying the creation and maintenance of visualization pipelines, and by optimizing their execution. It provides a general infrastructure that can be combined with existing visualization systems and libraries. A key component of VisTrails is the visualization trail (vistrail), a formal specification of a pipeline. Unlike existing dataflow-based systems, in VisTrails there is a clear separation between the specification of a pipeline and its execution instances. This separation enables powerful scripting capabilities and provides a scalable mechanism for generating a large number of visualizations. VisTrails also leverages the vistrail specification to identify and avoid redundant operations. This optimization is especially useful while exploring multiple visualizations. When variations of the same pipeline need to be executed, substantial speedups can be obtained by caching the results of overlapping subsequences of the pipelines. In this paper, we describe the design and implementation of VisTrails, and show its effectiveness in different application scenarios.
international provenance and annotation workshop | 2006
Juliana Freire; Cláudio T. Silva; Steven P. Callahan; Emanuele Santos; Carlos Eduardo Scheidegger; Huy T. Vo
We give an overview of VisTrails, a system that provides an infrastructure for systematically capturing detailed provenance and streamlining the data exploration process. A key feature that sets VisTrails apart from previous visualization and scientific workflow systems is a novel action-based mechanism that uniformly captures provenance for data products and workflows used to generate these products. This mechanism not only ensures reproducibility of results, but it also simplifies data exploration by allowing scientists to easily navigate through the space of workflows and parameter settings for an exploration task.
IEEE Transactions on Visualization and Computer Graphics | 2013
Nivan Ferreira; Jorge Poco; Huy T. Vo; Juliana Freire; Cláudio T. Silva
As increasing volumes of urban data are captured and become available, new opportunities arise for data-driven analysis that can lead to improvements in the lives of citizens through evidence-based decision making and policies. In this paper, we focus on a particularly important urban data set: taxi trips. Taxis are valuable sensors and information associated with taxi trips can provide unprecedented insight into many different aspects of city life, from economic activity and human behavior to mobility patterns. But analyzing these data presents many challenges. The data are complex, containing geographical and temporal components in addition to multiple variables associated with each trip. Consequently, it is hard to specify exploratory queries and to perform comparative analyses (e.g., compare different regions over time). This problem is compounded due to the size of the data-there are on average 500,000 taxi trips each day in NYC. We propose a new model that allows users to visually query taxi trips. Besides standard analytics queries, the model supports origin-destination queries that enable the study of mobility across the city. We show that this model is able to express a wide range of spatio-temporal queries, and it is also flexible in that not only can queries be composed but also different aggregations and visual representations can be applied, allowing users to explore and compare results. We have built a scalable system that implements this model which supports interactive response times; makes use of an adaptive level-of-detail rendering strategy to generate clutter-free visualization for large results; and shows hidden details to the users in a summary through the use of overlay heat maps. We present a series of case studies motivated by traffic engineers and economists that show how our model and system enable domain experts to perform tasks that were previously unattainable for them.
IEEE Transactions on Visualization and Computer Graphics | 2007
Carlos Eduardo Scheidegger; Huy T. Vo; David Koop; Juliana Freire; Cláudio T. Silva
While there have been advances in visualization systems, particularly in multi-view visualizations and visual exploration, the process of building visualizations remains a major bottleneck in data exploration. We show that provenance metadata collected during the creation of pipelines can be reused to suggest similar content in related visualizations and guide semi-automated changes. We introduce the idea of query-by-example in the context of an ensemble of visualizations, and the use of analogies as first-class operations in a system to guide scalable interactions. We describe an implementation of these techniques in VisTrails, a publicly-available, open-source system.
international conference on data engineering | 2006
Steven P. Callahan; Juliana Freire; Emanuele Santos; Carlos Eduardo Scheidegger; Cláudio T. Silva; Huy T. Vo
Scientists are now faced with an incredible volume of data to analyze. To successfully analyze and validate various hypotheses, it is necessary to pose several queries, correlate disparate data, and create insightful visualizations of both the simulated processes and observed phenomena. Data exploration through visualization requires scientists to go through several steps. In essence, they need to assemble complex workflows that consist of dataset selection, specification of series of operations that need to be applied to the data, and the creation of appropriate visual representations, before they can finally view and analyze the results. Often, insight comes from comparing the results of multiple visualizations that are created during the data exploration process.
international conference on management of data | 2008
Carlos Eduardo Scheidegger; Huy T. Vo; David Koop; Juliana Freire; Cláudio T. Silva
We show how work flow systems can be augmented to leverage provenance information to enhance usability. In particular, we will demonstrate new mechanisms and intuitive user interfaces designed to allow users to query work flows by example and to refine work flows by analogies. These techniques are implemented in VisTrails, an open-source provenance-enabled scientific work flow system that can be combined with a wide range of tools, libraries, and visualization systems. We will show di erent scenarios where these techniques can be used to simplify the notoriously hard tasks of creating and refining work flows.
international conference on conceptual structures | 2011
David Koop; Emanuele Santos; Phillip Mates; Huy T. Vo; Philippe Bonnet; Bela Bauer; Brigitte Surer; Matthias Troyer; Dean N. Williams; Joel E. Tohline; Juliana Freire; Cláudio T. Silva
As publishers establish a greater online presence as well as infrastructure to support the distribution of more varied information, the idea of an executable paper that enables greater interaction has developed. An executable paper provides more information for computational experiments and results than the text, tables, and figures of standard papers. Executable papers can bundle computational content that allow readers and reviewers to interact, validate, and explore experiments. By including such content, authors facilitate future discoveries by lowering the barrier to reproducing and extending results. We present an infrastructure for creating, disseminating, and maintaining executable papers. Our approach is rooted in provenance, the documentation of exactly how data, experiments, and results were generated. We seek to improve the experience for everyone involved in the life cycle of an executable paper. The automated capture of provenance information allows authors to easily integrate and update results into papers as they write, and also helps reviewers better evaluate approaches by enabling them to explore experimental results by varying parameters or data. With a provenance-based system, readers are able to examine exactly how a result was developed to better understand and extend published findings.
ieee symposium on large data analysis and visualization | 2011
Huy T. Vo; Jonathan R. Bronson; Brian Summa; João Luiz Dihl Comba; Juliana Freire; Bill Howe; Valerio Pascucci; Cláudio T. Silva
Large-scale visualization systems are typically designed to efficiently “push” datasets through the graphics hardware. However, exploratory visualization systems are increasingly expected to support scalable data manipulation, restructuring, and querying capabilities in addition to core visualization algorithms. We posit that new emerging abstractions for parallel data processing, in particular computing clouds, can be leveraged to support large-scale data exploration through visualization. In this paper, we take a first step in evaluating the suitability of the MapReduce framework to implement large-scale visualization techniques. MapReduce is a lightweight, scalable, general-purpose parallel data processing framework increasingly popular in the context of cloud computing. Specifically, we implement and evaluate a representative suite of visualization tasks (mesh rendering, isosurface extraction, and mesh simplification) as MapReduce programs, and report quantitative performance results applying these algorithms to realistic datasets. For example, we perform isosurface extraction of up to l6 isovalues for volumes composed of 27 billion voxels, simplification of meshes with 30GBs of data and subsequent rendering with image resolutions up to 800002 pixels. Our results indicate that the parallel scalability, ease of use, ease of access to computing resources, and fault-tolerance of MapReduce offer a promising foundation for a combined data manipulation and data visualization system deployed in a public cloud or a local commodity cluster.
international provenance and annotation workshop | 2008
Steven P. Callahan; Juliana Freire; Carlos Eduardo Scheidegger; Cláudio T. Silva; Huy T. Vo
Currently, there are no general provenance management systems or tools available for existing applications. Our goal is to develop provenance technology that is flexible and adaptable to the wide range of requirements of software applications. By consolidating provenance information for a variety of applications, we can provide a uniform environment for querying, sharing, and re-using provenance in large-scale, collaborative settings. In this paper, we describe our framework for provenance-enabling existing applications. Our approach is applicable to a variety of software systems that are process driven. As a concrete example, we describe a working plug-in for an open source application in scientific visualization.