Is this you? Create Your Porfile

Jaegul Choo

Georgia Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jaegul Choo is active.

Explore More

Publication

Featured researches published by Jaegul Choo.

IEEE Transactions on Visualization and Computer Graphics | 2013

UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization

Jaegul Choo; Changhyun Lee; Chandan K. Reddy; Haesun Park

Topic modeling has been widely used for analyzing text document collections. Recently, there have been significant advancements in various topic modeling techniques, particularly in the form of probabilistic graphical modeling. State-of-the-art techniques such as Latent Dirichlet Allocation (LDA) have been successfully applied in visual text analytics. However, most of the widely-used methods based on probabilistic modeling have drawbacks in terms of consistency from multiple runs and empirical convergence. Furthermore, due to the complicatedness in the formulation and the algorithm, LDA cannot easily incorporate various types of user feedback. To tackle this problem, we propose a reliable and flexible visual analytics system for topic modeling called UTOPIAN (User-driven Topic modeling based on Interactive Nonnegative Matrix Factorization). Centered around its semi-supervised formulation, UTOPIAN enables users to interact with the topic modeling method and steer the result in a user-driven manner. We demonstrate the capability of UTOPIAN via several usage scenarios with real-world document corpuses such as InfoVis/VAST paper data set and product review data sets.

visual analytics science and technology | 2010

iVisClassifier: An interactive visual analytics system for classification based on supervised dimension reduction

Jaegul Choo; Hanseung Lee; Jaeyeon Kihm; Haesun Park

We present an interactive visual analytics system for classification, iVisClassifier, based on a supervised dimension reduction method, linear discriminant analysis (LDA). Given high-dimensional data and associated cluster labels, LDA gives their reduced dimensional representation, which provides a good overview about the cluster structure. Instead of a single two- or three-dimensional scatter plot, iVisClassifier fully interacts with all the reduced dimensions obtained by LDA through parallel coordinates and a scatter plot. Furthermore, it significantly improves the interactivity and interpretability of LDA. LDA enables users to understand each of the reduced dimensions and how they influence the data by reconstructing the basis vector into the original data domain. By using heat maps, iVisClassifier gives an overview about the cluster relationship in terms of pairwise distances between cluster centroids both in the original space and in the reduced dimensional space. Equipped with these functionalities, iVisClassifier supports users classification tasks in an efficient way. Using several facial image data, we show how the above analysis is performed.

Computer Graphics Forum | 2012

iVisClustering: An Interactive Visual Document Clustering via Topic Modeling

Hanseung Lee; Jaeyeon Kihm; Jaegul Choo; John T. Stasko; Haesun Park

Clustering plays an important role in many large‐scale data analyses providing users with an overall understanding of their data. Nonetheless, clustering is not an easy task due to noisy features and outliers existing in the data, and thus the clustering results obtained from automatic algorithms often do not make clear sense. To remedy this problem, automatic clustering should be complemented with interactive visualization strategies. This paper proposes an interactive visual analytics system for document clustering, called iVisClustering, based on a widely‐used topic modeling method, latent Dirichlet allocation (LDA). iVisClustering provides a summary of each cluster in terms of its most representative keywords and visualizes soft clustering results in parallel coordinates. The main view of the system provides a 2D plot that visualizes cluster similarities and the relation among data items with a graph‐based representation. iVisClustering provides several other views, which contain useful interaction methods. With help of these visualization modules, we can interactively refine the clustering results in various ways. Keywords can be adjusted so that they characterize each cluster better. In addition, our system can filter out noisy data and re‐cluster the data accordingly. Cluster hierarchy can be constructed using a tree structure and for this purpose, the system supports cluster‐level interactions such as sub‐clustering, removing unimportant clusters, merging the clusters that have similar meanings, and moving certain clusters to any other node in the tree structure. Furthermore, the system provides document‐level interactions such as moving mis‐clustered documents to another cluster and removing useless documents. Finally, we present how interactive clustering is performed via iVisClustering by using real‐world document data sets.

IEEE Transactions on Visualization and Computer Graphics | 2013

Combining Computational Analyses and Interactive Visualization for Document Exploration and Sensemaking in Jigsaw

Carsten Görg; Zhicheng Liu; Jaeyeon Kihm; Jaegul Choo; Haesun Park; John T. Stasko

Investigators across many disciplines and organizations must sift through large collections of text documents to understand and piece together information. Whether they are fighting crime, curing diseases, deciding what car to buy, or researching a new field, inevitably investigators will encounter text documents. Taking a visual analytics approach, we integrate multiple text analysis algorithms with a suite of interactive visualizations to provide a flexible and powerful environment that allows analysts to explore collections of documents while sensemaking. Our particular focus is on the process of integrating automated analyses with interactive visualizations in a smooth and fluid manner. We illustrate this integration through two example scenarios: An academic researcher examining InfoVis and VAST conference papers and a consumer exploring car reviews while pondering a purchase decision. Finally, we provide lessons learned toward the design and implementation of visual analytics systems for document exploration and understanding.

IEEE Computer Graphics and Applications | 2013

Customizing Computational Methods for Visual Analytics with Big Data

Jaegul Choo; Haesun Park

The volume of available data has been growing exponentially, increasing data problems complexity and obscurity. In response, visual analytics (VA) has gained attention, yet its solutions havent scaled well for big data. Computational methods can improve VAs scalability by giving users compact, meaningful information about the input data. However, the significant computation time these methods require hinders real-time interactive visualization of big data. By addressing crucial discrepancies between these methods and VA regarding precision and convergence, researchers have proposed ways to customize them for VA. These approaches, which include low-precision computation and iteration-level interactive visualization, ensure real-time interactive VA for big data.

visual analytics science and technology | 2009

Two-stage framework for visualization of clustered high dimensional data

Jaegul Choo; Shawn J. Bohn; Haesun Park

In this paper, we discuss dimension reduction methods for 2D visualization of high dimensional clustered data. We propose a twostage framework for visualizing such data based on dimension reduction methods. In the first stage, we obtain the reduced dimensional data by applying a supervised dimension reduction method such as linear discriminant analysis which preserves the original cluster structure in terms of its criteria. The resulting optimal reduced dimension depends on the optimization criteria and is often larger than 2. In the second stage, the dimension is further reduced to 2 for visualization purposes by another dimension reduction method such as principal component analysis. The role of the second-stage is to minimize the loss of information due to reducing the dimension all the way to 2. Using this framework, we propose several two-stage methods, and present their theoretical characteristics as well as experimental comparisons on both artificial and real-world text data sets.

Archive | 2015

Nonnegative Matrix Factorization for Interactive Topic Modeling and Document Clustering

Da Kuang; Jaegul Choo; Haesun Park

Nonnegative matrix factorization (NMF) approximates a nonnegative matrix by the product of two low-rank nonnegative matrices. Since it gives semantically meaningful result that is easily interpretable in clustering applications, NMF has been widely used as a clustering method especially for document data, and as a topic modeling method.We describe several fundamental facts of NMF and introduce its optimization framework called block coordinate descent. In the context of clustering, our framework provides a flexible way to extend NMF such as the sparse NMF and the weakly-supervised NMF. The former provides succinct representations for better interpretations while the latter flexibly incorporate extra information and user feedback in NMF, which effectively works as the basis for the visual analytic topic modeling system that we present.Using real-world text data sets, we present quantitative experimental results showing the superiority of our framework from the following aspects: fast convergence, high clustering accuracy, sparse representation, consistent output, and user interactivity. In addition, we present a visual analytic system called UTOPIAN (User-driven Topic modeling based on Interactive NMF) and show several usage scenarios.Overall, our book chapter cover the broad spectrum of NMF in the context of clustering and topic modeling, from fundamental algorithmic behaviors to practical visual analytics systems.

visualization and data analysis | 2013

An interactive visual testbed system for dimension reduction and clustering of large-scale high-dimensional data

Jaegul Choo; Hanseung Lee; Zhicheng Liu; John T. Stasko; Haesun Park

Many of the modern data sets such as text and image data can be represented in high-dimensional vector spaces and have benefited from computational methods that utilize advanced computational methods. Visual analytics approaches have contributed greatly to data understanding and analysis due to their capability of leveraging humans’ ability for quick visual perception. However, visual analytics targeting large-scale data such as text and image data has been challenging due to the limited screen space in terms of both the numbers of data points and features to represent. Among various computational methods supporting visual analytics, dimension reduction and clustering have played essential roles by reducing these numbers in an intelligent way to visually manageable sizes. Given numerous dimension reduction and clustering methods available, however, the decision on the choice of algorithms and their parameters becomes difficult. In this paper, we present an interactive visual testbed system for dimension reduction and clustering in a large-scale high-dimensional data analysis. The testbed system enables users to apply various dimension reduction and clustering methods with different settings, visually compare the results from different algorithmic methods to obtain rich knowledge for the data and tasks at hand, and eventually choose the most appropriate path for a collection of algorithms and parameters. Using various data sets such as documents, images, and others that are already encoded in vectors, we demonstrate how the testbed system can support these tasks.

web search and data mining | 2014

Understanding and promoting micro-finance activities in Kiva.org

Jaegul Choo; Changhyun Lee; Daniel Lee; Hongyuan Zha; Haesun Park

Non-profit Micro-finance organizations provide loaning opportunities to eradicate poverty by financially equipping impoverished, yet skilled entrepreneurs who are in desperate need of an institution that lends to those who have little. Kiva.org, a widely-used crowd-funded micro-financial service, provides researchers with an extensive amount of publicly available data containing a rich set of heterogeneous information regarding micro-financial transactions. Our objective in this paper is to identify the key factors that encourage people to make micro-financing donations, and ultimately, to keep them actively involved. In our contribution to further promote a healthy micro-finance ecosystem, we detail our personalized loan recommendation system which we formulate as a supervised learning problem where we try to predict how likely a given lender will fund a new loan. We construct the features for each data item by utilizing the available connectivity relationships in order to integrate all the available Kiva data sources. For those lenders with no such relationships, e.g., first-time lenders, we propose a novel method of feature construction by computing joint nonnegative matrix factorizations. Utilizing gradient boosting tree methods, a state-of-the-art prediction model, we are able to achieve up to 0.92 AUC (area under the curve) value, which shows the potential of our methods for practical deployment. Finally, we point out several interesting phenomena on lenders social behaviors in micro-finance activities.

international world wide web conferences | 2014

To gather together for a better world: understanding and leveraging communities in micro-lending recommendation

Jaegul Choo; Daniel Lee; Bistra Dilkina; Hongyuan Zha; Haesun Park

Micro-finance organizations provide non-profit lending opportunities to mitigate poverty by financially supporting impoverished, yet skilled entrepreneurs who are in desperate need of an institution that lends to them. In Kiva.org, a widely-used crowd-funded micro-financial service, a vast amount of micro-financial activities are done by lending teams, and thus, understanding their diverse characteristics is crucial in maintaining a healthy micro-finance ecosystem. As the first step for this goal, we model different lending teams by using a maximum-entropy distribution approach based on a wealthy set of heterogeneous information regarding micro-financial transactions available at Kiva. Based on this approach, we achieved a competitive performance in predicting the lending activities for the top 200 teams. Furthermore, we provide deep insight about the characteristics of lending teams by analyzing the resulting team-specific lending models. We found that lending teams are generally more careful in selecting loans by a loans geo-location, a borrowers gender, a field partners reliability, etc., when compared to lenders without team affiliations. In addition, we identified interesting lending behaviors of different lending teams based on lenders background and interest such as their ethnic, religious, linguistic, educational, regional, and occupational aspects. Finally, using our proposed model, we tackled a novel problem of lending team recommendation and showed its promising performance results.

Explore More