Jason Chuang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jason Chuang is active.

Explore More

Publication

Featured researches published by Jason Chuang.

advanced visual interfaces | 2012

Termite: visualization techniques for assessing textual topic models

Jason Chuang; Christopher D. Manning; Jeffrey Heer

Topic models aid analysis of text corpora by identifying latent topics based on co-occurring words. Real-world deployments of topic models, however, often require intensive expert verification and model refinement. In this paper we present Termite, a visual analysis tool for assessing topic model quality. Termite uses a tabular layout to promote comparison of terms both within and across latent topics. We contribute a novel saliency measure for selecting relevant terms and a seriation algorithm that both reveals clustering structure and promotes the legibility of related terms. In a series of examples, we demonstrate how Termite allows analysts to identify coherent and significant themes.

Genome Research | 2013

RNA sequencing reveals a diverse and dynamic repertoire of the Xenopus tropicalis transcriptome over development

Meng How Tan; Kin Fai Au; Arielle L. Yablonovitch; Andrea E. Wills; Jason Chuang; Julie C. Baker; Wing Hung Wong; Jin Billy Li

The Xenopus embryo has provided key insights into fate specification, the cell cycle, and other fundamental developmental and cellular processes, yet a comprehensive understanding of its transcriptome is lacking. Here, we used paired end RNA sequencing (RNA-seq) to explore the transcriptome of Xenopus tropicalis in 23 distinct developmental stages. We determined expression levels of all genes annotated in RefSeq and Ensembl and showed for the first time on a genome-wide scale that, despite a general state of transcriptional silence in the earliest stages of development, approximately 150 genes are transcribed prior to the midblastula transition. In addition, our splicing analysis uncovered more than 10,000 novel splice junctions at each stage and revealed that many known genes have additional unannotated isoforms. Furthermore, we used Cufflinks to reconstruct transcripts from our RNA-seq data and found that ∼13.5% of the final contigs are derived from novel transcribed regions, both within introns and in intergenic regions. We then developed a filtering pipeline to separate protein-coding transcripts from noncoding RNAs and identified a confident set of 6686 noncoding transcripts in 3859 genomic loci. Since the current reference genome, XenTro3, consists of hundreds of scaffolds instead of full chromosomes, we also performed de novo reconstruction of the transcriptome using Trinity and uncovered hundreds of transcripts that are missing from the genome. Collectively, our data will not only aid in completing the assembly of the Xenopus tropicalis genome but will also serve as a valuable resource for gene discovery and for unraveling the fundamental mechanisms of vertebrate embryogenesis.

ACM Transactions on Computer-Human Interaction | 2012

“Without the clutter of unimportant words”: Descriptive keyphrases for text visualization

Jason Chuang; Christopher D. Manning; Jeffrey Heer

Keyphrases aid the exploration of text collections by communicating salient aspects of documents and are often used to create effective visualizations of text. While prior work in HCI and visualization has proposed a variety of ways of presenting keyphrases, less attention has been paid to selecting the best descriptive terms. In this article, we investigate the statistical and linguistic properties of keyphrases chosen by human judges and determine which features are most predictive of high-quality descriptive phrases. Based on 5,611 responses from 69 graduate students describing a corpus of dissertation abstracts, we analyze characteristics of human-generated keyphrases, including phrase length, commonness, position, and part of speech. Next, we systematically assess the contribution of each feature within statistical models of keyphrase quality. We then introduce a method for grouping similar terms and varying the specificity of displayed phrases so that applications can select phrases dynamically based on the available screen space and current context of interaction. Precision-recall measures find that our technique generates keyphrases that match those selected by human judges. Crowdsourced ratings of tag cloud visualizations rank our approach above other automatic techniques. Finally, we discuss the role of HCI methods in developing new algorithmic techniques suitable for user-facing applications.

empirical methods in natural language processing | 2014

Human Effort and Machine Learnability in Computer Aided Translation

Spence Green; Sida I. Wang; Jason Chuang; Jeffrey Heer; Sebastian Schuster; Christopher D. Manning

Analyses of computer aided translation typically focus on either frontend interfaces and human effort, or backend translation and machine learnability of corrections. However, this distinction is artificial in practice since the frontend and backend must work in concert. We present the first holistic, quantitative evaluation of these issues by contrasting two assistive modes: postediting and interactive machine translation (MT). We describe a new translator interface, extensive modifications to a phrasebased MT system, and a novel objective function for re-tuning to human corrections. Evaluation with professional bilingual translators shows that post-edit is faster than interactive at the cost of translation quality for French-English and EnglishGerman. However, re-tuning the MT system to interactive output leads to larger, statistically significant reductions in HTER versus re-tuning to post-edit. Analysis shows that tuning directly to HTER results in fine-grained corrections to subsequent machine output.

user interface software and technology | 2014

Predictive translation memory: a mixed-initiative system for human language translation

Spence Green; Jason Chuang; Jeffrey Heer; Christopher D. Manning

The standard approach to computer-aided language translation is post-editing: a machine generates a single translation that a human translator corrects. Recent studies have shown this simple technique to be surprisingly effective, yet it underutilizes the complementary strengths of precision-oriented humans and recall-oriented machines. We present Predictive Translation Memory, an interactive, mixed-initiative system for human language translation. Translators build translations incrementally by considering machine suggestions that update according to the users current partial translation. In a large-scale study, we find that professional translators are slightly slower in the interactive mode yet produce slightly higher quality translations despite significant prior experience with the baseline post-editing condition. Our analysis identifies significant predictors of time and quality, and also characterizes interactive aid usage. Subjects entered over 99% of characters via interactive aids, a significantly higher fraction than that shown in previous work.

north american chapter of the association for computational linguistics | 2015

TopicCheck: Interactive Alignment for Assessing Topic Model Stability.

Jason Chuang; Margaret E. Roberts; Brandon M. Stewart; Rebecca Weiss; Dustin Tingley; Justin Grimmer; Jeffrey Heer

Content analysis, a widely-applied social science research method, is increasingly being supplemented by topic modeling. However, while the discourse on content analysis centers heavily on reproducibility, computer scientists often focus more on scalability and less on coding reliability, leading to growing skepticism on the usefulness of topic models for automated content analysis. In response, we introduce TopicCheck, an interactive tool for assessing topic model stability. Our contributions are threefold. First, from established guidelines on reproducible content analysis, we distill a set of design requirements on how to computationally assess the stability of an automated coding process. Second, we devise an interactive alignment algorithm for matching latent topics from multiple models, and enable sensitivity evaluation across a large number of models. Finally, we demonstrate that our tool enables social scientists to gain novel insights into three active research questions.

Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces | 2014

Concurrent Visualization of Relationships between Words and Topics in Topic Models

Alison Smith; Jason Chuang; Yuening Hu; Jordan L. Boyd-Graber; Leah Findlater

Analysis tools based on topic models are often used as a means to explore large amounts of unstructured data. Users often reason about the correctness of a model using relationships between words within the topics or topics within the model. We compute this useful contextual information as term co-occurrence and topic covariance and overlay it on top of standard topic model output via an intuitive interactive visualization. This is a work in progress with the end goal to combine the visual representation with interactions and online learning, so the users can directly explore (a) why a model may not align with their intuition and (b) modify the model as needed.

empirical methods in natural language processing | 2013