Tae Yano
Carnegie Mellon University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tae Yano.
north american chapter of the association for computational linguistics | 2009
Tae Yano; William W. Cohen; Noah A. Smith
In this paper we model discussions in online political blogs. To do this, we extend Latent Dirichlet Allocation (Blei et al., 2003), in various ways to capture different characteristics of the data. Our models jointly describe the generation of the primary documents (posts) as well as the authorship and, optionally, the contents of the blog communitys verbal reactions to each post (comments). We evaluate our model on a novel comment prediction task where the models are used to predict which blog users will leave comments on a given post. We also provide a qualitative discussion about what the models discover.
conference on information and knowledge management | 2013
Michael Gamon; Tae Yano; Xinying Song; Johnson Apacible; Patrick Pantel
We propose a system that determines the salience of entities within web documents. Many recent advances in commercial search engines leverage the identification of entities in web pages. However, for many pages, only a small subset of entities are central to the document, which can lead to degraded relevance for entity triggered experiences. We address this problem by devising a system that scores each entity on a web page according to its centrality to the page content. We propose salience classification functions that incorporate various cues from document content, web search logs, and a large web graph. To cost-effectively train the models, we introduce a soft labeling methodology that generates a set of annotations based on user behaviors observed in web search logs. We evaluate several variations of our model via a large-scale empirical study conducted over a test set, which we release publicly to the research community. We demonstrate that our methods significantly outperform competitive baselines and the previous state of the art, while keeping the human annotation cost to a minimum.
knowledge discovery and data mining | 2013
Daniel Preoţiuc-Pietro; Justin Cranshaw; Tae Yano
In this work we explore the use of incidentally generated social network data for the folksonomic characterization of cities by the types of amenities located within them. Using data collected about venue categories in various cities, we examine the effect of different granularities of spatial aggregation and data normalization when representing a city as a collection of its venues. We introduce three vector-based representations of a city, where aggregations of the venue categories are done within a grid structure, within the citys municipal neighborhoods, and across the city as a whole. We apply our methods to a novel dataset consisting of Foursquare venue data from 17 cities across the United States, totaling over 1 million venues. Our preliminary investigation demonstrates that different assumptions in the urban perception could lead to qualitative, yet distinctive, variations in the induced city description and categorization.
Archive | 2007
Rebecca J. Passonneau; Tae Yano; Judith L. Klavans; Rachael Bradley; Carolyn Sheffield; Eileen G. Abels; Laura Jenemann
We describe a series of studies aimed at identifying specifications for a text extraction module of an image indexer’s toolkit. The materials used in the studies consist of images paired with paragraph sequences that describe the images. We administered a pilot survey to visual resource center professionals at three universities to determine what types of paragraphs would be preferred for metadata selection. Respondents generally showed a strong preference for one of two paragraphs they were presented with, indicating that not all paragraphs that describe images are seen as good sources of metadata. We developed a set of semantic category labels to assign to spans of text in order to distinguish between different types of information about the images, thus to classify metadata contexts. Human agreement on metadata is notoriously variable. In order to maximize agreement, we conducted four human labeling experiments using the seven semantic category labels we developed. A subset of our labelers had much higher inter-annotator reliability, and highest reliability occurs when labelers can pick two labels per text unit.
international conference on weblogs and social media | 2010
Tae Yano; Noah A. Smith
north american chapter of the association for computational linguistics | 2010
Tae Yano; Philip Resnik; Noah A. Smith
north american chapter of the association for computational linguistics | 2012
Tae Yano; Noah A. Smith; John Wilkerson
international conference on weblogs and social media | 2013
Tae Yano; Dani Yogatama; Noah A. Smith
empirical methods in natural language processing | 2011
Jacob Eisenstein; Tae Yano; William W. Cohen; Noah A. Smith; Eric P. Xing
Archive | 2013
Michael Gamon; Tae Yano; Xinying Song; Johnson Apacible; Patrick Pantel