Yangfeng Ji
Georgia Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yangfeng Ji.
meeting of the association for computational linguistics | 2014
Yangfeng Ji; Jacob Eisenstein
Text-level discourse parsing is notoriously difficult, as distinctions between discourse relations require subtle semantic judgments that are not easily captured using standard features. In this paper, we present a representation learning approach, in which we transform surface features into a latent space that facilitates RST discourse parsing. By combining the machinery of large-margin transition-based structured prediction with representation learning, our method jointly learns to parse discourse while at the same time learning a discourse-driven projection of surface features. The resulting shift-reduce discourse parser obtains substantial improvements over the previous state-of-the-art in predicting relations and nuclearity on the RST Treebank.
empirical methods in natural language processing | 2015
Parminder Bhatia; Yangfeng Ji; Jacob Eisenstein
Discourse structure is the hidden link between surface features and document-level properties, such as sentiment polarity. We show that the discourse analyses produced by Rhetorical Structure Theory (RST) parsers can improve document-level sentiment analysis, via composition of local information up the discourse tree. First, we show that reweighting discourse units according to their position in a dependency representation of the rhetorical structure can yield substantial improvements on lexicon-based sentiment analysis. Next, we present a recursive neural network over the RST structure, which offers significant improvements over classificationbased methods.
north american chapter of the association for computational linguistics | 2016
Yangfeng Ji; Gholamreza Haffari; Jacob Eisenstein
This paper presents a novel latent variable recurrent neural network architecture for jointly modeling sequences of words and (possibly latent) discourse relations between adjacent sentences. A recurrent neural network generates individual words, thus reaping the benefits of discriminatively-trained vector representations. The discourse relations are represented with a latent variable, which can be predicted or marginalized, depending on the task. The resulting model can therefore employ a training objective that includes not only discourse relation classification, but also word prediction. As a result, it outperforms state-of-the-art alternatives for two tasks: implicit discourse relation classification in the Penn Discourse Treebank, and dialog act classification in the Switchboard corpus. Furthermore, by marginalizing over latent discourse relations at test time, we obtain a discourse informed language model, which improves over a strong LSTM baseline.
empirical methods in natural language processing | 2015
Yangfeng Ji; Gongbo Zhang; Jacob Eisenstein
Many discourse relations are explicitly marked with discourse connectives, and these examples could potentially serve as a plentiful source of training data for recognizing implicit discourse relations. However, there are important linguistic differences between explicit and implicit discourse relations, which limit the accuracy of such an approach. We account for these differences by applying techniques from domain adaptation, treating implicitly and explicitly-marked discourse relations as separate domains. The distribution of surface features varies across these two domains, so we apply a marginalized denoising autoencoder to induce a dense, domain-general representation. The label distribution is also domain-specific, so we apply a resampling technique that is similar to instance weighting. In combination with a set of automatically-labeled data, these improvements eliminate more than 80% of the transfer loss incurred by training an implicit discourse relation classifier on explicitly-marked discourse relations.
Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality | 2014
Yangfeng Ji; Hwajung Hong; Rosa I. Arriaga; Agata Rozga; Gregory D. Abowd; Jacob Eisenstein
Discussion forums offer a new source of insight for the experiences and challenges faced by individuals affected by mental disorders. Language technology can help domain experts gather insight from these forums, by aggregating themes and user behaviors across thousands of conversations. We present a novel model for web forums, which captures both thematic content as well as user-specific interests. Applying this model to the Aspies Central forum (which covers issues related to Asperger’s syndrome and autism spectrum disorder), we identify several topics of concern to individuals who report being on the autism spectrum. We perform the evaluation on the data collected from Aspies Central forum, including 1,939 threads, 29,947 posts and 972 users. Quantitative evaluations demonstrate that the topics extracted by this model are substantially more than those obtained by Latent Dirichlet Allocation and the Author-Topic Model. Qualitative analysis by subjectmatter experts suggests intriguing directions for future investigation.
empirical methods in natural language processing | 2013
Yangfeng Ji; Jacob Eisenstein
arXiv: Machine Learning | 2017
Graham Neubig; Chris Dyer; Yoav Goldberg; Austin Matthews; Waleed Ammar; Antonios Anastasopoulos; Miguel Ballesteros; David Chiang; Daniel Clothiaux; Trevor Cohn; Kevin Duh; Manaal Faruqui; Cynthia Gan; Dan Garrette; Yangfeng Ji; Lingpeng Kong; Adhiguna Kuncoro; Gaurav Kumar; Chaitanya Malaviya; Paul Michel; Yusuke Oda; Matthew Richardson; Naomi Saphra; Swabha Swayamdipta; Pengcheng Yin
Transactions of the Association for Computational Linguistics | 2015
Yangfeng Ji; Jacob Eisenstein
arXiv: Computation and Language | 2015
Yangfeng Ji; Trevor Cohn; Lingpeng Kong; Chris Dyer; Jacob Eisenstein
Archive | 2015
Yangfeng Ji; Gongbo Zhang; Jacob Eisenstein