Frances Yung
Nara Institute of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Frances Yung.
Proceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications | 2015
Enrico Santus; Frances Yung; Alessandro Lenci; Chu-Ren Huang
In this paper, we introduce EVALution 1.0, a dataset designed for the training and the evaluation of Distributional Semantic Models (DSMs). This version consists of almost 7.5K tuples, instantiating several semantic relations between word pairs (including hypernymy, synonymy, antonymy, meronymy). The dataset is enriched with a large amount of additional information (i.e. relation domain, word frequency, word POS, word semantic field, etc.) that can be used for either filtering the pairs or performing an in-depth analysis of the results. The tuples were extracted from a combination of ConceptNet 5.0 and WordNet 4.0, and subsequently filtered through automatic methods and crowdsourcing in order to ensure their quality. The dataset is freely downloadable1. An extension in RDF format, including also scripts for data processing, is under development.
Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing | 2015
Frances Yung; Kevin Duh; Yuji Matsumoto
We propose a linguistically driven approach to represent discourse relations in Chinese text as sequences. We observe that certain surface characteristics of Chinese texts, such as the order of clauses, are overt markers of discourse structures, yet existing annotation proposals adapted from formalism constructed for English do not fully incorporate these characteristics. We present an annotated resource consisting of 325 articles in the Chinese Treebank. In addition, using this annotation, we introduce a discourse chunker based on a cascade of classifiers and report 70% top-level discourse sense accuracy.
meeting of the association for computational linguistics | 2016
Frances Yung; Kevin Duh; Taku Komura; Yuji Matsumoto
We propose a framework to model human comprehension of discourse connectives. Following the Bayesian pragmatic paradigm, we advocate that discourse connectives are interpreted based on a simulation of the production process by the speaker, who, in turn, considers the ease of interpretation for the listener when choosing connectives. Evaluation against the sense annotation of the Penn Discourse Treebank confirms the superiority of the model over literal comprehension. A further experiment demonstrates that the proposed model also improves automatic discourse parsing.
empirical methods in natural language processing | 2015
Frances Yung; Kevin Duh; Yuji Matsumoto
Usage of discourse connectives (DCs) differs across languages, thus addition and omission of connectives are common in translation. We investigate how implicit (omitted) DCs in the source text impacts various machine translation (MT) systems, and whether a discourse parser is needed as a preprocessor to explicitate implicit DCs. Based on the manual annotation and alignment of 7266 pairs of discourse relations in a Chinese-English translation corpus, we evaluate whether a preprocessing step that inserts explicit DCs at positions of implicit relations can improve MT. Results show that, without modifying the translation model, explicitating implicit relations in the input source text has limited effect on MT evaluation scores. In addition, translation spotting analysis shows that it is crucial to identify DCs that should be explicitly translated in order to improve implicit-to-explicit DC translation. On the other hand, further analysis reveals that the disambiguation as well as explicitation of implicit relations are subject to a certain level of optionality, suggesting the limitation to learn and evaluate this linguistic phenomenon using standard parallel corpora.
meeting of the association for computational linguistics | 2014
Frances Yung
Translation of discourse relations is one of the recent efforts of incorporating discourse information to statistical machine translation (SMT). While existing works focus on disambiguation of ambiguous discourse connectives, or transformation of discourse trees, only explicit discourse relations are tackled. A greater challenge exists in machine translation of Chinese, since implicit discourse relations are abundant and occur both inside and outside a sentence. This thesis proposal describes ongoing work on bilingual discourse annotation and plans towards incorporating discourse relation knowledge to a ChineseEnglish SMT system with consideration of implicit discourse relations. The final goal is a discourse-unit-based translation model unbounded by the traditional assumption of sentence-to-sentence translation.
conference on computational natural language learning | 2016
Frances Yung; Kevin Duh; Taku Komura; Yuji Matsumoto
Discourse relations can either be implicit or explicitly expressed by markers, such as ’therefore’ and ’but’. How a speaker makes this choice is a question that is not well understood. We propose a psycholinguistic model that predicts whether a speaker will produce an explicit marker given the discourse relation s/he wishes to express. Based on the framework of the Rational Speech Acts model, we quantify the utility of producing a marker based on the information-theoretic measure of surprisal, the cost of production, and a bias to maintain uniform information density throughout the utterance. Experiments based on the Penn Discourse Treebank show that our approach outperforms stateof-the-art approaches, while giving an explanatory account of the speaker’s choice.
conference of the european chapter of the association for computational linguistics | 2014
Frances Yung; Kevin Duh; Yuji Matsumoto
Professional human translators usually do not employ the concept of word alignments, producing translations ‘sense-forsense’ instead of ‘word-for-word’. This suggests that unalignable words may be prevalent in the parallel text used for machine translation (MT). We analyze this phenomenon in-depth for Chinese-English translation. We further propose a simple and effective method to improve automatic word alignment by pre-removing unalignable words, and show improvements on hierarchical MT systems in both translation directions. 1 Motivation It is generally acknowledged that absolute equivalence between two languages is impossible, since concept lexicalization varies across languages. Major translation theories thus argue that texts should be translated ‘sense-for-sense’ instead of ‘word-for-word’ (Nida, 1964). This suggests that unalignable words may be an issue for the parallel text used to train current statistical machine translation (SMT) systems. Although existing automatic word alignment methods have some mechanism to handle the lack of exact word-for-word alignment (e.g. null probabilities, fertility in the IBM models (Brown et al., 1993)), they may be too coarse-grained to model the ’sense-for-sense’ translations created by professional human translators.
north american chapter of the association for computational linguistics | 2013
Yutaro Shigeto; Ai Azuma; Sorami Hisamoto; Shuhei Kondo; Tomoya Kouse; Keisuke Sakaguchi; Akifumi Yoshimoto; Frances Yung; Yuji Matsumoto
Dialogue and Discourse | 2017
Frances Yung; Kevin Duh; Taku Komura; Yuji Matsumoto
arXiv: Computation and Language | 2018
Wei Shi; Frances Yung; Vera Demberg