Is this you? Create Your Porfile

Isao Goto

National Institute of Information and Communications Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Isao Goto is active.

Explore More

Publication

Featured researches published by Isao Goto.

international conference on computational linguistics | 2004

Back transliteration from Japanese to English using target English context

Isao Goto; Naoto Kato; Terumasa Ehara; Hideki Tanaka

This paper proposes a method of automatic back transliteration of proper nouns, in which a Japanese transliterated-word is restored to the original English word. The English words are created from a sequence of letters; thus our method can create new English words that are not registered in dictionaries or English word lists. When a katakana character is converted into English letters, there are various candidates of alphabetic characters. To ensure adequate conversion, the proposed method uses a target English context to calculate the probability of an English character or string corresponding to a Japanese katakana character or string. We confirmed the effectiveness of using the target English context by an experiment of personal-name back transliteration.

acm transactions on asian and low resource language information processing | 2015

Preordering using a Target-Language Parser via Cross-Language Syntactic Projection for Statistical Machine Translation

Isao Goto; Masao Utiyama; Eiichiro Sumita; Sadao Kurohashi

When translating between languages with widely different word orders, word reordering can present a major challenge. Although some word reordering methods do not employ source-language syntactic structures, such structures are inherently useful for word reordering. However, high-quality syntactic parsers are not available for many languages. We propose a preordering method using a target-language syntactic parser to process source-language syntactic structures without a source-language syntactic parser. To train our preordering model based on ITG, we produced syntactic constituent structures for source-language training sentences by (1) parsing target-language training sentences, (2) projecting constituent structures of the target-language sentences to the corresponding source-language sentences, (3) selecting parallel sentences with highly synchronized parallel structures, (4) producing probabilistic models for parsing using the projected partial structures and the Pitman-Yor process, and (5) parsing to produce full binary syntactic structures maximally synchronized with the corresponding target-language syntactic structures, using the constraints of the projected partial structures and the probabilistic models. Our ITG-based preordering model is trained using the produced binary syntactic structures and word alignments. The proposed method facilitates the learning of ITG by producing highly synchronized parallel syntactic structures based on cross-language syntactic projection and sentence selection. The preordering model jointly parses input sentences and identifies their reordered structures. Experiments with Japanese--English and Chinese--English patent translation indicate that our method outperforms existing methods, including string-to-tree syntax-based SMT, a preordering method that does not require a parser, and a preordering method that uses a source-language dependency parser.

ACM Transactions on Asian Language Information Processing | 2013

Post-Ordering by Parsing with ITG for Japanese-English Statistical Machine Translation

Isao Goto; Masao Utiyama; Eiichiro Sumita

Word reordering is a difficult task for translation between languages with widely different word orders, such as Japanese and English. A previously proposed post-ordering method for Japanese-to-English translation first translates a Japanese sentence into a sequence of English words in a word order similar to that of Japanese, then reorders the sequence into an English word order. We employed this post-ordering framework and improved upon its reordering method. The existing post-ordering method reorders the sequence of English words via SMT, whereas our method reorders the sequence by (1) parsing the sequence using ITG to obtain syntactic structures which are similar to Japanese syntactic structures, and (2) transferring the obtained syntactic structures into English syntactic structures according to the ITG. The experiments using Japanese-to-English patent translation demonstrated the effectiveness of our method and showed that both the RIBES and BLEU scores were improved over compared methods.

acm transactions on asian and low resource language information processing | 2016

Converting Continuous-Space Language Models into N -gram Language Models with Efficient Bilingual Pruning for Statistical Machine Translation

Rui Wang; Masao Utiyama; Isao Goto; Eiichiro Sumita; Hai Zhao; Bao-Liang Lu

The Language Model (LM) is an essential component of Statistical Machine Translation (SMT). In this article, we focus on developing efficient methods for LM construction. Our main contribution is that we propose a Natural N-grams based Converting (NNGC) method for transforming a Continuous-Space Language Model (CSLM) to a Back-off N-gram Language Model (BNLM). Furthermore, a Bilingual LM Pruning (BLMP) approach is developed for enhancing LMs in SMT decoding and speeding up CSLM converting. The proposed pruning and converting methods can convert a large LM efficiently by working jointly. That is, a LM can be effectively pruned before it is converted from CSLM without sacrificing performance, and further improved if an additional corpus contains out-of-domain information. For different SMT tasks, our experimental results indicate that the proposed NNGC and BLMP methods outperform the existing counterpart approaches significantly in BLEU and computational cost.

ACM Transactions on Asian Language Information Processing | 2014

Distortion Model Based on Word Sequence Labeling for Statistical Machine Translation

Isao Goto; Masao Utiyama; Eiichiro Sumita; Akihiro Tamura; Sadao Kurohashi

This article proposes a new distortion model for phrase-based statistical machine translation. In decoding, a distortion model estimates the source word position to be translated next (subsequent position; SP) given the last translated source word position (current position; CP). We propose a distortion model that can simultaneously consider the word at the CP, the word at an SP candidate, the context of the CP and an SP candidate, relative word order among the SP candidates, and the words between the CP and an SP candidate. These considered elements are called rich context. Our model considers rich context by discriminating label sequences that specify spans from the CP to each SP candidate. It enables our model to learn the effect of relative word order among SP candidates as well as to learn the effect of distances from the training data. In contrast to the learning strategy of existing methods, our learning strategy is that the model learns preference relations among SP candidates in each sentence of the training data. This leaning strategy enables consideration of all of the rich context simultaneously. In our experiments, our model had higher BLUE and RIBES scores for Japanese-English, Chinese-English, and German-English translation compared to the lexical reordering models.

international universal communication symposium | 2010

Head- and relation-driven tree-to-tree translation using phrases in a monolingual corpus

Isao Goto; Eiichiro Sumita

We propose an extension of context-based machine translation (CBMT) [1] to deal with distant language pairs such as Japanese and English, incorporating a syntactic transfer approach. Our method uses a tree structure where a node is a head and an edge is a dependency with a relation between heads. We retrieve partial trees from a monolingual corpus using a bilingual dictionary to generate candidate translation phrases, and build a tree by overlapping their heads. Word orders of a verb and its elements are decided based on a structural monolingual corpus in the target language. In our experiment with Japanese to English patent translation, human evaluation results showed that our method was better than phrase-based and hierarchical phrase-based statistical machine translation methods.

meeting of the association for computational linguistics | 2017

Detecting Untranslated Content for Neural Machine Translation.

Isao Goto; Hideki Tanaka

Despite its promise, neural machine translation (NMT) has a serious problem in that source content may be mistakenly left untranslated. The ability to detect untranslated content is important for the practical use of NMT. We evaluate two types of probability with which to detect untranslated content: the cumulative attention (ATN) probability and back translation (BT) probability from the target sentence to the source sentence. Experiments on detecting untranslated content in Japanese-English patent translations show that ATN and BT are each more effective than random choice, BT is more effective than ATN, and the combination of the two provides further improvements. We also confirmed the effectiveness of using ATN and BT to rerank the n-best NMT outputs.

meeting of the association for computational linguistics | 2015

The "News Web Easy'' news service as a resource for teaching and learning Japanese: An assessment of the comprehension difficulty of Japanese sentence-end expressions

Hideki Tanaka; Tadashi Kumano; Isao Goto

Japan’s public broadcasting corporation, NHK, launched “News Web Easy” in April 2012 1 . It provides users with five simplified news scripts (easy Japanese news) on a daily basis. This web service provides users with five daily simplified news scripts of “easy” Japanese news. Since its inception, this service has been favorably received both in Japan and overseas. Users particularly appreciate its value as a Japanese learning and teaching resource. In this paper, we discuss this service and its possible contribution to language education. We focus on difficulty levels of sentence-end expressions, compiled from the news, that create ambiguity and problems when rewriting news items. These are analyzed and compared within regular news and News Web Easy, and their difficulty is assessed based on Japanese learners’ reading comprehension levels. Our results revealed that current rewriting of sentence-end expressions in News Web Easy is appropriate. We further identified features of these expressions that contribute to difficulty in comprehension.

Journal of Information Processing | 2012

An Empirical Comparison of Parsers in Constraining Reordering for E-J Patent Machine Translation

Isao Goto; Masao Utiyama; Takashi Onishi; Eiichiro Sumita

Machine translation of patent documents is very important from a practical point of view. One of the key technologies for improving machine translation quality is the utilization of syntax. It is difficult to select the appropriate parser for English to Japanese patent machine translation because the effects of each parser on patent translation are not clear. This paper provides an empirical comparative evaluation of several state-of-the-art parsers for English, focusing on the effects on patent machine translation from English to Japanese. We add syntax to a method that constrains the reordering of noun phrases for phrase-based statistical machine translation. There are two methods for obtaining the noun phrases from input sentences: 1) an input sentence is directly parsed by a parser and 2) noun phrases from an input sentence are determined by a method using the parsing results of the context document that contains the input sentence. We measured how much each parser contributed to improving the translation quality for each of the two methods and how much a combination of parsers contributed to improving the translation quality for the second method. We conducted experiments using the NTCIR-8 patent translation task dataset. Most of the parsers improved translation quality. Combinations of parsers using the method based on context documents achieved the best translation quality.

NTCIR | 2011