Is this you? Create Your Porfile

Yusuke Oda

Nara Institute of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yusuke Oda is active.

Explore More

Publication

Featured researches published by Yusuke Oda.

automated software engineering | 2015

Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation (T)

Yusuke Oda; Hiroyuki Fudaba; Graham Neubig; Hideaki Hata; Sakriani Sakti; Tomoki Toda; Satoshi Nakamura

Pseudo-code written in natural language can aid the comprehension of source code in unfamiliar programming languages. However, the great majority of source code has no corresponding pseudo-code, because pseudo-code is redundant and laborious to create. If pseudo-code could be generated automatically and instantly from given source code, we could allow for on-demand production of pseudo-code without human effort. In this paper, we propose a method to automatically generate pseudo-code from source code, specifically adopting the statistical machine translation (SMT) framework. SMT, which was originally designed to translate between two natural languages, allows us to automatically learn the relationship between source code/pseudo-code pairs, making it possible to create a pseudo-code generator with less human effort. In experiments, we generated English or Japanese pseudo-code from Python statements using SMT, and find that the generated pseudo-code is largely accurate, and aids code understanding.

meeting of the association for computational linguistics | 2014

Optimizing Segmentation Strategies for Simultaneous Speech Translation

Yusuke Oda; Graham Neubig; Sakriani Sakti; Tomoki Toda; Satoshi Nakamura

In this paper, we propose new algorithms for learning segmentation strategies for simultaneous speech translation. In contrast to previously proposed heuristic methods, our method finds a segmentation that directly maximizes the performance of the machine translation system. We describe two methods based on greedy search and dynamic programming that search for the optimal segmentation strategy. An experimental evaluation finds that our algorithm is able to segment the input two to three times more frequently than conventional methods in terms of number of words, while maintaining the same score of automatic evaluation. 1

north american chapter of the association for computational linguistics | 2015

Ckylark: A More Robust PCFG-LA Parser

Yusuke Oda; Graham Neubig; Sakriani Sakti; Tomoki Toda; Satoshi Nakamura

This paper describes Ckylark, a PCFG-LA style phrase structure parser that is more robust than other parsers in the genre. PCFG-LA parsers are known to achieve highly competitive performance, but sometimes the parsing process fails completely, and no parses can be generated. Ckylark introduces three new techniques that prevent possible causes for parsing failure: outputting intermediate results when coarse-to-fine analysis fails, smoothing lexicon probabilities, and scaling probabilities to avoid underflow. An experiment shows that this allows millions of sentences can be parsed without any failures, in contrast to other publicly available PCFG-LA parsers. Ckylark is implemented in C++, and is available opensource under the LGPL license.1

international joint conference on natural language processing | 2015

Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents

Yusuke Oda; Graham Neubig; Sakriani Sakti; Tomoki Toda; Satoshi Nakamura

Simultaneous translation is a method to reduce the latency of communication through machine translation (MT) by dividing the input into short segments before performing translation. However, short segments pose problems for syntaxbased translation methods, as it is difficult to generate accurate parse trees for sub-sentential segments. In this paper, we perform the first experiments applying syntax-based SMT to simultaneous translation, and propose two methods to prevent degradations in accuracy: a method to predict unseen syntactic constituents that help generate complete parse trees, and a method that waits for more input when the current utterance is not enough to generate a fluent translation. Experiments on English-Japanese translation show that the proposed methods allow for improvements in accuracy, particularly with regards to word order of the target sentences.

automated software engineering | 2015

Pseudogen: A Tool to Automatically Generate Pseudo-Code from Source Code

Hiroyuki Fudaba; Yusuke Oda; Koichi Akabe; Graham Neubig; Hideaki Hata; Sakriani Sakti; Tomoki Toda; Satoshi Nakamura

Understanding the behavior of source code written in an unfamiliar programming language is difficult. One way to aid understanding of difficult code is to add corresponding pseudo-code, which describes in detail the workings of the code in a natural language such as English. In spite of its usefulness, most source code does not have corresponding pseudo-code because it is tedious to create. This paper demonstrates a tool Pseudogen that makes it possible to automatically generate pseudo-code from source code using statistical machine translation (SMT). Pseudogen currently supports generation of English or Japanese pseudo-code from Python source code, and the SMT framework makes it easy for users to create new generators for their preferred source code/pseudo-code pairs.

meeting of the association for computational linguistics | 2017

An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation

Makoto Morishita; Yusuke Oda; Graham Neubig; Koichiro Yoshino; Katsuhito Sudoh; Satoshi Nakamura

Training of neural machine translation (NMT) models usually uses mini-batches for efficiency purposes. During the mini-batched training process, it is necessary to pad shorter sentences in a mini-batch to be equal in length to the longest sentence therein for efficient computation. Previous work has noted that sorting the corpus based on the sentence length before making mini-batches reduces the amount of padding and increases the processing speed. However, despite the fact that mini-batch creation is an essential step in NMT training, widely used NMT toolkits implement disparate strategies for doing so, which have not been empirically validated or compared. This work investigates mini-batch creation strategies with experiments over two different datasets. Our results suggest that the choice of a mini-batch creation strategy has a large effect on NMT training and some length-based sorting strategies do not always work well compared with simple shuffling.

meeting of the association for computational linguistics | 2017

Neural Machine Translation via Binary Code Prediction

Yusuke Oda; Philip Arthur; Graham Neubig; Koichiro Yoshino; Satoshi Nakamura

In this paper, we propose a new method for calculating the output layer in neural machine translation systems. The method is based on predicting a binary code for each word and can reduce computation time/memory requirements of the output layer to be logarithmic in vocabulary size in the best case. In addition, we also introduce two advanced approaches to improve the robustness of the proposed model: using error-correcting codes and combining softmax and binary codes. Experiments on two English-Japanese bidirectional translation tasks show proposed models achieve BLEU scores that approach the softmax, while reducing memory usage to the order of less than 1/10 and improving decoding speed on CPUs by x5 to x10.

arXiv: Machine Learning | 2017

DyNet: The Dynamic Neural Network Toolkit.

Graham Neubig; Chris Dyer; Yoav Goldberg; Austin Matthews; Waleed Ammar; Antonios Anastasopoulos; Miguel Ballesteros; David Chiang; Daniel Clothiaux; Trevor Cohn; Kevin Duh; Manaal Faruqui; Cynthia Gan; Dan Garrette; Yangfeng Ji; Lingpeng Kong; Adhiguna Kuncoro; Gaurav Kumar; Chaitanya Malaviya; Paul Michel; Yusuke Oda; Matthew Richardson; Naomi Saphra; Swabha Swayamdipta; Pengcheng Yin

international conference on computational linguistics | 2014