Takehiko Maruyama | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Takehiko Maruyama is active.

Explore More

Publication

Featured researches published by Takehiko Maruyama.

language resources and evaluation | 2014

Balanced corpus of contemporary written Japanese

Kikuo Maekawa; Makoto Yamazaki; Toshinobu Ogiso; Takehiko Maruyama; Hideki Ogura; Wakako Kashino; Hanae Koiso; Masaya Yamaguchi; Makiro Tanaka; Yasuharu Den

Abstract The balanced corpus of contemporary written Japanese (BCCWJ) is Japan’s first 100 million words balanced corpus. It consists of three subcorpora (publication subcorpus, library subcorpus, and special-purpose subcorpus) and covers a wide range of text registers including books in general, magazines, newspapers, governmental white papers, best-selling books, an internet bulletin-board, a blog, school textbooks, minutes of the national diet, publicity newsletters of local governments, laws, and poetry verses. A random sampling technique is utilized whenever possible in order to maximize the representativeness of the corpus. The corpus is annotated in terms of dual POS analysis, document structure, and bibliographical information. The BCCWJ is currently accessible in three different ways including Chunagon a web-based interface to the dual POS analysis data. Lastly, results of some pilot evaluation of the corpus with respect to the textual diversity are reported. The analyses include POS distribution, word-class distribution, entropy of orthography, sentence length, and variation of the adjective predicate. High textual diversity is observed in all these analyses.

meeting of the association for computational linguistics | 2006

Dependency Parsing of Japanese Spoken Monologue Based on Clause Boundaries

Tomohiro Ohno; Shigeki Matsubara; Hideki Kashioka; Takehiko Maruyama; Yasuyoshi Inagaki

In applications of spoken monologue processing such as simultaneous machine interpretation and real-time captions generation, incremental language parsing is strongly required. This paper proposes a technique for incremental dependency parsing of Japanese spoken monologue on a clause-by-clause basis. The technique identifies the clauses based on clause boundaries analysis, analyzes the dependency structures of them, and tries to decide the dependency relations with another clauses, simultaneously with the monologue speech input. The dependency relations are generated at the stage before the input of the entire monologue, and therefore, our technique can be used for language parsing in simultaneous Japanese speech understanding. An experiment using Japanese monologues has shown that our technique had the same degree of the performance as the usual dependency parsing for monologue sentences.

language resources and evaluation | 2007

Dependency parsing of Japanese monologue using clause boundaries

Tomohiro Ohno; Shigeki Matsubara; Hideki Kashioka; Takehiko Maruyama; Hideki Tanaka; Yasuyoshi Inagaki

Spoken monologues feature greater sentence length and structural complexity than spoken dialogues. To achieve high-parsing performance for spoken monologues, simplifying the structure by dividing a sentence into suitable language units could prove effective. This paper proposes a method for dependency parsing of Japanese spoken monologues based on sentence segmentation. In this method, dependency parsing is executed in two stages: at the clause level and the sentence level. First, dependencies within a clause are identified by dividing a sentence into clauses and executing stochastic dependency parsing for each clause. Next, dependencies across clause boundaries are identified stochastically, and the dependency structure of the entire sentence is thus completed. An experiment using a spoken monologue corpus shows the effectiveness of this method for efficient dependency parsing of Japanese monologue sentences.

Journal of Psycholinguistic Research | 2017

Self Addressed Questions and Filled Pauses: A Cross-linguistic Investigation

Ye Tian; Takehiko Maruyama; Jonathan Ginzburg

There is an ongoing debate whether phenomena of disfluency (such as filled pauses) are produced communicatively. Clark and Fox Tree (Cognition 84(1):73–111, 2002) propose that filled pauses are words, and that different forms signal different lengths of delay. This paper evaluates this Filler-As-Words hypothesis by analyzing the distribution of self-addressed-questions or SAQs (such as “what’s the word”) in relation to filled pauses. We found that SAQs address different problems in different languages (most frequently about memory-retrieval in English and Chinese, and about appropriateness in Japanese). In relation to filled pauses, British but not American English uses “um” to signal a more severe problem than “uh”. Chinese uses different filled pauses to signal the syntactic category of the problem constituent. Japanese uses different filled pauses to signal levels of interaction with the interlocuter. Overall, our data supports the Filler-As-Words hypothesis that filled pauses are used communicatively. However, the dimensions of its meanings vary across languages and dialects.

language resources and evaluation | 2010