Terumasa Ehara | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Terumasa Ehara is active.

Explore More

Publication

Featured researches published by Terumasa Ehara.

Information Processing and Management | 1999

An efficient document clustering algorithm and its application to a document browser

Hideki Tanaka; Tadashi Kumano; Noriyoshi Uratani; Terumasa Ehara

We present an efficient document clustering algorithm that uses a term frequency vector for each document instead of using a huge proximity matrix. The algorithm has the following features: (1) it requires a relatively small amount of memory and runs fast, (2) it produces a hierarchy in the form of a document classification tree and (3) the hierarchy obtained by the algorithm explicitly reveals a collection structure. We confirm these features and thus show the algorithms feasibility through clustering experiments in which we use two collections of Japanese documents, the sizes of which are 83,099 and 14,701 documents. We also introduce an application of this algorithm to a document browser. This browser is used in our Japanese-to-English translation aid system. The browsing module of the system consists of a huge database of Japanese news articles and their English translations. The Japanese article collection is clustered into a hierarchy by our method. Since each node in the hierarchy corresponds to a topic in the collection, we can use the hierarchy to directly access articles by topic. A user can learn general translation knowledge of each topic by browsing the Japanese articles and their English translations. We also discuss techniques of presenting a large tree-formed hierarchy on a computer screen.

international conference on computational linguistics | 2004

Back transliteration from Japanese to English using target English context

Isao Goto; Naoto Kato; Terumasa Ehara; Hideki Tanaka

This paper proposes a method of automatic back transliteration of proper nouns, in which a Japanese transliterated-word is restored to the original English word. The English words are created from a sequence of letters; thus our method can create new English words that are not registered in dictionaries or English word lists. When a katakana character is converted into English letters, there are various candidates of alphabetic characters. To ensure adequate conversion, the proposed method uses a target English context to calculate the probability of an English character or string corresponding to a Japanese katakana character or string. We confirmed the effectiveness of using the target English context by an experiment of personal-name back transliteration.

Systems and Computers in Japan | 2003

Automatic closed-caption production system on TV programs for hearing-impaired people

Takao Monma; Eiji Sawamura; Takahiro Fukushima; Ichiro Maruyama; Terumasa Ehara; Katsuhiko Shirai

Increasing the number of closed-captioned television programs represents a social responsibility in the sense of providing information. In terms of the system to create closed-captioned television programs by hand, there is considerable hope that the time involved can be reduced and the burden on workers can be eased. The system the authors report on automates three processes in the creation of closed-captioned television programs: summarization, synchronization, and closed-captioned screen creation, yielding from an electronic manuscript closed-caption data applicable to current closed-captioned broadcasts. The authors created closed captions for 12 types of news programs and one documentary program, confirming that the process of creating a closed-captioned television program could be completed in three to six times the program length, excluding the process of creating the electronic manuscript and testing/editing. The authors demonstrate the validity of their system insofar as the time needed to create closed captions using their system was about 70% of the time needed to create closed captions by hand, excluding the process of testing and editing.

Systems and Computers in Japan | 2002

A translation aid system by retrieving bilingual news database

Tadashi Kumano; Isao Goto; Hideki Tanaka; Noriyoshi Uratani; Terumasa Ehara

Machine translation technology is currently incapable of producing translations of the high quality required for purposes such as broadcast news. Such translations still require skilled human translators. We have developed a translation aid system to support translators in such tasks. The system retrieves news articles by answering user queries, and shows the entire article together with the corresponding translated article. The system does not require manual alignment of each sentence with its translation when storing articles in a database. Thus, it is capable of handling flexible translations. Moreover, the system helps users learn not just the translations for queried expressions, but also the facts described in the articles, which can aid in producing good translations. The results of a user inquiry demonstrated the validity of the system.

Journal of Natural Language Processing | 1999

Partitioning long sentences for text summarization

Takahiro Fukushima; Terumasa Ehara; Katsuhiko Shirai

TVニュース原稿は, 新聞記事に比べて1記事中の文数が少なく, 1文当たりの文字数も多い. このため, 自動要約としての重要文抽出を行うと, 文単位で選択が行われる為, 情報の欠落が大きい. 本論文では, 記事中に現れる長文を分割出来る条件を設定し, 条件に合う場合は, 短い文に分割するという処理 (短文分割処理) を行った結果が自動要約の基本的技術にどれだけ影響・効果があるのかを調べた. 短文分割は, 基本的に, 動詞, 形容動詞と述語名詞の連用文節を分割の対象とした. また, 分割の自動要約に対する影響については, 評価の尺度として, 各文の重要度による順位付けと文字数圧縮 (不要部分削除) を用いた. 文順位付けの評価では, テキスト中の各文を人手及びシステムによって, その重要度に応じて順位を付けたものを対象とした. 人手により重要と判断された文が, 短文分割により分割された場合に, その分割された文は, どのような順位となると判断されるのかを調べた. その結果, 短文分割により分割された重要文は, 分割後の順位差において「3」以上離れる場合のほうが, 順位差が生じない場合, つまり順位差が「1」の場合より多くあり, 短文分割の効果が見られた. 次に, 記事中の重要文だけではなく全部の文を対象として, 人手とシステムによる順位付けについて短文分割前後での変化をスペアマンの順位相関関係係数を用いて比較した. その結果, 短文分割をすることにより, スペアマンの係数が0.0318～0.065増加し, 文の順位が, 人とシステムにおいてより近いものになることが判明した. 最後に, 文字数圧縮での評価では, 不要部分を特定し, 文字列を削除または言い換えを行う文字数圧縮処理において, 短文分割を行う前後での変化を調査した. 短文分割により削除される文字数は増え, 文字数圧縮後の文字数を元記事の文字数で割る圧縮率において, 2.71%～2.78%減少することが判明し, 短文分割が文字数圧縮に良い効果があることが分かった.

NLPRS | 2001