Yoshihiko Nitta | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yoshihiko Nitta is active.

Explore More

Publication

Featured researches published by Yoshihiko Nitta.

international conference on computational linguistics | 1994

Co-occurrence vectors from corpora vs. distance vectors from dictionaries

Yoshiki Niwa; Yoshihiko Nitta

A comparison was made of vectors derived by using ordinary co-occurrence statistics from large text corpora and of vectors derived by measuring the interword distances in dictionary definitions. The precision of word sense disambiguation by using co-occurrence vectors from the 1987 Wall Street Journal (20M total words) was higher than that by using distance vectors from the Collins English Dictionary (60K head words + 1.6M definition words). However, other experimental results suggest that distance vectors contain some different semantic information from co-occurrence vectors.

international conference on computational linguistics | 1994

A grammatico-statistical approach to discourse partitioning

Tadashi Nomoto; Yoshihiko Nitta

The paper presents a new approach to text segmentation - which concerns dividing a text into coherent discourse units. The approach builds on the theory of discourse segment (Nomoto and Nitta, 1993), incorporating ideas from the research on information retrieval (Salton, 1988). A discourse segment has to do with a structure of Japanese discourse; it could be thought of as a linguistic unit demarcated by wa, a Japanese topic particle, which may extend over several sentences. The segmentation works with discourse segments and makes use of coherence measure based on tf-idf, a standard information retrieval measurement (Salton, 1988; Hearst, 1993). Experiments have been done with a Japanese newspaper corpus. It has been found that the present approach is quite successful in recovering articles from the unstructured corpus.

international conference on computational linguistics | 1982

A heuristic approach to English-into-Japanese machine translation

Yoshihiko Nitta; Atsushi Okajima; Fumiyuki Yamano; Koichiro Ishihara

Practical machine translation must be considered from a heuristic point of view rather than from a purely rigid analytical linguistic method. An English-into-Japanese translation system named ATHENE based on a Heuristic Parsing Model (HPM) has been developed. The experiment shows some advantageous points such as simplification of transforming and generating phase, semilocalization of multiple meaning resolution, and extendability for future grammatical refinement. HPM-base parsing process, parsed tree, grammatical data representation, and translation results are also described.

international conference on computational linguistics | 1996

Analysis of Japanese compound nouns by direct text scanning

Toru Hisamitsu; Yoshihiko Nitta

This paper aims to analyze word dependency structure in compound nouns appearing in Japanese newspaper articles. The analysis is a difficult problem because such compound nouns can be quite long, have no word boundaries between contained nouns, and often contain unregistered words such as abbreviations. The non-segmentation property and unregistered words cause initial segmentation errors which result in erroneous analysis. This paper presents a corpus-based approach which scans a corpus with a set of pattern matchers and gathers cooccurrence examples to analyze compound nouns. It employs boot-strapping search to cope with unregistered words: if an unregistered word is found in the process of searching the examples, it is recorded and invokes additional searches to gather the examples containing it. This makes it possible to correct initial oversegmentation errors, and leads to higher accuracy. The accuracy of the method is evaluated using the compound nouns of length 5, 6, 7, and 8. A baseline is also introduced and compared.

meeting of the association for computational linguistics | 1984

A PROPER TREATMEMT OF SYNTAX AND SEMANTICS IN MACHINE TRANSLATION

Yoshihiko Nitta; Atsushi Okajima; Hiroyuki Kaji; Youichi Hidano; Koichiro Ishihara

A proper treatment of syntax and semantics in machine translation is introduced and discussed from the empirical viewpoint. For English-Japanese machine translation, the syntax directed approach is effective where the Heuristic Parsing Model (HPM) and the Syntactic Role System Play important roles. For Japanese-English translation, the semantics directed approach is powerful where the Conceptual Dependency Diagram (CDD) and the Augmented Case Marker System (which is a kind of Semantic Role System) play essential roles. Some examples of the difference between Japanese sentence structure and English sentence structure, which is vital to machine translation, are also discussed together with various interesting ambiguities.

international conference on computational linguistics | 1994

An efficient treatment of Japanese verb inflection for morphological analysis

Toru Hisamitsu; Yoshihiko Nitta

Because of its simple appearance, Japanese verb inflection has never been treated seriously. In this paper we reconsider traditional lexical treatments of Japanese verb inflection, and propose a new treatment of verb inflection which uses newly devised segmenting units. We show that our proposed treatment minimizes the number of lexical entries and avoids useless segmentation. It requires 20 to 40% less chart parsing computation and it is also suitable for error correction in optical character readers.

Future Generation Computer Systems | 1986

Problems of machine translation system - effect of cultural differences on sentence structure

Yoshihiko Nitta

Abstract The potential and the limitation of current machine translation is discussed by comparing the output of human translation and that of virtual machine translation. Here, “virtual machine translation” means a kind of syntax-oriented literal translation which may be regarded as an idealized competence of todays practical machine translation. The above comparison shows that the main reason for the limitation or the incompleteness of current practical machine translation systems is the insufficient ability to treat “structural idiosyncrasies” of sentences. Also, some translation examples tell us that, without “understanding” the total meaning of the source sentence, it is quite difficult to manipulate the idiosyncrasies in sentence structure. Idiosyncratic gaps between source and target sentence structure usually originate in cultural differences, so that the computational treatment of these gaps is a very difficult problem. But the translation examples also give us some encouraging evidence that the principal technologies of todays not-yet-completed machine translation have sufficient potential for producing barely acceptable translation. The current practical efforts to treat such structural idiosyncrasies are also mentioned together with some long-range, basic-research type of approaches.

international conference on document analysis and recognition | 1995

Optimal techniques in OCR error correction for Japanese texts

Toru Hisamitsu; Katsumi Marukawa; Yoshihiro Shima; Hiromichi Fujisawa; Yoshihiko Nitta

This paper investigates three fundamental techniques in OCR error correction for Japanese texts using morphological analysis: (1) an optimal method for candidate word extraction from a candidate character lattice, (2) optimal word entries for Japanese verb inflection analysis, and (3) a new method of word matching cost calculation which is more suitable to be used with linguistic criteria. Comparative evaluation shows that the combination of these techniques requires 84% less computation, captures 2.6% more candidate words, reduces the chart parsing computation by 20%, and attains 25% higher error correction rate than a commonly used method.

conference of the european chapter of the association for computational linguistics | 1993

Resolving zero anaphora in Japanese

Tadashi Nomoto; Yoshihiko Nitta

The paper presents a computational theory for resolving Japanese zero anaphora, based on the notion of discourse segment. We see that the discourse segment reduces the domain of antecedents for zero anaphora and thus leads to their efficient resolution.Also we make crucial use of functional notions such as empathy hierarchy and minimal semantics thesis to resolve reference for zero anaphora [Kuno, 1987]. Our approach differs from the Centering analysis [Walker et al., 1990] in that the resolution works by matching one empathy hierarchy against another, which makes it possible to deal with discourses with no explicit topic and those with cataphora [Halliday and Hassan, 1990].The theory is formalized through the definite clause grammar (DCG) formalism [Pereira and Warren, 1980], [Gazdar and Mellish, 1989; Longacre, 1979].Finally, we show that graphology i.e., quotation mark, spacing, has an important effect on the interpretation of zero anaphora in Japanese discourse.

Systems and Computers in Japan | 1995

A generalized algorithm for Japanese morphological analysis and a comparative evaluation of some heuristics

Toru Hisamitsu; Yoshihiko Nitta

In ordinary written Japanese, words are not separated by spaces. Therefore morphological analysis involves segmenting and tagging sentences. Since each sentence has a huge number of possible tagged segmentations, various criteria have been proposed for making plausible decisions. However, there are still no unified frameworks that incorporate various heuristics, and there has been no comparative evaluation of commonly used heuristics. This paper presents a clear framework to describe various heuristics, and an N-best algorithm for extracting optimal solutions. The time complexity of this algorithm is O(nNlog 2 (1 + N)), where n is the sentence length. The advantage of the N-best algorithm over the standard beam search algorithm is also discussed. This paper also presents a comparative evaluation of three major heuristics, and proposes a precise and portable rule-based heuristic. Estimation was done using the aforementioned algorithm and six criteria. The newly proposed heuristic is based upon the Extended Least Bunsetsu (Phrase) Number method

Explore More