Toru Sumitomo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Toru Sumitomo is active.

Explore More

Publication

Featured researches published by Toru Sumitomo.

Information Processing and Management | 2002

Extraction of field-coherent passages

Samuel Sangkon Lee; Masami Shishibori; Toru Sumitomo; Jun-ichi Aoe

It is important to identify text that is substantially independent of adjacent material. This paper presents a technique for dividing text into field-coherent passages. The method presented is based upon extracting field-associated words or phrases from the text by determining how topics grow, shrink and shift from sentence to sentence. We propose measures of topic continuity and transition and suggest how those may be used to find the passage boundaries. After collecting 12,500 documents, we obtained an average precision of 88% and recall of 78% in a training document set.

Information Processing and Management | 2007

A compact static double-array keeping character codes

Susumu Yata; Masaki Oono; Kazuhiro Morita; Masao Fuketa; Toru Sumitomo; Jun-ichi Aoe

A trie represented by a double-array enables us to search a key fast with a small space. However, the double-array uses extra space to be updated dynamically. This paper presents a compact structure for a static double-array. The new structure keeps character codes instead of indices in order to compress elements of the double-array. In addition, the new structure unifies common suffixes and consists of less elements than the old structure. Experimental results for English keys show that the new structure reduces space usage of the double-array up to 40%.

International Journal of Computer Mathematics | 2005

A sentence classification technique using intention association expressions

Yuki Kadoya; Kazuhiro Morita; Masao Fuketa; Masaki Oono; El-Sayed Atlam; Toru Sumitomo; Jun-ichi Aoe

Although there are many text classification techniques depending on the vector space, it is difficult to detect the meaning related to the user’s intention (complaint, encouragement, request, invitation, etc.). The approach be discussed in this paper is very useful for understanding focus points in conversation. We present a technique for determining the speaker’s intention for sentences in conversation. Intention association expressions are introduced, and formal descriptions with weights are defined using these expressions to construct an intention classification. A deterministic multi-attribute pattern-matching algorithm is used to determine the intention class efficiently. In simulation results for 681 email messages of 5859 sentences, the multi-attribute pattern-matching algorithm is about 44.5 times faster than the Aho and Corasick method. The precision and recall of intention classification of sentences are 91% and 95%, respectively. The precision and recall of extraction of unnecessary sentences are 98% and 96%, respectively. The precision and recall of the classification of each email are 88% and 89%, respectively.

systems man and cybernetics | 1998

A fast and compact data structure of storing multi-attribute relations among words

Kazuhiro Morita; Toru Sumitomo; Masao Fuketa; Jun-ichi Aoe

Word relation is primitive knowledge and it is very useful for natural language processing systems. In the traditional systems, although each knowledge dictionary is constructed separately, recent natural language applications become more complex by integrating the above multi-attribute relationships. This paper presents an efficient data structure by introducing a trie that can define the linkage among leaves. The linkage enables us to share the basic words required for multi-attribute relations. Theoretical observations show that the worst-case time complexity of retrieving multi-attribute relations is a constant. From the simulation results, it is shown that the presented method is about 1/3 smaller than that of the competitive methods.

systems man and cybernetics | 1998

A method for understanding time expressions

Shoji Mizobuchi; Toru Sumitomo; Masao Fuketa; Jun-ichi Aoe

This paper proposes a method for understanding time expressions. For components of which time expressions consist, 31 concepts are defined. We provide formal representations for these concepts and proposed an algorithm to analyse the meaning of time expressions. The representation includes both conceptual and quantitative aspects. Experiments for about 2,000 time expressions extracted from actual documents show that the proposed method yields correct interpretation in about 93% of the cases.

International Journal of Computer Mathematics | 2004

A new compression method of double array for compact dictionaries

Masao Fuketa; Kazuhiro Morita; Toru Sumitomo; Shinkaku Kashiji; El-Sayed Atlam; Jun-ichi Aoe

Speed and storage capacity are very important issues in information retrieval system. In natural language analysis, double array is a well-known data structure to implement the trie, which is a widely used approach to retrieve strings in a dictionary. Moreover, double array helps for fast access in a matrix table with compactness of a list form. In order to realize quite compact structure for information retrieval, this paper presents a compression method by dividing the trie constructed into several pieces (pages). This compression method enables us to reduce the number of bits representing entries of the double array. The obtained trie must trace to the pages that cause slow retrieval time, because of a state connection. To solve this problem, this paper proposes a new trie construction method to compress and minimize the number of state connections. Experimental results applying for a large set of keys show that the storage capacity has been reduced to 50%. Moreover, our new approach has the same retrieval speed as the old one.

Information Sciences | 2004

A compression algorithm using integrated record information for translation dictionaries

Yuki Kadoya; Masao Fuketa; El-Sayed Atlam; Kazuhiro Morita; Toru Sumitomo; Jun-ichi Aoe

A Trie structure is a well-known method for retrieving natural language (NL) dictionaries for morphological analysis, machine translation and so on. With the development of a variety of NL processing systems, some types of dictionaries in a computer hard disk have a lot of common information. This paper presents a method of merging individual dictionaries into the generalized dictionary. It enables us to reduce the total dictionary size and to expand the usage of individual dictionaries to that of the other applications. For key retrieval of the merged dictionary, there are many long strings such as compound words and idioms which take much space for a huge set of keys when stored in the Trie, so a fast trie structure, called a double-array structure is introduced and its compression scheme is proposed by replacing long strings into corresponding leaf node numbers of the Trie. Although the size of the presented records grows, the total number of them is extremely decreased by merging common information. The presented method is evaluated by the observation experimental results for nine dictionaries show that new method is more efficient than previous ones.

Information Sciences | 2000

An efficient representation for implementing finite state machines based on the double-array

Shoji Mizobuchi; Toru Sumitomo; Masao Fuketa; Jun-ichi Aoe

Abstract This paper describes the double-array structure that is extended in order to apply it to general finite state machines. The double-array is an efficient data structure which combines time efficiency and space efficiency. However, the range of its application has been limited in the areas where a directed tree with labeled edges is used as the data structure. Here, the double-array structure is extended so that it can represent the graph structure and, furthermore, be operated dynamically. The presented method has been evaluated by theoretical observations, and its space efficiency is verified in our experiment.

international conference on computing & informatics | 2006

Double-array compression by pruning twin leaves and unifying common suffixes

Susumu Yata; Masaki Oono; Kazuhiro Morita; Toru Sumitomo; Jun-ichi Aoe

A minimal prefix (MP) trie is a tree structure for key retrieval, and a double-array is an efficient data structure for the MP trie. This paper presents two compression methods for the double-array. One method removes leaf nodes following two-way arcs (named twin leaves) from the MP trie. The other method unifies common suffixes. Experimental results show that space usage of the double-array is reduced to about 60% by the two methods.

International Journal of Computer Mathematics | 2002

An Efficient Trie Construction for Natural Language Dictionaries

Toru Sumitomo; Kazuhiro Morita; Masao Fuketa; Hidekazu Tokunaga; Jun-ichi Aoe

A trie structure is frequently used for various applications, such as natural language dictionaries, database systems and compilers. However, the total number of nodes of the trie becomes large and it takes a lot of spaces for a huge set of keys. The space cost becomes a serious problem if long strings, or compound words, are stored in the trie. In order to resolve this disadvantage, this paper presents a compression method for these long strings by using trie arc for single words. The concept of the compression scheme to be presented is to replace long strings into the corresponding leaf node numbers of the trie. The double array structure is introduced because a fast backward tracing of the trie is required in this approach. The theoretical and experimental observations show that the method presented is more practical than existing ones.

Explore More