Jun-ichi Aoe | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jun-ichi Aoe is active.

Explore More

Publication

Featured researches published by Jun-ichi Aoe.

IEEE Transactions on Software Engineering | 1989

An efficient digital search algorithm by using a double-array structure

Jun-ichi Aoe

An efficient digital search algorithm that is based on an internal array structure called a double array, which combines the fast access of a matrix form with the compactness of a list form, is presented. Each arc of a digital search tree, called a DS-tree, can be computed from the double array in 0(1) time; that is to say, the worst-case time complexity for retrieving a key becomes 0(k) for the length k of that key. The double array is modified to make the size compact while maintaining fast access, and algorithms for retrieval, insertion, and deletion are presented. If the size of the double array is n+cm, where n is the number of nodes of the DS-tree, m is the number of input symbols, and c is a constant particular to each double array, then it is theoretically proved that the worst-case times of deletion and insertion are proportional to cm and cm/sup 2/, respectively, and are independent of n. Experimental results of building the double array incrementally for various sets of keys show that c has an extremely small value, ranging from 0.17 to 1.13. >

Software - Practice and Experience | 1992

An efficient implementation of trie structures

Jun-ichi Aoe; Katsushi Morimoto; Takashi Sato

A new internal array structure, called a double‐array, implementing a trie structure is presented. The double‐array combines the fast access of a matrix form with the compactness of a list form. The algorithms for retrieval, insertion and deletion are introduced through examples. Although insertion is rather slow, it is still practical, and both the deletion and the retrieval time can be improved from the list form. From the comparison with the list for various large sets of keys, it is shown that the size of the double‐array can be about 17 per cent smaller than that of the list, and that the retrieval speed of the double‐array can be from 3–1 to 5–1 times faster than that of the list.

Information Processing and Management | 2003

Documents similarity measurement using field association terms

El-Sayed Atlam; Masao Fuketa; Kazuhiro Morita; Jun-ichi Aoe

Conventional approaches to text analysis and information retrieval which measured document similarity by using considering all of the information in texts are a relatively inefficiency for processing large text collections in heterogeneous subject areas. This paper outlined a new text manipulation system FA-Sim that is useful for retrieving information in large heterogeneous texts and for recognizing content similarity in text excerpts. FA-Sim is based on flexible text matching procedures carried out in various contexts and various field ranks. FA-Sim measures texts similarity by using specific field association (FA) terms instead of by comparing all text information. Similarity between texts is faster and higher by using FA-Sim than other two analysis methods. Therefore, Recall and Precision significantly improved by 39% and 37% over these two traditional methods.

IEEE Transactions on Software Engineering | 1989

An efficient implementation of static string pattern matching machines

Jun-ichi Aoe

A technique for implementing a static transition table of a string pattern matching machine which locates all occurrences of a finite number of keywords in a string is described. The approach is based on S.C. Johnsons (1975) storage and retrieval method of the transition table of a finite-state machine. By restricting the transition table of the finite-state machine to that of the string pattern-matching machine, triple arrays of Johnsonss data structure can be reduced to two arrays. The retrieval program of the reduced data structure can be speeded up by a finite straight program without loops. >

IEEE Transactions on Software Engineering | 1984

A Method for Improving String Pattern Matching Machines

Jun-ichi Aoe; Y. Yamamoto; Ryosaku Shimada

This correspondence describes an efficient string pattern matching machine to locate all occurrences of any of a finite number of keywords and phrases in an arbitrary text string. Some conditions are defined on the states of the machine in order to improve the speed and size of the machine by Aho and Corasick [1]. The pattern matching algorithm is partitioned into various cases by combining these conditions. Finally, the correspondence illustrates the proposed approach by applying it to the analysis of the machines for a simple search.

Information Processing and Management | 2002

A new method for selecting English field association terms of compound words and its knowledge representation

El-Sayed Atlam; Kazuhiro Morita; Masao Fuketa; Jun-ichi Aoe

This paper presents a strategy for building a morphological machine dictionary of English that infers meaning of derivations by considering morphological affixes and their semantic classification. Derivations are grouped into a frame that is accessible to semantic stem and knowledge base. This paper also proposes an efficient method for selecting compound Field Association (FA) terms from a large pool of single FA terms for some specialized fields. For single FA terms, five levels of association are defined and two ranks are defined, based on stability and inheritance. About 85% of redundant compound FA terms can be removed effectively by using levels and ranks proposed in this paper. Recall averages of 60-80% are achieved, depending on the type of text. The proposed methods are applied to 22,000 relationships between verbs and nouns extracted from the large tagged corpus.

Information Processing and Management | 2004

Word classification and hierarchy using co-occurrence word information

Kazuhiro Morita; El-Sayed Atlam; Masao Fuketra; Kazuhiko Tsuda; Masaki Oono; Jun-ichi Aoe

By the development of the computer in recent years, calculating a complex advanced processing at high speed has become possible. Moreover, a lot of linguistic knowledge is used in the natural language processing (NLP) system for improving the system. Therefore, the necessity of co-occurrence word information in the natural language processing system increases further and various researches using co-occurrence word information are done. Moreover, in the natural language processing, dictionary is necessary and indispensable because the ability of the entire system is controlled by the amount and the quality of the dictionary. In this paper, the importance of co-occurrence word information in the natural language processing system was described. The classification technique of the co-occurrence word (receiving word) and the co-occurrence frequency was described and the classified group was expressed hierarchically. Moreover, this paper proposes a technique for an automatic construction system and a complete thesaurus. Experimental test operation of this system and effectiveness of the proposal technique is verified.

Information Processing and Management | 2006

Automatic building of new field association word candidates using search engine

El-Sayed Atlam; Ghada Elmarhomy; Kazuhiro Morita; Masao Fuketa; Jun-ichi Aoe

With increasing popularity of the Internet and tremendous amount of on-line text, automatic document classification is important for organizing huge amounts of data. Readers can know the subject of many document fields by reading only some specific Field Association (FA) words. Document fields can be decided efficiently if there are many FA words and if the frequency rate is high. This paper proposes a method for automatically building new FA words. A WWW search engine is used to extract FA word candidates from document corpora. New FA word candidates in each field are automatically compared with previously determined FA words. Then new FA words are appended to an FA word dictionary. From the experiential results, our new system can automatically appended around 44% of new FA words to the existence FA word dictionary. Moreover, the concentration ratio 0.9 is also effective for extracting relevant FA words that needed for the system design to build FA words automatically.

Information Processing and Management | 2002

Extraction of field-coherent passages

Samuel Sangkon Lee; Masami Shishibori; Toru Sumitomo; Jun-ichi Aoe

It is important to identify text that is substantially independent of adjacent material. This paper presents a technique for dividing text into field-coherent passages. The method presented is based upon extracting field-associated words or phrases from the text by determining how topics grow, shrink and shift from sentence to sentence. We propose measures of topic continuity and transition and suggest how those may be used to find the passage boundaries. After collecting 12,500 documents, we obtained an average precision of 88% and recall of 78% in a training document set.

Information Processing and Management | 2002

A dynamic construction algorithm for the compact patricia trie using the hierarchical structure

Minsoo Jung; Masami Shishibori; Yasuhiro Tanaka; Jun-ichi Aoe

We need to access objective information efficiently and arbitrary strings in the text at high speed. In several key retrieval strategies, we often use the binary trie for supporting fast access method in order. Especially, the Patricia trie (Pat tree) is famous as the fastest access method in binary tries, because it has the shallowest tree structure. However, the Pat tree requires many good physician storage spaces in memory, if key set registered is large. Thereby, an expense problem happens when storing this trie to the main storage unit. We already proposed a method that use compact bit stream and compress a Pat tree to solve this problem. This is called Compact Patricia trie (CPat tree). This CPat tree needs capacity of only a very few memory device. However, if a size of key set increases, the time expense that search, update key increases gradually. This paper proposes a new structure of the CPat tree to avoid that it takes much time in search and update about much key set, and a method to construct a new CPat tree dynamically and efficiently. This method divides a CPat tree consisting of bit string to fixed depth. In addition, it compose been divided CPAT tree hierarchically. A construction algorithm that proves this update time requires alteration of only one tree among whole trees that is divided. From experimental result that use 120,000 English substantives and 70,000 Japanese substantives, we prove an update time that is faster more than 40 times than the traditional method. Moreover, a space efficiency of memory increases about 35% only than the traditional method.

Explore More