Masao Fuketa | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Masao Fuketa is active.

Explore More

Publication

Featured researches published by Masao Fuketa.

Information Processing and Management | 2003

Documents similarity measurement using field association terms

El-Sayed Atlam; Masao Fuketa; Kazuhiro Morita; Jun-ichi Aoe

Conventional approaches to text analysis and information retrieval which measured document similarity by using considering all of the information in texts are a relatively inefficiency for processing large text collections in heterogeneous subject areas. This paper outlined a new text manipulation system FA-Sim that is useful for retrieving information in large heterogeneous texts and for recognizing content similarity in text excerpts. FA-Sim is based on flexible text matching procedures carried out in various contexts and various field ranks. FA-Sim measures texts similarity by using specific field association (FA) terms instead of by comparing all text information. Similarity between texts is faster and higher by using FA-Sim than other two analysis methods. Therefore, Recall and Precision significantly improved by 39% and 37% over these two traditional methods.

Information Processing and Management | 2002

A new method for selecting English field association terms of compound words and its knowledge representation

El-Sayed Atlam; Kazuhiro Morita; Masao Fuketa; Jun-ichi Aoe

This paper presents a strategy for building a morphological machine dictionary of English that infers meaning of derivations by considering morphological affixes and their semantic classification. Derivations are grouped into a frame that is accessible to semantic stem and knowledge base. This paper also proposes an efficient method for selecting compound Field Association (FA) terms from a large pool of single FA terms for some specialized fields. For single FA terms, five levels of association are defined and two ranks are defined, based on stability and inheritance. About 85% of redundant compound FA terms can be removed effectively by using levels and ranks proposed in this paper. Recall averages of 60-80% are achieved, depending on the type of text. The proposed methods are applied to 22,000 relationships between verbs and nouns extracted from the large tagged corpus.

Information Processing and Management | 2006

Automatic building of new field association word candidates using search engine

El-Sayed Atlam; Ghada Elmarhomy; Kazuhiro Morita; Masao Fuketa; Jun-ichi Aoe

With increasing popularity of the Internet and tremendous amount of on-line text, automatic document classification is important for organizing huge amounts of data. Readers can know the subject of many document fields by reading only some specific Field Association (FA) words. Document fields can be decided efficiently if there are many FA words and if the frequency rate is high. This paper proposes a method for automatically building new FA words. A WWW search engine is used to extract FA word candidates from document corpora. New FA word candidates in each field are automatically compared with previously determined FA words. Then new FA words are appended to an FA word dictionary. From the experiential results, our new system can automatically appended around 44% of new FA words to the existence FA word dictionary. Moreover, the concentration ratio 0.9 is also effective for extracting relevant FA words that needed for the system design to build FA words automatically.

Information Processing and Management | 2007

A compact static double-array keeping character codes

Susumu Yata; Masaki Oono; Kazuhiro Morita; Masao Fuketa; Toru Sumitomo; Jun-ichi Aoe

A trie represented by a double-array enables us to search a key fast with a small space. However, the double-array uses extra space to be updated dynamically. This paper presents a compact structure for a static double-array. The new structure keeps character codes instead of indices in order to compress elements of the double-array. In addition, the new structure unifies common suffixes and consists of less elements than the old structure. Experimental results for English keys show that the new structure reduces space usage of the double-array up to 40%.

International Journal of Computer Mathematics | 2005

A sentence classification technique using intention association expressions

Yuki Kadoya; Kazuhiro Morita; Masao Fuketa; Masaki Oono; El-Sayed Atlam; Toru Sumitomo; Jun-ichi Aoe

Although there are many text classification techniques depending on the vector space, it is difficult to detect the meaning related to the user’s intention (complaint, encouragement, request, invitation, etc.). The approach be discussed in this paper is very useful for understanding focus points in conversation. We present a technique for determining the speaker’s intention for sentences in conversation. Intention association expressions are introduced, and formal descriptions with weights are defined using these expressions to construct an intention classification. A deterministic multi-attribute pattern-matching algorithm is used to determine the intention class efficiently. In simulation results for 681 email messages of 5859 sentences, the multi-attribute pattern-matching algorithm is about 44.5 times faster than the Aho and Corasick method. The precision and recall of intention classification of sentences are 91% and 95%, respectively. The precision and recall of extraction of unnecessary sentences are 98% and 96%, respectively. The precision and recall of the classification of each email are 88% and 89%, respectively.

International Journal of Computer Mathematics | 1995

An incremental algorithm for string pattern matching machines

Kazuhiko Tsuda; Masao Fuketa; Jun-ichi Aoe

Aho and Corasick presented a string pattern matching machine (hereafter called machine AC) to locate multiple keywords. However, the machine AC must be reconstructed all over again when a keyword is appended. This paper proposes an efficient algorithm to append a keyword for the machine AC. This paper presents the time efficiency comparison with the original algorithm using the actual simulation results. The simulation results show the speed up factor, by the algorithm proposed, to be between 25 and 270 fold when compared with the original algorithm by Aho and Corasick which requires the reconstruction of the entire machine AC.

Information Processing and Management | 2007

Improvement of building field association term dictionary using passage retrieval

Uddin Sharif; Elmarhomy Ghada; El-Sayed Atlam; Masao Fuketa; Kazuhiro Morita; Jun-ichi Aoe

Field Association (FA) terms are a limited set of discriminating terms that can specify document fields. Document fields can be decided efficiently if there are many relevant FA terms in that documents. An earlier approach built FA terms dictionary using a WWW search engine, but there were irrelevant selected FA terms in that dictionary because that approach extracted FA terms from the whole documents. This paper proposes a new approach for extracting FA terms using passage (portions of a document text) technique rather than extracting them from the whole documents. This approach extracts FA terms more accurately than the earlier approach. The proposed approach is evaluated for 38,372 articles from the large tagged corpus. According to experimental results, it turns out that by using the new approach about 24% more relevant FA terms are appending to the earlier FA term dictionary and around 32% irrelevant FA terms are deleted. Moreover, precision and recall are achieved 98% and 94% respectively using the new approach.

Software - Practice and Experience | 2007

An efficient deletion method for a minimal prefix double array

Susumu Yata; Masaki Oono; Kazuhiro Morita; Masao Fuketa; Jun-ichi Aoe

A minimal prefix (MP) double array is an efficient data structure for a trie. The MP double array only requires a small amount of space and enables fast retrieval. However, the space efficiency of the MP double array is degraded by deletion. This paper presents a fast and compact adaptive deletion method for the MP double array. The presented method is implemented with C. Simulation results for English and Japanese keys show that the adaptive method is faster than the conventional method and maintains higher space efficiency. Copyright

Information Processing and Management | 2000

Similarity measurement using term negative weight and its application to word similarity

El-Sayed Atlam; Masao Fuketa; Kazuhiro Morita; Jun-ichi Aoe

Abstract A “term weighting” is a useful technique for keyword extraction and document classification. The traditional approach depends on high frequency terms, called positive weight ( PW ) function. This paper presents a new weighting method that depends on low frequency terms, called negative weight ( NW ) function. In this paper word similarity for typical verbs and objects is focused as an example for the application field. Negative weighted inverse verb frequency ( NWIVF ) function is well defined in this study and new similarity measurement is presented by combining the NWIVF and PWIVF ( positive weighted inverse verb frequency ) functions. The proposed method is applied to 11,000 relationships between verbs and nouns extracted from a large tagged corpus. By using this new method both recall and precision have improved by 33% and 18% respectively, over the positive weight method.

Information Processing and Management | 2014

Compression of double array structures for fixed length keywords

Masao Fuketa; Hiroya Kitagawa; Takuki Ogawa; Kazuhiro Morita; Jun-ichi Aoe

A trie is one of the data structures for keyword matching. It is used in natural language processing, IP address routing, and so on. It is represented by the matrix form, the link form, the double array, and LOUDS. The double array representation combines retrieval speed of the matrix form with compactness of the list form. LOUDS is a succinct data structure using bit-string. Retrieval speed of LOUDS is not faster than that of the double array, but its space usage is smaller. This paper proposes a compressed version of the double array by dividing the trie into multiple levels and removing the BASE array from the double array. Moreover, a retrieval algorithm and a construction algorithm are proposed. According to the presented experimental results for pseudo and real data sets, the retrieval speed of the presented method is almost the same as the double array, and its space usage is compressed to 66% comparing with LOUDS for a large set of keywords with fixed length.

Explore More