Masami Shishibori
University of Tokushima
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Masami Shishibori.
systems man and cybernetics | 2001
Satoru Tsuge; Masami Shishibori; Shingo Kuroiwa; Kenji Kita
The vector space model (VSM) is a conventional information retrieval model, which represents a document collection by a term-by-document matrix. Since term-by-document matrices are usually high-dimensional and sparse, they are susceptible to noise and are also difficult to capture the underlying semantic structure. Additionally, the storage and processing of such matrices places great demands on computing resources. Dimensionality reduction is a way to overcome these problems. Principal component analysis (PCA) and singular value decomposition (SVD) are popular techniques for dimensionality reduction based on matrix decomposition, however they contain both positive and negative values in the decomposed matrices. In the work described here, we use non-negative matrix factorization (NMF) for dimensionality reduction of the vector space model. Since matrices decomposed by NMF only contain non-negative values, the original data are represented by only additive, not subtractive, combinations of the basis vectors. This characteristic of parts-based representation is appealing because it reflects the intuitive notion of combining parts to form a whole. Also NMF computation is based on the simple iterative algorithm, it is therefore advantageous for applications involving large matrices. Using the MEDLINE collection, we experimentally showed that NMF offers great improvement over the vector space model.
Information Processing and Management | 2002
Samuel Sangkon Lee; Masami Shishibori; Toru Sumitomo; Jun-ichi Aoe
It is important to identify text that is substantially independent of adjacent material. This paper presents a technique for dividing text into field-coherent passages. The method presented is based upon extracting field-associated words or phrases from the text by determining how topics grow, shrink and shift from sentence to sentence. We propose measures of topic continuity and transition and suggest how those may be used to find the passage boundaries. After collecting 12,500 documents, we obtained an average precision of 88% and recall of 78% in a training document set.
Information Processing and Management | 2002
Minsoo Jung; Masami Shishibori; Yasuhiro Tanaka; Jun-ichi Aoe
We need to access objective information efficiently and arbitrary strings in the text at high speed. In several key retrieval strategies, we often use the binary trie for supporting fast access method in order. Especially, the Patricia trie (Pat tree) is famous as the fastest access method in binary tries, because it has the shallowest tree structure. However, the Pat tree requires many good physician storage spaces in memory, if key set registered is large. Thereby, an expense problem happens when storing this trie to the main storage unit. We already proposed a method that use compact bit stream and compress a Pat tree to solve this problem. This is called Compact Patricia trie (CPat tree). This CPat tree needs capacity of only a very few memory device. However, if a size of key set increases, the time expense that search, update key increases gradually. This paper proposes a new structure of the CPat tree to avoid that it takes much time in search and update about much key set, and a method to construct a new CPat tree dynamically and efficiently. This method divides a CPat tree consisting of bit string to fixed depth. In addition, it compose been divided CPAT tree hierarchically. A construction algorithm that proves this update time requires alteration of only one tree among whole trees that is divided. From experimental result that use 120,000 English substantives and 70,000 Japanese substantives, we prove an update time that is faster more than 40 times than the traditional method. Moreover, a space efficiency of memory increases about 35% only than the traditional method.
Information Sciences | 1998
Jun-ichi Aoe; Kazuaki Anda; Toshiharu Kinoshita; Masami Shishibori
Abstract Aho and Corasick presented a string pattern matching machine to locate multiple keywords. However, the AC machine could not match multi-attribute information. This paper describes an efficient multi-attribute pattern matching machine to locate all occurrences of any of a finite number of the sequence of rule structures (called matching rules) in a sequence of input structures. The proposed algorithm enables us to match set representations containing multiple attributes. Therefore, confirming transition is decided by the relationship, whether the input structure includes the rule structure or not. Finally, the pattern matching algorithm is evaluated by theoretical analysis and the evaluation is supported by the simulation results with rules for the extraction of keywords.
systems man and cybernetics | 1997
Masami Shishibori; M. Okuno; Kazuaki Ando; Jun-ichi Aoe
In many applications, information retrieval is a very important research field. In several key strategies, the trie is famous as a fast access method to be able to retrieve keys in order. Especially, the Patricia trie gives the shallowest trie by eliminating all single descendant nodes, for this reason, the Patricia trie is often used as indices of information retrieval systems. If trie structures are implemented, however, the greater the number of registered keys, the larger storage is required. Jonge et al. (1987) proposed a method to change the normal binary trie into a compact bit stream. This paper shows the method for compressing the Patricia trie into the new bit stream. The theoretical and experimental results show that this method generates 40/spl sim/60 percent shorter than the traditional method. This method thus enables us to provide more compact storage and faster access than the traditional method.
International Journal of Computer Mathematics | 1994
Ki-Hong Park; Jun-ichi Aoe; Katsushi Morimoto; Masami Shishibori
SCOPE: Algorithms, Information storage and retrieval. A trie is a search tree obtained by merging the common suffixes of the key set. It has the advantage that all keys as prefixes of an input string can be retrieved with high speed. When the size of the key set is enlarged, however, a problem arises, as the number of transitions increases, so too does the need for a large storage capacity. This paper proposes an algorithm that dynamically constructs DAWGs (Directed Acyclic Word Graphs) for the handling of dynamic key sets. It also solves the problem of the increasing number of transitions in the trie structure. The proposed method constructs a DAWG through the local separation of common suffixes for updating a key and, after finishing updating a key, the local transition merge of common suffixes. The proposed algorithm is theoretically evaluated and the data structure for the implementation is discussed. Experimental results show that the number of transitions in the DAWG is reduced by approx. 50 to 70% ...
international conference on acoustics, speech, and signal processing | 2006
Satoru Tsuge; Masami Shishibori; Kenji Kita; Fuji Ren; Shingo Kuroiwa
In this paper, we describe a Japanese speech corpus collected for investigating the speech variability of a specific speaker over short and long time periods and then report the variability of speech recognition performance over short and long time periods. Although speakers use a speaker-dependent speech recognition system, it is known that speech recognition performance varies pending when the utterance was uttered. This is because speech quality varies by occasion even if the speaker and utterance remain constant. However, the relationships between intra-speaker speech variability and speech recognition performance are not clear. Hence, we have been collecting speech data to investigate these relationships since November 2002. In this paper, we introduce our speech corpus and report speech recognition experiments using our corpus. Experimental results show that the variability of recognition performance over different days is larger than variability of recognition performance within a day
International Journal of Computer Processing of Languages | 2002
Samuel Sangkon Lee; Masami Shishibori
This paper presents a technique for dividing the text into field-coherent passages. The presented method is based upon extracting field-associated terms from the text measuring how the topics grow, shrink and shift from sentence to sentence. We propose measures of topic continuity and of topic transition and suggest how those could be used to find the boundaries among passages. After collecting 12,500 documents, we obtain 88% for average precision and 78% for recall in training set.
ieee international conference on intelligent processing systems | 1997
Masami Shishibori; Kazuaki Ando; Makoto Okada; Jun-ichi Aoe
In several key strategies, the Patricia trie has the shallowest trie by eliminating all nodes which have only one arc, and these nodes are called single descendant nodes. For this reason, this trie can retrieve the key faster than any other trie strategies. This trie, however, must store information concerning the eliminated nodes, and thus if this trie structure is implemented, the required storage is large. This paper shows the retrieval algorithm using the compact Patricia trie, which is represented by the bit stream.
korea japan joint workshop on frontiers of computer vision | 2011
Masami Shishibori; Samuel Sangkon Lee; Kenji Kita
On multimedia databases, in order to realize the fast access method, indexing methods for the multi-dimension data space are used. However, since it is a premise to use the Euclid distance as the distance measure, this method lacks in flexibility. On the other hand, there are metric indexing methods which require only to satisfy distance axiom. Since metric indexing methods can also apply for distance measures other than the Euclid distance, these methods have high flexibility. This paper proposes an improved method of VP-tree which is one of the metric indexing methods. VP-tree follows the node which suits the search range from a route node at searching. And distances between a query and all objects linked from the leaf node which finally arrived are computed, and it investigates whether each object is contained in the search range. However, search speed will become slow if the number of distance calculations in a leaf node increases. Therefore, we paid attention to the candidates selection method using the triangular inequality in a leaf node. As the improved methods, we propose a method to use the nearest neighbor object point for the query as the datum point of the triangular inequality. It becomes possible to make the search range smaller and to cut down the number of times of distance calculation by these improved methods. From evaluation experiments using 10,000 image data, it was found that our proposed method could cut 5%∼12% of search time of the traditional method.