Kyoji Umemura
Toyohashi University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kyoji Umemura.
international conference on computational linguistics | 1994
Kumiko Tanaka; Kyoji Umemura
When using a third language to construct a bilingual dictionary, it is necessary to discriminate equivalencies from inappropriate words derived as a result of ambiguity in the third language. We propose a method to treat this by utilizing the structures of dictionaries to measure the nearness of the meanings of words. The resulting dictionary is a word-to-word bilingual dictionary of nouns and can be used to refine the entries and equivalencies in published bilingual dictionaries.
Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages | 2003
Eiko Yamamoto; Masahiro Kishida; Yoshinori Takenami; Yoshiyuki Takeda; Kyoji Umemura
Though dynamic programming matching can carry out approximate string matching when there may be deletions or insertions in a document, its effectiveness and efficiency are usually too poor to use it for large-scale information retrieval. In this paper, we propose a method of dynamic programming matching for information retrieval. This method is as effective as a conventional information retrieval system, even though it is capable of approximate matching. It is also as efficient as a conventional system.
practical aspects of knowledge management | 2004
Takafumi Yamaya; Toramatsu Shintani; Tadachika Ozono; Yusuke Hiraoka; Hiromitsu Hattori; Takayuki Ito; Naoki Fukuta; Kyoji Umemura
The Internet is a very popular for information sharing technology since users can share information in organizations and communities. In this paper, we present a flexible peer-to-peer networking technology for information sharing on the Internet called MiNet. In some certain communities, MiNet can construct an ad-hoc network for information sharing. MiNet enables users to share information based on mobile agents, which are implemented in a mobile agent framework MiLog. MiNet can construct ad-hoc peer-to-peer networks by encapsulating information and sending it as mobile agents that can migrate in MiNet beyond firewalls, proxies, and NATs in LANs. Therefore, MiNet can construct VPNs, which consist of several LANs covered by firewalls, etc. MiNet agents can automatically choose a destination platform according to its policies. We show the document sharing system MiDoc as an application based on MiNet. Since MiDoc is implemented using MiNet, MiDoc users can share any document among any LANs.
empirical methods in natural language processing | 2005
Masayuki Okabe; Kyoji Umemura; Seiji Yamada
Query expansion techniques generally select new query terms from a set of top ranked documents. Although a users manual judgment of those documents would much help to select good expansion terms, it is difficult to get enough feedback from users in practical situations. In this paper we propose a query expansion technique which performs well even if a user notifies just a relevant document and a non-relevant document. In order to tackle this specific condition, we introduce two refinements to a well-known query expansion technique. One is application of a transductive learning technique in order to increase relevant documents. The other is a modified parameter estimation method which laps the predictions by multiple learning trials and try to differentiate the importance of candidate terms for expansion in relevant documents. Experimental results show that our technique outperforms some traditional query expansion methods in several evaluation measures.
empirical methods in natural language processing | 2000
Kyoji Umemura; Kenneth Ward Church
We propose an empirical method for estimating term weights directly from relevance judgments, avoiding various standard but potentially trouble-some assumptions. It is common to assume, for example, that weights vary with term frequency (tf) and inverse document frequency (idf) in a particular way, e.g., tf .idf, but the fact that there are so many variants of this formula in the literature suggests that there remains considerable uncertainty about these assumptions. Our method is similar to the Berkeley regression method where labeled relevance judgments are fit as a linear combination of (transforms of) tf, idf, etc. Training methods not only improve performance, but also extend naturally to include additional factors such as burstiness and query expansion. The proposed histogram-based training method provides a simple way to model complicated interactions among factors such as tf, idf, burstiness and expansion frequency (a generalization of query expansion). The correct handling of expanded term is realized based on statistical information. Expansion frequency dramatically improves performance from a level comparable to BKJJBIDS, Berkeleys entry in the Japanese NACSIS NTCIR-1 evaluation for short queries, to the level of JCB1, the top system in the evaluation. JCB1 uses sophisticated (and proprietary) natural language processing techniques developed by Just System, a leader in the Japanese word-processing industry. We are encouraged that the proposed method, which is simple to understand and replicate, can reach this level of performance.
international acm sigir conference on research and development in information retrieval | 2002
Yoshiyuki Takeda; Kyoji Umemura
It is not easy to tokenize agglutinative languages like Japanese and Chinese into words. Many IR systems start with a dictionary-based morphology program like ChaSen [4]. Unfortunately, dictionaries cannot cover all possible words; unknown words such as proper nouns are important for IR. This paper proposes a statistical dictionary-free method for selecting index strings based on recent work on adaptive language modeling.
network aware data management | 2011
Chunghan Lee; Hirotake Abe; Toshio Hirotsu; Kyoji Umemura
Grid applications are increasingly becoming dependent on network resources. Predicted network throughput is a useful parameter for network-aware scheduling for such applications. Although throughput prediction methods have been proposed, many of these methods are suffering from the fact that the probability distribution of traffic is unclear and the scale and bandwidth of networks are constantly changing. Furthermore, a virtual machine has been used as a platform for grid computing, and it can affect network measurement. A prediction method that uses pairs of differently sized connections has been proposed. This method, which we call connection pair, features a small probe transfer that predicts the throughput of a large data transfer. We propose a throughput prediction method based on the connection pair that uses v-support vector regression (SVR) and polynomial kernel to deal with prediction models represented as a non-linear and continuous monotonic function. The prediction accuracy of our method compared to that of a previous prediction method is higher. Moreover, the drop in the accuracy is also smaller than that of the previous method under an unstable network state. We clarify the prediction accuracy with other probe sizes for the connection pair. The accuracy is decreased by a small-sized probe, and there are no changes with a large-sized probe. These results show that our method is accurate, robust, and suitable for its purpose.
computer and information technology | 2010
Chunghan Lee; Hirotake Abe; Toshio Hirotsu; Kyoji Umemura
To clarify useful parameters for avoiding unstable conditions in network experiments on a virtualized testbed, we used PlanetLab as the virtualized testbed and measured network throughput using a combination of probe and data transfers. Although PlanetLab has been widely used as a testbed for overlay networks, distributed systems, and network measurement, it is provided as a virtualized environment to users. A set of these environments on different nodes is called as ‘slice’, and multiple slices run simultaneously on each node. We found that network throughput was occasionally decreased even though the network condition was stable. The cause of the throughput decrease was an unintended large packet spacing. The unintended large packet spacing is an anomaly. Although the cause of the anomaly is known to be unstable CPU scheduling latency, no clear conditions for anomaly avoidance had previously been given. We investigated throughput measurement with resource monitoring to clarify anomaly avoidance conditions. When the CPUs at a node are shared by many slices, slices are frequently scheduled off the CPUs, and the anomaly occurs. If network throughput is decreased by the anomaly, the measurement results should be discarded.
Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages | 2003
Yinghui Xu; Kyoji Umemura
In this paper, we focus on performing LSI on very low SVD dimensions. The results show that there is a nearly linear surface in the local query region. Using low-dimensional LSI on local query region we can capture such a linear surface, obtain much better performance than VSM and come comparably to global LSI. The surprisingly small requirements of the SVD dimension resolve the computation restrictions. Moreover, on the condition that several relevant sample documents are available, application of low-dimensional LSI to these documents yielded comparable IR performance to local RF but in a different manner.
asia information retrieval symposium | 2005
Masayuki Okabe; Kyoji Umemura; Seiji Yamada
Query expansion techniques generally select new query terms from a set of top ranked documents. Although a user’s manual judgment of those documents would much help to select good expansion terms, it is difficult to get enough feedback from users in practical situations. In this paper we propose a query expansion technique which performs well even if a user notifies just a relevant document and a non-relevant document. In order to tackle this specific condition, we introduce two refinements to a well-known query expansion technique. One is to increase documents possibly being relevant by a transductive learning method because the more relevant documents will produce the better performance. The other is a modified term scoring scheme based on the results of the learning method and a simple function. Experimental results show that our technique outperforms some traditional methods in standard precision and recall criteria.