Fumihito Nishino | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fumihito Nishino is active.

Explore More

Publication

Featured researches published by Fumihito Nishino.

meeting of the association for computational linguistics | 2006

Chinese-English Term Translation Mining Based on Semantic Prediction

Gaolin Fang; Hao Yu; Fumihito Nishino

Using abundant Web resources to mine Chinese term translations can be applied in many fields such as reading/writing assistant, machine translation and cross-language information retrieval. In mining English translations of Chinese terms, how to obtain effective Web pages and evaluate translation candidates are two challenging issues. In this paper, the approach based on semantic prediction is first proposed to obtain effective Web pages. The proposed method predicts possible English meanings according to each constituent unit of Chinese term, and expands these English items using semantically relevant knowledge for searching. The refined related terms are extracted from top retrieved documents through feedback learning to construct a new query expansion for acquiring more effective Web pages. For obtaining a correct translation list, a translation evaluation method in the weighted sum of multi-features is presented to rank these candidates estimated from effective Web pages. Experimental results demonstrate that the proposed method has good performance in Chinese-English term translation acquisition, and achieves 82.9% accuracy.

document engineering | 2003

Effective text extraction and recognition for WWW images

Jun Sun; Zhulong Wang; Hao Yu; Fumihito Nishino; Yukata Katsuyama; Satoshi Naoi

Images play a very important role in web content delivery. Many WWW images contain text information that can be used for web indexing and searching. A new text extraction and recognition algorithm is proposed in this paper. The character strokes in the image are first extracted by color clustering and connected component analysis. A novel stroke verification algorithm is used to effectively remove non-character strokes. The verified strokes are then used to build the binary text line image, which is segmented and recognized by dynamic programming. Since text in WWW image usually has close relationship with webpage content, approximate string matching is used to revise the recognition result by matching the content in the webpage with the content in the image. This effective post-processing not only improves the recognition performance, but also can be used in other applications such like image - webpage paragraph corresponding.

international joint conference on natural language processing | 2005

Web-based terminology translation mining

Gaolin Fang; Hao Yu; Fumihito Nishino

Mining terminology translation from a large amount of Web data can be applied in many fields such as reading/writing assistant, machine translation and cross-language information retrieval. How to find more comprehensive results from the Web and obtain the boundary of candidate translations, and how to remove irrelevant noises and rank the remained candidates are the challenging issues. In this paper, after reviewing and analyzing all possible methods of acquiring translations, a feasible statistics-based method is proposed to mine terminology translation from the Web. In the proposed method, on the basis of an analysis of different forms of term translation distributions, character-based string frequency estimation is presented to construct term translation candidates for exploring more translations and their boundaries, and then sort-based subset deletion and mutual information methods are respectively proposed to deal with subset redundancy information and prefix/suffix redundancy information formed in the process of estimation. Extensive experiments on two test sets of 401 and 3511 English terms validate that our system has better performance.

international conference on human centered design held as part of hci international | 2009

Web Orchestration: Customization and Sharing Tool for Web Information

Lei Fu; Terunobu Kume; Fumihito Nishino

In this paper, we present a tool, Web Orchestration, which allows people to customize and share the web information in a simple way. Our work is based on the web annotation and web scraping technique. It adopts B/S architecture, and has a user-friendly interface. It can be used in many aspects, such as web information monitoring, web information sharing, web information integration , recombination and so on. As an application of web 2.0 technique, its easy to use, simple but powerful; it can enhance collaboration of each other, and make web information sharing and personalized web information customization much easier to use.

ieee international conference semantic computing | 2015

Link scientific publications using linked data

Qingliang Miao; Yao Meng; Lu Fang; Fumihito Nishino; Nobuyuki Igata

Scientific publication management services are changing drastically. On the one hand, researchers demand intelligent search services to discover scientific publications. On the other hand, publishers need to incorporate semantic information to better organize their digital assets and make publications more discoverable. For this purpose, we investigate how to manage scientific publications using Linked Data and introduce FELinker, an entity linking component that links scientific publications with DBPedia. In particular, this paper introduces advantages of linking scientific publications with Linked Data, discusses major challenges, and outlines the proposed method. Experiment shows the proposed method could get promising performance in scientific publication linkage.

international joint conference on natural language processing | 2005

A lexicon-constrained character model for chinese morphological analysis

Yao Meng; Hao Yu; Fumihito Nishino

This paper proposes a lexicon-constrained character model that combines both word and character features to solve complicated issues in Chinese morphological analysis. A Chinese character-based model constrained by a lexicon is built to acquire word building rules. Each character in a Chinese sentence is assigned a tag by the proposed model. The word segmentation and part-of-speech tagging results are then generated based on the character tags. The proposed method solves such problems as unknown word identification, data sparseness, and estimation bias in an integrated, unified framework. Preliminary experiments indicate that the proposed method outperforms the best SIGHAN word segmentation systems in the open track on 3 out of the 4 test corpora. Additionally, our method can be conveniently integrated with any other Chinese morphological systems as a post-processing module leading to significant improvement in performance.

international symposium on neural networks | 2004

Exploring Various Features to Optimize Hot Topic Retrieval on WEB

Lan You; Xuanjing Huang; Lide Wu; Hao Yu; Jun Wang; Fumihito Nishino

BBS is an electrical forum on Web where people discuss many topics. So it’s a challenging problem to retrieve hot topics from it. There are various features of hot topics. Though count of posts on BBS about topic is a simple and effective feature for hotness of topic, it is shown in the paper that a better result can be obtained if irrelevant posts are filtered out before counting; and a much better result can be obtained if more features are combined using BPNN.

international joint conference on natural language processing | 2004

Chinese new word finding using character-based parsing model

Yao Meng; Hao Yu; Fumihito Nishino

The new word finding is a difficult and indispensable task in Chinese segmentation. The traditional methods used the string statistical information to identify the new words in the large-scale corpus. But it is neither convenient nor powerful enough to describe the words’ internal and external structure laws. And it is even the less effective when the occurrence frequency of the new words is very low in the corpus. In this paper, we present a novel method of using parsing information to find the new words. A character level PCFG model is trained by People Daily corpus and Penn Chinese Treebank. The characters are inputted into the character parsing system, and the words are determined by the parsing tree automatically. Our method describes the word-building rules in the full sentences, and takes advantage of rich context to find the new words. This is especially effective in identifying the occasional words or rarely used words, which are usually in low frequency. The preliminary experiments indicate that our method can substantially improve the precision and recall of the new word finding process.

Archive | 1991