Wen Hsiang Lu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wen Hsiang Lu is active.

Explore More

Publication

Featured researches published by Wen Hsiang Lu.

international acm sigir conference on research and development in information retrieval | 2004

Translating unknown queries with web corpora for cross-language information retrieval

Pu-Jen Cheng; Jei Wen Teng; Ruei Cheng Chen; Jenq-Haur Wang; Wen Hsiang Lu; Lee-Feng Chien

It is crucial for cross-language information retrieval (CLIR) systems to deal with the translation of unknown queries due to that real queries might be short. The purpose of this paper is to investigate the feasibility of exploiting the Web as the corpus source to translate unknown queries for CLIR. We propose an online translation approach to determine effective translations for unknown query terms via mining of bilingual search-result pages obtained from Web search engines. This approach can alleviate the problem of the lack of large bilingual corpora, translate many unknown query terms, provide flexible query specifications, and extract semantically-close translations to benefit CLIR tasks -- especially for cross-language Web search.

Expert Systems With Applications | 2011

An effective tree structure for mining high utility itemsets

Chun Wei Lin; Tzung-Pei Hong; Wen Hsiang Lu

Research highlights? In this paper, the high utility pattern tree (HUP tree) is designed. ? The HUP-growth mining algorithm is proposed to derive high utility patterns effectively and efficiently. ? The proposed approach integrates the previous two-phase procedure for utility mining and the FP-tree concept to utilize the downward-closure property and generate a compressed tree structure. In the past, many algorithms were proposed to mine association rules, most of which were based on item frequency values. Considering a customer may buy many copies of an item and each item may have different profits, mining frequent patterns from a traditional database is not suitable for some real-world applications. Utility mining was thus proposed to consider costs, profits and other measures according to user preference. In this paper, the high utility pattern tree (HUP tree) is designed and the HUP-growth mining algorithm is proposed to derive high utility patterns effectively and efficiently. The proposed approach integrates the previous two-phase procedure for utility mining and the FP-tree concept to utilize the downward-closure property and generate a compressed tree structure. Experimental results also show that the proposed approach has a better performance than Liu et al.s two-phase algorithm in execution time. At last, the numbers of tree nodes generated from three different item ordering methods are also compared, with results showing that the frequency ordering produces less tree nodes than the other two.

ACM Transactions on Information Systems | 2004

Anchor text mining for translation of Web queries: A transitive translation approach

Wen Hsiang Lu; Lee-Feng Chien; Hsi-Jian Lee

To discover translation knowledge in diverse data resources on the Web, this article proposes an effective approach to finding translation equivalents of query terms and constructing multilingual lexicons through the mining of Web anchor texts and link structures. Although Web anchor texts are wide-scoped hypertext resources, not every particular pair of languages contains sufficient anchor texts for effective extraction of translations for Web queries. For more generalized applications, the approach is designed based on a transitive translation model. The translation equivalents of a query term can be extracted via its translation in an intermediate language. To reduce interference from translation errors, the approach further integrates a competitive linking algorithm into the process of determining the most probable translation. A series of experiments has been conducted, including performance tests on term translation extraction, cross-language information retrieval, and translation suggestions for practical Web search services, respectively. The obtained experimental results have shown that the proposed approach is effective in extracting translations of unknown queries, is easy to combine with the probabilistic retrieval model to improve the cross-language retrieval performance, and is very useful when the considered language pairs lack a sufficient number of anchor texts. Based on the approach, an experimental system called LiveTrans has been developed for English--Chinese cross-language Web search.

ACM Transactions on Asian Language Information Processing | 2002

Translation of web queries using anchor text mining

Wen Hsiang Lu; Lee-Feng Chien; Hsi-Jian Lee

This article presents an approach to automatically extracting translations of Web query terms through mining of Web anchor texts and link structures. One of the existing difficulties in cross-language information retrieval (CLIR) and Web search is the lack of appropriate translations of new terminology and proper names. The proposed approach successfully exploits the anchor-text resources and reduces the existing difficulties of query term translation. Many query terms that cannot be obtained in general-purpose translation dictionaries are, therefore, extracted.

Expert Systems With Applications | 2009

The Pre-FUFP algorithm for incremental mining

Chun Wei Lin; Tzung-Pei Hong; Wen Hsiang Lu

The frequent pattern tree (FP-tree) is an efficient data structure for association-rule mining without generation of candidate itemsets. It was used to compress a database into a tree structure which stored only large items. It, however, needed to process all transactions in a batch way. In real-world applications, new transactions are usually incrementally inserted into databases. In the past, we proposed a Fast Updated FP-tree (FUFP-tree) structure to efficiently handle new transactions and to make the tree update process become easier. In this paper, we attempt to modify the FUFP-tree construction based on the concept of pre-large itemsets. Pre-large itemsets are defined by a lower support threshold and an upper support threshold. It does not need to rescan the original database until a number of new transactions have been inserted. The proposed approach can thus achieve a good execution time for tree construction especially when each time a small number of transactions are inserted. Experimental results also show that the proposed Pre-FUFP maintenance algorithm has a good performance for incrementally handling new transactions.

meeting of the association for computational linguistics | 2004

Creating Multilingual Translation Lexicons with Regional Variations Using Web Corpora

Pu-Jen Cheng; Wen Hsiang Lu; Jei-Wen Teng; Lee-Feng Chien

The purpose of this paper is to automatically create multilingual translation lexicons with regional variations. We propose a transitive translation approach to determine translation variations across languages that have insufficient corpora for translation via the mining of bilingual search-result pages and clues of geographic information obtained from Web search engines. The experimental results have shown the feasibility of the proposed approach in efficiently generating translation equivalents of various terms not covered by general translation dictionaries. It also revealed that the created translation lexicons can reflect different cultural aspects across regions such as Taiwan, Hong Kong and mainland China.

Expert Systems With Applications | 2010

Linguistic data mining with fuzzy FP-trees

Chun Wei Lin; Tzung-Pei Hong; Wen Hsiang Lu

Due to the increasing occurrence of very large databases, mining useful information and knowledge from transactions is evolving into an important research area. In the past, many algorithms were proposed for mining association rules, most of which were based on items with binary values. Transactions with quantitative values are, however, commonly seen in real-world applications. In this paper, the frequent fuzzy pattern tree (fuzzy FP-tree) is proposed for extracting frequent fuzzy itemsets from the transactions with quantitative values. When extending the FP-tree to handle fuzzy data, the processing becomes much more complex than the original since fuzzy intersection in each transaction has to be handled. The fuzzy FP-tree construction algorithm is thus designed, and the mining process based on the tree is presented. Experimental results on three different numbers of fuzzy regions also show the performance of the proposed approach.

international conference on data mining | 2001

Anchor text mining for translation of Web queries

Wen Hsiang Lu; Lee-Feng Chien; Hsi-Jian Lee

The paper presents an approach to automatically extracting translations of Web query terms through mining of Web anchor texts and link structures. One of the existing difficulties in cross-language information retrieval (CLIR) and Web search is the lack of the appropriate translations of new terminology and proper names. Such a difficult problem can be effectively alleviated by our proposed approach, and the resource of anchor texts in the Web is proven a valuable corpus for this kind of term translation.

web intelligence | 2007

Improving Identification of Latent User Goals through Search-Result Snippet Classification

Kuan Yu He; Yao Sheng Chang; Wen Hsiang Lu

In this paper, we propose an enhanced approach to improving our previous method which employs syntactic structures (verb-object pairs) to identify latent user goals. Our new approach employs a supervised-learning method to learn hint verbs and considers URL information and title information to classify snippets into three coarse categories, which are resource-seeking, informational, and navigational. Also, we propose three different models to identify three different categories of specific latent user goals from the classified snippets.In this paper, we propose an enhanced approach to improving our previous method which employs syntactic structures (verb-object pairs) to identify latent user goals. Our new approach employs a supervised-learning method to learn hint verbs and considers URL information and title information to classify snippets into three coarse categories, which are resource-seeking, informational, and navigational. Also, we propose three different models to identify three different categories of specific latent user goals from the classified snippets.

intelligent systems design and applications | 2008

An Incremental FUSP-Tree Maintenance Algorithm

Chun Wei Lin; Tzung-Pei Hong; Wen Hsiang Lu; Wen-Yang Lin

In this paper, we attempt to handle the maintenance of sequential patterns. New transactions may come from both the new customers and old customers. A fast updated sequential pattern tree (called FUSP-tree) structure is proposed to make the tree update process become easy. An incremental FUSP-tree maintenance algorithm is also proposed for reducing the execution time in reconstructing the tree. The proposed approach is expected to achieve a good trade-off between execution time and tree complexity.

Explore More