Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Pu-Jen Cheng is active.

Publication


Featured researches published by Pu-Jen Cheng.


international acm sigir conference on research and development in information retrieval | 2004

Translating unknown queries with web corpora for cross-language information retrieval

Pu-Jen Cheng; Jei Wen Teng; Ruei Cheng Chen; Jenq-Haur Wang; Wen Hsiang Lu; Lee-Feng Chien

It is crucial for cross-language information retrieval (CLIR) systems to deal with the translation of unknown queries due to that real queries might be short. The purpose of this paper is to investigate the feasibility of exploiting the Web as the corpus source to translate unknown queries for CLIR. We propose an online translation approach to determine effective translations for unknown query terms via mining of bilingual search-result pages obtained from Web search engines. This approach can alleviate the problem of the lack of large bilingual corpora, translate many unknown query terms, provide flexible query specifications, and extract semantically-close translations to benefit CLIR tasks -- especially for cross-language Web search.


meeting of the association for computational linguistics | 2004

Creating Multilingual Translation Lexicons with Regional Variations Using Web Corpora

Pu-Jen Cheng; Wen Hsiang Lu; Jei-Wen Teng; Lee-Feng Chien

The purpose of this paper is to automatically create multilingual translation lexicons with regional variations. We propose a transitive translation approach to determine translation variations across languages that have insufficient corpora for translation via the mining of bilingual search-result pages and clues of geographic information obtained from Web search engines. The experimental results have shown the feasibility of the proposed approach in efficiently generating translation equivalents of various terms not covered by general translation dictionaries. It also revealed that the created translation lexicons can reflect different cultural aspects across regions such as Taiwan, Hong Kong and mainland China.


international acm sigir conference on research and development in information retrieval | 2014

Learning user reformulation behavior for query auto-completion

Jyun-Yu Jiang; Yen-Yu Ke; Pao-Yu Chien; Pu-Jen Cheng

It is crucial for query auto-completion to accurately predict what a user is typing. Given a query prefix and its context (e.g., previous queries), conventional context-aware approaches often produce relevant queries to the context. The purpose of this paper is to investigate the feasibility of exploiting the context to learn user reformulation behavior for boosting prediction performance. We first conduct an in-depth analysis of how the users reformulate their queries. Based on the analysis, we propose a supervised approach to query auto-completion, where three kinds of reformulation-related features are considered, including term-level, query-level and session-level features. These features carefully capture how the users change preceding queries along the query sessions. Extensive experiments have been conducted on the large-scale query log of a commercial search engine. The experimental results demonstrate a significant improvement over 4 competitive baselines.


conference on information and knowledge management | 2012

Visualizing timelines: evolutionary summarization via iterative reinforcement between text and image streams

Rui Yan; Xiaojun Wan; Mirella Lapata; Wayne Xin Zhao; Pu-Jen Cheng; Xiaoming Li

We present a novel graph-based framework for timeline summarization, the task of creating different summaries for different timestamps but for the same topic. Our work extends timeline summarization to a multimodal setting and creates timelines that are both textual and visual. Our approach exploits the fact that news documents are often accompanied by pictures and the two share some common content. Our model optimizes local summary creation and global timeline generation jointly following an iterative approach based on mutual reinforcement and co-ranking. In our algorithm, individual summaries are generated by taking into account the mutual dependencies between sentences and images, and are iteratively refined by considering how they contribute to the global timeline and its coherence. Experiments on real-world datasets show that the timelines produced by our model outperform several competitive baselines both in terms of ROUGE and when assessed by human evaluators.


conference on information and knowledge management | 2009

Learning to rank from Bayesian decision inference

Jen-Wei Kuo; Pu-Jen Cheng; Hsin-Min Wang

Ranking is a key problem in many information retrieval (IR) applications, such as document retrieval and collaborative filtering. In this paper, we address the issue of learning to rank in document retrieval. Learning-based methods, such as RankNet, RankSVM, and RankBoost, try to create ranking functions automatically by using some training data. Recently, several learning to rank methods have been proposed to directly optimize the performance of IR applications in terms of various evaluation measures. They undoubtedly provide statistically significant improvements over conventional methods; however, from the viewpoint of decision-making, most of them do not minimize the Bayes risk of the IR system. In an attempt to fill this research gap, we propose a novel framework that directly optimizes the Bayes risk related to the ranking accuracy in terms of the IR evaluation measures. The results of experiments on the LETOR collections demonstrate that the framework outperforms several existing methods in most cases.


asia information retrieval symposium | 2009

Selecting Effective Terms for Query Formulation

Chia-Jung Lee; Yi-Chun Lin; Ruey-Cheng Chen; Pu-Jen Cheng

It is difficult for users to formulate appropriate queries for search. In this paper, we propose an approach to query term selection by measuring the effectiveness of a query term in IR systems based on its linguistic and statistical properties in document collections. Two query formulation algorithms are presented for improving IR performance. Experiments on NTCIR-4 and NTCIR-5 ad-hoc IR tasks demonstrate that the algorithms can significantly improve the retrieval performance by 9.2% averagely, compared to the performance of the original queries given in the benchmarks.


international conference on asian digital libraries | 2003

Effective image annotation for search using multi-level semantics

Pu-Jen Cheng; Lee-Feng Chien

There is an increasing need of development of automatic tools to annotate images for effective image searching in digital libraries. In this paper, we present a novel probabilistic model for image annotation based on content-based image retrieval techniques and statistical analysis. One key obstacle in applying statistical methods to annotating images is the amount of manually-labeled images, which are used to train the methods, is normally insufficient. Numerous keywords cannot be correctly assigned to appropriate images due to lacking or missing in the labeled image database. We further propose an enhanced model to deal with the challenging problem. With the model, the annotated keywords of a new image are determined in terms of their similarity at different semantic levels including image level, keyword level and concept level. To avoid some relevant keywords missing, the model prefers labeling the keywords with the same concepts to the new image. Obtained experimental results have shown that the proposed models are effective for helping users annotate images in different training data qualities.


web intelligence | 2011

Clustering and Visualizing Geographic Data Using Geo-tree

Che-An Lu; Chin-Hui Chen; Pu-Jen Cheng

Plotting lots of geographical data points usually clutters up a map. In this paper, we propose an approach to provide a summary view of geographical data by efficiently clustering. We present a novel data structure, called Geo-tree, which is extended from quad tree, and then develop two algorithms, which use Geo-tree to cluster geographic data and visualize the clusters with a heat map-like representation. The experimental results show that our approach is very efficient in a large scale, compared to K-means and HAC, and the clustering results are comparable to theirs.


international acm sigir conference on research and development in information retrieval | 2010

To translate or not to translate

Chia-Jung Lee; Chin-Hui Chen; Shao-Hang Kao; Pu-Jen Cheng

Query translation is an important task in cross-language information retrieval (CLIR) aiming to translate queries into languages used in documents. The purpose of this paper is to investigate the necessity of translating query terms, which might differ from one term to another. Some untranslated terms cause irreparable performance drop while others do not. We propose an approach to estimate the translation probability of a query term, which helps decide if it should be translated or not. The approach learns regression and classification models based on a rich set of linguistic and statistical properties of the term. Experiments on NTCIR-4 and NTCIR-5 English-Chinese CLIR tasks demonstrate that the proposed approach can significantly improve CLIR performance. An in-depth analysis is also provided for discussing the impact of untranslated out-of-vocabulary (OOV) query terms and translation quality of non-OOV query terms on CLIR performance.


conference on information and knowledge management | 2015

Improving Ranking Consistency for Web Search by Leveraging a Knowledge Base and Search Logs

Jyun-Yu Jiang; Jing Liu; Chin-Yew Lin; Pu-Jen Cheng

In this paper, we propose a new idea called ranking consistency in web search. Relevance ranking is one of the biggest problems in creating an effective web search system. Given some queries with similar search intents, conventional approaches typically only optimize ranking models by each query separately. Hence, there are inconsistent rankings in modern search engines. It is expected that the search results of different queries with similar search intents should preserve ranking consistency. The aim of this paper is to learn consistent rankings in search results for improving the relevance ranking in web search. We then propose a re-ranking model aiming to simultaneously improve relevance ranking and ranking consistency by leveraging knowledge bases and search logs. To the best of our knowledge, our work offers the first solution to improving relevance rankings with ranking consistency. Extensive experiments have been conducted using the Freebase knowledge base and the large-scale query-log of a commercial search engine. The experimental results show that our approach significantly improves relevance ranking and ranking consistency. Two user surveys on Amazon Mechanical Turk also show that users are sensitive and prefer the consistent ranking results generated by our model.

Collaboration


Dive into the Pu-Jen Cheng's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jyun-Yu Jiang

University of California

View shared research outputs
Top Co-Authors

Avatar

Ruey-Cheng Chen

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Chin-Hui Chen

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Pei-Ying Huang

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Chia-Jung Lee

University of Massachusetts Amherst

View shared research outputs
Top Co-Authors

Avatar

Hsin-Yu Liu

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Shao-Hang Kao

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Wei-Yen Day

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Wen Hsiang Lu

National Cheng Kung University

View shared research outputs
Researchain Logo
Decentralizing Knowledge