Ming-Feng Tsai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ming-Feng Tsai is active.

Explore More

Publication

Featured researches published by Ming-Feng Tsai.

international acm sigir conference on research and development in information retrieval | 2007

FRank: a ranking method with fidelity loss

Ming-Feng Tsai; Tie-Yan Liu; Tao Qin; Hsin-Hsi Chen; Wei-Ying Ma

Ranking problem is becoming important in many fields, especially in information retrieval (IR). Many machine learning techniques have been proposed for ranking problem, such as RankSVM, RankBoost, and RankNet. Among them, RankNet, which is based on a probabilistic ranking framework, is leading to promising results and has been applied to a commercial Web search engine. In this paper we conduct further study on the probabilistic ranking framework and provide a novel loss function named fidelity loss for measuring loss of ranking. The fidelity loss notonly inherits effective properties of the probabilistic ranking framework in RankNet, but possesses new properties that are helpful for ranking. This includes the fidelity loss obtaining zero for each document pair, and having a finite upper bound that is necessary for conducting query-level normalization. We also propose an algorithm named FRank based on a generalized additive model for the sake of minimizing the fedelity loss and learning an effective ranking function. We evaluated the proposed algorithm for two datasets: TREC dataset and real Web search dataset. The experimental results show that the proposed FRank algorithm outperforms other learning-based ranking methods on both conventional IR problem and Web search.

Information Processing and Management | 2008

Query-level loss functions for information retrieval

Tao Qin; Xudong Zhang; Ming-Feng Tsai; De-Sheng Wang; Tie-Yan Liu; Hang Li

Many machine learning technologies such as support vector machines, boosting, and neural networks have been applied to the ranking problem in information retrieval. However, since originally the methods were not developed for this task, their loss functions do not directly link to the criteria used in the evaluation of ranking. Specifically, the loss functions are defined on the level of documents or document pairs, in contrast to the fact that the evaluation criteria are defined on the level of queries. Therefore, minimizing the loss functions does not necessarily imply enhancing ranking performances. To solve this problem, we propose using query-level loss functions in learning of ranking functions. We discuss the basic properties that a query-level loss function should have and propose a query-level loss function based on the cosine similarity between a ranking list and the corresponding ground truth. We further design a coordinate descent algorithm, referred to as RankCosine, which utilizes the proposed loss function to create a generalized additive ranking model. We also discuss whether the loss functions of existing ranking algorithms can be extended to query-level. Experimental results on the datasets of TREC web track, OHSUMED, and a commercial web search engine show that with the use of the proposed query-level loss function we can significantly improve ranking accuracies. Furthermore, we found that it is difficult to extend the document-level loss functions to query-level loss functions.

asia information retrieval symposium | 2006

Query expansion with conceptnet and wordnet: an intrinsic comparison

Ming-Hung Hsu; Ming-Feng Tsai; Hsin-Hsi Chen

This paper compares the utilization of ConceptNet and WordNet in query expansion. Spreading activation selects candidate terms for query expansion from these two resources. Three measures including discrimination ability, concept diversity, and retrieval performance are used for comparisons. The topics and document collections in the ad hoc track of TREC-6, TREC-7 and TREC-8 are adopted in the experiments. The results show that ConceptNet and WordNet are complementary. Queries expanded with WordNet have higher discrimination ability. In contrast, queries expanded with ConceptNet have higher concept diversity. The performance of queries expanded by selecting the candidate terms from ConceptNet and WordNet outperforms that of queries without expansion, and queries expanded with a single resource.

international acm sigir conference on research and development in information retrieval | 2008

A study of learning a merge model for multilingual information retrieval

Ming-Feng Tsai; Yu-Ting Wang; Hsin-Hsi Chen

This paper proposes a learning approach for the merging process in multilingual information retrieval (MLIR). To conduct the learning approach, we also present a large number of features that may influence the MLIR merging process; these features are mainly extracted from three levels: query, document, and translation. After the feature extraction, we then use the FRank ranking algorithm to construct a merge model; to our knowledge, this practice is the first attempt to use a learning-based ranking algorithm to construct a merge model for MLIR merging. In our experiments, three test collections for the task of crosslingual information retrieval (CLIR) in NTCIR3, 4, and 5 are employed to assess the performance of our proposed method; moreover, several merging methods are also carried out for a comparison, including traditional merging methods, the 2-step merging strategy, and the merging method based on logistic regression. The experimental results show that our method can significantly improve merging quality on two different types of datasets. In addition to the effectiveness, through the merge model generated by FRank, our method can further identify key factors that influence the merging process; this information might provide us more insight and understanding into MLIR merging.

Information Retrieval | 2013

Mining subtopics from different aspects for diversifying search results

Chieh-Jen Wang; Yung-Wei Lin; Ming-Feng Tsai; Hsin-Hsi Chen

User queries to the Web tend to have more than one interpretation due to their ambiguity and other characteristics. How to diversify the ranking results to meet users’ various potential information needs has attracted considerable attention recently. This paper is aimed at mining the subtopics of a query either indirectly from the returned results of retrieval systems or directly from the query itself to diversify the search results. For the indirect subtopic mining approach, clustering the retrieval results and summarizing the content of clusters is investigated. In addition, labeling topic categories and concept tags on each returned document is explored. For the direct subtopic mining approach, several external resources, such as Wikipedia, Open Directory Project, search query logs, and the related search services of search engines, are consulted. Furthermore, we propose a diversified retrieval model to rank documents with respect to the mined subtopics for balancing relevance and diversity. Experiments are conducted on the ClueWeb09 dataset with the topics of the TREC09 and TREC10 Web Track diversity tasks. Experimental results show that the proposed subtopic-based diversification algorithm significantly outperforms the state-of-the-art models in the TREC09 and TREC10 Web Track diversity tasks. The best performance our proposed algorithm achieves is α-nDCG@5 0.307, IA-P@5 0.121, and α#-nDCG@5 0.214 on the TREC09, as well as α-nDCG@10 0.421, IA-P@10 0.201, and α#-nDCG@10 0.311 on the TREC10. The results conclude that the subtopic mining technique with the up-to-date users’ search query logs is the most effective way to generate the subtopics of a query, and the proposed subtopic-based diversification algorithm can select the documents covering various subtopics.

Information Processing and Management | 2011

Learning a merge model for multilingual information retrieval

Ming-Feng Tsai; Hsin-Hsi Chen; Yu-Ting Wang

This paper proposes a learning approach for the merging process in multilingual information retrieval (MLIR). To conduct the learning approach, we present a number of features that may influence the MLIR merging process. These features are mainly extracted from three levels: query, document, and translation. After the feature extraction, we then use the FRank ranking algorithm to construct a merge model. To the best of our knowledge, this practice is the first attempt to use a learning-based ranking algorithm to construct a merge model for MLIR merging. In our experiments, three test collections for the task of crosslingual information retrieval (CLIR) in NTCIR3, 4, and 5 are employed to assess the performance of our proposed method. Moreover, several merging methods are also carried out for a comparison, including traditional merging methods, the 2-step merging strategy, and the merging method based on logistic regression. The experimental results show that our proposed method can significantly improve merging quality on two different types of datasets. In addition to the effectiveness, through the merge model generated by FRank, our method can further identify key factors that influence the merging process. This information might provide us more insight and understanding into MLIR merging.

european conference on information retrieval | 2004

Identification of Relevant and Novel Sentences Using Reference Corpus

Hsin-Hsi Chen; Ming-Feng Tsai; Ming-Hung Hsu

The major challenging issue to determine the relevance and the novelty of sentences is the amount of information used in similarity computation among sentences. An information retrieval (IR) with reference corpus approach is proposed. A sentence is considered as a query to a reference corpus, and similarity is measured in terms of the weighting vectors of document lists ranked by IR systems. Two sentences are regarded as similar if they are related to the similar document lists returned by IR systems. A dynamic threshold setting method is presented. Besides IR with reference corpus, we also use IR systems to retrieve sentences from given sentences. The corpus-based approach with dynamic thresholds outperforms direct retrieval approach. The average F-measure of relevance and novelty detection using Okapi system was 0.212 and 0.207, 57.14% and 58.64% of human performance, respectively.

asia information retrieval symposium | 2004

Multilingual relevant sentence detection using reference corpus

Ming-Hung Hsu; Ming-Feng Tsai; Hsin-Hsi Chen

IR with reference corpus is one approach when dealing with relevant sentences detection, which takes the result of IR as the representation of query (sentence). Lack of information and language difference are two major issues in relevant detection among multilingual sentences. This paper refers to a parallel corpus for information expansion and translation, and introduces different representations, i.e. sentence-vector, document-vector and term-vector. Both sentence-aligned and document-aligned corpora, i.e., Sinorama corpus and HKSAR corpus, are used. The factors of aligning granularity, the corpus domain, the corpus size, the language basis, and the term selection strategy are addressed. The experiment results show that MRR 0.839 is achieved for similarity computation between multilingual sentences when larger finer grain parallel corpus of the same domain as test data is adopted. Generally speaking, the sentence-vector approach is superior to the term-vector approach when sentence-aligned corpus is employed. The document-vector approach is better than the term-vector approach if document-aligned corpus is used. Considering the language issue, Chinese basis is more suitable to English basis in our experiments. We also employ the translated TREC novelty test bed to evaluate the overall performance. The experimental results show that multilingual relevance detection has 80% of the performance of monolingual relevance detection. That indicates the feasibility of IR with reference corpus approach in relevant sentence detection.

international conference on machine learning | 2007