King-Lup Liu
DePaul University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by King-Lup Liu.
ACM Computing Surveys | 2002
Weiyi Meng; Clement T. Yu; King-Lup Liu
Frequently a users information needs are stored in the databases of multiple search engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search engines and identify useful documents from the returned results. To support unified access to multiple search engines, a metasearch engine can be constructed. When a metasearch engine receives a query from a user, it invokes the underlying search engines to retrieve useful information for the user. Metasearch engines have other benefits as a search tool such as increasing the search coverage of the Web and improving the scalability of the search. In this article, we survey techniques that have been proposed to tackle several underlying challenges for building a good metasearch engine. Among the main challenges, the database selection problem is to identify search engines that are likely to return useful documents to a given query. The document selection problem is to determine what documents to retrieve from each identified search engine. The result merging problem is to combine the documents returned from multiple search engines. We will also point out some problems that need to be further researched.
international conference on data engineering | 1999
Weiyi Meng; King-Lup Liu; Clement T. Yu; Wensheng Wu; N. Naphtali Rishe
In this paper, we present a statistical method to estimate the usefulness of a search engine for any given query. The estimates can be used by a metasearch engine to choose local search engines to invoke. For a given query, the usefulness of a search engine in this paper is defined to be a combination of the number of documents in the search engine that are sufficiently similar to the query and the average similarity of these documents. Experimental results indicate that the proposed estimation method is quite accurate.
conference on information and knowledge management | 2001
King-Lup Liu; Adrain Santoso; Clement T. Yu; Weiyi Meng
Given a large number of search engines on the Internet, it is difficult for a person to determine which search engines could serve his/her information needs. A common solution is to construct a metasearch engine on top of the search engines. Upon receiving a user query, the metasearch engine sends it to those underlying search engines which are likely to return the desired documents for the query. The selection algorithm used by a metasearch engine to determine whether a search engine should be sent the query typically makes the decision based on the search-engine representative, which contains characteristic information about the database of a search engine. However, an underlying search engine may not be willing to provide the needed information to the metasearch engine. This paper shows that the needed information can be estimated from an uncooperative search engine with good accuracy. Two pieces of information which permit accurate search engine selection are the number of documents indexed by the search engine and the maximum weight of each term. In this paper, we present techniques for the estimation of these two pieces of information.
international conference on management of data | 2001
Clement T. Yu; Weiyi Meng; Wensheng Wu; King-Lup Liu
Linkages among documents have a significant impact on the importance of documents, as it can be argued that important documents are pointed to by many documents or by other important documents. Metasearch engines can be used to facilitate ordinary users for retrieving information from multiple local sources (text databases). There is a search engine associated with each database. In a large-scale metasearch engine, the contents of each local database is represented by a representative. Each user query is evaluated against he set of representatives of all databases in order to determine the appropriate databases (search engines) to search (invoke) In previous word, the linkage information between documents has not been utilized in determining the appropriate databases to search. In this paper, such information is employed to determine the degree of relevance of a document with respect to a given query. Specifically, the importance (rank) of each document as determined by the linkages is integrated in each database representative to facilitate the selection of databases for each given query. We establish a necessary and sufficient condition to rank databases optimally, while incorporating the linkage information. A method is provided to estimate the desired quantities stated in the necessary and sufficient condition. The estimation method runs in time linearly proportional to the number of query terms. Experimental results are provided to demonstrate the high retrieval effectiveness of the method.
conference on information and knowledge management | 1999
Clement T. Yu; Weiyi Meng; King-Lup Liu; Wensheng Wu; Naphtali Rishe
Metasearch engines can be used to facilitate ordinary users for retrieving information from multiple local sources (text databases). In a metasearch engine, the contents of each local database is represented by a representative. Each user query is evaluated against the set of representatives of all databases in order to determine the appropriate databases to search. When the number of databases is very large, say in the order of tens of thousands or more, then a traditional metasearch engine may become inefficient as each query needs to be evaluated against too many database representatives. Furthermore, the storage requirement on the site containing the metasearch engine can be very large. In this paper, we propose to use a hierarchy of database representatives to improve the efficiency. We provide an algorithm to search the hierarchy. We show that the retrieval effectiveness of our algorithm is the same as that of evaluating the user query against all database representatives. We also show that our algorithm is efficient. In addition, we propose an alternative way of allocating representatives to sites so that the storage burden on the site containing the metasearch engine is much reduced.
web information systems engineering | 2005
Yiyao Lu; Weiyi Meng; Liangcai Shu; Clement T. Yu; King-Lup Liu
Result merging is a key component in a metasearch engine. Once the results from various search engines are collected, the metasearch system merges them into a single ranked list. The effectiveness of a metasearch engine is closely related to the result merging algorithm it employs. In this paper, we investigate a variety of resulting merging algorithms based on a wide range of available information about the retrieved results, from their local ranks, their titles and snippets, to the full documents of these results. The effectiveness of these algorithms is then compared experimentally based on 50 queries from the TREC Web track and 10 most popular general-purpose search engines. Our experiments yield two important results. First, simple result merging strategies can outperform Google. Second, merging based on the titles and snippets of retrieved results can outperform that based on the full documents.
IEEE Transactions on Knowledge and Data Engineering | 2002
Clement T. Yu; King-Lup Liu; Weiyi Meng; Zonghuan Wu; Naphtali Rishe
This paper presents a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, the contents of databases are indicated approximately by database representatives. Databases are ranked using their representatives with respect to the given query. We provide a necessary and sufficient condition to rank the databases optimally. In order to satisfy this condition, we provide three estimation methods. One estimation method is intended for short queries; the other two are for all queries. Second, we provide an algorithm, OptDocRetrv, to retrieve documents from the databases according to their rank and in a particular way. We show that if the databases containing the n most similar documents for a given query are ranked ahead of other databases, our methodology will guarantee the retrieval of the n most similar documents for the query. When the number of databases is large, we propose to organize database representatives into a hierarchy and employ a best-search algorithm to search the hierarchy. It is shown that the effectiveness of the best-search algorithm is the same as that of evaluating the user query against all database representatives.
IEEE Transactions on Knowledge and Data Engineering | 2002
King-Lup Liu; Clement T. Yu; Weiyi Meng; Wensheng Wu; Naphtali Rishe
Searching desired data on the Internet is one of the most common ways the Internet is used. No single search engine is capable of searching all data on the Internet. The approach that provides an interface for invoking multiple search engines for each user query has the potential to satisfy more users. When the number of search engines under the interface is large, invoking all search engines for each query is often not cost effective because it creates unnecessary network traffic by sending the query to a large number of useless search engines and searching these useless search engines wastes local resources. The problem can be overcome if the usefulness of every search engine with respect to each query can be predicted. We present a statistical method to estimate the usefulness of a search engine for any given query. For a given query, the usefulness of a search engine in this paper is defined to be a combination of the number of documents in the search engine that are sufficiently similar to the query and the average similarity of these documents. Experimental results indicate that our estimation method is much more accurate than existing methods.
Proceedings IEEE Forum on Research and Technology Advances in Digital Libraries | 1999
Clement T. Yu; King-Lup Liu; Wensheng Wu; Weiyi Meng; Naphtali Rishe
We present a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, databases are ranked in a certain order. Next, documents are retrieved from the databases according to the order and in a particular way. If the databases containing the n most similar documents for a given query can be ranked ahead of other databases, the methodology will guarantee the retrieval of the n most similar documents for the query. A statistical method is provided to identify databases, each of which is estimated to contain at least one of the n most similar documents. Then, a number of strategies are presented to retrieve documents from the identified databases. Experimental results are given to illustrate the relative performance of different strategies.
cooperative information systems | 1999
Weiyi Meng; Clement T. Yu; King-Lup Liu
As the number of text retrieval systems (search engines) grows rapidly on the World Wide Web, there is an increasing need to build search brokers (metasearch engines) on top of them. Often, the task of building an effective and efficient metasearch engine is hindered by the heterogeneities among the underlying local search engines. We first analyze the impact of various heterogeneities on building a metasearch engine. We then present some techniques that can be used to detect the most prominent heterogeneities among multiple search engines. Applications of utilizing the detected heterogeneities in building better metasearch engines are provided.