Shiren Ye
National University of Singapore
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shiren Ye.
Information Processing and Management | 2007
Shiren Ye; Tat-Seng Chua; Min-Yen Kan; Long Qiu
We argue that the quality of a summary can be evaluated based on how many concepts in the original document(s) that can be preserved after summarization. Here, a concept refers to an abstract or concrete entity or its action often expressed by diverse terms in text. Summary generation can thus be considered as an optimization problem of selecting a set of sentences with minimal answer loss. In this paper, we propose a document concept lattice that indexes the hierarchy of local topics tied to a set of frequent concepts and the corresponding sentences containing these topics. The local topics will specify the promising sub-spaces related to the selected concepts and sentences. Based on this lattice, the summary is an optimized selection of a set of distinct and salient local topics that lead to maximal coverage of concepts with the given number of sentences. Our summarizer based on the concept lattice has demonstrated competitive performance in Document Understanding Conference 2005 and 2006 evaluations as well as follow-on tests.
international conference on computational linguistics | 2002
Shiren Ye; Tat-Seng Chua; Liu Jimin
Chinese NE (Named Entity) recognition is a difficult problem because of the uncertainty in word segmentation and flexibility in language structure. This paper proposes the use of a rationality model in a multi-agent framework to tackle this problem. We employ a greedy strategy and use the NE rationality model to evaluate and detect all possible NEs in the text. We then treat the process of selecting the best possible NEs as a multi-agent negotiation problem. The resulting system is robust and is able to handle different types of NE effectively. Our test on the MET-2 test corpus indicates that our system is able to achieve high F1 values of above 92% on all NE types.
international joint conference on natural language processing | 2009
Shiren Ye; Tat-Seng Chua; Jie Lu
Wikipedia provides a wealth of knowledge, where the first sentence, infobox (and relevant sentences), and even the entire document of a wiki article could be considered as diverse versions of summaries (definitions) of the target topic. We explore how to generate a series of summaries with various lengths based on them. To obtain more reliable associations between sentences, we introduce wiki concepts according to the internal links in Wikipedia. In addition, we develop an extended document concept lattice model to combine wiki concepts and non-textual features such as the outline and infobox. The model can concatenate representative sentences from non-overlapping salient local topics for summary generation. We test our model based on our annotated wiki articles which topics come from TREC-QA 2004--2006 evaluations. The results show that the model is effective in summarization and definition QA.
web intelligence | 2003
Shiren Ye; Tat-Seng Chua; Jeremy R. Kei
One of the most frequent Web surfing tasks is to search for names of persons and organizations. Such names are often not distinctive, commonly occurring, and nonunique. Thus, a single name may be mapped to several entities. We describe a methodology to cluster the Web pages returned by the search engine so that pages belonging to different entities are clustered into different groups. The algorithm uses a combination of named entities, link-based and structure-based information as features to partition the document set into direct and indirect pages using a decision model. It then uses the distinct direct pages as seeds to cluster the document set into different clusters. The algorithm has been found to be effective for Web-based applications.
web intelligence | 2004
Shiren Ye; Tat-Seng Chua
This paper presents an automated approach to detect and partition data objects or product description from complex Web pages. First, we derive the common page structure by comparing similar pages, and then identify data region covering the descriptions of data objects. Second, we partition the nodes belonging to different data objects in the data region and construct the self-explainable XML output files. The experiments indicate that our technique is effective.
International Journal on Semantic Web and Information Systems | 2007
Shiren Ye; Tat-Seng Chua
IntroductIon Ontology integration is one of the most important research topics in knowledge integration and knowledge reconciliation. It has been widely applied in knowledge engineering, database and data warehouse, e-commerce as well as semantic Web (Dill, Eiron, Gibson, Gruhl & Guha, 2003; Maedche, 2002).�s more and more organizations in industry, academia and government publish their information (such as services and products) on the Web, how to add semantic annotation for machine-access and thus enhance the interoperability among these autonomous, distributed, and semantically heterogeneous information sources becomes more urgent. Usually, it is feasible to utilize weighted string-matching method to handle the explicit Automatically Integrating Heterogeneous ontologies from structured Web Pages
international conference on asian digital libraries | 2002
Shiren Ye; Tat-Seng Chua; Jimin Liu; Jeremy R. Kei
Information extraction on the Web permits users to retrieve specific information on a person or an organization. As names are non-unique, the same name may be mapped to multiple entities. The aim of this paper is to describe an algorithm to cluster Web pages returned by search engines so that pages belonging to different entities are clustered into different groups. The algorithm uses named entities as the features to divide the document set into direct and indirect pages. It then uses distinct direct pages as seeds of clusters to group indirect pages into different clusters. The algorithm has been found to be effective for Web-based applications.
Archive | 2005
Shiren Ye; Long Qiu; Tat-Seng Chua; Min-Yen Kan
IEEE Transactions on Knowledge and Data Engineering | 2006
Shiren Ye; Tat-Seng Chua
Workshop on Materialized | 2007
Ziheng Lin; Tat-Seng Chua; Min-Yen Kan; Wee Sun Lee; Long Qiu; Shiren Ye