Jianhan Zhu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jianhan Zhu is active.

Explore More

Publication

Featured researches published by Jianhan Zhu.

wissensmanagement | 2005

ESpotter: adaptive named entity recognition for web browsing

Jianhan Zhu; Victoria S. Uren; Enrico Motta

Browsing constitutes an important part of the user information searching process on the Web. In this paper, we present a browser plug-in called ESpotter, which recognizes entities of various types on Web pages and highlights them according to their types to assist user browsing. ESpotter uses a range of standard named entity recognition techniques. In addition, a key new feature of ESpotter is that it addresses the problem of multiple domains on the Web by adapting lexicon and patterns to these domains.

ACM Transactions on Internet Technology | 2004

PageCluster: Mining conceptual link hierarchies from Web log files for adaptive Web site navigation

Jianhan Zhu; Jun Hong; John G. Hughes

User traversals on hyperlinks between Web pages can reveal semantic relationships between these pages. We use user traversals on hyperlinks as weights to measure semantic relationships between Web pages. On the basis of these weights, we propose a novel method to put Web pages on a Web site onto different conceptual levels in a link hierarchy. We develop a clustering algorithm called PageCluster, which clusters conceptually-related pages on each conceptual level of the link hierarchy based on their in-link and out-link similarities. Clusters are then used to construct a conceptual link hierarchy, which is visualized in a prototype called Online Navigation Explorer (ONE) for adaptive Web site navigation. Our experiments show that our method can put Web pages onto conceptual levels of a link hierarchy more accurately than both the breadth-first search method and the shortest-weighted-path method, and PageCluster can cluster conceptually-related pages more accurately than the bibliographic analysis method. Our user study also shows that the conceptual link hierarchy visualized in ONE can help users find information more effectively and efficiently as the task of finding information becomes less specific and involves more Web pages on multiple conceptual levels.

Knowledge and Information Systems | 2010

Integrating multiple document features in language models for expert finding

Jianhan Zhu; Xiangji Huang; Dawei Song; Stefan M. Rüger

We argue that expert finding is sensitive to multiple document features in an organizational intranet. These document features include multiple levels of associations between experts and a query topic from sentence, paragraph, up to document levels, document authority information such as the PageRank, indegree, and URL length of documents, and internal document structures that indicate the experts’ relationship with the content of documents. Our assumption is that expert finding can largely benefit from the incorporation of these document features. However, existing language modeling approaches for expert finding have not sufficiently taken into account these document features. We propose a novel language modeling approach, which integrates multiple document features, for expert finding. Our experiments on two large scale TREC Enterprise Track datasets, i.e., the W3C and CSIRO datasets, demonstrate that the natures of the two organizational intranets and two types of expert finding tasks, i.e., key contact finding for CSIRO and knowledgeable person finding for W3C, influence the effectiveness of different document features. Our work provides insights into which document features work for certain types of expert finding tasks, and helps design expert finding strategies that are effective for different scenarios. Our main contribution is to develop an effective formal method for modeling multiple document features in expert finding, and conduct a systematic investigation of their effects. It is worth noting that our novel approach achieves better results in terms of MAP than previous language model based approaches and the best automatic runs in both the TREC2006 and TREC2007 expert search tasks, respectively.

web intelligence | 2005

Mining Web Data for Competency Management

Jianhan Zhu; Alexandre Leopoldo Gonçalves; Victoria S. Uren; Enrico Motta; Roberto Carlos dos Santos Pacheco

We present CORDER (Community Relation Discovery by named Entity Recognition) an un-supervised machine learning algorithm that exploits named entity recognition and co-occurrence data to associate individuals in an organization with their expertise and associates. We discuss the problems associated with evaluating unsupervised learners and report our initial evaluation experiments.

european semantic web conference | 2006

An infrastructure for acquiring high quality semantic metadata

Yuangui Lei; Marta Sabou; Vanessa Lopez; Jianhan Zhu; Victoria S. Uren; Enrico Motta

Because metadata that underlies semantic web applications is gathered from distributed and heterogeneous data sources, it is important to ensure its quality (i.e., reduce duplicates, spelling errors, ambiguities). However, current infrastructures that acquire and integrate semantic data have only marginally addressed the issue of metadata quality. In this paper we present our metadata acquisition infrastructure, ASDI, which pays special attention to ensuring that high quality metadata is derived. Central to the architecture of ASDI is a verification engine that relies on several semantic web tools to check the quality of the derived data. We tested our prototype in the context of building a semantic web portal for our lab, KMi. An experimental evaluation comparing the automatically extracted data against manual annotations indicates that the verification engine enhances the quality of the extracted semantic metadata.

web age information management | 2006

LRD: latent relation discovery for vector space expansion and information retrieval

Alexandre Leopoldo Gonçalves; Jianhan Zhu; Dawei Song; Victoria S. Uren; Roberto Carlos dos Santos Pacheco

In this paper, we propose a text mining method called LRD (latent relation discovery), which extends the traditional vector space model of document representation in order to improve information retrieval (IR) on documents and document clustering. Our LRD method extracts terms and entities, such as person, organization, or project names, and discovers relationships between them by taking into account their co-occurrence in textual corpora. Given a target entity, LRD discovers other entities closely related to the target effectively and efficiently. With respect to such relatedness, a measure of relation strength between entities is defined. LRD uses relation strength to enhance the vector space model, and uses the enhanced vector space model for query based IR on documents and clustering documents in order to discover complex relationships among terms and entities. Our experiments on a standard dataset for query based IR shows that our LRD method performed significantly better than traditional vector space model and other five standard statistical methods for vector expansion.

international conference on user modeling, adaptation, and personalization | 2001

Using Markov Chains for Structural Link Prediction in Adaptive Web Sites

Jianhan Zhu

My research investigates into using Markov chains to make link prediction and the transition matrix derived from Markov chains to acquire structural knowledge about Web sites. The structural knowledge is acquired in the form of three types of clusters: hierarchical clusters, reference clusters, and grid clusters. The predicted Web pages and acquired Web structures are further integrated to assist Web users in their navigation in the Web site.

Focused Access to XML Documents | 2008

Integrating Document Features for Entity Ranking

Jianhan Zhu; Dawei Song; Stefan M. Rüger

The Knowledge Media Institute of the Open University participated in the entity ranking and entity list completion tasks of the Entity Ranking Track in INEX 2007. In both the entity ranking and entity list completion tasks, we have considered document features in addition to a basic document content based relevance model. These document features include categorizations of documents, relevance of category names to the query, and hierarchical relations between categories. Furthermore, based on our TREC2006 and 2007 expert search approach, we applied a co-occurrence based entity association discovery model to the two tasks based on the assumption that relevant entities often co-occur with query terms or given relevant entities in documents. Our initial experimental results show that, by considering the predefined category, its children and grandchildren in the document content based relevance model, the performance of our entity ranking approach can be significantly improved. Consideration of the predefined categorys parents, a category name based relevance model, and the co-occurrence model is not shown to be helpful in entity ranking and list completion, respectively.

acm conference on hypertext | 2001

PageRate: counting Web users' votes

Jianhan Zhu; Jun Hong; John G. Hughes

We propose a PageRate method to give Web pages on a Web site ratings based on the Web link structure and user usage data, which are both recorded in the Web log files. The method is an improvement over PageRank [1, 6]. PageRate can be used to objectively evaluate the importance of pages. A PageClustering algorithm is proposed to cluster Web pages with similar incoming links and ratings. The results are used to integrate with search results returned by search engines.

web information systems engineering | 2000

Online Web mining transactions association rules using frame metadata model

Joseph Fong; John G. Hughes; Jianhan Zhu

Introduces a frame metadata model to facilitate the continuous association rules of Web transactions. A new set of association rules can be derived with the updating of the Web log file by the Web transactions in the frame metadata model. This model consists of two types of classes: static classes and active classes. The static classes describe the Web transactions of the association rule table. The active classes are event-driven, obtaining Web transactions when invoked by a certain event. Whenever an update occurs in the existing Web transactions in the Web log file, a corresponding update is invoked by an event attribute in the method class which computes the association rules continuously. The result is active Web mining that is capable of deriving association rules of Web transactions continuously or incrementally using the frame metadata model.

Explore More