Zhaohui Wu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhaohui Wu is active.

Explore More

Publication

Featured researches published by Zhaohui Wu.

knowledge discovery and data mining | 2010

Efficient deep web crawling using reinforcement learning

Lu Jiang; Zhaohui Wu; Qian Feng; Jun Liu; Qinghua Zheng

Deep web refers to the hidden part of the Web that remains unavailable for standard Web crawlers. To obtain content of Deep Web is challenging and has been acknowledged as a significant gap in the coverage of search engines. To this end, the paper proposes a novel deep web crawling framework based on reinforcement learning, in which the crawler is regarded as an agent and deep web database as the environment. The agent perceives its current state and selects an action (query) to submit to the environment according to Q-value. The framework not only enables crawlers to learn a promising crawling strategy from its own experience, but also allows for utilizing diverse features of query keywords. Experimental results show that the method outperforms the state of art methods in terms of crawling capability and breaks through the assumption of full-text search implied by existing methods.

Information Systems | 2013

Learning to crawl deep web

Qinghua Zheng; Zhaohui Wu; Xiaocheng Cheng; Lu Jiang; Jun Liu

Deep web or hidden web refers to the hidden part of the Web (usually residing in structured databases) that remains unavailable for standard Web crawlers. Obtaining content of the deep web is challenging and has been acknowledged as a significant gap in the coverage of search engines. The paper proposes a novel deep web crawling framework based on reinforcement learning, in which the crawler is regarded as an agent and deep web database as the environment. The agent perceives its current state and selects an action (query) to submit to the environment (the deep web database) according to Q-value. While the existing methods rely on an assumption that all deep web databases possess full-text search interfaces and solely utilize the statistics (TF or DF) of acquired data records to generate the next query, the reinforcement learning framework not only enables crawlers to learn a promising crawling strategy from its own experience, but also allows for utilizing diverse features of query keywords. Experimental results show that the method outperforms the state of art methods in terms of crawling capability and relaxes the assumption of full-text search implied by existing methods.

web intelligence | 2009

Learning Deep Web Crawling with Diverse Features

Lu Jiang; Zhaohui Wu; Qinghua Zheng; Jun Liu

The key to Deep Web crawling is to submit promising keywords to query form and retrieve Deep Web content efficiently. To select keywords, existing methods make a decision based on keywords’ statistic information deriving from TF and DF in local acquired records, thus work well only in textual databases providing full text search interfaces, whereas not well in structured databases of multi-attribute or field-restricted search interfaces. This paper proposes a novel Deep Web crawling method. Keywords are encoded as a tuple by its linguistic, statistic and HTML features so that a harvest rate evaluation model can be learned from the issued keywords for the un-issued in future. The method breaks through the assumption of plain-text search made by existing methods. Experimental results show that the method outperforms the state of the art methods.

very large data bases | 2011

Mining learning-dependency between knowledge units from text

Jun Liu; Lu Jiang; Zhaohui Wu; Qinghua Zheng; Yanan Qian

Identifying learning-dependency among the knowledge units (KU) is a preliminary requirement of navigation learning. Methods based on link mining lack the ability of discovering such dependencies among knowledge units that are arranged in a linear way in the text. In this paper, we propose a method of mining the learning- dependencies among the KU from text document. This method is based on two features that we found and studied from the KU and the learning-dependencies among them. They are the distributional asymmetry of the domain terms and the local nature of the learning-dependency, respectively. Our method consists of three stages, (1) Build document association relationship by calculating the distributional asymmetry of the domain terms. (2) Generate the candidate KU-pairs by measuring the locality of the dependencies. (3) Use classification algorithm to identify the learning-dependency between KU-pairs. Our experimental results show that our method extracts the learning-dependency efficiently and reduces the computational complexity.

computer supported cooperative work in design | 2009

ETM Toolkit: A development tool based on Extended Topic Map

Lu Jiang; Jun Liu; Zhaohui Wu; Qinghua Zheng; Yanan Qian

By research on Topic Map standard, the Extended Topic Map (ETM) is proposed as a novel model for organization and management of the massive knowledge resources in E-learning. Based on the model, an Extended Topic Map Toolkit is designed and implemented, which allows for operations as exploration, search, consistency check and etc. The ETM Toolkit not only provides learners with visual navigation and search on massive E-learning resources, but also offers an efficient way for instructors to build the shareable and reusable domain knowledge. By ETM Toolkit, an extended topic map with a certain scale on Computer Networks has been built and is currently available for students in our university.

intelligent information systems | 2011

Deep Web adaptive crawling based on minimum executable pattern

Jun Liu; Lu Jiang; Zhaohui Wu; Qinghua Zheng

The key to Deep Web Crawling is to submit valid input values to a query form and retrieve Deep Web content efficiently. In the literature, related work focus only on generic text boxes or entire query forms, causing the problem of “data islands” or inferior validity of query submission. This paper proposes the concept of Minimum Executable Pattern (MEP), a minimal combination of elements in a query form that can conduct a successful query, and then presents a MEPGeneration method and a MEP-based Deep Web adaptive crawling method. The query form is parsed and partitioned into MEP set, and then local-optimal queries are generated by choosing a MEP in the MEP set and a keyword vector of the MEP. Furthermore, the crawler can make a decision on its termination to balance the trade-off between high coverage of the content and resource consumption. The adoption of MEP is expected to improve the validity of query submission, and adaptive selection of multiple MEPs shows good effect for overcoming the problem of “data islands”. We present a set of experiments to validate the effectiveness of the proposed method. Experimental results show that our method outperforms the state of art methods in terms of query capability and applicability, and on average, it achieves good coverage by issuing only a few hundred queries.

acm symposium on applied computing | 2010

Mining preorder relation between knowledge units from text

Jun Liu; Lu Jiang; Zhaohui Wu; Qinghua Zheng; Yanan Qian

Preorder relation between Knowledge Units (KU) is the precondition for navigation learning. Although possible solutions, existing link mining methods lack the ability of mining preorder relation between knowledge units which are linearly arranged in text. Through the analysis of sample data, we discovered and studied two characteristics of knowledge units: the locality of preorder relation and the distribution asymmetry of domain terms. Based on these two characteristics, a method is presented for mining preorder relation between knowledge units from text documents, which proceeds in three stages. Firstly, the associations between text documents are established according to the distribution asymmetry of domain terms. Secondly, candidate KU-pairs are generated according to the locality of preorder relation. Finally, the preorder relations between KU-pairs are identified by using classification methods. The experimental results show the method can efficiently extract the preorder relation, and reduce the computational complexity caused by the quadratic problem of link mining.

computer supported cooperative work in design | 2009

A collaborative knowledge construction system design for massive knowledge resources

Qinghua Zheng; Zhaohui Wu; Lu Jiang; Jun Liu

Aiming at deficiencies of existing knowledge resources management systems, we designed a new collaborative knowledge construction system for massive knowledge resources. By collaborative knowledge building in the following three phases: acquisition of knowledge factors, generation of the local extended topic maps and integration of global extended topic map, we realized a conjunction of concept level and knowledge element level in different knowledge granularity, as well as an integration of topic map in distributed nodes. Through those methods we achieved to establish the global extended topic map of a specific domain from knowledge resources.

Archive | 2010