Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Pengpeng Zhao is active.

Publication


Featured researches published by Pengpeng Zhao.


knowledge science engineering and management | 2007

Ontology-based focused crawling of deep web sources

Wei Fang; Zhiming Cui; Pengpeng Zhao

Deep Web sources discovery is one of the critical steps toward the large-scale information integration. In this paper, we present Deep Web sources crawling based on ontology, an enhanced crawling methodology. This focused crawling method based on ontology of Deep Web sources avoids to download a large number of irrelevant pages. Evaluation showed that this new approach has promising results.


international symposium on information processing | 2008

From Wrapping to Knowledge: Domain Ontology Learning from Deep Web

Zhiming Cui; Pengpeng Zhao; Wei Fang; Chao Lin

The next generation of the Web, called Semantic Web, has to improve the Web with semantic page annotations to enable knowledge-level querying and searches. However, manual construction of these ontologies is a time consuming and difficult task. In this paper, we describe an automatic extraction method that learns domain ontologies for semantic web from deep web. Our approach first learns a base ontology from deep web query interfaces, then grows the current ontology by probing the sources and discovering additional concepts and instances from the result pages. We have evaluated our approach in several real-world domains. Preliminary results indicate that the proposed extraction method discovers concepts and instances with high accuracy.


pacific-asia conference on web mining and web-based application | 2009

Data Source Selection for Large-Scale Deep Web Data Integration

Xuefeng Xian; Pengpeng Zhao; Wei Fang; Jie Xin; Zhiming Cui

Deep web has been an important resource on the web due to its rich and high quality information, leading to emerging a new application area in data mining and integrates. There may be hundreds or thousands of data sources providing data of relevance to a particular domain on the web, So a primary challenge to large-scale deep web data integration is to determine in what order to user integrate candidate data sources. In this paper, we develop a most-benefit approach (MBA) for ordering candidate data sources for user integration. At the core of this approach is a utility function that quantifies the utility of a given the state of integration system; thus, we devise a utility function for integration system based on query result number. We show in practice how to efficiently apply MBA in concert with this utility function to order data sources. A detailed experimental evaluation on real datasets shows that the ordering of data sources produced by this MBA-based yields a integration system with a significantly higher utility than a wide range of other ordering strategies.


international conference on machine learning and cybernetics | 2009

Quality-based data source selection for web-scale Deep Web data integration

Xuefeng Xian; Pengpeng Zhao; Wei Fang; Jie Xin; Zhiming Cui

Deep Web has been an important resource on the web due to its rich and high quality information, leading to emerging a new application area in data mining and information retrieval and integrates. In webscale Deep Web data integration tasks, where there may be hundreds or thousands of data sources providing data of relevance to a particular domain, It must be inefficient to integrate all available Deep Web sources. This paper proposes a data source selection approach based on the quality of Deep Web source. It is used for automatic finding the highest quality set of Deep Web sources related to a particular domain, which is a premise for effective Deep Web data integration. The quality of data sources are assessed by evaluating quality dimensions represent the characteristics of Deep Web source. Experiments running on real Deep Web sources collected from the internet show that our provides an effective and scalable solution for selecting data sources for Deep Web data integration.


Frontiers of Computer Science in China | 2015

Active transfer learning of matching query results across multiple sources

Jie Xin; Zhiming Cui; Pengpeng Zhao; Tianxu He

Entity resolution (ER) is the problem of identifying and grouping different manifestations of the same real world object. Algorithmic approaches have been developed where most tasks offer superior performance under supervised learning. However, the prohibitive cost of labeling training data is still a huge obstacle for detecting duplicate query records from online sources. Furthermore, the unique combinations of noisy data with missing elements make ER tasks more challenging. To address this, transfer learning has been adopted to adaptively share learned common structures of similarity scoring problems between multiple sources. Although such techniques reduce the labeling cost so that it is linear with respect to the number of sources, its random sampling strategy is not successful enough to handle the ordinary sample imbalance problem. In this paper, we present a novel multi-source active transfer learning framework to jointly select fewer data instances from all sources to train classifiers with constant precision/recall. The intuition behind our approach is to actively label the most informative samples while adaptively transferring collective knowledge between sources. In this way, the classifiers that are learned can be both label-economical and flexible even for imbalanced or quality diverse sources. We compare our method with the state-of-the-art approaches on real-word datasets. Our experimental results demonstrate that our active transfer learning algorithm can achieve impressive performance with far fewer labeled samples for record matching with numerous and varied sources.


international conference on convergence information technology | 2007

A Hybrid Object Matching Method for Deep Web Information Integration

Pengpeng Zhao; Chao Lin; Wei Fang; Zhiming Cui

Object matching is a crucial step to integration of Deep Web sources. Existing methods suppose that record extraction and attribute segmentation are of high accuracy. But because of limitation of extraction techniques, information gained through the above methods is often incomplete. If we match object base on noisy and incomplete information, we can not achieve satisfactory performance. This paper proposes a hybrid object matching method, which considers structured and unstructured features and multi-level errors in extraction. We compare performance of the unstructured, structured and hybrid object matching models in our prototype system, which indicates that hybrid method has the highest performance.


asia-pacific web conference | 2009

Extension of OWL with Dynamic Fuzzy Logic

Zhiming Cui; Wei Fang; Xuefeng Xian; Shukui Zhang; Pengpeng Zhao

In recent years, ontology has played a major role in knowledge representation. Ontology languages are based on description logics. Though they are expressive enough, they cannot express and reason with fuzzy and dynamic knowledge on the Semantic Web. To deal with uncertain and dynamic knowledge on the Semantic Web and its applications, a new fuzzy extension of description logics,OWL and Ontology based on Dynamic fuzzy logic called the dynamic Description logics(DFDL), dynamic fuzzy Ontology(DFO) and dynamic fuzzy OWL (DFOWL) are presented. The syntax and semantics of DFDL, DFO and DFOWL are formally defined, and the forms of axioms and assertions are specified. The research indicates the DFOWL provides more expressive power for the Semantic Web, and overcomes the insufficiency of OWL as the ontology language for the Semantic Web.


international symposium on information processing | 2008

SDWS: Semantic Search for Deep Web Data

Wei Fang; Zhiming Cui; Pengyu Hu; Li Huang; Pengpeng Zhao

A lot of high quality and wealthy data are hidden in backend database and search engines can not index this page, which is called Deep Web. It is mostly accessible through query interfaces. SDWS, a semantic search engine for Deep Web is presented. We are studying and implementing semantic Web technology to the each process of Deep Web information integrated, and expertise in Deep Web discovering, annotating query results and integrating information. The novel approach promise better access to Deep Web.


advanced data mining and applications | 2008

Organizing Structured Deep Web by Clustering Query Interfaces Link Graph

Pengpeng Zhao; Li Huang; Wei Fang; Zhiming Cui


enterprise information systems and web technologies | 2007

Deep Web Sources Focused Crawling.

Pengpeng Zhao; Chao Lin; Ling Gao; Zhiming Cui

Collaboration


Dive into the Pengpeng Zhao's collaboration.

Researchain Logo
Decentralizing Knowledge