Meng Xiaofeng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Meng Xiaofeng is active.

Explore More

Publication

Featured researches published by Meng Xiaofeng.

International Workshop on Challenges in Web Information Retrieval and Integration | 2005

Postal Address Detection fromWeb Documents

Lin Can; Zhang Qian; Meng Xiaofeng; Liu Wenyin

An approach to postal address detection from Web pages is proposed. The Web pages are first segmented into text blocks based on their visual similarity. The text content in each block undergoes the recognition process, which employs a syntactic approach. The grammars of almost all possible patterns of postal addresses are built for this purpose. The results of our preliminary experiments on 44 Web pages with 56 true addresses show that our approach can detect the postal addresses with a high precision (89.3%) and a low false alarms rate (3.8%)

Journal of Computer Science and Technology | 2002

Data extraction from the web based on pre-defined schema

Meng Xiaofeng; Lu Hongjun; Wang Haiyan; Gu Mingzhe

With the development of the Internet, the World Wide Web has become an invaluable information source for most organizations. However, most documents available from the Web are in HTML form which is originally designed for document formatting with little consideration of its contents. Effectively extracting data from such documents remains a non-trivial task. In this paper, we present a schema-guided approach to extracting data from HTML pages. Under the approach, the user defines a schema specifying what to be extracted and provides sample mappings between the schema and the HTML page. The system will induce the mapping rules and generate a wrapper that takes the HTML page as input and produces the required data in the form of XML conforming to the user-defined schema. A prototype system implementing the approach has been developed. The preliminary experiments indicate that the proposed semi-automatic approach is not only easy to use but also able to produce a wrapper that extracts required data from inputted pages with high accuracy.With the development of the Internet, the World Wide Web has become an invaluable information source for most organizations. However, most documents available from the Web are in HTML form which is originally designed for document formatting with little consideration of its contents. Effectively extracting data from such documents remains a nontrivial task. In this paper, we present a schema-guided approach to extracting data from HTML pages. Under the approach, the user defines a schema specifying what to be extracted and provides sample mappings between the schema and the HTML page. The system will induce the mapping rules and generate a wrapper that takes the HTML page as input and produces the required data in the form of XML conforming to the user-defined schema. A prototype system implementing the approach has been developed. The preliminary experiments indicate that the proposed semi-automatic approach is not only easy to use but also able to produce a wrapper that extracts required data from inputted pages with high accuracy.

Wuhan University Journal of Natural Sciences | 2006

A Deep Web Data Integration System for Job Search

Liu Wei; Li Xian; Ling Yan-Yan; Zhang Xiaoyu; Meng Xiaofeng

With the rapid development of Web, there are more and more Web databases available for users to access. At the same time, job searchers often have difficulties in first finding the right sources and then querying over them, providing such an integrated job search system over Web databases has become a Web application in high demand. Based on such consideration, we build a deep Web data integration system that supports unified access for users to multiple job Web sites as a job meta-search engine. In this paper, the architecture of the system is given first, and the key components in the system are introduced.

Journal of Computer Science and Technology | 2002

A transactional asynchronous replication scheme for mobile database systems

Ding Zhiming; Meng Xiaofeng; Wang Shan

In mobile database systems, mobility of users has a significant impact on data replication. As a result, the various replica control protocols that exist today in traditional distributed and multidatabase environments are no longer suitable. To solve this problem, a new mobile database replication scheme, the Transaction-Level Result-Set Propagation (TLRSP) model, is put forward in this paper. The conflict detection and resolution strategy based on TLRSP is discussed in detail, and the implementation algorithm is proposed. In order to compare the performance of the TLRSP model with that of other mobile replication schemes, we have developed a detailed simulation model. Experimental results show that the TLRSP model provides an efficient support for replicated mobile database systems by reducing reprocessing overhead and maintaining database consistency.In mobile database systems, mobility of users has a significant impact on data replication. As a result, the various replica control protocols that exist today in traditional distributed and multidatabase environments are no longer suitable. To solve this problem, a new mobile database replication scheme, the Transaction-Level Result-Set Propagation (TLRSP) model, is put forward in this paper. The conflict detection and resolution strategy based on TLRSP is discussed in detail, and the implementation algorithm is proposed. In order to compare the performance of the TLRSP model with that of other mobile replication schemes, we have developed a detailed simulation model. Experimental results show that the TLRSP model provides an efficient support for replicated mobile database systems by reducing reprocessing overhead and maintaining database consistency.

Wuhan University Journal of Natural Sciences | 2007

TwigStack~+:Holistic Twig Join Pruning Using Extended Solution Extension

Zhou Junfeng; Xie Min; Meng Xiaofeng

XML has been used extensively in many applications as a de facto standard for information representation and exchange over internet. Huge volumes of data are organized or exported in tree-structured form and the desired information can be got by traversing the whole tree structure using a twig pattern query. A new definition, Extended Solution Extension, is proposed in this paper to check the usefulness of an element from both forward and backward directions. Then a novel Extended Solution Extension based algorithm, TwigStack+, is also proposed to reduce the query processing cost, simply because it can check whether other elements can be processed together with the current one. Compared with existing methods, query evaluation cost can be largely reduced. The experimental results on various datasets indicate that the proposed algorithm performs significantly better than the existing ones.

Wuhan University Journal of Natural Sciences | 2006

A survey of Web information technology and application

Meng Xiaofeng; Xu Baowen; Liu Qing; Yu Ge; Shen Jun-yi; Lu Zhengding; He Yanxiang

因特网的令人吃惊的生长，结合了网技术的快速的开发和网信息系统和申请的越来越多的出现，把大机会和大挑战带到我们。因为网为巨大的用户人口提供跨平台的通用存取给资源，甚至更大的需求被请求有效地管理数据和服务。网信息系统和应用被对设计方法论使运动适应更高级的用户数字的管理和网的需要驾驶、分布式、分散综合的网应用的管理应用和系统，以用户为中心的取向，和预备。那么怎么开发并且设法信息 systemsandapplications 遇见了的网某伟人质问，它与古典软件相比。并且信息检索，推广，安全和管理的增加的困难和复杂性，在费力的时间表和运作的环境下面正在要求另一注意和行动。会议(WISA2005 ) 的原则是召集为网信息系统讨论技术的网技术，信息系统，电子政府和办公室自动化的成员。并且 WISA2005 是网 InformationSystems 和应用程序上的第二个会议，由中国计算机联盟(CCF ) 的 OA & 电子政府社会组织了。

Wuhan University Journal of Natural Sciences | 2004

Orientl: A strategy of Web information integration

Yi Lei; Meng Xiaofeng; Hu Dong-dong; Yu Jun-tao; Li Yu

We propose the OrientI approach for effectively building flexible applications on information integration. The system presents a fully visual development environment to build applications. With OrientI system, a user only needs to concentrate on the composition of components for building the InterPlan, and the detailed underlying operations and data streams are invisible to the user. A prototype system has been implemented and has partially proved the convenience brought by the OrientI approach.

Journal of Computer Science and Technology | 2000

Word Segmentation Based on Database Semantics in NChiql

Meng Xiaofeng; Liu Shuang; Wang Shan

In this paper a novel word-segmentation algorithm is presented to delimit words in Chinese natural language queries in NChiql system, a Chinese natural language query interface to databases. Although there are sizable literatures on Chinese segmentation, they cannot satisfy particular requirements in this system. The novel word-segmentation algorithm is based on the database semantics, namely Semantic Conceptual Model (SCM) for specific domain knowledge. Based on SCM, the segmenter labels the database semantics to words directly, which eases the disambiguation and translation (from natural language to database query) in NChiql.

Wuhan University Journal of Natural Sciences | 2007

Query Translation on the Fly in Deep Web Integration

Jiang Fangjiao; Jia Linlin; Meng Xiaofeng

To facilitate users to access the desired information, many researches have dedicated to the Deep Web (i.e. Web databases) integration. We focus on query translation which is an important part of the Deep Web integration. Our aim is to construct automatically a set of constraints mapping rules so that the system can translate the query from the integrated interface to the Web database interfaces based on them. We construct a concept hierarchy for the attributes of the query interfaces, especially, store the synonyms and the types (e.g. Number, Text, etc.) for every concept. At the same time, we construct the data hierarchies for some concepts if necessary. Then we present an algorithm to generate the constraint mapping rules based on these hierarchies. The approach is suitable for the scalability of such application and can be extended easily from one domain to another for its domain independent feature. The results of experiment show its effectiveness and efficiency.

Wuhan University Journal of Natural Sciences | 2006

A framework of web data integrated LBS middleware

Meng Xiaofeng; Yin Shaoyi; Xiao Zhen

In this paper, we propose a flexible location-based service (LBS) middleware framework to make the development and deployment of new location based applications much easier. Considering the World Wide Web as a huge data source of location-relative information, we integrate the common used web data extraction techniques into the middleware framework, exposing a unified web data interface for the upper applications to make them more attractive. Besides, the framework also emphasizes some common LBS issues, including posisitioning, location modeling, location-dependent query processing, privacy and secure management.

Explore More