Wenxin Liang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wenxin Liang is active.

Explore More

Publication

Featured researches published by Wenxin Liang.

british national conference on databases | 2005

LAX: an efficient approximate XML join based on clustered leaf nodes for XML data integration

Wenxin Liang; Haruo Yokota

Recently, more and more data are published and exchanged by XML on the Internet. However, different XML data sources might contain the same data but have different structures. Therefore, it requires an efficient method to integrate such XML data sources so that more complete and useful information can be conveniently accessed and acquired by users. The tree edit distance is regarded as an effective metric for evaluating the structural similarity in XML documents. However, its computational cost is extremely expensive and the traditional wisdom in join algorithms cannot be applied easily. In this paper, we propose LAX (Leaf-clustering based Approximate XML join algorithm), in which the two XML document trees are clustered into subtrees representing independent items and the similarity between them is determined by calculating the similarity degree based on the leaf nodes of each pair of subtrees. We also propose an effective algorithm for clustering the XML document for LAX. We show that it is easily to apply the traditional wisdom in join algorithms to LAX and the join result contains complete information of the two documents. We then do experiments to compare LAX with the tree edit distance and evaluate its performance using both synthetic and real data sets. Our experimental results show that LAX is more efficient in performance and more effective for measuring the approximate similarity between XML documents than the tree edit distance.

international conference on data engineering | 2005

VLEI code: an efficient labeling method for handling XML documents in an RDB

Kazuhito Kobayashi; Wenxin Liang; D. Kobayashi; Akitsugu Watanabe; Haruo Yokota

A number of XML labeling methods have been proposed to store XML documents in relational databases. However, they have a vulnerable point, in insertion operations. We propose the variable length endless insertable (VLEI) code and apply it to XML labeling to reduce the cost of insertion operations. Results of our experiments indicate that a combination of the VLEI code and Dewey order is effective for handling skewed insertions.

database and expert systems applications | 2009

A Low-Storage-Consumption XML Labeling Method for Efficient Structural Information Extraction

Wenxin Liang; Akihiro Takahashi; Haruo Yokota

Recently, labeling methods to extract and reconstruct the structural information of XML data, which are important for many applications such as XPath query and keyword search, are becoming more attractive. To achieve efficient structural information extraction, in this paper we propose C-DO-VLEI code, a novel update-friendly bit-vector encoding scheme, based on register-length bit operations combining with the properties of Dewey Order numbers, which cannot be implemented in other relevant existing schemes such as ORDPATH. Meanwhile, the proposed method also achieves lower storage consumption because it does not require either prefix schema or any reserved codes for node insertion. We performed experiments to evaluate and compare the performance and storage consumption of the proposed method with those of the ORDPATH method. Experimental results show that the execution times for extracting depth information and parent node labels using the C-DO-VLEI code are about 25% and 15% less, respectively, and the average label size using the C-DO-VLEI code is about 24% smaller, comparing with ORDPATH.

international conference on data engineering | 2006

A Path-sequence Based Discrimination for Subtree Matching in Approximate XML Joins

Wenxin Liang; Haruo Yokota

In this paper, we discuss the one-to-multiple matching problem in leaf-clustering based approximate XML join algorithms and propose a path-sequence based discrimination method to solve this problem. In our method, each path sequence from the top node to the matched leaf in the base and target subtree is extracted, and the most similar target subtree for the base one is determined by the pathsequence based subtree similarity degree. We conduct experiments to evaluate our method by using both real bibliography and bioinformatics XML documents. The experimental results show that our method can effectively decrease the occunence rate of one-to-multiple matching for both bibliography and bioinformatics XML data, and hence improve the precision of the leaf-clustering based approximate XML join algorithms.

international conference on digital information management | 2007

An XML subtree segmentation method based on syntactic segmentation rate

Wenxin Liang; Xiangyong Ouyang; Haruo Yokota

In this paper, we propose an effective method for segmenting large XML documents into independent meaningful subtrees based on two syntactic segmentation rates: vertical segmentation rate and horizontal segmentation rate. In the proposed method, we use DO-VLEI code to calculate the required parameters for the subtree segmentation. We conduct experiments to observe the effectiveness of the proposed subtree segmentation method using real bibliography XML documents stored in RDBs. We apply our previously proposed subtree matching algorithm SLAX to match the segmented subtrees and evaluate how the matching threshold impacts the precision and recall of subtree matching. Besides, we also integrate the matched subtrees determined by SLAX by our previously proposed subtree integration algorithm. The experimental results indicate that the proposed subtree segmentation method is effective for segmenting XML documents into independent meaningful subtrees and our previously proposed subtree matching algorithm achieves reasonable matching precision and recall using the segmented subtrees.

web age information management | 2008

Exploiting Path Information for Syntax-Based XML Subtree Matching in RDBs

Wenxin Liang; Haruo Yokota

In this paper, we propose two methods exploiting path information, direct-parent based method and full-path based method for syntax-based XML subtree matching in RDBs. In each proposed method, we discuss two ways of using the path information. The one is utilizing the path information after matching the leaf nodes. The other is using the path information together with the PCDATA value of leaf node as the join object. We perform experiments using the real bibliography XML documents stored in RDBs to evaluate the execution time, precision and recall of subtree matching. The experimental results indicate that both the two proposed path-based methods can effectively improve the precision and recall of subtree matching comparing with the original SLAX algorithm.

database and expert systems applications | 2008

Superimposed Code-Based Indexing Method for Extracting MCTs from XML Documents

Wenxin Liang; Takeshi Miki; Haruo Yokota

With the exponential increase in the amount of XML data on the Internet, information retrieval techniques on tree-structured XML documents such as keyword search become important. The search results for this retrieval technique are often represented by minimum connecting trees (MCTs) rooted at the lowest common ancestors (LCAs) of the nodes containing all the search keywords. Recently, effective methods such as the stack-based algorithm for generating the lowest grouped distance MCTs (GDMCTs), which derive a more compact representation of the query results, have been proposed. However, when the XML documents and the number of search keywords become large, these methods are still expensive. To achieve more efficient algorithms for extracting MCTs, especially lowest GDMCTs, we first consider two straightforward LCA detection methods: keyword B+trees with Dewey-order labels and superimposed code-based indexing methods. Then, we propose a method for efficiently detecting the LCAs, which combines the two straightforward indexing methods for LCA detection. We also present an effective solution for the false drop problem caused by the superimposed code. Finally, the proposed LCA detection methods are applied to generate the lowest GDMCTs. We conduct detailed experiments to evaluate the benefits of our proposed algorithms and show that the proposed combined method can completely solve the false drop problem and outperforms the stack-based algorithm in extracting the lowest GDMCTs.

international conference on digital information management | 2007

Storage consumption of variable-length XML labels uninfluenced by insertions

Akihiro Takahashi; Wenxin Liang; Haruo Yokota

In recent years, the method of assigning labels to the nodes of an XML tree is getting more attraction. Various functions in an RDBMS can be easily utilized by storing the labeled XML documents into the RDB. However, in traditional labeling methods, a number of nodes need to be relabeled, when the XML documents are updated. To address this problem, we proposed DO-VLEI code combining VLEI code with the Dewey Order method. DO-VLEI code is effective to reduce the update cost, but the label size increases rapidly when handling large XML documents. To reduce the label size, we presented Compressed-bit-string DO-VLEI (C-DO-VLEI) code. However, it is difficult to handle the length of C-DO-VLEI because it is a variable-length code. In this paper, we propose two effective methods, VLEI-ABL and VLEI-EOL for handling the code length of C-DO-VLEI. We perform experiments to compare the storage consumption of the proposed methods with the previously known ORDPATH. The experimental results show that our methods considerably outperform the ORDPATH.

Journal of Information Processing | 2006