Shuyun Wang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shuyun Wang is active.

Explore More

Publication

Featured researches published by Shuyun Wang.

annual acis international conference on computer and information science | 2008

Subspace Clustering of High Dimensional Data Streams

Shuyun Wang; Yingjie Fan; Chenghong Zhang; HeXiang Xu; Xiulan Hao; Yunfa Hu

In this paper, SOStream, which is a novel algorithm of clustering over high dimensional online data stream is presented, it is based on subspace.-SOStream partitions the data space into grids, and maintains a superset of all dense units in an online way. A deterministic lower and upper bound of the selectivity of each maintained units are also given. With the maintained potential dense units, SOStream is capable of discovering the clusters in different subspaces over high dimensional data stream with arbitrary shape. The experimental results on real and synthetic datasets demonstrate the effectivity of the approach.

international conference on machine learning and cybernetics | 2007

A Method of Deep Web Classification

HeXiang Xu; Xiulan Hao; Shuyun Wang; Yunfa Hu

The research on deep Web classification is an important area in large-scale deep Web integration, which is still at its early stage. Many deep Web sources are structured by providing structured query interfaces and results. Classifying such structured sources into domains is one of the critical steps toward the integration of heterogeneous Web sources. In this paper, we present an ontology-based deep Web classification, which includes a category ontology model and a deep Web vector space model (VSM). The experimental results show that we can get a good performance with average precision 91.6% and average recall 92.4%.

annual acis international conference on computer and information science | 2008

An Improved Condensing Algorithm

Xiulan Hao; Chenghong Zhang; HeXiang Xu; Xiaopeng Tao; Shuyun Wang; Yunfa Hu

kNN classifier is widely used in text categorization, however, kNN has the large computational and store requirements, and its performance also suffers from uneven distribution of training data. Usually, condensing technique is resorted to reducing the noises of training data and decreasing the cost of time and space. Traditional condensing technique picks up samples in a random manner when initialization. Though random sampling is one means to reduce outliers, the extremely stochastic may lead to bad performance sometimes, that is, advantages of sampling may be suppressed. To avoid such a misfortune, we propose a variation of traditional condensing technique. Experiment results illustrate this strategy can solve above problems effectively.

annual acis international conference on computer and information science | 2008

Entropy Based Clustering of Data Streams with Mixed Numeric and Categorical Values

Shuyun Wang; Yingjie Fan; Chenghong Zhang; HeXiang Xu; Xiulan Hao; Yunfa Hu

In is paper, a novel algorithm for clustering data streams with mixed numeric and categorical attributes (CNC-Stream)is proposed. A new similarity measure based on entropy determining the similarity between the objects(data points in the stream or the micro- clusters in memory) is also presented here, which makes CNC-Stream work, the experiments conducted on the real data sets and synthetic data sets show that the proposed method is of high quality.

international conference on emerging technologies | 2007

Finding frequent items in data streams using ESBF

Shuyun Wang; Xiulan Hao; HeXiang Xu; Yunfa Hu

In this paper, we introduce a novel data structure, ESBF (Ex- tensible and Scalable Bloom Filter), and the algorithm FI-ESBF (Finding frequent Items using ESBF) for estimating the frequent items in data streams. FI-ESBF can work with high precision while using much less memory than those of the best reported algorithm does considering the large number of distinct items in the stream. ESBF is the extension of counting Bloom Filter(CBF), By using it, we are allowed to adjust the size of memory used dynamically according to the different data distribution and the number of distinct items in the data streams, therefore the priori knowledge about the data distribution of the streams and the number of distinct elements to be stored is not required.

fuzzy systems and knowledge discovery | 2007

Mining Frequent Items Based on Bloom Filter

Shuyun Wang; Xiulan Hao; HeXiang Xu; Yunfa Hu

This paper introduce the algorithm MIBFD (mining frequent items using bloom filter based on damped model) for mining recent frequent items in data streams. Based on an efficient data structure named extensible and scalable bloom filter(ESBF), MIBFD is able to adjust the size of memory used dynamically. Theoretical analysis and experiments show that MIBFD is efficient both in processing time and in memory usage.

international conference on machine learning and cybernetics | 2007

Efficient KNN Text Categorization Based on Multiedit and Condensing Techniques

Xiulan Hao; Chenghong Zhang; Shuyun Wang; Xiaopeng Tao; Yunfa Hu

As a simple and effective classification approach, KNN is widely used in text categorization. However, KNN classifier not only has the large computational and store requirements, but also deteriorates performance of classification because of uneven distribution of training data. In this paper, we present a combinational technique, multi-edit-nearest-neighbor and condensing techniques, for reducing the noises of training data and decreasing the cost of time and space. Our experiment results illustrate that this strategy can solve above problems effectively.

fuzzy systems and knowledge discovery | 2007

Accurate Chinese Text Classification via Multiple Strategies

Xiulan Hao; Chenghong Zhang; Xiaopeng Tao; Shuyun Wang; Yunfa Hu

Text classification is one of means to understand text content. It is widely used in information retrieving, filtering spam, monitoring ill gossips, and blocking pornographic and evil messages. kN N is widely used in text categorization, but it suffers from biased training data set. In developing Prototype of Internet Information Security for Shanghai Council of Information and Security, we detect that when training data set is biased, almost all test documents of some rare (smaller) categories are classified into common (larger) ones by traditional kN N classifier. The performance of text classification can not satisfy the users requirement in this case. To alleviate such a misfortune, we adopt 2 measures to boost kN N classifier. Firstly, we optimize features by removing some candidate features. Secondly, we modify traditional decision rules by integrating number of training samples of each category with them. Exhaustive experiments illustrate that the adapted kN N achieves significant classification performance improvement on biased corpora.

web age information management | 2008

An Efficient Structural Index for Branching Path Queries

Yingjie Fan; Shuyun Wang; Chenghong Zhang; Hai-Bing Ma; Yunfa Hu

Structural index, which acts as a schema, plays an important role in query optimization over XML and semi-structured data. On the basis of the inter-relevant successive trees (IRST), we extend IRST to the XML data graph, introduce the new equivalence relation of k-l-similarity, and propose IRST (k, l)-index, an adaptive structural index that can support branching or simple path queries efficiently. Compared with the same kind of indexes, our experiments show that IRST (k, l)-index performs more efficiently in terms of space consumption and query performance for branching or simple paths, while using significantly less construction time.

annual acis international conference on computer and information science | 2008

An Adaptive Index of XML for Frequent Branching Path Queries

Yingjie Fan; Chenghong Zhang; Shuyun Wang; Xiulan Hao; Yunfa Hu

Structural index, which acts as a structural summary, plays an important role in query optimization over XML and semi-structured data. To speed up branching path queries, we introduce the notion of k-l-bisimilarity into M(k)-index and propose an adaptive structural index, MBF(k, l)-index, which supports branching paths with any complexity efficiently and inherits the advantage of avoiding over-refinement from M(k)-index. Our experiments have shown that the index performs more efficiently in terms of space consumption and query performance for branching or simple paths in comparison with the same kind of indexes.

Explore More