Weining Qian | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Weining Qian is active.

Explore More

Publication

Featured researches published by Weining Qian.

conference on information and knowledge management | 2003

Dynamically maintaining frequent items over a data stream

Cheqing Jin; Weining Qian; Chaofeng Sha; Jeffrey Xu Yu; Aoying Zhou

It is challenge to maintain frequent items over a data stream, with a small bounded memory, in a dynamic environment where both insertion/deletion of items are allowed. In this paper, we propose a new novel algorithm, called hCount, which can handle both insertion and deletion of items with a much less memory space than the best reported algorithm. Our algorithm is also superior in terms of precision, recall and processing time. In addition, our approach does not request the preknowledge on the size of range for a data stream, and can handle range extension dynamically. Given a little modification, algorithm hCount can be improved to hCount*, which even owns significantly better performance than before.

international universal communication symposium | 2010

Services in the Cloud Computing era: A survey

Minqi Zhou; Rong Zhang; Dadan Zeng; Weining Qian

Cloud Computing is becoming a well-known buzzword nowadays. As a brand new infrastructure to offer services, Cloud Computing systems have many superiorities in comparing to those existed traditional service provisions, such as reduced upfront investment, expected performance, high availability, infinite scalability, tremendous fault-tolerance capability and so on and consequently chased by most of the IT companies, such as Google, Amazon, Microsoft, Salesforce.com. Based on their overwhelming predominance in traditional service provisions and capital accumulation, most of these IT companies have more chance to adapt their services into such a new environment earlier, say Cloud Computing systems. On the other hand, a large number of new companies are spawned with competitive services relayed on those provided Cloud Computing systems. In terms of their provisions, we divide those services into six categories in this paper, say Data as a Service (Daas), Software as a Service (SaaS), Platform as a Service (PaaS), Identity and Policy Management as a Service (IPMaaS), Network as a Service (NaaS), Infrastructure as a Service (IaaS). Detailed analysis to these services are provided, as well as those companies which provide the corresponding service categories.

Knowledge and Information Systems | 2008

Tracking clusters in evolving data streams over sliding windows

Aoying Zhou; Feng Cao; Weining Qian; Cheqing Jin

Mining data streams poses great challenges due to the limited memory availability and real-time query response requirement. Clustering an evolving data stream is especially interesting because it captures not only the changing distribution of clusters but also the evolving behaviors of individual clusters. In this paper, we present a novel method for tracking the evolution of clusters over sliding windows. In our SWClustering algorithm, we combine the exponential histogram with the temporal cluster features, propose a novel data structure, the Exponential Histogram of Cluster Features (EHCF). The exponential histogram is used to handle the in-cluster evolution, and the temporal cluster features represent the change of the cluster distribution. Our approach has several advantages over existing methods: (1) the quality of the clusters is improved because the EHCF captures the distribution of recent records precisely; (2) compared with previous methods, the mechanism employed to adaptively maintain the in-cluster synopsis can track the cluster evolution better, while consuming much less memory; (3) the EHCF provides a flexible framework for analyzing the cluster evolution and tracking a specific cluster efficiently without interfering with other clusters, thus reducing the consumption of computing resources for data stream clustering. Both the theoretical analysis and extensive experiments show the effectiveness and efficiency of the proposed method.

international conference on data engineering | 2005

Bloom filter-based XML packets filtering for millions of path queries

Xueqing Gong; Weining Qian; Ying Yan; Aoying Zhou

The filtering of XML data is the basis of many complex applications. Lots of algorithms have been proposed to solve this problem. One important challenge is that the number of path queries is huge. It is necessary to take an efficient data structure representing path queries. Another challenge is that these path queries usually vary with time. The maintenance of path queries determines the flexibility and capacity of a filtering system. In this paper, we introduce a novel approximate method for XML data filtering, which uses Bloom filters representing path queries. In this method, millions of path queries can be stored efficiently At the same time, it is easy to deal with the change of these path queries. To improve the filtering performance, we introduce a new data structure, Prefix Filters, to decrease the number of candidate paths. Experiments show that our Bloom filter-based method takes less time to build routing table than automaton-based method. And our method has a good performance with acceptable false positive when filtering XML packets of relatively small depth with millions of path queries.

cloud data management | 2009

Query processing of massive trajectory data based on mapreduce

Qiang Ma; Bin Yang; Weining Qian; Aoying Zhou

With the development of positioning technologies and the boosting deployment of inexpensive location-aware sensors, large volumes of trajectory data have emerged. However, efficient and scalable query processing over trajectory data remains a big challenge. We explore a new approach to this target in this paper, presenting a new framework for query processing over trajectory data based on MapReduce. Traditional trajectory data partitioning, indexing, and query processing technologies are extended so that they may fully utilize the highly parallel processing power of large-scale clusters. We also show that the append-only scheme of MapReduce storage model can be a nice base for handling updates of moving objects. Preliminary experiments show that this framework scales well in terms of the size of trajectory data set. It is also discussed the limitation of traditional trajectory data processing techniques and our future research directions.

database systems for advanced applications | 2003

M-kernel merging: towards density estimation over data streams

Aoying Zhou; Zhiyuan Cai; Li Wei; Weining Qian

Density estimation is a costly operation for computing distribution information of data sets underlying many important data mining applications, such as clustering and biased sampling. However, traditional density estimation methods are inapplicable for streaming data, which are continuously arriving large volume of data, because of their request for linear storage and square size calculation. The shortcoming limits the application of many existing effective algorithms on data streams, for which the mining problem is an emergency for applications and a challenge for research. In this paper, the problem of computing density functions over data streams is examined. A novel method attacking this shortcoming of existing methods is developed to enable density estimation for large volume of data in linear time, fixed size memory, and without lose of accuracy. The method is based on M-Kernel merging, so that limited kernel functions to be maintained are determined intelligently, The application of the new method on different streaming data models is discussed, and the result of intensive experiments is presented. The analytical and empirical result show that this new density estimation algorithm for data streams can calculate density functions on demand at any time with high accuracy for different streaming data models.

Knowledge and Information Systems | 2006

Finding centric local outliers in categorical/numerical spaces

Jeffrey Xu Yu; Weining Qian; Hongjun Lu; Aoying Zhou

Outlier detection techniques are widely used in many applications such as credit-card fraud detection, monitoring criminal activities in electronic commerce, etc. These applications attempt to identify outliers as noises, exceptions, or objects around the border. The existing density-based local outlier detection assigns the degree to which an object is an outlier in a numerical space. In this paper, we propose a novel mutual-reinforcement-based local outlier detection approach. Instead of detecting local outliers as noise, we attempt to identify local outliers in the center, where they are similar to some clusters of objects on one hand, and are unique on the other. Our technique can be used for bank investment to identify a unique body, similar to many good competitors, in which to invest. We attempt to detect local outliers in categorical, ordinal as well as numerical data. In categorical data, the challenge is that there are many similar but different ways to specify relationships among the data items. Our mutual-reinforcement-based approach is stable, with similar but different user-defined relationships. Our technique can reduce the burden for users to determine the relationships among data items, and find the explanations why the outliers are found. We conducted extensive experimental studies using real datasets.

international world wide web conferences | 2009

Detecting Overlapping Community Structures in Networks

Fang Wei; Weining Qian; Chen Wang; Aoying Zhou

Community structure has been recognized as an important statistical feature of networked systems over the past decade. A lot of work has been done to discover isolated communities from a network, and the focus was on developing of algorithms with high quality and good performance. However, there is less work done on the discovery of overlapping community structure, even though it could better capture the nature of network in some real-world applications. For example, people are always provided with varying characteristics and interests, and are able to join very different communities in their social network. In this context, we present a novel overlapping community structures detecting algorithm which first finds the seed sets by the spectral partition and then extends them with a special random walks technique. At every expansion step, the modularity function Q is chosen to measure the expansion structures. The function has become one of the popular standards in community detecting and is defined in Newman and Girvan (Phys. Rev. 69:026113, 2004). We also give a theoretic analysis to the whole expansion process and prove that our algorithm gets the best community structures greedily. Extensive experiments are conducted in real-world networks with various sizes. The results show that overlapping is important to find the complete community structures and our method outperforms the C-means in quality.

Frontiers of Computer Science in China | 2013

Towards modeling popularity of microblogs

Haixin Ma; Weining Qian; Fan Xia; Xiaofeng He; Jun Xu; Aoying Zhou

As one kind of social media, microblogs are widely used for sensing the real-world. The popularity of microblogs is an important measurement for evaluation of the influencial of pieces of information. The models and modeling techniques for popularity of microblogs are studied in this paper. A huge data set based on Sina Weibo, one of the most popular microblogging services, is used in the study. First, two different types of popularity, namely number of retweets and number of possible views are defined, while their relationships are discussed. Then, the temporal dynamics, including lifecycles and tipping-points, of tweets’ popularity are studied. For modeling the temporal dynamics, a piecewise sigmoid model is used. Empirical studies show the effectiveness of our modeling methods.

international conference on data engineering | 2009

Efficient Indices Using Graph Partitioning in RDF Triple Stores

Ying Yan; Chen Wang; Aoying Zhou; Weining Qian; Li Ma; Yue Pan

With the advance of the Semantic Web, varying RDF data were increasingly generated, published, queried, and reused via the Web. For example, the DBpedia, a community effort to extract structured data from Wikipedia articles, broke 100 million RDF triples in its latest release. Initiated by Tim Berners-Lee,likewise, the Linking Open Data (LOD) project has published and interlinked many open licence datasets which consisted of over 2 billion RDF triples so far. In this context, fast query response over such large scaled data would be one of the challenges to existing RDF data stores. In this paper, we propose a novel triple indexing scheme to help RDF query engine fast locate the instances within a small scope. By considering the RDF data as a graph, we would partition the graph into multiple subgraph pieces and store them individually, over which a signature tree would be built up to index the URIs. When a query arrives, the signature tree index is used to fast locate the partitions that might include the matches of the query by its constant URIs. Our experiments indicate that the indexing scheme dramatically reduces the query processing time in most cases because many partitions would be early filtered out and the expensive exact matching is only performed over a quite small scope against the original dataset.

Explore More