Minqi Zhou | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Minqi Zhou is active.

Explore More

Publication

Featured researches published by Minqi Zhou.

international universal communication symposium | 2010

Services in the Cloud Computing era: A survey

Minqi Zhou; Rong Zhang; Dadan Zeng; Weining Qian

Cloud Computing is becoming a well-known buzzword nowadays. As a brand new infrastructure to offer services, Cloud Computing systems have many superiorities in comparing to those existed traditional service provisions, such as reduced upfront investment, expected performance, high availability, infinite scalability, tremendous fault-tolerance capability and so on and consequently chased by most of the IT companies, such as Google, Amazon, Microsoft, Salesforce.com. Based on their overwhelming predominance in traditional service provisions and capital accumulation, most of these IT companies have more chance to adapt their services into such a new environment earlier, say Cloud Computing systems. On the other hand, a large number of new companies are spawned with competitive services relayed on those provided Cloud Computing systems. In terms of their provisions, we divide those services into six categories in this paper, say Data as a Service (Daas), Software as a Service (SaaS), Platform as a Service (PaaS), Identity and Policy Management as a Service (IPMaaS), Network as a Service (NaaS), Infrastructure as a Service (IaaS). Detailed analysis to these services are provided, as well as those companies which provide the corresponding service categories.

Future Generation Computer Systems | 2009

An efficient peer-to-peer indexing tree structure for multidimensional data

Rong Zhang; Weining Qian; Aoying Zhou; Minqi Zhou

As one of the most important technologies for implementing large-scale distributed systems, peer-to-peer (P2P) computing has attracted much attention in both research and industrial communities, for its advantages such as high availability, high performance, and high flexibility to the dynamics of networks. However, multidimensional data indexing remains as a big challenge to P2P computing, because of the inefficiency in search and network maintenance caused by the complicated existing index structures, which greatly limits the scalability of applications and dimensionality of the data to be indexed. We propose SDI (Swift tree structure for multidimensional Data Indexing), a swift index scheme with a simple tree structure for multidimensional data indexing in large-scale distributed systems. While keeping the query efficiency in O(logN) in terms of routing hops, SDI has extremely low maintenance costs which is proved through theoretical analysis. Furthermore, SDI overcomes the root-bottleneck problem existing in most other tree-based distributed indexing systems. Extensive empirical study verifies the superiority of SDI in both query and maintenance performance.

semantics, knowledge and grid | 2010

Join Optimization in the MapReduce Environment for Column-wise Data Store

Minqi Zhou; Rong Zhang; Dadan Zeng; Weining Qian; Aoying Zhou

The chain join processing which combines records from two or more tables sequentially has been well studied in the centralized databases. However, it has seldom been discussed in the cloud computing era, and remains imperative to be solved, especially where structured (or relational) data are stored in a column (attribute) wise fashion in distributed file systems (e.g., Google File System) over hundreds of or even thousands of commodities PCs. In this paper, we propose a novel method for chain join processing, which is one of the common primitives in the cloud era for column-wise stored data analysis. By effectively selecting the dedicated records (tuples) for the chain join based on the information exploited within bipartite join graph, communication cost for record transmission could be reduced dramatically. A bushy tree structure is deployed to regulate the chain join sequence, which further reduces the number of intermediate results generated and transmitted, and explores higher parallelism in join processing, while results in more efficient join processing. Our extensive performance study confirms the effectiveness and efficiency of our methods.

international world wide web conferences | 2012

Exploiting shopping and reviewing behavior to re-score online evaluations

Rong Zhang; Chaofeng Sha; Minqi Zhou; Aoying Zhou

Analysis to product reviews has attracted great attention from both academia and industry. Generally the evaluation scores of reviews are used to generate the average scores of products and shops for future potential users. However, in the real world, there is the inconsistency problem between the evaluation scores and review content, and some customers do not give out fair reviews. In this work, we focus on detecting the credibility of customers by analyzing online shopping and review behaviors, and then we re-score the reviews for products and shops. In the end, we evaluate our algorithm based on the real data set from Taobao, the biggest E-commerce site in China.

Distributed and Parallel Databases | 2009

Multi-dimensional data density estimation in P2P networks

Minqi Zhou; Weining Qian; Xueqing Gong; Aoying Zhou

Estimating the global data distribution in Peer-to-Peer (P2P) networks is an important issue and has not yet been well addressed. It can benefit many P2P applications, such as load balancing analysis, query processing, data mining, and so on. In this paper, we propose a novel algorithm which is based on compact multi-dimensional histogram information to achieve high estimation accuracy with low estimation cost. Maintaining data distribution in a multi-dimensional histogram which is spread among peers without overlapping and each part of which is further condensed by a set of discrete cosine transform coefficients, each peer is capable to hierarchically accumulate the compact information to the entire histogram by information exchange and consequently estimates the global data density with accuracy and efficiency. Algorithms on discrete cosine transform coefficients hierarchically accumulating as well as density estimation error are introduced with detailed theoretical analysis and proof. Our extensive performance study confirms the effectiveness and efficiency of our methods on density estimation in dynamic P2P networks.

international conference on data engineering | 2015

Chronos: An elastic parallel framework for stream benchmark generation and simulation

Ling Gu; Minqi Zhou; Zhenjie Zhang; Ming-Chien Shan; Aoying Zhou; Marianne Winslett

In the coming big data era, stress test to IT systems under extreme data volume is crucial to the adoption of computing technologies in every corner of the cyber world. Appropriately generated benchmark datasets provide the possibility for administrators to evaluate the capacity of the systems when real datasets hard obtained have not extreme cases. Traditional benchmark data generators, however, mainly target at producing relation tables of arbitrary size following fixed distributions. The output of such generators are insufficient when it is used to measure the stability of the architecture with extremely dynamic and heavy workloads, caused by complicated/hiden factors in the generation mechanism of real world, e.g. dependency between stocks in the trading market and collaborative human behaviors on the social network. In this paper, we present a new framework, called Chronos, to support new demands on streaming data benchmarking, by generating and simulating realistic and fast data streams in an elastic manner. Given a small group of samples with timestamps, Chronos reproduces new data streams with similar characteristics of the samples, preserving column-wise correlations, temporal dependency and order statistics of the snapshot distributions at the same time. To achieve such realistic requirements, we propose 1) a column decomposition optimization technique to partition the original relation table into small sub-tables with minimal correlation information loss, 2) a generative and extensible model based on Latent Dirichlet Allocation to capture temporal dependency while preserving order statistics of the snapshot distribution, and 3) a new generation and assembling method to efficiently build tuples following the expected distribution on the snapshots. To fulfill the vision of elasticity, we also present a new parallel stream data generation mechanism, facilitating distributed nodes to collaboratively generate tuples with minimal synchronization overhead and excellent load balancing. Our extensive experimental studies on real world data domains confirm the efficiency and effectiveness of Chronos on stream benchmark generation and simulation.

Archive | 2011

At the Frontiers of Information and Software as Services

K. Selçuk Candan; Wen Syan Li; Thomas Phan; Minqi Zhou

The high cost of creating and maintaining software and hardware infrastructures for delivering services to businesses has led to a notable trend toward the use of third-party service providers, which rent out network presence, computation power, and data storage space to clients with infrastructural needs. These third party service providers can act as data stores as well as entire software suites for improved availability and system scalability, reducing small and medium businesses’ burden of managing complex infrastructures. This is called information/application outsourcing or software as a service (SaaS). Emergence of enabling technologies, such as service oriented architectures (SOA), virtual machines, and cloud computing, contribute to this trend. Scientific Grid computing, on-line software services, and business service networks are typical examples leveraging database and software as service paradigm. In this paper, we survey the technologies used to enable SaaS paradigm as well as the current offerings on the market. We also outline research directions in the field.

international universal communication symposium | 2010

Searching XML data by SLCA on a MapReduce cluster

Mengjie Zhou; Haoji Hu; Minqi Zhou

XML keyword search is a popular topic in research field, and the Smallest Lowest Common Ancestor (SLCA) concept is fundamental for XML keyword search algorithms. With the rapid growth of XML data in internet, we are confronted with big data issues, its becoming a new research direction for managing massive XML data now. Conventional centralized data management technologies are limited in the aspects of efficiency, throughout and maintenance cost. MapReduce framework is a recent trend to process large-scale data. It is implemented on clusters built by numbers of business machines, to conquer limitations mentioned above by parallel computation. In this paper, we provide a SLCA-based keyword search implementation for large-scale XML data sets on a MapReduce cluster. Main steps of our implementation include XML data partition, parse and sort, index setup and SLCA computation. We conduct some experiments to evaluate the effectiveness of the proposed method.

database systems for advanced applications | 2007

Gchord: indexing for multi-attribute query in p2p system with low maintenance cost

Minqi Zhou; Rong Zhang; Weining Qian; Aoying Zhou

To provide complex query processing in peer-to-peer systems has attracted much attention in both academic and industrial community. We present GChord, a scalable technique for evaluating queries with multi-attributes. Both exact match and range queries can be handled by GChord. It has advantages over existing methods in that each tuple only needs to be indexed once, while the query efficiency is guaranteed. Thus, index maintenance cost and search efficiency are balanced. Additional optimization techniques further improves the performance of GChord. Extensive experiments are conducted to validate the efficiency of the proposed method.

international conference on management of data | 2016

Elastic Pipelining in an In-Memory Database Cluster

Li Wang; Minqi Zhou; Zhenjie Zhang; Yin Yang; Aoying Zhou; Dina Bitton

An in-memory database cluster consists of multiple interconnected nodes with a large capacity of RAM and modern multi-core CPUs. As a conventional query processing strategy, pipelining remains a promising solution for in-memory parallel database systems, as it avoids expensive intermediate result materialization and parallelizes the data processing among nodes. However, to fully unleash the power of pipelining in a cluster with multi-core nodes, it is crucial for the query optimizer to generate good query plans with appropriate intra-node parallelism, in order to maximize CPU and network bandwidth utilization. A suboptimal plan, on the contrary, causes load imbalance in the pipelines and consequently degrades the query performance. Parallelism assignment optimization at compile time is nearly impossible, as the workload in each node is affected by numerous factors and is highly dynamic during query evaluation. To tackle this problem, we propose elastic pipelining, which makes it possible to optimize intra-node parallelism assignments in the pipelines based on the actual workload at runtime. It is achieved with the adoption of new elastic iterator model and a fully optimized dynamic scheduler. The elastic iterator model generally upgrades traditional iterator model with new dynamic multi-core execution adjustment capability. And the dynamic scheduler efficiently provisions CPU cores to query execution segments in the pipelines based on the light-weight measurements on the operators. Extensive experiments on real and synthetic (TPC-H) data show that our proposal achieves almost full CPU utilization on typical decision-making analytical queries, outperforming state-of-the-art open-source systems by a huge margin.

Explore More