Zhenyu Dai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhenyu Dai is active.

Explore More

Publication

Featured researches published by Zhenyu Dai.

international conference on algorithms and architectures for parallel processing | 2015

FASTDB: An Array Database System for Efficient Storing and Analyzing Massive Scientific Data

Hui Li; Nengjun Qiu; Mei Chen; Hongyuan Li; Zhenyu Dai; Ming Zhu; Menglin Huang

With the development of science and technology, the data size and complexity of scientific data are increased rapidly, which made efficient data storage and parallel analysis of scientific data become a big challenge. The previous techniques that combine the traditional relational database with analysis software tends cannot efficiently meet the performance requirement of large scale scientific data based analysis. In this paper, we present FASTDB, a distributed array database system that optimized for massive scientific data management and provide a share-nothing, parallel array processing analysis. In order to demonstrate the intrinsic performance characteristics of FASTDB, we applied it into the interactive analysis of data from astronomical surveys, and designed a series of experiments with scientific analysis tasks. According to the experimental results, we found FASTDB can be significantly fast than traditional database based SkyServer in many typical analytical scenarios.

international conference on algorithms and architectures for parallel processing | 2015

Performance Prediction for Concurrent Workloads in Distributed Database Systems

Hui Li; Xiaohuan Hou; Mei Chen; Zhenyu Dai; Ming Zhu; Menglin Huang

In order to store and process data at large-scale, distributed databases are built to partition data and process it in parallel on distributed nodes in a cluster. When the database concurrently execute heterogeneous query workloads, performance prediction is needed. However, running queries in a distributed database heavily touches upon the network overhead as the data transmission between cluster nodes. Hence, in this work, we take network latency into account when predict concurrent query performance. We propose a linear regression model to estimate the interaction when execute concurrent query for analytical workloads in distributed database system. Since network latency and overheads of local processing are the two most significant factors for query execution, we analyze the query behavior with multivariate regression on both of them at different degree of concurrency. In addition, we use sampling techniques to obtain various query mixes as concurrency level increasing. The experiments for evaluation the performance of our prediction model are conducted over a PostgreSQL database cluster with a representative analytical workloads of TPC-H, the experimental results demonstrates that the query latency predictions of our model can minimize the relative error within 14 % on average.

Archive | 2019

Enhancing Feature Selection with Density Cluster for Better Clustering

Yang Chen; Hui Li; Mei Chen; Zhenyu Dai; Huanjun Li; Ming Zhu

Feature selection is an important data analysis technique that used to reduce the redundancy of features and exploit hidden information in high-dimensional data. In this paper we propose a similarity metric based feature selection method named Fesim. We use the Euclidean distance to measure the similarity among all features, and then apply the density based DBSCAN algorithm to clustering features which to be relevant. Moreover, we present a strategy which choose representative features of each cluster accurately. We conducted comprehensive experiments to evaluate the proposed approach, and the results on different datasets are demonstrated its superiority.