2019 IEEE International Conference on Signal, Information and Data Processing (ICSIDP) | 2019

A Measurable Framework for Run-time Data Sampling in Large-scale Datacenter

 
 
 

Abstract


In large-scale data center, collecting run-time data is a very effective method which can be used to analyze and monitor the performance of data centers. But due to the huge size of data centers, limited computing resources and the requirement of low delay, it is very difficult and unrealistic to collect all the data in large-scale data centers. Therefore, to solve the serious problem, sampling partial data from all data is a common method at present. However, existing researches only focus on designing some efficient data sampling methods to reduce resource and time overhead in datacenters, but these works do not provide a unified and measurable framework to quantity the quality and practicability of other sampling methods. In this paper, we propose a measurable framework for general run-time data sampling in large-scale data center by modeling underlying recovering hypothesis explicitly. The proposed framework is mainly composed of four processes: sampling, collecting, recovering, and comparing. It could measure sampling bias degree accurately. And we design and implement three sampling methods with different recovering hypothesis. The experimental results demonstrate that the proposed framework can help us find a better run-time data sampling method effectively which has a lower sampling bias degree with same sampling rate.

Volume None
Pages 1-6
DOI 10.1109/ICSIDP47821.2019.9173399
Language English
Journal 2019 IEEE International Conference on Signal, Information and Data Processing (ICSIDP)

Full Text