Haidong Zhang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Haidong Zhang is active.

Explore More

Publication

Featured researches published by Haidong Zhang.

international conference on machine learning | 2011

Software analytics as a learning case in practice: approaches and experiences

Dongmei Zhang; Yingnong Dang; Jian-Guang Lou; Shi Han; Haidong Zhang; Tao Xie

Software analytics is to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information for data-driven tasks around software and services. In this position paper, we advocate that when applying analytic technologies in practice of software analytics, one should (1) incorporate a broad spectrum of domain knowledge and expertise, e.g., management, machine learning, large-scale data processing and computing, and information visualization; and (2) investigate how practitioners take actions on the produced information, and provide effective support for such information-based action taking. Our position is based on our experiences of successful technology transfer on software analytics at Microsoft Research Asia.

very large data bases | 2015

YADING: fast clustering of large-scale time series data

Rui Ding; Qiang Wang; Yingnong Dang; Qiang Fu; Haidong Zhang; Dongmei Zhang

Fast and scalable analysis techniques are becoming increasingly important in the era of big data, because they are the enabling techniques to create real-time and interactive experiences in data analysis. Time series are widely available in diverse application areas. Due to the large number of time series instances (e.g., millions) and the high dimensionality of each time series instance (e.g., thousands), it is challenging to conduct clustering on large-scale time series, and it is even more challenging to do so in real-time to support interactive exploration. In this paper, we propose a novel end-to-end time series clustering algorithm, YADING, which automatically clusters large-scale time series with fast performance and quality results. Specifically, YADING consists of three steps: sampling the input dataset, conducting clustering on the sampled dataset, and assigning the rest of the input data to the clusters generated on the sampled dataset. In particular, we provide theoretical proof on the lower and upper bounds of the sample size, which not only guarantees YADINGs high performance, but also ensures the distribution consistency between the input dataset and the sampled dataset. We also select L1 norm as similarity measure and the multi-density approach as the clustering method. With theoretical bound, this selection ensures YADINGs robustness to time series variations due to phase perturbation and random noise. Evaluation results have demonstrated that on typical-scale (100,000 time series each with 1,000 dimensions) datasets, YADING is about 40 times faster than the state-of-the-art, sampling-based clustering algorithm DENCLUE 2.0, and about 1,000 times faster than DBSCAN and CLARANS. YADING has also been used by product teams at Microsoft to analyze service performance. Two of such use cases are shared in this paper.

foundations of software engineering | 2014

Querying sequential software engineering data

Chengnian Sun; Haidong Zhang; Jian-Guang Lou; Hongyu Zhang; Qiang Wang; Dongmei Zhang; Siau-Cheng Khoo

We propose a pattern-based approach to effectively and efficiently analyzing sequential software engineering (SE) data. Different from other types of SE data, sequential SE data preserves unique temporal properties, which cannot be easily analyzed without much programming effort. In order to facilitate the analysis of sequential SE data, we design a sequential pattern query language (SPQL), which specifies the temporal properties based on regular expressions, and is enhanced with variables and statements to store and manipulate matching states. We also propose a query engine to effectively process the SPQL queries. We have applied our approach to analyze two types of SE data, namely bug report history and source code change history. We experiment with 181,213 Eclipse bug reports and 323,989 code revisions of Android. SPQL enables us to explore interesting temporal properties underneath these sequential data with a few lines of query code and low matching overhead. The analysis results can help better under- stand a software process and identify process violations.

Perspectives on Data Science for Software Engineering | 2016

Visual analytics for software engineering data

Zhitao Hou; Hongyu Zhang; Haidong Zhang; Dongmei Zhang

Abstract Many data analysis techniques require substantial knowledge and skills and are typically performed by “data scientists”. Ordinary users may find it difficult to apply these techniques to quickly explore the data by themselves. We propose MetroEyes, a visual analytics tool for interactive data exploration. We have successfully transferred the main concepts and experiences of MetroEyes to Microsoft Power BI.

Archive | 2008