Weiyi Shang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Weiyi Shang is active.

Explore More

Publication

Featured researches published by Weiyi Shang.

working conference on reverse engineering | 2009

An Empirical Study on Inconsistent Changes to Code Clones at Release Level

Nicolas Bettenburg; Weiyi Shang; Walid M. Ibrahim; Bram Adams; Ying Zou; Ahmed E. Hassan

Current research on code clones tries to address the question whether or not code clones are harmful for the quality of software. As most of these studies are based on the fine-grained analysis of inconsistent changes at the revision level, they capture much of the chaotic and experimental nature inherent to any ongoing software development process. Conclusions drawn from the inspection of highly fluctuating and short-lived clones are likely to exaggerate the ill effects of inconsistent changes. To gain a broader perspective, we perform an empirical study on the effect of inconsistent changes on software quality at the release level. Based on a case study on two open source software systems, we observe that only 1% to 3% of inconsistent changes to clones introduce software defects, as opposed to substantially higher percentages reported by other studies. Our findings suggest that developers are able to effectively manage and control the evolution of cloned code at the release level.

international conference on software engineering | 2013

Assisting developers of big data analytics applications when deploying on hadoop clouds

Weiyi Shang; Zhen Ming Jiang; Hadi Hemmati; Brain Adams; Ahmed E. Hassan; Patrick Martin

Big data analytics is the process of examining large amounts of data (big data) in an effort to uncover hidden patterns or unknown correlations. Big Data Analytics Applications (BDA Apps) are a new type of software applications, which analyze big data using massive parallel processing frameworks (e.g., Hadoop). Developers of such applications typically develop them using a small sample of data in a pseudo-cloud environment. Afterwards, they deploy the applications in a large-scale cloud environment with considerably more processing power and larger input data (reminiscent of the mainframe days). Working with BDA App developers in industry over the past three years, we noticed that the runtime analysis and debugging of such applications in the deployment phase cannot be easily addressed by traditional monitoring and debugging approaches. In this paper, as a first step in assisting developers of BDA Apps for cloud deployments, we propose a lightweight approach for uncovering differences between pseudo and large-scale cloud deployments. Our approach makes use of the readily-available yet rarely used execution logs from these platforms. Our approach abstracts the execution logs, recovers the execution sequences, and compares the sequences between the pseudo and cloud deployments. Through a case study on three representative Hadoop-based BDA Apps, we show that our approach can rapidly direct the attention of BDA App developers to the major differences between the two deployments. Knowledge of such differences is essential in verifying BDA Apps when analyzing big data in the cloud. Using injected deployment faults, we show that our approach not only significantly reduces the deployment verification effort, but also provides very few false positives when identifying deployment failures.

mining software repositories | 2009

MapReduce as a general framework to support research in Mining Software Repositories (MSR)

Weiyi Shang; Zhen Ming Jiang; Bram Adams; Ahmed E. Hassan

Researchers continue to demonstrate the benefits of Mining Software Repositories (MSR) for supporting software development and research activities. However, as the mining process is time and resource intensive, they often create their own distributed platforms and use various optimizations to speed up and scale up their analysis. These platforms are project-specific, hard to reuse, and offer minimal debugging and deployment support. In this paper, we propose the use of MapReduce, a distributed computing platform, to support research in MSR. As a proof-of-concept, we migrate J-REX, an optimized evolutionary code extractor, to run on Hadoop, an open source implementation of MapReduce. Through a case study on the source control repositories of the Eclipse, BIRT and Datatools projects, we demonstrate that the migration effort to MapReduce is minimal and that the benefits are significant, as running time of the migrated J-REX is only 30% to 50% of the original J-REXs. This paper documents our experience with the migration, and highlights the benefits and challenges of the MapReduce framework in the MSR community.

international conference on software engineering | 2014

Detecting performance anti-patterns for applications developed using object-relational mapping

Tse-Hsun Chen; Weiyi Shang; Zhen Ming Jiang; Ahmed E. Hassan; Mohamed N. Nasser; Parminder Flora

Object-Relational Mapping (ORM) provides developers a conceptual abstraction for mapping the application code to the underlying databases. ORM is widely used in industry due to its convenience; permitting developers to focus on developing the business logic without worrying too much about the database access details. However, developers often write ORM code without considering the impact of such code on database performance, leading to cause transactions with timeouts or hangs in large-scale systems. Unfortunately, there is little support to help developers automatically detect suboptimal database accesses. In this paper, we propose an automated framework to detect ORM performance anti-patterns. Our framework automatically flags performance anti-patterns in the source code. Furthermore, as there could be hundreds or even thousands of instances of anti-patterns, our framework provides sup- port to prioritize performance bug fixes based on a statistically rigorous performance assessment. We have successfully evaluated our framework on two open source and one large-scale industrial systems. Our case studies show that our framework can detect new and known real-world performance bugs and that fixing the detected performance anti- patterns can improve the system response time by up to 98%.

Journal of Software: Evolution and Process | 2014

An exploratory study of the evolution of communicated information about the execution of large software systems

Weiyi Shang; Zhen Ming Jiang; Bram Adams; Ahmed E. Hassan; Michael W. Godfrey; Mohamed N. Nasser; Parminder Flora

A great deal of research in software engineering focuses on understanding the dynamic nature of software systems. Such research makes use of automated instrumentation and profiling techniques after fact, i.e., without considering domain knowledge. In this paper, we turn our attention to another source of dynamic information, i.e., the Communicated Information (CI) about the execution of a software system. Major examples of CI are execution logs and system events. They are generated from statements that are inserted intentionally by domain experts (e.g., developers or administrators) to convey crucial points of interest. The accessibility and domain-driven nature of the CI make it a valuable source for studying the evolution of a software system. In a case study on one large open source and one industrial software system, we explore the concept of CI and its evolution by mining the execution logs of these systems. Our study illustrates the need for better trace ability techniques between CI and the Log Processing Apps that analyze the CI. In particular, we find that the CI changes at a rather high rate across versions, leading to fragile Log Processing Apps. 40% to 60% of these changes can be avoided and the impact of 15% to 50% of the changes can be controlled through the use of the robust analysis techniques by Log Processing Apps. We also find that Log Processing Apps that track implementation-level CI (e.g., performance analysis) are more fragile than Log Processing Apps that track domain-level CI (e.g., workload modeling), because the implementation-level CI is often short-lived.

automated software engineering | 2010

An experience report on scaling tools for mining software repositories using MapReduce

Weiyi Shang; Bram Adams; Ahmed E. Hassan

The need for automated software engineering tools and techniques continues to grow as the size and complexity of studied systems and analysis techniques increase. Software engineering researchers often scale their analysis techniques using specialized one-off solutions, expensive infrastructures, or heuristic techniques (e.g., search-based approaches). However, such efforts are not reusable and are often costly to maintain. The need for scalable analysis is very prominent in the Mining Software Repositories (MSR) field, which specializes in the automated recovery and analysis of large data stored in software repositories. In this paper, we explore the scaling of automated software engineering analysis techniques by reusing scalable analysis platforms from the web field. We use three representative case studies from the MSR field to analyze the potential of the MapReduce platform to scale MSR tools with minimal effort. We document our experience such that other researchers could benefit from them. We find that many of the web fields guidelines for using the MapReduce platform need to be modified to better fit the characteristics of software engineering problems.

Science of Computer Programming | 2012

An empirical study on inconsistent changes to code clones at the release level

Nicolas Bettenburg; Weiyi Shang; Walid M. Ibrahim; Bram Adams; Ying Zou; Ahmed E. Hassan

To study the impact of code clones on software quality, researchers typically carry out their studies based on fine-grained analysis of inconsistent changes at the revision level. As a result, they capture much of the chaotic and experimental nature inherent in any on-going software development process. Analyzing highly fluctuating and short-lived clones is likely to exaggerate the ill effects of inconsistent changes on the quality of the released software product, as perceived by the end user. To gain a broader perspective, we perform an empirical study on the effect of inconsistent changes on software quality at the release level. Based on a case study on three open source software systems, we observe that only 1.02%-4.00% of all clone genealogies introduce software defects at the release level, as opposed to the substantially higher percentages reported by previous studies at the revision level. Our findings suggest that clones do not have a significant impact on the post-release quality of the studied systems, and that the developers are able to effectively manage the evolution of cloned code.

Journal of Systems and Software | 2012

Using Pig as a data preparation language for large-scale mining software repositories studies

Weiyi Shang; Bram Adams; Ahmed E. Hassan

Highlights? We evaluate Pigs ability to prepare data in a modular way by performing three large-scale MSR studies in detail. Our implementation can be reused by other MSR researchers. ? We compare the use of Pig and Hadoop for preparing data for MSR studies. ? We report the lessons learnt with Pig in order to assist other researchers who want to use Pig as a data preparation language in their MSR studies. The Mining Software Repositories (MSR) field analyzes software repository data to uncover knowledge and assist development of ever growing, complex systems. However, existing approaches and platforms for MSR analysis face many challenges when performing large-scale MSR studies. Such approaches and platforms rarely scale easily out of the box. Instead, they often require custom scaling tricks and designs that are costly to maintain and that are not reusable for other types of analysis. We believe that the web community has faced many of these software engineering scaling challenges before, as web analyses have to cope with the enormous growth of web data. In this paper, we report on our experience in using a web-scale platform (i.e., Pig) as a data preparation language to aid large-scale MSR studies. Through three case studies, we carefully validate the use of this web platform to prepare (i.e., Extract, Transform, and Load, ETL) data for further analysis. Despite several limitations, we still encourage MSR researchers to leverage Pig in their large-scale studies because of Pigs scalability and flexibility. Our experience report will help other researchers who want to scale their analyses.

working conference on reverse engineering | 2011

An Exploratory Study of the Evolution of Communicated Information about the Execution of Large Software Systems

Weiyi Shang; Zhen Ming Jiang; Bram Adams; Ahmed E. Hassan; Michael W. Godfrey; Mohamed N. Nasser; Parminder Flora

international conference on performance engineering | 2015

Automated Detection of Performance Regressions Using Regression Models on Clustered Performance Counters

Weiyi Shang; Ahmed E. Hassan; Mohamed N. Nasser; Parminder Flora

Performance testing is conducted before deploying system updates in order to ensure that the performance of large software systems did not degrade (i.e., no performance regressions). During such testing, thousands of performance counters are collected. However, comparing thousands of performance counters across versions of a software system is very time consuming and error-prone. In an effort to automate such analysis, model-based performance regression detection approaches build a limited number (i.e., one or two) of models for a limited number of target performance counters (e.g., CPU or memory) and leverage the models to detect performance regressions. Such model-based approaches still have their limitations since selecting the target performance counters is often based on experience or gut feeling. In this paper, we propose an automated approach to detect performance regressions by analyzing all collected counters instead of focusing on a limited number of target counters. We first group performance counters into clusters to determine the number of performance counters needed to truly represent the performance of a system. We then perform statistical tests to select the target performance counters, for which we build regression models. We apply the regression models on new version of the system to detect performance regressions. We perform two case studies on two large systems: one open-source system and one enterprise system. The results of our case studies show that our approach can group a large number of performance counters into a small number of clusters. Our approach can successfully detect both injected and real-life performance regressions in the case studies. In addition, our case studies show that our approach outperforms traditional approaches for analyzing performance counters. Our approach has been adopted in industrial settings to detect performance regressions on a daily basis.

Explore More