Tse-Hsun Chen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tse-Hsun Chen is active.

Explore More

Publication

Featured researches published by Tse-Hsun Chen.

international conference on software engineering | 2014

Detecting performance anti-patterns for applications developed using object-relational mapping

Tse-Hsun Chen; Weiyi Shang; Zhen Ming Jiang; Ahmed E. Hassan; Mohamed N. Nasser; Parminder Flora

Object-Relational Mapping (ORM) provides developers a conceptual abstraction for mapping the application code to the underlying databases. ORM is widely used in industry due to its convenience; permitting developers to focus on developing the business logic without worrying too much about the database access details. However, developers often write ORM code without considering the impact of such code on database performance, leading to cause transactions with timeouts or hangs in large-scale systems. Unfortunately, there is little support to help developers automatically detect suboptimal database accesses. In this paper, we propose an automated framework to detect ORM performance anti-patterns. Our framework automatically flags performance anti-patterns in the source code. Furthermore, as there could be hundreds or even thousands of instances of anti-patterns, our framework provides sup- port to prioritize performance bug fixes based on a statistically rigorous performance assessment. We have successfully evaluated our framework on two open source and one large-scale industrial systems. Our case studies show that our framework can detect new and known real-world performance bugs and that fixing the detected performance anti- patterns can improve the system response time by up to 98%.

mining software repositories | 2012

Explaining software defects using topic models

Tse-Hsun Chen; Stephen W. Thomas; Meiyappan Nagappan; Ahmed E. Hassan

Researchers have proposed various metrics based on measurable aspects of the source code entities (e.g., methods, classes, files, or modules) and the social structure of a software project in an effort to explain the relationships between software development and software defects. However, these metrics largely ignore the actual functionality, i.e., the conceptual concerns, of a software system, which are the main technical concepts that reflect the business logic or domain of the system. For instance, while lines of code may be a good general measure for defects, a large entity responsible for simple I/O tasks is likely to have fewer defects than a small entity responsible for complicated compiler implementation details. In this paper, we study the effect of conceptual concerns on code quality. We use a statistical topic modeling technique to approximate software concerns as topics; we then propose various metrics on these topics to help explain the defect-proneness (i.e., quality) of the entities. Paramount to our proposed metrics is that they take into account the defect history of each topic. Case studies on multiple versions of Mozilla Firefox, Eclipse, and Mylyn show that (i) some topics are much more defect-prone than others, (ii) defect-prone topics tend to remain so over time, and (iii) defect-prone topics provide additional explanatory power for code quality over existing structural and historical metrics.

Empirical Software Engineering | 2016

A survey on the use of topic models when mining software repositories

Tse-Hsun Chen; Stephen W. Thomas; Ahmed E. Hassan

Researchers in software engineering have attempted to improve software development by mining and analyzing software repositories. Since the majority of the software engineering data is unstructured, researchers have applied Information Retrieval (IR) techniques to help software development. The recent advances of IR, especially statistical topic models, have helped make sense of unstructured data in software repositories even more. However, even though there are hundreds of studies on applying topic models to software repositories, there is no study that shows how the models are used in the software engineering research community, and which software engineering tasks are being supported through topic models. Moreover, since the performance of these topic models is directly related to the model parameters and usage, knowing how researchers use the topic models may also help future studies make optimal use of such models. Thus, we surveyed 167 articles from the software engineering literature that make use of topic models. We find that i) most studies centre around a limited number of software engineering tasks; ii) most studies use only basic topic models; iii) and researchers usually treat topic models as black boxes without fully exploring their underlying assumptions and parameter values. Our paper provides a starting point for new researchers who are interested in using topic models, and may help new researchers and practitioners determine how to best apply topic models to a particular software engineering task.

mining software repositories | 2014

An empirical study of dormant bugs

Tse-Hsun Chen; Meiyappan Nagappan; Emad Shihab; Ahmed E. Hassan

Over the past decade, several research efforts have studied the quality of software systems by looking at post-release bugs. However, these studies do not account for bugs that remain dormant (i.e., introduced in a version of the software system, but are not found until much later) for years and across many versions. Such dormant bugs skew our under- standing of the software quality. In this paper we study dormant bugs against non-dormant bugs using data from 20 different open-source Apache foundation software systems. We find that 33% of the bugs introduced in a version are not reported till much later (i.e., they are reported in future versions as dormant bugs). Moreover, we find that 18.9% of the reported bugs in a version are not even introduced in that version (i.e., they are dormant bugs from prior versions). In short, the use of reported bugs to judge the quality of a specific version might be misleading. Exploring the fix process for dormant bugs, we find that they are fixed faster (median fix time of 5 days) than non- dormant bugs (median fix time of 8 days), and are fixed by more experienced developers (median commit counts of developers who fix dormant bug is 169% higher). Our results highlight that dormant bugs are different from non-dormant bugs in many perspectives and that future research in software quality should carefully study and consider dormant bugs.

foundations of software engineering | 2016

CacheOptimizer: helping developers configure caching frameworks for hibernate-based database-centric web applications

Tse-Hsun Chen; Weiyi Shang; Ahmed E. Hassan; Mohamed N. Nasser; Parminder Flora

To help improve the performance of database-centric cloud-based web applications, developers usually use caching frameworks to speed up database accesses. Such caching frameworks require extensive knowledge of the application to operate effectively. However, all too often developers have limited knowledge about the intricate details of their own application. Hence, most developers find configuring caching frameworks a challenging and time-consuming task that requires extensive and scattered code changes. Furthermore, developers may also need to frequently change such configurations to accommodate the ever changing workload. In this paper, we propose CacheOptimizer, a lightweight approach that helps developers optimize the configuration of caching frameworks for web applications that are implemented using Hibernate. CacheOptimizer leverages readily-available web logs to create mappings between a workload and database accesses. Given the mappings, CacheOptimizer discovers the optimal cache configuration using coloured Petri nets, and automatically adds the appropriate cache configurations to the application. We evaluate CacheOptimizer on three open-source web applications. We find that i) CacheOptimizer improves the throughput by 27--138%; and ii) after considering both the memory cost and throughput improvement, CacheOptimizer still brings statistically significant gains (with mostly large effect sizes) in comparison to the applications default cache configuration and to blindly enabling all possible caches.

IEEE Transactions on Software Engineering | 2016

Finding and Evaluating the Performance Impact of Redundant Data Access for Applications that are Developed Using Object-Relational Mapping Frameworks

Tse-Hsun Chen; Weiyi Shang; Zhen Ming Jiang; Ahmed E. Hassan; Mohamed N. Nasser; Parminder Flora

Developers usually leverage Object-Relational Mapping (ORM) to abstract complex database accesses for large-scale systems. However, since ORM frameworks operate at a lower-level (i.e., data access), ORM frameworks do not know how the data will be used when returned from database management systems (DBMSs). Therefore, ORM cannot provide an optimal data retrieval approach for all applications, which may result in accessing redundant data and significantly affect system performance. Although ORM frameworks provide ways to resolve redundant data problems, due to the complexity of modern systems, developers may not be able to locate such problems in the code; hence, may not proactively resolve the problems. In this paper, we propose an automated approach, which we implement as a Java framework, to locate redundant data problems. We apply our framework on one enterprise and two open source systems. We find that redundant data problems exist in 87 percent of the exercised transactions. Due to the large number of detected redundant data problems, we propose an automated approach to assess the impact and prioritize the resolution efforts. Our performance assessment result shows that by resolving the redundant data problems, the system response time for the studied systems can be improved by an average of 17 percent.

mining software repositories | 2016

Studying the effectiveness of application performance management (APM) tools for detecting performance regressions for web applications: an experience report

Tarek M. Ahmed; Cor-Paul Bezemer; Tse-Hsun Chen; Ahmed E. Hassan; Weiyi Shang

Performance regressions, such as a higher CPU utilization than in the previous version of an application, are caused by software application updates that negatively affect the performance of an application.Although a plethora of mining software repository research has been done to detect such regressions, research tools are generally not readily available to practitioners. Application Performance Management (APM) tools are commonly used in practice for detecting performance issues in the field by mining operational data.In contrast to performance regression detection tools that assume a changing code base and a stable workload, APM tools mine operational data to detect performance anomalies caused by a changing workload in an otherwise stable code base.Although APM tools are widely used in practice, no research has been done to understand 1) whether APM tools can identify performance regressions caused by code changes and 2) how well these APM tools support diagnosing the root-cause of these regressions.In this paper, we explore if the readily accessible APM tools can help practitioners detect performance regressions. We perform a case study using three commercial (AppDynamics, New Relic and Dynatrace) and one open source (Pinpoint) APM tools. In particular, we examine the effectiveness of leveraging these APM tools in detecting and diagnosing injected performance regressions (excessive memory usage, high CPU utilization and inefficient database queries) in three open source applications. We find that APM tools can detect most of the injected performance regressions, making them good candidates to detect performance regressions in practice. However, there is a gap between mining approaches that are proposed in state-of-the-art performance regression detection research and the ones used by APM tools. In addition, APM tools lack the ability to be extended, which makes it hard to enhance them when exploring novel mining approaches for detecting performance regressions.

Journal of Systems and Software | 2017

Topic-based software defect explanation

Tse-Hsun Chen; Weiyi Shang; Meiyappan Nagappan; Ahmed E. Hassan; Stephen W. Thomas

Some topics are more defect-prone than others.Defect-prone topics are likely to remain so over time.Our topic-based metrics provide additional defect explanatory to baseline metrics.Our metrics outperform state-of-the-art topic-based cohesion and coupling metrics. Researchers continue to propose metrics using measurable aspects of software systems to understand software quality. However, these metrics largely ignore the functionality, i.e., the conceptual concerns, of software systems. Such concerns are the technical concepts that reflect the systems business logic. For instance, while lines of code may be a good general measure for defects, a large file responsible for simple I/O tasks is likely to have fewer defects than a small file responsible for complicated compiler implementation details. In this paper, we study the effect of concerns on software quality. We use a statistical topic modeling approach to approximate software concerns as topics (related words in source code). We propose various metrics using these topics to help explain the file defect-proneness. Case studies on multiple versions of Firefox, Eclipse, Mylyn, and NetBeans show that (i) some topics are more defect-prone than others; (ii) defect-prone topics tend to remain so over time; (iii) our topic-based metrics provide additional explanatory power for software quality over existing structural and historical metrics; and (iv) our topic-based cohesion metric outperforms state-of-the-art topic-based cohesion and coupling metrics in terms of defect explanatory power, while being simpler to implement and more intuitive to interpret.

international conference on software engineering | 2016

Detecting problems in the database access code of large scale systems: an industrial experience report

Tse-Hsun Chen; Weiyi Shang; Ahmed E. Hassan; Mohamed N. Nasser; Parminder Flora

Database management systems (DBMSs) are one of the most important components in modern large-scale systems. Thus, it is important for developers to write code that can access DBMS correctly and efficiently. Since the behaviour of database access code can sometimes be a blackbox for developers, writing good test cases to capture problems in database access code can be very difficult. In addition to testing, static bug detection tools are often used to detect problems in the code. However, existing bug detection tools usually fail to detect functional and performance problems in the database access code. In this paper, we document our industrial experience over the past few years on finding bug patterns of database access code, implementing a bug detection tool, and integrating the tool into daily practice. We discuss the challenges that we encountered and the day-to-day lessons that we learned during integrating our tool into the development processes. Since most systems nowadays are leveraging frameworks, we also provide a detailed discussion of five framework-specific database access bug patterns that we found. We hope to encourage further research efforts on framework-specific detectors, instead of the current research focus on general programming language bug patterns and associated detectors.

Empirical Software Engineering | 2018

Understanding the factors for fast answers in technical Q&A websites: An empirical study of four stack exchange websites

Shaowei Wang; Tse-Hsun Chen; Ahmed E. Hassan

Technical questions and answers (Q&A) websites accumulate a significant amount of knowledge from users. Developers are especially active on these Q&A websites, since developers are constantly facing new development challenges that require help from other experts. Over the years, Q&A website designers have derived several incentive systems (e.g., gamification) to encourage users to answer questions that are posted by others. However, the current incentive systems primarily focus on the quantity and quality of the answers instead of encouraging the rapid answering of questions. Improving the speed of getting an answer can significantly improve the user experience and increase user engagement on such Q&A websites. In this paper, we explore how one may improve the current incentive systems to motivate fast answering of questions. We use a logistic regression model to analyze 46 factors along four dimensions (i.e., question, asker, answer, and answerer dimension) in order to understand the relationship between the studied factors and the needed time to get an accepted answer. We conduct our study on the four most popular (i.e., with the most questions) Q&A Stack Exchange websites: Stack Overflow, Mathematics, Ask Ubuntu, and Superuser. We find that i) factors in the answerer dimension have the strongest effect on the needed time to get an accepted answer, after controlling for other factors; ii) the current incentive system does not recognize non-frequent answerers who often answer questions which frequent answerers are not able to answer. Such questions that are answered by non-frequent answerers are as important (i.e., have similar range of scores) as those that are answered by frequent answerers; iii) the current incentive system motivates frequent answerers well, but such frequent answerers tend to answer short questions. Our findings suggest that Q&A website designers should improve their incentive systems to motivate non-frequent answerers to be more active and to answer questions fast, in order to shorten the waiting time to receive an answer (especially for questions that require specific knowledge that frequent answerers might not possess). In addition, the question answering incentive system needs to factor in the value and difficulty of answering the questions (e.g., providing more rewards to harder questions or questions that remain unanswered for a long period of time).

Explore More