Rahul Gopinath | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rahul Gopinath is active.

Explore More

Publication

Featured researches published by Rahul Gopinath.

international conference on software engineering | 2014

Code coverage for suite evaluation by developers

Rahul Gopinath; Carlos Jensen; Alex Groce

One of the key challenges of developers testing code is determining a test suites quality -- its ability to find faults. The most common approach is to use code coverage as a measure for test suite quality, and diminishing returns in coverage or high absolute coverage as a stopping rule. In testing research, suite quality is often evaluated by a suites ability to kill mutants (artificially seeded potential faults). Determining which criteria best predict mutation kills is critical to practical estimation of test suite quality. Previous work has only used small sets of programs, and usually compares multiple suites for a single program. Practitioners, however, seldom compare suites --- they evaluate one suite. Using suites (both manual and automatically generated) from a large set of real-world open-source projects shows that evaluation results differ from those for suite-comparison: statement (not block, branch, or path) coverage predicts mutation kills best.

international symposium on software reliability engineering | 2014

Mutations: How Close are they to Real Faults?

Rahul Gopinath; Carlos Jensen; Alex Groce

Mutation analysis is often used to compare the effectiveness of different test suites or testing techniques. One of the main assumptions underlying this technique is the Competent Programmer Hypothesis, which proposes that programs are very close to a correct version, or that the difference between current and correct code for each fault is very small. Researchers have assumed on the basis of the Competent Programmer Hypothesis that the faults produced by mutation analysis are similar to real faults. While there exists some evidence that supports this assumption, these studies are based on analysis of a limited and potentially non-representative set of programs and are hence not conclusive. In this paper, we separately investigate the characteristics of bug-fixes and other changes in a very large set of randomly selected projects using four different programming languages. Our analysis suggests that a typical fault involves about three to four tokens, and is seldom equivalent to any traditional mutation operator. We also find the most frequently occurring syntactical patterns, and identify the factors that affect the real bug-fix change distribution. Our analysis suggests that different languages have different distributions, which in turn suggests that operators optimal in one language may not be optimal for others. Moreover, our results suggest that mutation analysis stands in need of better empirical support of the connection between mutant detection and detection of actual program faults in a larger body of real programs.

international conference on software engineering | 2016

On the limits of mutation reduction strategies

Rahul Gopinath; Mohammad Amin Alipour; Iftekhar Ahmed; Carlos Jensen; Alex Groce

Although mutation analysis is considered the best way to evaluate the effectiveness of a test suite, hefty computational cost often limits its use. To address this problem, various mutation reduction strategies have been proposed, all seeking to reduce the number of mutants while maintaining the representativeness of an exhaustive mutation analysis. While research has focused on the reduction achieved, the effectiveness of these strategies in selecting representative mutants, and the limits in doing so have not been investigated, either theoretically or empirically. We investigate the practical limits to the effectiveness of mutation reduction strategies, and provide a simple theoretical framework for thinking about the absolute limits. Our results show that the limit in improvement of effectiveness over random sampling for real-world open source programs is a mean of only 13.078%. Interestingly, there is no limit to the improvement that can be made by addition of new mutation operators. Given that this is the maximum that can be achieved with perfect advance knowledge of mutation kills, what can be practically achieved may be much worse. We conclude that more effort should be focused on enhancing mutations than removing operators in the name of selective mutation for questionable benefit.

international symposium on software testing and analysis | 2014

MuCheck: an extensible tool for mutation testing of haskell programs

Duc Le; Mohammad Amin Alipour; Rahul Gopinath; Alex Groce

This paper presents MuCheck, a mutation testing tool for Haskell programs. MuCheck is a counterpart to the widely used QuickCheck random testing tool for functional programs, and can be used to evaluate the efficacy of QuickCheck property definitions. The tool implements mutation operators that are specifically designed for functional programs, and makes use of the type system of Haskell to achieve a more relevant set of mutants than otherwise possible. Mutation coverage is particularly valuable for functional programs due to highly compact code, referential transparency, and clean semantics; these make augmenting a test suite or specification based on surviving mutants a practical method for improved testing.

international symposium on software reliability engineering | 2015

How hard does mutation analysis have to be, anyway?

Rahul Gopinath; Amin Alipour; Iftekhar Ahmed; Carlos Jensen; Alex Groce

Mutation analysis is considered the best method for measuring the adequacy of test suites. However, the number of test runs required for a full mutation analysis grows faster than project size, which is not feasible for real-world software projects, which often have more than a million lines of code. It is for projects of this size, however, that developers most need a method for evaluating the efficacy of a test suite. Various strategies have been proposed to deal with the explosion of mutants. However, these strategies at best reduce the number of mutants required to a fraction of overall mutants, which still grows with program size. Running, e.g., 5% of all mutants of a 2MLOC program usually requires analyzing over 100,000 mutants. Similarly, while various approaches have been proposed to tackle equivalent mutants, none completely eliminate the problem, and the fraction of equivalent mutants remaining is hard to estimate, often requiring manual analysis of equivalence. In this paper, we provide both theoretical analysis and empirical evidence that a small constant sample of mutants yields statistically similar results to running a full mutation analysis, regardless of the size of the program or similarity between mutants. We show that a similar approach, using a constant sample of inputs can estimate the degree of stubbornness in mutants remaining to a high degree of statistical confidence, and provide a mutation analysis framework for Python that incorporates the analysis of stubbornness of mutants.

empirical software engineering and measurement | 2015

An Empirical Study of Design Degradation: How Software Projects Get Worse over Time

Iftekhar Ahmed; Umme Ayda Mannan; Rahul Gopinath; Carlos Jensen

Context: Software decay is a key concern for large, long-lived software projects. Systems degrade over time as design and implementation compromises and exceptions pile up. Goal: Quantify design decay and understand how software projects deal with this issue. Method: We conducted an empirical study on the presence and evolution of code smells, used as an indicator of design degradation in 220 open source projects. Results: The best approach to maintain the quality of a project is to spend time reducing both software defects (bugs) and design issues (refactoring). We found that design issues are frequently ignored in favor of fixing defects. We also found that design issues have a higher chance of being fixed in the early stages of a project, and that efforts to correct these stall as projects mature and the code base grows, leading to a build-up of problems. Conclusions: From studying a large set of open source projects, our research suggests that while core contributors tend to fix design issues more often than non-core contributors, there is no difference once the relative quantity of commits is accounted for. We also show that design issues tend to build up over time.

sigplan symposium on new ideas new paradigms and reflections on programming and software | 2014

Coverage and Its Discontents

Alex Groce; Mohammad Amin Alipour; Rahul Gopinath

Everyone wants to know one thing about a test suite: will it detect enough bugs? Unfortunately, in most settings that matter, answering this question directly is impractical or impossible. Software engineers and researchers therefore tend to rely on various measures of code coverage (where mutation testing is considered a form of syntactic coverage). A long line of academic research efforts have attempted to determine whether relying on coverage as a substitute for fault detection is a reasonable solution to the problems of test suite evaluation. This essay argues that the profusion of coverage-related literature is in part a sign of an underlying uncertainty as to what exactly it is that measuring coverage should achieve, as well as how we would know if it can, in fact, achieve it. We propose some solutions and mitigations, but the primary focus of this essay is to clarify the state of current confusions regarding this key problem for effective software testing.

Software Quality Journal | 2017

Does choice of mutation tool matter

Rahul Gopinath; Iftekhar Ahmed; Mohammad Amin Alipour; Carlos Jensen; Alex Groce

Though mutation analysis is the primary means of evaluating the quality of test suites, it suffers from inadequate standardization. Mutation analysis tools vary based on language, when mutants are generated (phase of compilation), and target audience. Mutation tools rarely implement the complete set of operators proposed in the literature and mostly implement at least a few domain-specific mutation operators. Thus different tools may not always agree on the mutant kills of a test suite. Few criteria exist to guide a practitioner in choosing the right tool for either evaluating effectiveness of a test suite or for comparing different testing techniques. We investigate an ensemble of measures for evaluating efficacy of mutants produced by different tools. These include the traditional difficulty of detection, strength of minimal sets, and the diversity of mutants, as well as the information carried by the mutants produced. We find that mutation tools rarely agree. The disagreement between scores can be large, and the variation due to characteristics of the project—even after accounting for difference due to test suites—is a significant factor. However, the mean difference between tools is very small, indicating that no single tool consistently skews mutation scores high or low for all projects. These results suggest that experiments yielding small differences in mutation score, especially using a single tool, or a small number of projects may not be reliable. There is a clear need for greater standardization of mutation analysis. We propose one approach for such a standardization.

international conference on software testing verification and validation workshops | 2016

Measuring Effectiveness of Mutant Sets

Rahul Gopinath; Amin Alipour; Iftekhar Ahmed; Carlos Jensen; Alex Groce

Redundancy in mutants, where multiple mutants end up producing the same semantic variant of a program, is a major problem in mutation analysis. Hence, a measure of effectiveness that accounts for redundancy is an essential tool for evaluating mutation tools, new operators, and reduction techniques. Previous research suggests using the size of the disjoint mutant set as an effectiveness measure. We start from a simple premise: test suites need to be judged on both the number of unique variations in specifications they detect (as a variation measure), and also on how good they are at detecting hard-to-find faults (as a measure of thoroughness). Hence, any set of mutants should be judged by how well it supports these measurements. We show that the disjoint mutant set has two major inadequacies - the single variant assumption and the large test suite assumption - when used as a measure of effectiveness in variation. These stem from its reliance on minimal test suites. We show that when used to emulate hard to find bugs (as a measure of thoroughness), disjoint mutant set discards useful mutants. We propose two alternatives: one measures variation and is not vulnerable to either the single variant assumption or the large test suite assumption, the other measures thoroughness. We provide a benchmark of these measures using diverse tools.

automated software engineering | 2016

Evaluating non-adequate test-case reduction

Mohammad Amin Alipour; August Shi; Rahul Gopinath; Darko Marinov; Alex Groce

Given two test cases, one larger and one smaller, the smaller test case is preferred for many purposes. A smaller test case usually runs faster, is easier to understand, and is more convenient for debugging. However, smaller test cases also tend to cover less code and detect fewer faults than larger test cases. Whereas traditional research focused on reducing test suites while preserving code coverage, recent work has introduced the idea of reducing individual test cases, rather than test suites, while still preserving code coverage. Other recent work has proposed non-adequately reducing test suites by not even preserving all the code coverage. This paper empirically evaluates a new combination of these two ideas, non-adequate reduction of test cases, which allows for a wide range of trade-offs between test case size and fault detection. Our study introduces and evaluates C%-coverage reduction (where a test case is reduced to retain at least C% of its original coverage) and N-mutant reduction (where a test case is reduced to kill at least N of the mutants it originally killed). We evaluate the reduction trade-offs with varying values of C% and N for four real-world C projects: Mozillas SpiderMonkey JavaScript engine, the YAFFS2 flash file system, Grep, and Gzip. The results show that it is possible to greatly reduce the size of many test cases while still preserving much of their fault-detection capability.

Explore More