Mohammad Amin Alipour

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mohammad Amin Alipour is active.

Explore More

Publication

Featured researches published by Mohammad Amin Alipour.

international symposium on software testing and analysis | 2013

Comparing non-adequate test suites using coverage criteria

Milos Gligoric; Alex Groce; Chaoqiang Zhang; Rohan Sharma; Mohammad Amin Alipour; Darko Marinov

A fundamental question in software testing research is how to compare test suites, often as a means for comparing test-generation techniques. Researchers frequently compare test suites by measuring their coverage. A coverage criterion C provides a set of test requirements and measures how many requirements a given suite satisfies. A suite that satisfies 100% of the (feasible) requirements is C-adequate. Previous rigorous evaluations of coverage criteria mostly focused on such adequate test suites: given criteria C and C′, are C-adequate suites (on average) more effective than C′-adequate suites? However, in many realistic cases producing adequate suites is impractical or even impossible. We present the first extensive study that evaluates coverage criteria for the common case of non-adequate test suites: given criteria C and C′, which one is better to use to compare test suites? Namely, if suites T1, T2 . . . Tn have coverage values c1, c2 . . . cn for C and c′1, c′2 . . . c′n for C′, is it better to compare suites based on c1, c2 . . . cn or based on c′1, c′ 2 . . . c′n? We evaluate a large set of plausible criteria, including statement and branch coverage, as well as stronger criteria used in recent studies. Two criteria perform best: branch coverage and an intra-procedural acyclic path coverage.

Software Testing, Verification & Reliability | 2016

Cause reduction: delta debugging, even without bugs

Alex Groce; Mohammad Amin Alipour; Chaoqiang Zhang; Yang Chen; John Regehr

What is a test case for? Sometimes, to expose a fault. Tests can also exercise code, use memory or time, or produce desired output. Given a desired effect, a test case can be seen as a cause, and its components divided into essential (required for effect) and accidental. Delta debugging is used for removing accidents from failing test cases, producing smaller test cases that are easier to understand. This paper extends delta debugging by simplifying test cases with respect to arbitrary effects, a generalization called cause reduction. Suites produced by cause reduction provide effective quick tests for real‐world programs. For Mozillas JavaScript engine, the reduced suite is possibly more effective for finding faults. The effectiveness of reduction‐based suites persists through changes to the software, improving coverage by over 500 branches for versions up to 4 months later. Cause reduction has other applications, including improving seeded symbolic execution, where using reduced tests can often double the number of additional branches explored. Copyright

international conference on software engineering | 2016

On the limits of mutation reduction strategies

Rahul Gopinath; Mohammad Amin Alipour; Iftekhar Ahmed; Carlos Jensen; Alex Groce

Although mutation analysis is considered the best way to evaluate the effectiveness of a test suite, hefty computational cost often limits its use. To address this problem, various mutation reduction strategies have been proposed, all seeking to reduce the number of mutants while maintaining the representativeness of an exhaustive mutation analysis. While research has focused on the reduction achieved, the effectiveness of these strategies in selecting representative mutants, and the limits in doing so have not been investigated, either theoretically or empirically. We investigate the practical limits to the effectiveness of mutation reduction strategies, and provide a simple theoretical framework for thinking about the absolute limits. Our results show that the limit in improvement of effectiveness over random sampling for real-world open source programs is a mean of only 13.078%. Interestingly, there is no limit to the improvement that can be made by addition of new mutation operators. Given that this is the maximum that can be achieved with perfect advance knowledge of mutation kills, what can be practically achieved may be much worse. We conclude that more effort should be focused on enhancing mutations than removing operators in the name of selective mutation for questionable benefit.

ACM Transactions on Software Engineering and Methodology | 2015

Guidelines for Coverage-Based Comparisons of Non-Adequate Test Suites

Milos Gligoric; Alex Groce; Chaoqiang Zhang; Rohan Sharma; Mohammad Amin Alipour; Darko Marinov

A fundamental question in software testing research is how to compare test suites, often as a means for comparing test-generation techniques that produce those test suites. Researchers frequently compare test suites by measuring their coverage. A coverage criterion C provides a set of test requirements and measures how many requirements a given suite satisfies. A suite that satisfies 100% of the feasible requirements is called C-adequate. Previous rigorous evaluations of coverage criteria mostly focused on such adequate test suites: given two criteria C and C′, are C-adequate suites on average more effective than C′-adequate suites? However, in many realistic cases, producing adequate suites is impractical or even impossible. This article presents the first extensive study that evaluates coverage criteria for the common case of non-adequate test suites: given two criteria C and C′, which one is better to use to compare test suites? Namely, if suites T1, T2,…,Tn have coverage values c1, c2,…,cn for C and c1′, c2′,…,cn′ for C′, is it better to compare suites based on c1, c2,…,cn or based on c1′, c2′,…,cn′? We evaluate a large set of plausible criteria, including basic criteria such as statement and branch coverage, as well as stronger criteria used in recent studies, including criteria based on program paths, equivalence classes of covered statements, and predicate states. The criteria are evaluated on a set of Java and C programs with both manually written and automatically generated test suites. The evaluation uses three correlation measures. Based on these experiments, two criteria perform best: branch coverage and an intraprocedural acyclic path coverage. We provide guidelines for testing researchers aiming to evaluate test suites using coverage criteria as well as for other researchers evaluating coverage criteria for research use.

international symposium on software testing and analysis | 2014

Using test case reduction and prioritization to improve symbolic execution

Chaoqiang Zhang; Alex Groce; Mohammad Amin Alipour

Scaling symbolic execution to large programs or programs with complex inputs remains difficult due to path explosion and complex constraints, as well as external method calls. Additionally, creating an effective test structure with symbolic inputs can be difficult. A popular symbolic execution strategy in practice is to perform symbolic execution not “from scratch” but based on existing test cases. This paper proposes that the effectiveness of this approach to symbolic execution can be enhanced by (1) reducing the size of seed test cases and (2) prioritizing seed test cases to maximize exploration efficiency. The proposed test case reduction strategy is based on a recently introduced generalization of delta debugging, and our prioritization techniques include novel methods that, for this purpose, can outperform some traditional regression testing algorithms. We show that applying these methods can significantly improve the effectiveness of symbolic execution based on existing test cases.

international symposium on software testing and analysis | 2014

MuCheck: an extensible tool for mutation testing of haskell programs

Duc Le; Mohammad Amin Alipour; Rahul Gopinath; Alex Groce

This paper presents MuCheck, a mutation testing tool for Haskell programs. MuCheck is a counterpart to the widely used QuickCheck random testing tool for functional programs, and can be used to evaluate the efficacy of QuickCheck property definitions. The tool implements mutation operators that are specifically designed for functional programs, and makes use of the type system of Haskell to achieve a more relevant set of mutants than otherwise possible. Mutation coverage is particularly valuable for functional programs due to highly compact code, referential transparency, and clean semantics; these make augmenting a test suite or specification based on surviving mutants a practical method for improved testing.

sigplan symposium on new ideas new paradigms and reflections on programming and software | 2014

Coverage and Its Discontents

Alex Groce; Mohammad Amin Alipour; Rahul Gopinath

Everyone wants to know one thing about a test suite: will it detect enough bugs? Unfortunately, in most settings that matter, answering this question directly is impractical or impossible. Software engineers and researchers therefore tend to rely on various measures of code coverage (where mutation testing is considered a form of syntactic coverage). A long line of academic research efforts have attempted to determine whether relying on coverage as a substitute for fault detection is a reasonable solution to the problems of test suite evaluation. This essay argues that the profusion of coverage-related literature is in part a sign of an underlying uncertainty as to what exactly it is that measuring coverage should achieve, as well as how we would know if it can, in fact, achieve it. We propose some solutions and mitigations, but the primary focus of this essay is to clarify the state of current confusions regarding this key problem for effective software testing.

Software Quality Journal | 2017

Does choice of mutation tool matter

Rahul Gopinath; Iftekhar Ahmed; Mohammad Amin Alipour; Carlos Jensen; Alex Groce

Though mutation analysis is the primary means of evaluating the quality of test suites, it suffers from inadequate standardization. Mutation analysis tools vary based on language, when mutants are generated (phase of compilation), and target audience. Mutation tools rarely implement the complete set of operators proposed in the literature and mostly implement at least a few domain-specific mutation operators. Thus different tools may not always agree on the mutant kills of a test suite. Few criteria exist to guide a practitioner in choosing the right tool for either evaluating effectiveness of a test suite or for comparing different testing techniques. We investigate an ensemble of measures for evaluating efficacy of mutants produced by different tools. These include the traditional difficulty of detection, strength of minimal sets, and the diversity of mutants, as well as the information carried by the mutants produced. We find that mutation tools rarely agree. The disagreement between scores can be large, and the variation due to characteristics of the project—even after accounting for difference due to test suites—is a significant factor. However, the mean difference between tools is very small, indicating that no single tool consistently skews mutation scores high or low for all projects. These results suggest that experiments yielding small differences in mutation score, especially using a single tool, or a small number of projects may not be reliable. There is a clear need for greater standardization of mutation analysis. We propose one approach for such a standardization.

international workshop on dynamic analysis | 2012

Extended program invariants: applications in testing and fault localization

Mohammad Amin Alipour; Alex Groce

Invariants are powerful tools for program analysis and reasoning.Several tools and techniques have been developed to infer invariants of a program. Given a test suite for a program, an invariant detection tool (IDT) extracts (potential) invariants from the program execution on test cases of the test suite. The resultant invariants contain relations only over variables and constants that are visible to the IDT. IDTs are usually unable to extract invariants about execution features like taken branches, since programs usually do not have state variables for such features. Thus, the IDT has no information about such features in order to infer relations between them. We speculate that invariants about execution features are useful for understanding test suites; we call these invariants, extended invariants. In this paper, we discuss potential applications of extended invariants in understanding of test suites, and fault localization. We illustrate the usefulness of extended invariants with some small examples that use basic block count as the execution feature in extended invariants. We believe extended invariants provide useful information about execution of programs that can be utilized in program analysis and testing.

international symposium on software reliability engineering | 2013

Help, help, i'm being suppressed! The significance of suppressors in software testing

Alex Groce; Chaoqiang Zhang; Mohammad Amin Alipour; Eric Eide; Yang Chen; John Regehr

Test features are basic compositional units used to describe what a test does (and does not) involve. For example, in API-based testing, the most obvious features are function calls; in grammar-based testing, the obvious features are the elements of the grammar. The relationship between features as abstractions of tests and produced behaviors of the tested program is surprisingly poorly understood. This paper shows how large-scale random testing modified to use diverse feature sets can uncover causal relationships between what a test contains and what the program being tested does. We introduce a general notion of observable behaviors as targets, where a target can be a detected fault, an executed branch or statement, or a complex coverage entity such as a state, predicate-valuation, or program path. While it is obvious that targets have triggers - features without which they cannot be hit by a test - the notion of suppressors - features which make a test less likely to hit a target - has received little attention despite having important implications for automated test generation and program understanding. For a set of subjects including C compilers, a flash file system, and JavaScript engines, we show that suppression is both common and important.

Explore More