Iftekhar Ahmed | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Iftekhar Ahmed is active.

Explore More

Publication

Featured researches published by Iftekhar Ahmed.

international conference on software engineering | 2016

On the limits of mutation reduction strategies

Rahul Gopinath; Mohammad Amin Alipour; Iftekhar Ahmed; Carlos Jensen; Alex Groce

Although mutation analysis is considered the best way to evaluate the effectiveness of a test suite, hefty computational cost often limits its use. To address this problem, various mutation reduction strategies have been proposed, all seeking to reduce the number of mutants while maintaining the representativeness of an exhaustive mutation analysis. While research has focused on the reduction achieved, the effectiveness of these strategies in selecting representative mutants, and the limits in doing so have not been investigated, either theoretically or empirically. We investigate the practical limits to the effectiveness of mutation reduction strategies, and provide a simple theoretical framework for thinking about the absolute limits. Our results show that the limit in improvement of effectiveness over random sampling for real-world open source programs is a mean of only 13.078%. Interestingly, there is no limit to the improvement that can be made by addition of new mutation operators. Given that this is the maximum that can be achieved with perfect advance knowledge of mutation kills, what can be practically achieved may be much worse. We conclude that more effort should be focused on enhancing mutations than removing operators in the name of selective mutation for questionable benefit.

international symposium on software reliability engineering | 2015

How hard does mutation analysis have to be, anyway?

Rahul Gopinath; Amin Alipour; Iftekhar Ahmed; Carlos Jensen; Alex Groce

Mutation analysis is considered the best method for measuring the adequacy of test suites. However, the number of test runs required for a full mutation analysis grows faster than project size, which is not feasible for real-world software projects, which often have more than a million lines of code. It is for projects of this size, however, that developers most need a method for evaluating the efficacy of a test suite. Various strategies have been proposed to deal with the explosion of mutants. However, these strategies at best reduce the number of mutants required to a fraction of overall mutants, which still grows with program size. Running, e.g., 5% of all mutants of a 2MLOC program usually requires analyzing over 100,000 mutants. Similarly, while various approaches have been proposed to tackle equivalent mutants, none completely eliminate the problem, and the fraction of equivalent mutants remaining is hard to estimate, often requiring manual analysis of equivalence. In this paper, we provide both theoretical analysis and empirical evidence that a small constant sample of mutants yields statistically similar results to running a full mutation analysis, regardless of the size of the program or similarity between mutants. We show that a similar approach, using a constant sample of inputs can estimate the degree of stubbornness in mutants remaining to a high degree of statistical confidence, and provide a mutation analysis framework for Python that incorporates the analysis of stubbornness of mutants.

empirical software engineering and measurement | 2015

An Empirical Study of Design Degradation: How Software Projects Get Worse over Time

Iftekhar Ahmed; Umme Ayda Mannan; Rahul Gopinath; Carlos Jensen

Context: Software decay is a key concern for large, long-lived software projects. Systems degrade over time as design and implementation compromises and exceptions pile up. Goal: Quantify design decay and understand how software projects deal with this issue. Method: We conducted an empirical study on the presence and evolution of code smells, used as an indicator of design degradation in 220 open source projects. Results: The best approach to maintain the quality of a project is to spend time reducing both software defects (bugs) and design issues (refactoring). We found that design issues are frequently ignored in favor of fixing defects. We also found that design issues have a higher chance of being fixed in the early stages of a project, and that efforts to correct these stall as projects mature and the code base grows, leading to a build-up of problems. Conclusions: From studying a large set of open source projects, our research suggests that while core contributors tend to fix design issues more often than non-core contributors, there is no difference once the relative quantity of commits is accounted for. We also show that design issues tend to build up over time.

Software Quality Journal | 2017

Does choice of mutation tool matter

Rahul Gopinath; Iftekhar Ahmed; Mohammad Amin Alipour; Carlos Jensen; Alex Groce

Though mutation analysis is the primary means of evaluating the quality of test suites, it suffers from inadequate standardization. Mutation analysis tools vary based on language, when mutants are generated (phase of compilation), and target audience. Mutation tools rarely implement the complete set of operators proposed in the literature and mostly implement at least a few domain-specific mutation operators. Thus different tools may not always agree on the mutant kills of a test suite. Few criteria exist to guide a practitioner in choosing the right tool for either evaluating effectiveness of a test suite or for comparing different testing techniques. We investigate an ensemble of measures for evaluating efficacy of mutants produced by different tools. These include the traditional difficulty of detection, strength of minimal sets, and the diversity of mutants, as well as the information carried by the mutants produced. We find that mutation tools rarely agree. The disagreement between scores can be large, and the variation due to characteristics of the project—even after accounting for difference due to test suites—is a significant factor. However, the mean difference between tools is very small, indicating that no single tool consistently skews mutation scores high or low for all projects. These results suggest that experiments yielding small differences in mutation score, especially using a single tool, or a small number of projects may not be reliable. There is a clear need for greater standardization of mutation analysis. We propose one approach for such a standardization.

international conference on software testing verification and validation workshops | 2016

Measuring Effectiveness of Mutant Sets

Rahul Gopinath; Amin Alipour; Iftekhar Ahmed; Carlos Jensen; Alex Groce

Redundancy in mutants, where multiple mutants end up producing the same semantic variant of a program, is a major problem in mutation analysis. Hence, a measure of effectiveness that accounts for redundancy is an essential tool for evaluating mutation tools, new operators, and reduction techniques. Previous research suggests using the size of the disjoint mutant set as an effectiveness measure. We start from a simple premise: test suites need to be judged on both the number of unique variations in specifications they detect (as a variation measure), and also on how good they are at detecting hard-to-find faults (as a measure of thoroughness). Hence, any set of mutants should be judged by how well it supports these measurements. We show that the disjoint mutant set has two major inadequacies - the single variant assumption and the large test suite assumption - when used as a measure of effectiveness in variation. These stem from its reliance on minimal test suites. We show that when used to emulate hard to find bugs (as a measure of thoroughness), disjoint mutant set discards useful mutants. We propose two alternatives: one measures variation and is not vulnerable to either the single variant assumption or the large test suite assumption, the other measures thoroughness. We provide a benchmark of these measures using diverse tools.

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems (MOBILESoft) | 2016

Understanding code smells in Android applications

Umme Ayda Mannan; Iftekhar Ahmed; Rana Abdullah M. Almurshed; Danny Dig; Carlos Jensen

Code smells are associated with poor coding practices that cause long-term maintainability problems and mask bugs. Despite mobile being a fast growing software sector, code smells in mobile applications have been understudied. We do not know how code smells in mobile applications compare to those in desktop applications, and how code smells are affecting the design of mobile applications. Without such knowledge, application developers, tool builders, and researchers cannot improve the practice and state of the art of mobile development.We first reviewed the literature on code smells in Android applications and found that there is a significant gap between the most studied code smells in literature and most frequently occurring code smells in real world applications. Inspired by this finding, we conducted a large scale empirical study to compare the type, density, and distribution of code smells in mobile vs. desktop applications. We analyze an open-source corpus of 500 Android applications (total of 6.7M LOC) and 750 desktop Java applications (total of 16M LOC), and compare 14,553 instances of code smells in Android applications to 117,557 instances of code smells in desktop applications. We find that, despite mobile applications having different structure and workflow than desktop applications, the variety and density of code smells is similar. However, the distribution of code smells is different – some code smells occur more frequently in mobile applications. We also found that different categories of Android applications have different code smell distributions. We highlight several implications of our study for application developers, tool builders, and researchers.

open source systems | 2014

An Exploration of Code Quality in FOSS Projects

Iftekhar Ahmed; Soroush Ghorashi; Carlos Jensen

It is a widely held belief that Free/Open Source Software (FOSS) development leads to the creation of software with the same, if not higher quality compared to that created using proprietary software development models. However there is little research on evaluating the quality of FOSS code, and the impact of project characteristics such as age, number of core developers, code-base size, etc. In this exploratory study, we examined 110 FOSS projects, measuring the quality of the code and architectural design using code smells. We found that, contrary to our expectations, the overall quality of the code is not affected by the size of the code base, but that it was negatively impacted by the growth of the number of code contributors. Our results also show that projects with more core developers don’t necessarily have better code quality.

Proceedings of The International Symposium on Open Collaboration | 2014

The Impact of Automatic Crash Reports on Bug Triaging and Development in Mozilla

Iftekhar Ahmed; Nitin Mohan; Carlos Jensen

Free/Open Source Software projects often rely on users submitting bug reports. However, reports submitted by novice users may lack information critical to developers, and the process may be intimidating and difficult. To gather more and better data, projects deploy automatic crash reporting tools, which capture stack traces and memory dumps when a crash occurs. These systems potentially generate large volumes of data, which may overwhelm developers, and their presence may discourage users from submitting traditional bug reports. In this paper, we examine Mozillas automatic crash reporting system and how it affects their bug triaging process. We find that fewer than 0.00009% of crash reports end up in a bug report, but as many as 2.33% of bug reports have data from crash reports added. Feedback from developers shows that despite some problems, these systems are valuable. We conclude with a discussion of the pros and cons of automatic crash reporting systems.

international conference on software testing verification and validation workshops | 2017

Applying Mutation Analysis on Kernel Test Suites: An Experience Report

Iftekhar Ahmed; Carlos Jensen; Alex Groce; Paul E. McKenney

Mutation analysis is an established technique for measuring the completeness and quality of a test suite. Despite four decades of research on this technique, its use in large systems is still rare, in part due to computational requirements and high numbers of false positives. We present our experiences using mutation analysis on the Linux kernels RCU (Read Copy Update) module, where we adapt existing techniques to constrain the complexity and computation requirements. We show that mutation analysis can be a useful tool, uncovering gaps in even well-tested modules like RCU. This experiment has so far led to the identification of 3 gaps in the RCU test harness, and 2 bugs in the RCU module masked by those gaps. We argue that mutation testing can and should be more extensively used in practice.

automated software engineering | 2018

How verified (or tested) is my code? Falsification-driven verification and testing

Alex Groce; Iftekhar Ahmed; Carlos Jensen; Paul E. McKenney; Josie Holmes

Formal verification has advanced to the point that developers can verify the correctness of small, critical modules. Unfortunately, despite considerable efforts, determining if a “verification” verifies what the author intends is still difficult. Previous approaches are difficult to understand and often limited in applicability. Developers need verification coverage in terms of the software they are verifying, not model checking diagnostics. We propose a methodology to allow developers to determine (and correct) what it is that they have verified, and tools to support that methodology. Our basic approach is based on a novel variation of mutation analysis and the idea of verification driven by falsification. We use the CBMC model checker to show that this approach is applicable not only to simple data structures and sorting routines, and verification of a routine in Mozilla’s JavaScript engine, but to understanding an ongoing effort to verify the Linux kernel read-copy-update mechanism. Moreover, we show that despite the probabilistic nature of random testing and the tendency to incompleteness of testing as opposed to verification, the same techniques, with suitable modifications, apply to automated test generation as well as to formal verification. In essence, it is the number of surviving mutants that drives the scalability of our methods, not the underlying method for detecting faults in a program. From the point of view of a Popperian analysis where an unkilled mutant is a weakness (in terms of its falsifiability) in a “scientific theory” of program behavior, it is only the number of weaknesses to be examined by a user that is important.

Explore More