Matthew Staats | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthew Staats is active.

Explore More

Publication

Featured researches published by Matthew Staats.

international conference on software engineering | 2011

Programs, tests, and oracles: the foundations of testing revisited

Matthew Staats; Michael W. Whalen; Mats Per Erik Heimdahl

In previous decades, researchers have explored the formal foundations of program testing. By exploring the foundations of testing largely separate from any specific method of testing, these researchers provided a general discussion of the testing process, including the goals, the underlying problems, and the limitations of testing. Unfortunately, a common, rigorous foundation has not been widely adopted in empirical software testing research, making it difficult to generalize and compare empirical research. We continue this foundational work, providing a framework intended to serve as a guide for future discussions and empirical studies concerning software testing. Specifically, we extend Gourlays functional description of testing with the notion of a test oracle, an aspect of testing largely overlooked in previous foundational work and only lightly explored in general. We argue additional work exploring the interrelationship between programs, tests, and oracles should be performed, and use our extension to clarify concepts presented in previous work, present new concepts related to test oracles, and demonstrate that oracle selection must be considered when discussing the efficacy of a testing process.

fundamental approaches to software engineering | 2012

On the danger of coverage directed test case generation

Matthew Staats; Michael W. Whalen; Mats Per Erik Heimdahl

In the avionics domain, the use of structural coverage criteria is legally required in determining test suite adequacy. With the success of automated test generation tools, it is tempting to use these criteria as the basis for test generation. To more firmly establish the effectiveness of such approaches, we have generated and evaluated test suites to satisfy two coverage criteria using counterexample-based test generation and a random generation approach, contrasted against purely random test suites of equal size. Our results yield two key conclusions. First, coverage criteria satisfaction alone is a poor indication of test suite effectiveness. Second, the use of structural coverage as a supplement--not a target--for test generation can have a positive impact. These observations points to the dangers inherent in the increase in test automation in critical systems and the need for more research in how coverage criteria, generation approach, and system structure jointly influence test effectiveness.

international conference on software engineering | 2012

Automated oracle creation support, or: how I learned to stop worrying about fault propagation and love mutation testing

Matthew Staats; Mats Per Erik Heimdahl

In testing, the test oracle is the artifact that determines whether an application under test executes correctly. The choice of test oracle can significantly impact the effectiveness of the testing process. However, despite the prevalence of tools that support the selection of test inputs, little work exists for supporting oracle creation. In this work, we propose a method of supporting test oracle creation. This method automatically selects the oracle data - the set of variables monitored during testing - for expected value test oracles. This approach is based on the use of mutation analysis to rank variables in terms of fault-finding effectiveness, thus automating the selection of the oracle data. Experiments over four industrial examples demonstrate that our method may be a cost-effective approach for producing small, effective oracle data, with fault finding improvements over current industrial best practice of up to 145.8% observed.

international conference on software engineering | 2013

Observable modified Condition/Decision coverage

Michael W. Whalen; Dongjiang You; Mats Per Erik Heimdahl; Matthew Staats

In many critical systems domains, test suite adequacy is currently measured using structural coverage metrics over the source code. Of particular interest is the modified condition/decision coverage (MC/DC) criterion required for, e.g., critical avionics systems. In previous investigations we have found that the efficacy of such test suites is highly dependent on the structure of the program under test and the choice of variables monitored by the oracle. MC/DC adequate tests would frequently exercise faulty code, but the effects of the faults would not propagate to the monitored oracle variables. In this report, we combine the MC/DC coverage metric with a notion of observability that helps ensure that the result of a fault encountered when covering a structural obligation propagates to a monitored variable; we term this new coverage criterion Observable MC/DC (OMC/DC). We hypothesize this path requirement will make structural coverage metrics 1.) more effective at revealing faults, 2.) more robust to changes in program structure, and 3.) more robust to the choice of variables monitored. We assess the efficacy and sensitivity to program structure of OMC/DC as compared to masking MC/DC using four subsystems from the civil avionics domain and the control logic of a microwave. We have found that test suites satisfying OMC/DC are significantly more effective than test suites satisfying MC/DC, revealing up to 88% more faults, and are less sensitive to program structure and the choice of monitored variables.

international symposium on software testing and analysis | 2012

Understanding user understanding: determining correctness of generated program invariants

Matthew Staats; Shin Hong; Moonzoo Kim; Gregg Rothermel

Recently, work has begun on automating the generation of test oracles, which are necessary to fully automate the testing process. One approach to such automation involves dynamic invariant generation which extracts invariants from program executions. To use such invariants as test oracles, however, it is necessary to distinguish correct from incorrect invariants, a process that currently requires human intervention. In this work we examine this process. In particular, we examine the ability of 30 users, across two empirical studies, to classify invariants generated from three Java programs. Our results indicate that users struggle to classify generated invariants: on average, they misclassify 9.1% to 31.7% of correct invariants and 26.1%-58.6% of incorrect invariants. These results contradict prior studies that suggest that classification by users is easy, and indicate that further work needs to be done to bridge the gap between the effectiveness of dynamic invariant generation in theory, and the ability of users to apply it in practice. Along these lines, we suggest several areas for future work.

formal methods | 2008

Partial Translation Verification for Untrusted Code-Generators

Matthew Staats; Mats Per Erik Heimdahl

Within the context of model-based development, the correctness of code generators for modeling notations such as Simulink and Stateflow is of obvious importance. If correctness of code generation can be shown, the extensive and often costly verification and validation activities conducted in the modeling domain could be effectively leveraged in the code domain. Unfortunately, most code generators in use today give no guarantees of correctness. In this paper, we investigate a method of leveraging existing model checking tools to verify the partial correctness of code generated by code generators that offer no guarantees of correctness. We explore the feasibility of this approach through a prototype tool that allows us to verify that Linear Temporal Logic (LTL) safety properties are preserved by C code generators for Simulink models. We find that the approach scales well, allowing us to verify that 55 LTL properties are maintained when generating 12,000+ lines of C code from a large Simulink model.

international symposium on software testing and analysis | 2014

Dodona: automated oracle data set selection

Pablo Loyola; Matthew Staats; In-Young Ko; Gregg Rothermel

Software complexity has increased the need for automated software testing. Most research on automating testing, however, has focused on creating test input data. While careful selection of input data is necessary to reach faulty states in a system under test, test oracles are needed to actually detect failures. In this work, we describe Dodona, a system that supports the generation of test oracles. Dodona ranks program variables based on the interactions and dependencies observed between them during program execution. Using this ranking, Dodona proposes a set of variables to be monitored, that can be used by engineers to construct assertion-based oracles. Our empirical study of Dodona reveals that it is more effective and efficient than the current state-of-the-art approach for generating oracle data sets, and can often yield oracles that are almost as effective as oracles hand-crafted by engineers without support.

international conference on software testing verification and validation | 2013

The Impact of Concurrent Coverage Metrics on Testing Effectiveness

Shin Hong; Matthew Staats; Jaemin Ahn; Moonzoo Kim; Gregg Rothermel

When testing multithreaded programs, the number of possible thread interactions makes exploring all interactions infeasible in practice. In response, researchers have developed concurrent coverage metrics for multithreaded programs. These metrics allow them to estimate how well they have exercised concurrent program behavior, just as branch and statement coverage metrics do for sequential program testing. However, unlike sequential coverage metrics, the effectiveness of concurrent coverage metrics in testing remains largely unexamined. In this paper, we explore the relationship between concurrent coverage and fault detection effectiveness by studying the application of eight concurrent coverage metrics in testing nine concurrent programs. Our results show that existing concurrent coverage metrics are often moderate to strong predictors of concurrent testing effectiveness, and are generally reasonable targets for test suite generation. Nevertheless, their relative effectiveness as predictors and test generation targets varies across programs, and thus additional work is needed in this area.

IEEE Transactions on Software Engineering | 2015

The Impact of View Histories on Edit Recommendations

Seonah Lee; Sungwon Kang; Sunghun Kim; Matthew Staats

Recommendation systems are intended to increase developer productivity by recommending files to edit. These systems mine association rules in software revision histories. However, mining coarse-grained rules using only edit histories produces recommendations with low accuracy, and can only produce recommendations after a developer edits a file. In this work, we explore the use of finer-grained association rules, based on the insight that view histories help characterize the contexts of files to edit. To leverage this additional context and fine-grained association rules, we have developed MI, a recommendation system extending ROSE, an existing edit-based recommendation system. We then conducted a comparative simulation of ROSE and MI using the interaction histories stored in the Eclipse Bugzilla system. The simulation demonstrates that MI predicts the files to edit with significantly higher recommendation accuracy than ROSE (about 63 over 35 percent), and makes recommendations earlier, often before developers begin editing. Our results clearly demonstrate the value of considering both views and edits in systems to recommend files to edit, and results in more accurate, earlier, and more flexible recommendations.

Software Testing, Verification & Reliability | 2015

Are concurrency coverage metrics effective for testing: a comprehensive empirical investigation

Shin Hong; Matthew Staats; Jaemin Ahn; Moonzoo Kim; Gregg Rothermel

Testing multithreaded programs is inherently challenging, as programs can exhibit numerous thread interactions. To help engineers test these programs cost‐effectively, researchers have proposed concurrency coverage metrics. These metrics are intended to be used as predictors for testing effectiveness and provide targets for test generation. The effectiveness of these metrics, however, remains largely unexamined. In this work, we explore the impact of concurrency coverage metrics on testing effectiveness and examine the relationship between coverage, fault detection, and test suite size. We study eight existing concurrency coverage metrics and six new metrics formed by combining complementary metrics. Our results indicate that the metrics are moderate to strong predictors of testing effectiveness and effective at providing test generation targets. Nevertheless, metric effectiveness varies across programs, and even combinations of complementary metrics do not consistently provide effective testing. These results highlight the need for additional work on concurrency coverage metrics. Copyright

Explore More