Shin Yoo
KAIST
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shin Yoo.
Software Testing, Verification & Reliability | 2012
Shin Yoo; Mark Harman
Regression testing is a testing activity that is performed to provide confidence that changes do not harm the existing behaviour of the software. Test suites tend to grow in size as software evolves, often making it too costly to execute entire test suites. A number of different approaches have been studied to maximize the value of the accrued test suite: minimization, selection and prioritization. Test suite minimization seeks to eliminate redundant test cases in order to reduce the number of tests to run. Test case selection seeks to identify the test cases that are relevant to some set of recent changes. Test case prioritization seeks to order test cases in such a way that early fault detection is maximized. This paper surveys each area of minimization, selection and prioritization technique and discusses open problems and potential directions for future research. Copyright
international symposium on software testing and analysis | 2007
Shin Yoo; Mark Harman
Previous work has treated test case selection as a single objective optimisation problem. This paper introduces the concept of Pareto efficiency to test case selection. The Pareto efficient approach takes multiple objectives such as code coverage, past fault-detection history and execution cost, and constructs a group of non-dominating, equivalently optimal test case subsets. The paper describes the potential bene?ts of Pareto efficient multi-objective test case selection, illustrating with empirical studies of two and three objective formulations.
IEEE Transactions on Software Engineering | 2015
Earl T. Barr; Mark Harman; Phil McMinn; Muzammil Shahbaz; Shin Yoo
Testing involves examining the behaviour of a system in order to discover potential faults. Given an input for a system, the challenge of distinguishing the corresponding desired, correct behaviour from potentially incorrect behavior is called the “test oracle problem”. Test oracle automation is important to remove a current bottleneck that inhibits greater overall test automation. Without test oracle automation, the human has to determine whether observed behaviour is correct. The literature on test oracles has introduced techniques for oracle automation, including modelling, specifications, contract-driven development and metamorphic testing. When none of these is completely adequate, the final source of test oracle information remains the human, who may be aware of informal specifications, expectations, norms and domain specific information that provide informal oracle guidance. All forms of test oracles, even the humble human, involve challenges of reducing cost and increasing benefit. This paper provides a comprehensive survey of current approaches to the test oracle problem and an analysis of trends in this important area of software testing research and practice.
international symposium on software testing and analysis | 2009
Shin Yoo; Mark Harman; Paolo Tonella; Angelo Susi
Pair-wise comparison has been successfully utilised in order to prioritise test cases by exploiting the rich, valuable and unique knowledge of the tester. However, the prohibitively large cost of the pair-wise comparison method prevents it from being applied to large test suites. In this paper, we introduce a cluster-based test case prioritisation technique. By clustering test cases, based on their dynamic runtime behaviour, we can reduce the required number of pair-wise comparisons significantly. The approach is evaluated on seven test suites ranging in size from 154 to 1,061 test cases. We present an empirical study that shows that the resulting prioritisation is more effective than existing coverage-based prioritisation techniques in terms of rate of fault detection. Perhaps surprisingly, the paper also demonstrates that clustering (even without human input) can outperform unclustered coverage-based technologies, and discusses an automated process that can be used to determine whether the application of the proposed approach would yield improvement.
Journal of Systems and Software | 2010
Shin Yoo; Mark Harman
Test suite minimisation techniques seek to reduce the effort required for regression testing by selecting a subset of test suites. In previous work, the problem has been considered as a single-objective optimisation problem. However, real world regression testing can be a complex process in which multiple testing criteria and constraints are involved. This paper presents the concept of Pareto efficiency for the test suite minimisation problem. The Pareto-efficient approach is inherently capable of dealing with multiple objectives, providing the decision maker with a group of solutions that are not dominated by each other. The paper illustrates the benefits of Pareto efficient multi-objective test suite minimisation with empirical studies of two and three objective formulations, in which multiple objectives such as coverage and past fault-detection history are considered. The paper utilises a hybrid, multi-objective genetic algorithm that combines the efficient approximation of the greedy approach with the capability of population based genetic algorithm to produce higher-quality Pareto fronts.
formal methods | 2013
Shin Yoo; Mark Harman; David Clark
Test case prioritization techniques seek to maximize early fault detection. Fault localization seeks to use test cases already executed to help find the fault location. There is a natural interplay between the two techniques; once a fault is detected, we often switch focus to fault fixing, for which localization may be a first step. In this article we introduce the Fault Localization Prioritization (FLP) problem, which combines prioritization and localization. We evaluate three techniques: a novel FLP technique based on information theory, FLINT (Fault Localization using INformation Theory), that we introduce in this article, a standard Test Case Prioritization (TCP) technique, and a “test similarity technique” used in previous work. Our evaluation uses five different releases of four software systems. The results indicate that FLP and TCP can statistically significantly reduce fault localization costs for 73% and 76% of cases, respectively, and that FLINT significantly outperforms similarity-based localization techniques in 52% of the cases considered in the study.
international conference on software testing verification and validation | 2014
Seokhyeon Moon; Yunho Kim; Moonzoo Kim; Shin Yoo
We present MUSE (MUtation-baSEd fault localization technique), a new fault localization technique based on mutation analysis. A key idea of MUSE is to identify a faulty statement by utilizing different characteristics of two groups of mutants-one that mutates a faulty statement and the other that mutates a correct statement. We also propose a new evaluation metric for fault localization techniques based on information theory, called Locality Information Loss (LIL): it can measure the aptitude of a localization technique for automated fault repair systems as well as human debuggers. The empirical evaluation using 14 faulty versions of the five real-world programs shows that MUSE localizes a fault after reviewing 7.4 statements on average, which is about 25 times more precise than the state-of-the-art SBFL technique Op2.
symposium on search based software engineering | 2012
Shin Yoo
Spectra-Based Fault Localisation (SBFL) aims to assist debugging by applying risk evaluation formulae (sometimes called suspiciousness metrics) to program spectra and ranking statements according to the predicted risk. Designing a risk evaluation formula is often an intuitive process done by human software engineer. This paper presents a Genetic Programming (GP) approach for evolving risk assessment formulae. The empirical evaluation using 92 faults from four Unix utilities produces promising results. Equations evolved by Genetic Programming can consistently outperform many of the human-designed formulae, such as Tarantula, Ochiai, Jaccard, Ample, and Wong1/2, up to 6 times. More importantly, they can perform equally as well as Op2, which was recently proved to be optimal against If-Then-Else-2 (ITE2) structure, or even outperform it against other program structures.
genetic and evolutionary computation conference | 2009
Mark Harman; Jens Krinke; Jian Ren; Shin Yoo
Software engineering is plagued by problems associated with unreliable cost estimates. This paper introduces an approach to sensitivity analysis for requirements engineering. It uses Search-Based Software Engineering to aid the decision maker to explore sensitivity of the cost estimates of requirements for the Next Release Problem (NRP). The paper presents both single- and multi-objective formulation of NRP with empirical sensitivity analysis on synthetic and real-world data. The results show strong correlation between the level of inaccuracy and the impact on the selection of requirements, as well as between the cost of requirements and the impact, which is as intuitively expected. However, there also exist a few sensitive exceptions to these trends; the paper uses a heat-map style visualisation to reveal these exceptions which require careful consideration. The paper also shows that such unusually sensitivity patterns occur in real-world data and how the proposed approach clearly identifies them.
Software Testing, Verification & Reliability | 2012
Shin Yoo; Mark Harman
Existing automated test data generation techniques tend to start from scratch, implicitly assuming that no pre‐existing test data are available. However, this assumption may not always hold, and where it does not, there may be a missed opportunity; perhaps the pre‐existing test cases could be used to assist the automated generation of additional test cases. This paper introduces search‐based test data regeneration, a technique that can generate additional test data from existing test data using a meta‐heuristic search algorithm. The proposed technique is compared to a widely studied test data generation approach in terms of both efficiency and effectiveness. The empirical evaluation shows that test data regeneration can be up to 2 orders of magnitude more efficient than existing test data generation techniques, while achieving comparable effectiveness in terms of structural coverage and mutation score. Copyright