James McCaffrey
Microsoft
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by James McCaffrey.
international conference on information technology: new generations | 2010
James McCaffrey
Pairwise test set generation is the process of producing a subset of all possible test case inputs to a system in situations where exhaustive testing is not possible or is prohibitively expensive. For a given system under test with a set of input parameters where each parameter can take on one of a discrete set of values, a pairwise test set consists of a collection of vectors which capture all possible combinations of pairs of parameter values. Generating pairwise test sets with a minimal size has been shown to be an NP-complete problem, and several deterministic generation algorithms have been published. This paper describes the results of an investigation of pairwise test set generation using a genetic algorithm. The genetic algorithm approach produced pairwise test sets with comparable or smaller (better) size compared with published results for deterministic algorithms for 39 out of 40 benchmark problems. However, the genetic algorithm test set generation technique required significantly longer processing time in all cases. The results illustrate that generation of pairwise test sets using a genetic algorithm is possible, and suggest that the technique may be both practical and useful in certain software testing situations.
computer software and applications conference | 2009
James McCaffrey
Pairwise testing is a combinatorial technique used to reduce the number of test case inputs to a system in situations where exhaustive testing with all possible inputs is not possible or prohibitively expensive. Given a set of input parameters where each parameter can take on one of a discrete set of values, a pairwise test set consists of a collection of vectors which captures all possible combinations of pairs of parameter values. The generation of minimal pairwise test sets has been shown to be an NP-complete problem and there have been several deterministic algorithms published. This paper presents the results of an investigation of generating pairwise test sets using a genetic algorithm. Compared with published results for deterministic pairwise test set generation algorithms, the genetic algorithm approach produced test sets which were comparable or better in terms of test set size in 39 out of 40 cases. However, the genetic algorithm approach required longer processing time than deterministic approaches in all cases. The results demonstrate that the generation of pairwise test sets using a genetic algorithm is possible, and suggest that the approach may be practical and useful in certain testing scenarios.
information reuse and integration | 2009
James McCaffrey
Pairwise testing is a combinatorial technique used to reduce the number of test case inputs to a system in situations where exhaustive testing with all possible inputs is not feasible. The generation of pairwise test sets with a minimal size is an NP-complete problem and several deterministic algorithms have been published. This paper presents the results of generating pairwise test sets using a simulated bee colony algorithm. Compared to published results for seven benchmark problems, the simulated bee colony approach produced test sets which were comparable or better in terms of size for all seven problems. However, the simulated bee colony approach required significantly longer generation time than deterministic approaches in all cases. The results demonstrate that the generation of pairwise test sets using a simulated bee colony algorithm is possible, and suggest that the approach may be useful in testing scenarios where pairwise test set data will be reused.
international conference on information technology: new generations | 2009
James McCaffrey
The Multi-Attribute Global Inference of Quality (MAGIQ) technique is a simple way to assign a single measure of overall quality to each of a set of similar software systems. Software testing activities can produce a wide range of useful information such as bug counts, performance metrics, and mean time to failure data. However, techniques to aggregate quality and testing metrics into a single quality meta-value are not widely known or used. The MAGIQ technique uses rank order centroids to convert system comparison attributes into normalized numeric weights, and then computes an overall measure of quality as a weighted (by comparison attributes) sum of system ratings. MAGIQ was originally developed to validate the results of analytic hierarchy process (AHP) analyses. Although MAGIQ has not been subjected to extensive research, the technique has proven highly useful in practice.
information reuse and integration | 2011
James McCaffrey
Graph partitioning is a problem which has great practical importance. Because graph partitioning is an NP-complete problem, there has been much research attention focused on developing heuristics which find reasonably good approximations to optimal solutions. This study presents a Simulated Bee Colony (SBC) graph partitioning algorithm which is based on the foraging behavior of honey bees. A computer program which implemented the SBC algorithm was executed against 12 benchmark graph partitioning problems. The SBC algorithm produced partitions with better quality than the best published results for 10 of the 12 benchmark problems. The results suggest that Simulated Bee Colony algorithms are a highly effective technique for partitioning graphs in situations where partition quality is more important than real-time performance and that using SBC graph partitioning algorithms may be particularly useful in problem scenarios where the partition result is intended for reuse such as analyses of large communication graphs.
rules and rule markup languages for the semantic web | 2009
James McCaffrey; Howard Lee Dierking
This study investigates the use of a biologically inspired meta-heuristic algorithm to extract rule sets from clustered categorical data. A computer program which implemented the algorithm was executed against six benchmark data sets and successfully discovered the underlying generation rules in all cases. Compared to existing approaches, the simulated bee colony (SBC) algorithm used in this study has the advantage of allowing full customization of the characteristics of the extracted rule set, and allowing arbitrarily large data sets to be analyzed. The primary disadvantages of the SBC algorithm for rule set extraction are that the approach requires a relatively large number of input parameters, and that the approach does not guarantee convergence to an optimal solution. The results demonstrate that an SBC algorithm for rule set extraction of clustered categorical data is feasible, and suggest that the approach may have the ability to outperform existing algorithms in certain scenarios.
information reuse and integration | 2012
James McCaffrey
This paper introduces simulated protozoa optimization (SPO). SPO is a multi-agent heuristic technique that models the foraging and reproductive behavior of unicellular organisms such as Paramecium caudatum. In one set of experiments, SPO-based algorithms were used to solve a set of five standard benchmark numeric minimization problems including the Rastrigin function and the Schwefel function. Compared to the related techniques particle swarm optimization (PSO), bacterial foraging optimization (BFO), and genetic algorithm optimization (GAO), SPO produced better results in terms of both solution accuracy and performance. In a second set of experiments, when used as the weight and bias estimation mechanism for neural network classification, SPO produced better accuracy than PSO, BFO and GAO. An analysis of SPO algorithms indicates that the two most important factors contributing to SPO effectiveness are those that model protozoan fission and conjugation. The results suggest that SPO is a promising new optimization technique that may be particularly applicable to the analysis of very large data sets.
international conference on information technology: new generations | 2012
James McCaffrey
This paper presents a case study of the design of a hybrid SQL data storage combined with procedural programming language processing (HSPPL) system for the analysis of large graphs. The HSPPL system was evaluated against a system with SQL data storage combined with SQL language processing (SQL), and against a system with internal memory storage combined with procedural programming language processing (PPL). In one experiment, the three systems were used to perform a shortest path analysis on six test graphs which varied in size and density. The HSPPL system was significantly faster than the SQL system and was able to handle graphs larger than those that could be handled by the PPL system, but the HSPPL system was significantly slower than the PPL system. In a second experiment, the three systems were used to perform graph partitioning on four benchmark problems. The results of the partitioning produced by the three systems were not statistically different. The results suggest that an HSPPL system for analyzing large graphs is feasible and may be particularly useful in situations where a graph under analysis is too large to fit into host machine main memory.
international conference on information technology: new generations | 2010
James McCaffrey; Adrian Bonar
Functional programming languages, which emphasize a paradigm in which code modules return a value, have traditionally been used primarily in research and academia rather than in commercial software development and testing. This paper presents the results of a case study which investigated the use of a functional programming language for writing software test automation. Several software test engineers with significant procedural programming experience, but minimal functional programming experience, were given a short training class which focused on writing test automation using the F# functional programming language. Survey results suggested that the technical factors (such as immutability and pipelining) associated with test automation written using a functional programming language were less important than the subjective factors (such as the similarities between programming paradigms), and that the use of a functional programming language may provide indirect value to a software testing effort.
international symposium on visual computing | 2009
James McCaffrey
This study investigates the use of a biologically inspired meta-heuristic algorithm to cluster categorical datasets so that the data can be presented in a useful visual form. A computer program which implemented the algorithm was executed against a benchmark dataset of voting records and produced better results, in terms of cluster accuracy, than all known published studies. Compared to alternative clustering and visualization approaches, the categorical dataset clustering with a simulated bee colony (CDC-SBC) algorithm has the advantage of allowing arbitrarily large datasets to be analyzed. The primary disadvantages of the CDC-SBC algorithm for dataset clustering and visualization are that the approach requires a relatively large number of input parameters, and that the approach does not guarantee convergence to an optimal solution. The results of this study suggest that using the CDC-SBC approach for categorical data visualization may be both practical and useful in certain scenarios.