Phyllis G. Frankl
New York University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Phyllis G. Frankl.
IEEE Transactions on Software Engineering | 1988
Phyllis G. Frankl; Elaine J. Weyuker
The authors extend the definitions of the previously introduced family of data flow testing criteria to apply to programs written in a large subset of Pascal. They then define a family of adequacy criteria called feasible data flow testing criteria, which are derived from the data-flow testing criteria. The feasible data flow testing criteria circumvent the problem of nonapplicability of the data flow testing criteria by requiring the test data to exercise only those definition-use associations which are executable. It is shown that there are significant differences between the relationships among the data flow testing criteria and the relationships among the feasible data flow testing criteria. The authors discuss a generalized notion of the executability of a path through a program unit. A script of a testing session using their data flow testing tool, ASSET, is included. >
ACM Transactions on Software Engineering and Methodology | 1994
Roong Ko Doong; Phyllis G. Frankl
This article describes a new approach to the unit testing of object-oriented programs, a set of tools based on this approach, and two case studies. In this approach, each test case consists of a tuple of sequences of messages, along with tags indicating whether these sequences should put objects of the class under test into equivalent states and/or return objects that are in equivalent states. Tests are executed by sending the sequences to objects of the class under test, then invoking a user-supplied equivalence-checking mechanism. This approach allows for substantial automation of many aspects of testing, including test case generation, test driver generation, test execution, and test checking. Experimental prototypes of tools for test generation and test execution are described. The test generation tool requires the availability of an algebraic specification of the abstract data type being tested, but the test execution tool can be used when no formal specification is available. Using the test execution tools, case studies involving execution of tens of thousands of test cases, with various sequence lengths, parameters, and combinations of operations were performed. The relationships among likelihood of detecting an error and sequence length, range of parameters, and relative frequency of various operations were investigated for priority queue and sorted-list implementations having subtle errors. In each case, long sequences tended to be more likely to detect the error, provided that the range of parameters was sufficiently large and likelihood of detecting an error tended to increase up to a threshold value as the parameter range increased.
IEEE Transactions on Software Engineering | 1993
Phyllis G. Frankl; Stewart N. Weiss
An experiment comparing the effectiveness of the all-uses and all-edges test data adequacy criteria is discussed. The experiment was designed to overcome some of the deficiencies of previous software testing experiments. A large number of test sets was randomly generated for each of nine subject programs with subtle errors. For each test set, the percentages of executable edges and definition-use associations covered were measured, and it was determined whether the test set exposed an error. Hypothesis testing was used to investigate whether all-uses adequate test sets are more likely to expose errors than are all-edges adequate test sets. Logistic regression analysis was used to investigate whether the probability that a test set exposes an error increases as the percentage of definition-use associations or edges covered by it increases. Error exposing ability was shown to be strongly positively correlated to percentage of covered definition-use associations in only four of the nine subjects. Error exposing ability was also shown to be positively correlated to the percentage of covered edges in four different subjects, but the relationship was weaker. >
Journal of Systems and Software | 1997
Phyllis G. Frankl; Stewart N. Weiss; Cang Hu
The effectiveness of a test data adequacy criterion for a given program and specification is the probability that a test set satisfying the criterion will expose a fault. Experiments were performed to compare the effectiveness of the mutation testing and all-uses test data adequacy criteria at various coverage levels, for randomly generated test sets. Large numbers of test sets were generated and executed, and for each, the proportion of mutants killed or def-use associations covered was measured. This data was used to estimate and compare the effectiveness of the criteria. The results were mixed: at the highest coverage levels considered, mutation was more effective than all-uses for five of the nine subjects, all-uses was more effective than mutation for two subjects, and there was no clear winner for two subjects. However, mutation testing was much more expensive than all-uses. The relationship between coverage and effectiveness for fixed-sized test sets was also explored and was found to be nonlinear and, in many cases, nonmonotonic.
IEEE Transactions on Software Engineering | 1993
Phyllis G. Frankl; Elaine J. Weyuker
Several relationships between software testing criteria, each induced by a relation between the corresponding multisets of subdomains, are examined. The authors discuss whether for each relation R and each pair of criteria, C/sub 1/ and C/sub 2/, R(C/sub 1/, C/sub 2/) guarantees that C/sub 1/ is better at detecting faults than C/sub 2/ according to various probabilistic measures of fault-detecting ability. It is shown that the fact that C/sub 1/ subsumes C/sub 2/ does not guarantee that C/sub 1/ is better at detecting faults. Relations that strengthen the subsumption relation and that have more bearing on fault-detecting ability are introduced. >
international conference on software maintenance | 1998
Filippos I. Vokolos; Phyllis G. Frankl
Regression testing is a commonly used activity whose purpose is to determine whether the modifications made to a software system have introduced new faults. Textual differencing is a new, safe and fairly precise, selective regression testing technique that works by comparing source files from the old and the new version of the program. We have implemented the textual differencing technique in a tool called Pythia. Pythia has been developed primarily through the integration of standard, well known UNIX programs, and is capable of analyzing large software systems written in C. We present results from a case study involving a software system of approximately 11,000 lines of source code written for the European Space Agency. The results provide empirical evidence that textual differencing is very fast and capable of achieving substantial reductions in the size of the regression test suite.
Software Testing, Verification & Reliability | 2004
David Chays; Yuetang Deng; Phyllis G. Frankl; Saikat Dan; Filippos I. Vokolos; Elaine J. Weyuker
Database systems play an important role in nearly every modern organization, yet relatively little research effort has focused on how to test them. This paper discusses issues arising in testing database systems, presents an approach to testing database applications, and describes AGENDA, a set of tools to facilitate the use of this approach. In testing such applications, the state of the database before and after the users operation plays an important role, along with the users input and the system output. A framework for testing database applications is introduced. A complete tool set, based on this framework, has been prototyped. The components of this system are a parsing tool that gathers relevant information from the database schema and application, a tool that populates the database with meaningful data that satisfy database constraints, a tool that generates test cases for the application, a tool that checks the resulting database state after operations are performed by a database application, and a tool that assists the tester in checking the database applications output. The design and implementation of each component of the system are discussed. The prototype described here is limited to applications consisting of a single SQL query. Copyright
foundations of software engineering | 1998
Phyllis G. Frankl; Oleg Iakounenko
This paper reports on an empirical evaluation of the fault-detecting ability of two white-box software testing techniques: decision coverage (branch testing) and the all-uses data flow testing criterion. Each subject program was tested using a very large number of randomly generated test sets. For each test set, the extent to which it satisfied the given testing criterion was measured and it was determined whether or not the test set detected a program fault. These data were used to explore the relationship between the coverage achieved by test sets and the likelihood that they will detect a fault.Previous experiments of this nature have used relatively small subject programs and/or have used programs with seeded faults. In contrast, the subjects used here were eight versions of an antenna configuration program written for the European Space Agency, each consisting of over 10,000 lines of C code.For each of the subject programs studied, the likelihood of detecting a fault increased sharply as very high coverage levels were reached. Thus, this data supports the belief that these testing techniques can be more effective than random testing. However, the magnitudes of the increases were rather inconsistent and it was difficult to achieve high coverage levels.
international symposium on software testing and analysis | 2000
David Chays; Saikat Dan; Phyllis G. Frankl; Filippos I. Vokolos; Elaine J. Weyuker
Database systems play an important role in nearly every modern organization, yet relatively little research effort has focused on how to test them. This paper discusses issues arising in testing database systems and presents an approach to testing database applications. In testing such applications, the state of the database before and after the users operation plays an important role, along with the users input and the system output. A tool for populating the database with meaningful data that satisfy database constraints has been prototyped. Its design and its role in a larger database application testing tool set are discussed.
IEEE Transactions on Software Engineering | 1993
Phyllis G. Frankl; Elaine J. Weyuker
This paper compares the fault-detecting ability of several software test data adequacy criteria. It has previously been shown that if C/sub 1/ properly covers C/sub 2/, then C/sub 1/ is guaranteed to be better at detecting faults than C/sub 2/, in the following sense: a test suite selected by independent random selection of one test case from each subdomain induced by C/sub 1/ is at least as likely to detect a fault as a test suite similarly selected using C/sub 2/. In contrast, if C/sub 1/ subsumes but does not properly cover C/sub 2/, this is not necessarily the case. These results are used to compare a number of criteria, including several that have been proposed as stronger alternatives to branch testing. We compare the relative fault-detecting ability of data flow testing, mutation testing, and the condition-coverage techniques, to branch testing, showing that most of the criteria examined are guaranteed to be better than branch testing according to two probabilistic measures. We also show that there are criteria that can sometimes be poorer at detecting faults than substantially less expensive criteria. >