Lutz Prechelt | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lutz Prechelt is active.

Explore More

Publication

Featured researches published by Lutz Prechelt.

Journal of Systems and Software | 1995

Experimental evaluation in computer science: a quantitative study

Walter F. Tichy; Paul Lukowicz; Lutz Prechelt; Ernst A. Heinz

A survey of 400 recent research articles suggests that computer scientists publish relatively few papers with experimentally validated results. The survey includes complete volumes of several refereed computer science journals, a conference, and 50 titles drawn at random from all articles published by ACM in 1993. The journals of Optical Engineering (OE) and Neural Computation (NC) were used for comparison. Of the papers in the random sample that would require experimental validation, 40% have none at all. In journals related to software engineering, this fraction is 50%. In comparison, the fraction of papers lacking quantitative evaluation in OE and NC is only 15% and 12%, respectively. Conversely, the fraction of papers that devote one fifth or more of their space to experimental validation is almost 70% for OE and NC, while it is a mere 30% for the computer science (CS) random sample and 20% for software engineering. The low ratio of validated results appears to be a serious weakness in computer science research. This weakness should be rectified for the long-term health of the field. The fundamental principle of science, the definition almost, is this: the sole test of the validity of any idea is experiment. —Richard P. Feynman. Beware of bugs in the above code; I have only proved it correct, not tried it. —Donald E. Knuth

neural information processing systems | 1998

Early Stopping-But When?

Lutz Prechelt

Validation can be used to detect when overfitting starts during supervised training of a neural network; training is then stopped before convergence to avoid the overfitting (“early stopping”). The exact criterion used for validation-based early stopping, however, is usually chosen in an ad-hoc fashion or training is stopped interactively. This trick describes how to select a stopping criterion in a systematic fashion; it is a trick for either speeding learning procedures or improving generalization, whichever is more important in the particular situation. An empirical investigation on multi-layer perceptrons shows that there exists a tradeoff between training time and generalization: From the given mix of 1296 training runs using different 12 problems and 24 different network architectures I conclude slower stopping criteria allow for small improvements in generalization (here: about 4% on average), but cost much more training time (here: about factor 4 longer on average).

IEEE Transactions on Software Engineering | 2001

A controlled experiment in maintenance: comparing design patterns to simpler solutions

Lutz Prechelt; Barbara Unger; Walter F. Tichy; Peter Brössler; Lawrence G. Votta

Software design patterns package proven solutions to recurring design problems in a form that simplifies reuse. We are seeking empirical evidence whether using design patterns is beneficial. In particular, one may prefer using a design pattern even if the actual design problem is simpler than that solved by the pattern, i.e., if not all of the functionality offered by the pattern is actually required. Our experiment investigates software maintenance scenarios that employ various design patterns and compares designs with patterns to simpler alternatives. The subjects were professional software engineers. In most of our nine maintenance tasks, we found positive effects from using a design pattern: either its inherent additional flexibility was achieved without requiring more maintenance time or maintenance time was reduced compared to the simpler alternative. In a few cases, we found negative effects: the alternative solution was less error-prone or required less maintenance time. Overall, we conclude that, unless there is a clear reason to prefer the simpler solution, it is probably wise to choose the flexibility provided by the design pattern because unexpected new requirements often appear. We identify several questions for future empirical research.

IEEE Transactions on Software Engineering | 2002

Two controlled experiments assessing the usefulness of design pattern documentation in program maintenance

Lutz Prechelt; Barbara Unger-Lamprecht; Michael Philippsen; Walter F. Tichy

Using design patterns is claimed to improve programmer productivity and software quality. Such improvements may manifest both at construction time (in faster and better program design) and at maintenance time (in faster and more accurate program comprehension). The paper focuses on the maintenance context and reports on experimental tests of the following question: does it help the maintainer if the design patterns in the program code are documented explicitly (using source code comments) compared to a well-commented program without explicit reference to design patterns? Subjects performed maintenance tasks on two programs ranging from 360 to 560 LOC including comments. The experiments tested whether pattern comment lines (PCL) help during maintenance if patterns are relevant and sufficient program comments are already present. This question is a challenge for the experimental methodology: A setup leading to relevant results is quite difficult to find. We discuss these issues in detail and suggest a general approach to such situations. A conservative analysis of the results supports the hypothesis that pattern-relevant maintenance tasks were completed faster or with fewer errors if redundant design pattern information was provided. The article provides the first controlled experiment results on design pattern usage and it presents a solution approach to an important class of experiment design problems for experiments regarding documentation.

Neural Networks | 1996

A quantitative study of experimental evaluations of neural network learning algorithms: current research practice

Lutz Prechelt

In all, 190 articles about neural network learning algorithms published in 1993 and 1994 are examined for the amount of experimental evaluation they contain. Some 29% of them employ not even a single realistic or real learning problem. Only 8% of the articles present results for more than one problem using real world data. Furthermore, one third of all articles do not present any quantitative comparison with a previously known algorithm. These results suggest that we should strive for better assessment practices in neural network learning algorithm research. For the long-term benefit of the field, the publication standards should be raised in this respect and easily accessible collections of benchmark problems should be built.

ACM Transactions on Computer-Human Interaction | 2001

An interface for melody input

Lutz Prechelt; Rainer Typke

We present a software system, called Tunserver, which recognizes a musical tune whistled by the user, finds it in a database, and returns its name, composer, and other information. Such a service is useful for track retrieval at radio stations, music stores, etc., and is also a step toward the long-term goal of communicating with a computer much like one would with a human being. Tuneserver is implemented as a public Java-based WWW service with a database of approximately 10,000 motifs. Tune recognition is based on a highly error-resistant encoding, proposed by Parsons, that uses only the direction of the melody, ignoring the size of intervals as well as rhythm. We present the design and implementation of the tune recognition core, outline the design of the Web service, and describe the results obtained in an empirical evaluation of the new interface, including the derivation of suitable system parameters, resulting performance figures, and an error analysis.

Neural Networks | 1997

Investigation of the CasCor family of learning algorithms

Lutz Prechelt

Six learning algorithms are investigated and compared empirically. All of them are based on variants of the candidate training idea of the Cascade Correlation method. The comparison was performed using 42 different datasets from the PROBEN1 benchmark collection. The results indicate: (1) for these problems it is slightly better not to cascade the hidden units; (2) error minimization candidate training is better than covariance maximization for regression problems but may be a little worse for classification problems; (3) for most learning tasks, considering validation set errors during the selection of the best candidate will not lead to improved networks, but for a few tasks it will. Copyright 1997 Elsevier Science Ltd.

IEEE Transactions on Software Engineering | 2001

An experiment measuring the effects of personal software process (PSP) training

Lutz Prechelt; Barbara Unger

The personal software process is a process improvement methodology aimed at individual software engineers. It claims to improve software quality (in particular defect content), effort estimation capability, and process adaptation and improvement capabilities. We have tested some of these claims in an experiment comparing the performance of participants who had just previously received a PSP course to a different group of participants who had received other technical training instead. Each participant of both groups performed the same task. We found the following positive effects: the PSP group estimated their productivity (though not their effort) more accurately, made fewer trivial mistakes, and their programs performed more careful error-checking; further, the performance variability was smaller in the PSP group in various respects. However, the improvements are smaller than the PSP proponents usually assume, possibly due to the low actual usage of PSP techniques in the PSP group. We conjecture that PSP training alone does not automatically realize the PSPs potential benefits (as seen in some industrial PSP success stories) when programmers are left alone with motivating themselves to actually use the PSP techniques.

Journal of Systems and Software | 2003

A controlled experiment on inheritance depth as a cost factor for code maintenance

Lutz Prechelt; Barbara Unger; Michael Philippsen; Walter F. Tichy

In two controlled experiments we compare the performance on code maintenance tasks for three equivalent programs with 0, 3, and 5 levels of inheritance. For the given tasks, which focus on understanding effort more than change effort, programs with less inheritance were faster to maintain. Daly et al. previously reported similar experiments on the same question with quite different results. They found that the 5-level program tended to be harder to maintain than the 0-level program, while the 3-level program was significantly easier to maintain than the 0-level program. We describe the design and setup of our experiment, the differences to the previous ones, and the results obtained. Ours and the previous experiments are different in several ways: We used a longer and more complex program, made an inheritance diagram available to the subjects, and added a second kind of maintenance task.When taken together, the previous results plus ours suggest that there is no such thing as usefulness or harmfulness of a certain inheritance depth as such. Code maintenance effort is hardly correlated with inheritance depth, but rather depends on other factors (partly related to inheritance depth). Using statistical modeling, we identify the number of relevant methods to be one such factor. We use it to build an explanation model of average code maintenance effort that is much more powerful than a model relying on inheritance depth.

IEEE Transactions on Software Engineering | 1998

A controlled experiment to assess the benefits of procedure argument type checking

Lutz Prechelt; Walter F. Tichy

Type checking is considered an important mechanism for detecting programming errors, especially interface errors. This report describes an experiment to assess the defect-detection capabilities of static, intermodule type checking. The experiment uses ANSI C and Kernighan & Ritchie (K&R) C. The relevant difference is that the ANSI C compiler checks module interfaces (i.e., the parameter lists calls to external functions), whereas K&R C does not. The experiment employs a counterbalanced design in which each of the 40 subjects, most of them CS PhD students, writes two nontrivial programs that interface with a complex library (Motif). Each subject writes one program in ANSI C and one in K&R C. The input to each compiler run is saved and manually analyzed for defects. Results indicate that delivered ANSI C programs contain significantly fewer interface defects than delivered K&R C programs. Furthermore, after subjects have gained some familiarity with the interface they are using, ANSI C programmers remove defects faster and are more productive (measured in both delivery time and functionality implemented).

Explore More