Deborah G. Mayo
Virginia Tech
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Deborah G. Mayo.
The British Journal for the Philosophy of Science | 2006
Deborah G. Mayo; Aris Spanos
Despite the widespread use of key concepts of the Neyman–Pearson (N–P) statistical paradigm—type I and II errors, significance levels, power, confidence levels—they have been the subject of philosophical controversy and debate for over 60 years. Both current and long-standing problems of N–P tests stem from unclarity and confusion, even among N–P adherents, as to how a tests (pre-data) error probabilities are to be used for (post-data) inductive inference as opposed to inductive behavior. We argue that the relevance of error probabilities is to ensure that only statistical hypotheses that have passed severe or probative tests are inferred from the data. The severity criterion supplies a meta-statistical principle for evaluating proposed statistical inferences, avoiding classic fallacies from tests that are overly sensitive, as well as those not sensitive enough to particular errors and discrepancies. 1. Introduction and overview 1.1Behavioristic and inferential rationales for Neyman–Pearson (N–P) tests 1.2Severity rationale: induction as severe testing 1.3Severity as a meta-statistical concept: three required restrictions on the N–P paradigm2. Error statistical tests from the severity perspective 2.1N–P test T(α): type I, II error probabilities and power 2.2Specifying test T(α) using p-values3. Neymans post-data use of power 3.1Neyman: does failure to reject H warrant confirming H?4. Severe testing as a basic concept for an adequate post-data inference 4.1The severity interpretation of acceptance (SIA) for test T(α) 4.2The fallacy of acceptance (i.e., an insignificant difference): Ms Rosy 4.3Severity and power5. Fallacy of rejection: statistical vs. substantive significance 5.1Taking a rejection of H0 as evidence for a substantive claim or theory 5.2A statistically significant difference from H0 may fail to indicate a substantively important magnitude 5.3Principle for the severity interpretation of a rejection (SIR) 5.4Comparing significant results with different sample sizes in T(α): large n problem 5.5General testing rules for T(α), using the severe testing concept6. The severe testing concept and confidence intervals 6.1Dualities between one and two-sided intervals and tests 6.2Avoiding shortcomings of confidence intervals7. Beyond the N–P paradigm: pure significance, and misspecification tests8. Concluding comments: have we shown severity to be a basic concept in a N–P philosophy of induction? Introduction and overview 1.1Behavioristic and inferential rationales for Neyman–Pearson (N–P) tests 1.2Severity rationale: induction as severe testing 1.3Severity as a meta-statistical concept: three required restrictions on the N–P paradigm 1.1Behavioristic and inferential rationales for Neyman–Pearson (N–P) tests 1.2Severity rationale: induction as severe testing 1.3Severity as a meta-statistical concept: three required restrictions on the N–P paradigm Error statistical tests from the severity perspective 2.1N–P test T(α): type I, II error probabilities and power 2.2Specifying test T(α) using p-values 2.1N–P test T(α): type I, II error probabilities and power 2.2Specifying test T(α) using p-values Neymans post-data use of power 3.1Neyman: does failure to reject H warrant confirming H? 3.1Neyman: does failure to reject H warrant confirming H? Severe testing as a basic concept for an adequate post-data inference 4.1The severity interpretation of acceptance (SIA) for test T(α) 4.2The fallacy of acceptance (i.e., an insignificant difference): Ms Rosy 4.3Severity and power 4.1The severity interpretation of acceptance (SIA) for test T(α) 4.2The fallacy of acceptance (i.e., an insignificant difference): Ms Rosy 4.3Severity and power Fallacy of rejection: statistical vs. substantive significance 5.1Taking a rejection of H0 as evidence for a substantive claim or theory 5.2A statistically significant difference from H0 may fail to indicate a substantively important magnitude 5.3Principle for the severity interpretation of a rejection (SIR) 5.4Comparing significant results with different sample sizes in T(α): large n problem 5.5General testing rules for T(α), using the severe testing concept 5.1Taking a rejection of H0 as evidence for a substantive claim or theory 5.2A statistically significant difference from H0 may fail to indicate a substantively important magnitude 5.3Principle for the severity interpretation of a rejection (SIR) 5.4Comparing significant results with different sample sizes in T(α): large n problem 5.5General testing rules for T(α), using the severe testing concept The severe testing concept and confidence intervals 6.1Dualities between one and two-sided intervals and tests 6.2Avoiding shortcomings of confidence intervals 6.1Dualities between one and two-sided intervals and tests 6.2Avoiding shortcomings of confidence intervals Beyond the N–P paradigm: pure significance, and misspecification tests Concluding comments: have we shown severity to be a basic concept in a N–P philosophy of induction?
Philosophy of Science | 2004
Deborah G. Mayo; Aris Spanos
The growing availability of computer power and statistical software has greatly increased the ease with which practitioners apply statistical methods, but this has not been accompanied by attention to checking the assumptions on which these methods are based. At the same time, disagreements about inferences based on statistical research frequently revolve around whether the assumptions are actually met in the studies available, e.g., in psychology, ecology, biology, risk assessment. Philosophical scrutiny can help disentangle ‘practical’ problems of model validation, and conversely, a methodology of statistical model validation can shed light on a number of issues of interest to philosophers of science.
Philosophy of Science | 1987
Deborah G. Mayo; Norman L. Gilinsky
The key problem in the controversy over group selection is that of defining a criterion of group selection that identifies a distinct causal process that is irreducible to the causal process of individual selection. We aim to clarify this problem and to formulate an adequate model of irreducible group selection. We distinguish two types of group selection models, labeling them type I and type II models. Type I models are invoked to explain differences among groups in their respective rates of production of contained individuals. Type II models are invoked to explain differences among groups in their respective rates of production of distinct new groups. Taking Elliott Sobers model as an exemplar, we argue that although type I models have some biological importance--they force biologists to consider the role of group properties in influencing the fitness of organisms--they fail to identify a distinct group-level causal selection process. Type II models if properly framed, however, do identify a group-level causal selection process that is not reducible to individual selection. We propose such a type II model and apply it to some of the major candidates for group selection.
Archive | 2009
Deborah G. Mayo; Aris Spanos
Part I. Introduction and Background: 1. Philosophy of methodological practice Deborah Mayo 2. Error statistical philosophy Deborah Mayo and Aris Spanos Part II: 3. Severe testing, error statistics, and the growth of theoretical knowledge Deborah Mayo Part III: 4. Can scientific theories be warranted? Alan Chalmers 5. Can scientific theories be warranted with severity? Exchanges with Alan Chalmers Deborah Mayo Part IV: 6. Critical rationalism, explanation and severe tests Alan Musgrave 7. Towards progressive critical rationalism: exchanges with Alan Musgrave Deborah Mayo Part V: 8. Error, tests and theory-confirmation John Worrall 9. Has Worrall saved his theory (on ad hoc saves) in a non ad hoc manner? Exchanges with Worrall Deborah Mayo Part VI: 10. Mills sins, or Mayos errors? Peter Achinstein 11. Sins of the Bayesian epistemologist: exchanges with Achinstein Deborah Mayo Part VII: 12. Theory testing in economics and the error statistical perspective Aris Spanos Part VIII: 13. Frequentist statistics as a theory of inductive inference Deborah Mayo and David Cox 14. Objectivity and conditionality in Frequentist inference David Cox and Deborah Mayo 15. An error in the argument from WCP and S to the SLP Deborah Mayo 16. On a new philosophy of Frequentist inference: exchanges with Cox and Mayo Aris Spanos Part IX: 17. Explanation and truth Clark Glymour 18. Explanation and testing: exchanges with Glymour Deborah Mayo 19. Graphical causal modeling and error statistics: exchanges with Glymour Aris Spanos Part X: 20. Legal epistemology: the anomaly of affirmative defenses Larry Laudan 21. Error and the law: exchanges with Laudan Deborah Mayo.
Archive | 2001
Deborah G. Mayo; Michael Kruse
What do data tell us about hypotheses or claims? When do data provide good evidence for or a good test of a hypothesis? These are key questions for a philosophical account of evidence and inference, and in answering them, philosophers of science have often appealed to formal accounts of probabilistic and statistical inference. In so doing, it is obvious that the answer will depend on the principles of inference embodied in one or another statistical account. If inference is by way of Bayes’ theorem, then two data sets license different inferences only by registering differently in the Bayesian algorithm. If inference is by way of error statistical methods (e.g., Neyman and Pearson methods), as are commonly used in applications of statistics in science, then two data sets license different inferences or hypotheses if they register differences in the error probabilistic properties of the methods.
Philosophy of Science | 2000
Deborah G. Mayo
In seeking general accounts of evidence, confirmation, or inference, philosophers have looked to logical relationships between evidence and hypotheses. Such logics of evidential relationship, whether hypothetico-deductive, Bayesian, or instantiationist fail to capture or be relevant to scientific practice. They require information that scientists do not generally have (e.g., an exhaustive set of hypotheses), while lacking slots within which to include considerations to which scientists regularly appeal (e.g., error probabilities). Building on my co-symposiasts contributions, I suggest some directions in which a new and more adequate philosophy of evidence can move.
Philosophy of Science | 1985
Deborah G. Mayo
While orthodox (Neyman-Pearson) statistical tests enjoy widespread use in science, the philosophical controversy over their appropriateness for obtaining scientific knowledge remains unresolved. I shall suggest an explanation and a resolution of this controversy. The source of the controversy, I argue, is that orthodox tests are typically interpreted as rules for making optimal decisions as to how to behave--where optimality is measured by the frequency of errors the test would commit in a long series of trials. Most philosophers of statistics, however, view the task of statistical methods as providing appropriate measures of the evidential-strength that data affords hypotheses. Since tests appropriate for the behavioral-decision task fail to provide measures of evidential-strength, philosophers of statistics claim the use of orthodox tests in science is misleading and unjustified. What critics of orthodox tests overlook, I argue, is that the primary function of statistical tests in science is neither to decide how to behave nor to assign measures of evidential strength to hypotheses. Rather, tests provide a tool for using incomplete data to learn about the process that generated it. This they do, I show, by providing a standard for distinguishing differences (between observed and hypothesized results) due to accidental or trivial errors from those due to systematic or substantively important discrepancies. I propose a reinterpretation of a commonly used orthodox test to make this learning model of tests explicit.
Synthese | 1983
Deborah G. Mayo
Theories of statistical testing may be seen as attempts to provide systematic means for evaluating scientific conjectures on the basis of incomplete or inaccurate observational data. The Neyman-Pearson Theory of Testing (NPT) has purported to provide an objective means for testing statistical hypotheses corresponding to scientific claims. Despite their widespread use in science, methods of NPT have themselves been accused of failing to be objective; and the purported objectivity of scientific claims based upon NPT has been called into question. The purpose of this paper is first to clarify this question by examining the conceptions of (I) the function served by NPT in science, and (II) the requirements of an objective theory of statistics upon which attacks on NPTs objectivity are based. Our grounds for rejecting these conceptions suggest altered conceptions of (I) and (II) that might avoid such attacks. Second, we propose a reformulation of NPT, denoted by NPT*, based on these altered conceptions, and argue that it provides an objective theory of statistics. The crux of our argument is that by being able to objectively control error frequencies NPT* is able to objectively evaluate what has or has not been learned from the result of a statistical test.
Philosophy of Science | 1997
Deborah G. Mayo
The error statistical account of testing uses statistical considerations, not to provide a measure of probability of hypotheses, but to model patterns of irregularity that are useful for controlling, distinguishing, and learning from errors. The aim of this paper is (1) to explain the main points of contrast between the error statistical and the subjective Bayesian approach and (2) to elucidate the key errors that underlie the central objection raised by Colin Howson at our PSA 96 Symposium.
Archive | 1981
Deborah G. Mayo
At a recent conference on problems in economics the following problem was raised: Econometricians like to think of themselves as scientists, and their methods as scientific. Students of the philosophy of science, on the other hand, have not had any notable success in relating the formal concepts of scientific method or the logic of scientific explanation and theory construction to either the method or the theory of econometrics [3, p. 238]. This should not be taken to mean that the theory and practice of economics and econometrics is not scientific. It rather points to the need for a greater effort among philosophers to tie their analyses to actual scientific practice. More specifically, it indicates a need for philosophers of science to examine statistical theorizing in science, since inference and explanation in economics is often statistical in nature.