Jose D. Perezgonzalez
Massey University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jose D. Perezgonzalez.
Frontiers in Psychology | 2015
Jose D. Perezgonzalez
Despite frequent calls for the overhaul of null hypothesis significance testing (NHST), this controversial procedure remains ubiquitous in behavioral, social and biomedical teaching and research. Little change seems possible once the procedure becomes well ingrained in the minds and current practice of researchers; thus, the optimal opportunity for such change is at the time the procedure is taught, be this at undergraduate or at postgraduate levels. This paper presents a tutorial for the teaching of data testing procedures, often referred to as hypothesis testing theories. The first procedure introduced is Fishers approach to data testing—tests of significance; the second is Neyman-Pearsons approach—tests of acceptance; the final procedure is the incongruent combination of the previous two theories into the current approach—NSHT. For those researchers sticking with the latter, two compromise solutions on how to improve NHST conclude the tutorial.
Frontiers in Psychology | 2015
Jose D. Perezgonzalez
Schneiders (2015) article is contemporary work addressing the shortcomings of null hypothesis significance testing (NHST). It summarizes previous work on the topic and provides original examples illustrating NHST-induced confusions in scientometrics. Among the confusions cited are those associated with the interpretation of p-values, old misinterpretations already investigated by Oakes (1986), Falk and Greenbaum (1995); Haller and Krauss (2000), and Perezgonzalez (2014a), and discussed in, for example, Carver (1978); Nickerson (2000), Hubbard and Bayarri (2003); Kline (2004), and Goodman (2008). That they are still relevant in recent times testifies to the fact that the lessons of the past have not been learnt. As the title anticipates, there is a twist to this saga, a pedagogical one: p-values are typically taught and presented as probabilities, and this may be the cause behind the confusions. A change in the heuristic we use for teaching and interpreting the meaning of p-values may be all we need to start working the path toward clarification and understanding. In this article I will illustrate the differences in interpretation that a percentile heuristic and a probability one make. As guiding example, I will use a one-tailed p-value in a normal distribution—z = −1.75, p = 0.04; Figure Figure1).1). The default testing approach will be Fishers tests of significance, but Neyman–Pearsons tests of acceptance approach will be assumed when discussing Type I errors and alternative hypotheses (for more information about those approaches see Perezgonzalez, 2014b, 2015). The scenario is the scoring of a sample of suspected schizophrenics on a validated psychological normality scale. The hypothesis tested (Fishers H0, Neyman–Pearsons HM) is that the mean score of the sample on the normality scale does not differ from that of the normal population (no H0 = the sample does not score as normal; HA = the sample scores as schizophrenic, assuming previous knowledge that schizophrenics score low on the scale, by a given effect size). Neither a level of significance nor a rejection region is needed for the discussion. Figure 1 Location of an observed z-score and its corresponding p-value in the frequency distribution of the hypothesis under test. The accompanying scales are for the theoretical z-scores and percentiles, respectively.
Frontiers in Psychology | 2015
Jose D. Perezgonzalez
Cummings (2014) article was commissioned by Psychological Science to reinforce the journals 2014 publication guidelines. It exhorts substituting confidence intervals (CIs) for Null Hypothesis Significance Testing (NHST) as a way of increasing the scientific value of psychological research. Cummings article is somehow biased, hence the aims of my commentary: to balance out the presentation of statistical tests and to fend CIs against misinterpretations. Researchers with an interest in the correct philosophical application of tests and CIs are my target audience.
Theory & Psychology | 2014
Jose D. Perezgonzalez
Significance testing has been controversial since Neyman and Pearson published their procedure for testing statistical hypotheses. Fisher, who popularized tests of significance, first noticed the emerging confusion between that procedure and his own, yet he could not stop their hybridization into what is nowadays known as Null Hypothesis Significance Testing (NHST). Here I hypothesize why similar attempts to clarify matters have also failed; namely because both procedures are designed to be confused: their names may not match purpose, both use null hypotheses and levels of significance yet for different goals, and p-values, errors, alternative hypotheses, and significance only apply to one procedure yet are commonly used with both. I also propose a reconceptualization of the procedures to prevent further confusion.
Frontiers in Psychology | 2015
Jose D. Perezgonzalez
Braver et al. (2014) article was published by Perspectives on Psychological Science as part of a special issue on advancing psychology toward a cumulative science. The article contributes to such advance by proposing using meta-analysis cumulatively, rather than waiting for a long number of replications before running such meta-analysis. Braver et al.s article sits well alongside a recent call for reforming psychological methods, under the umbrella of “the new statistics” (Cumming, 2012). As it happens with the latter, the method referred to is not new, only the call to use it is. Indeed, the idea behind a continuously cumulating meta-analysis (CCMA) was already put forward by Rosenthal as far back as 1978 and repeated since (e.g., Rosenthal, 1984, 1991). Yet, the reminder is as relevant today as it has been in the past, more so if we want to get psychology, and our own research within it, at the frontier of science. I will, however, take this opportunity to comment on an issue which I find contentious: the meaning of the replication used to prove the point. Braver et al. define the criterion for a successful replication as achieving conventional levels of significance. They also identify the typical low power of psychological research as a main culprit for failing to replicate studies. Indeed, they went ahead and simulated two normal populations with a medium effect size mean difference between them, from which they randomly drew 10,000 pairs of underpowered samples. The results they obtained fulfilled power expectations: about 42% of the initial studies, about 41% of the replications, and about 70% of the combined study-replication pairs turned out statistically significant—the latter supposedly supporting the benefits of CCMA over the uncombined studies. What the authors fail to notice, however, is that the meaning of replication differs depending on the data testing approach used: Fishers approach is not the same than Neyman–Pearsons (Neyman, 1942, 1955; Fisher, 1955, 1973; MacDonald, 2002; Gigerenzer, 2004; Hubbard, 2004; Louca, 2008; Perezgonzalez, 2015a). Neyman and Pearsons approach (1933) is based on repeated sampling from the same population while keeping an eye on power, which is Braver et al.s simulation setting (Neyman and Pearson, 1933). However, under this approach a successful replication reduces to a count of significant results in the long run, which translates to about 80% of significant replications when power is 0.8, or to about 41% when power is 0.41. Albeit not intentionally pursued, this is what Braver et al.s Table 1 shows (power lines 1 and 2, and criteria 1, 2, and 4—combining studies is not expected under Neyman–Pearsons approach but, given the nature of the simulation, such combination can be taken as a third set of studies that uses larger sample sizes and, thus, more power; criteria 5–10 can be considered punctilious studies under criterion 4). That is, Braver et al.s power results effectively replicate the population effect size the authors chose for their simulation. On the other hand, the 10,000 runs of study-replication pairs address replication under a different testing approach, that of Fishers, arguably the default one in todays research (Spielman, 1978; Johnstone, 1986; Cortina and Dunlap, 1997; Hubbard, 2004; Perezgonzalez, 2015b). Under Fishers approach (1954), power has no inherent meaning—a larger sample size is more sensitive to a departure from the null hypothesis and, thus, preferable, but the power of the test is of no relevance. There is not knowing (or guessing) the true population effect size beforehand, either, in which case meta-analysis helps to approximate better the unknown effect size, exactly what Braver et al.s Table 2 illustrates. It is under this approach that accumulating studies works, as a way of increasing our knowledge further—something that Fisher (1954) had already suggested. This is also the approach under which Rosenthal presented his techniques for meta-analysis—indeed, he did not contemplate power in 1978 or 1984, and his mentioning it in 1991 seems to be rather marginal to the techniques themselves. There are other ways of carrying out replications, though, ways more attuned to the “new statistics”—which is to say, ways already discussed by Rosenthal (1978). One of these is to attend to the effect sizes of studies and replications, to better know what we want to know (Cohen, 1994) instead of merely making dichotomous decisions based on significance (Rosenthal, 1991). Another way is to attend to the confidence intervals of studies and replications, as Cumming (2012) suggests. In summary, Braver et al.s call for CCMA is a worthy one, even if their simulation confused the meaning of replication under different testing approaches. One thing left to do for this call to have better chances of succeeding is to make CCMA easier to implement. For such purpose, the interested researcher has a suite of readily available meta-analysis computer applications for Microsofts Excel, such as ESCI (http://www.latrobe.edu.au/psy/research/cognitive-and-developmental-psychology/esci) and MIX (http://www.meta-analysis-made-easy.com), and standalone computer programs such as RevMan (http://tech.cochrane.org/revman) and CMA (http://www.meta-analysis.com)—for more resources see also https://www.researchgate.net/post/Which_meta-analysis_software_is_easy_to_use/1.
Frontiers in Psychology | 2018
David Trafimow; Valentin Amrhein; Corson N. Areshenkoff; Carlos Barrera-Causil; Eric J. Beh; Yusuf K. Bilgic; Roser Bono; Michael T. Bradley; William M. Briggs; Héctor A. Cepeda-Freyre; Sergio E. Chaigneau; Daniel R. Ciocca; Juan Carlos Correa; Denis Cousineau; Michiel R. de Boer; Subhra Sankar Dhar; Igor Dolgov; Juana Gómez-Benito; Marian Grendar; James W. Grice; Martin E. Guerrero-Gimenez; Andrés Gutiérrez; Tania B. Huedo-Medina; Klaus Jaffe; Armina Janyan; Ali Karimnezhad; Fränzi Korner-Nievergelt; Koji Kosugi; Martin Lachmair; Rubén Ledesma
We argue that making accept/reject decisions on scientific hypotheses, including a recent call for changing the canonical alpha level from p = 0.05 to p = 0.005, is deleterious for the finding of new discoveries and the progress of science. Given that blanket and variable alpha levels both are problematic, it is sensible to dispense with significance testing altogether. There are alternatives that address study design and sample size much more directly than significance testing does; but none of the statistical tools should be taken as the new magic method giving clear-cut mechanical answers. Inference should not be based on single studies at all, but on cumulative evidence from multiple independent studies. When evaluating the strength of the evidence, we should consider, for example, auxiliary assumptions, the strength of the experimental design, and implications for applications. To boil all this down to a binary decision based on a p-value threshold of 0.05, 0.01, 0.005, or anything else, is not acceptable.
The New Zealand Medical Journal | 2012
Andrew Gilbey; Jose D. Perezgonzalez
Frontiers in Psychology | 2015
Jose D. Perezgonzalez
Archive | 2010
Jose D. Perezgonzalez; Bo Lin
Archive | 2011
Jose D. Perezgonzalez; Bo Lin