Donald A. Jackson
University of Toronto
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Donald A. Jackson.
Ecology | 1993
Donald A. Jackson
Approaches to determining the number of components to interpret from principal components analysis were compared. Heuristic procedures included: retaining components with eigenvalues (Xs) > 1 (i.e., Kaiser-Guttman criterion); components with bootstrapped Xs > 1 (bootstrapped Kaiser-Guttman); the scree plot; the broken-stick model; and components with Xs totalling to a fixed amount of the total variance. Statistical ap- proaches included: Bartletts test of sphericity; Bartletts test of homogeneity of the cor- relation matrix, Lawleys test of the second X; bootstrapped confidence limits on successive Xs (i.e., significant differences between Xs); and bootstrapped confidence limits on eigen- vector coefficients (i.e., coefficients that differ significantly from zero). All methods were compared using simulated data matrices of uniform correlation structure, patterned ma- trices of varying correlation structure and data sets of lake morphometry, water chemistry, and benthic invertebrate abundance. The most consistent results were obtained from the broken-stick model and a combined measure using bootstrapped Xs and associated eigen- vector coefficients. The traditional and bootstrapped Kaiser-Guttman approaches over- estimated the number of nontrivial dimensions as did the fixed-amount-of-variance model. The scree plot consistently estimated one dimension more than the number of simulated dimensions. Bartletts test of sphericity showed inconsistent results. Both Bartletts test of homogeneity of the correlation matrix and Lawleys test are limited to testing for only one and two dimensions, respectively.
Oecologia | 2001
Pedro R. Peres-Neto; Donald A. Jackson
The Mantel test provides a means to test the association between distance matrices and has been widely used in ecological and evolutionary studies. Recently, another permutation test based on a Procrustes statistic (PROTEST) was developed to compare multivariate data sets. Our study contrasts the effectiveness, in terms of power and type I error rates, of the Mantel test and PROTEST. We illustrate the application of Procrustes superimposition to visually examine the concordance of observations for each dimension separately and how to conduct hypothesis testing in which the association between two data sets is tested while controlling for the variation related to other sources of data. Our simulation results show that PROTEST is as powerful or more powerful than the Mantel test for detecting matrix association under a variety of possible scenarios. As a result of the increased power of PROTEST and the ability to assess the match for individual observations (not available with the Mantel test), biologists now have an additional and powerful analytical tool to study ecological and evolutionary relationships.
Ecological Modelling | 2002
Julian D. Olden; Donald A. Jackson
Abstract With the growth of statistical modeling in the ecological sciences, researchers are using more complex methods, such as artificial neural networks (ANNs), to address problems associated with pattern recognition and prediction. Although in many studies ANNs have been shown to exhibit superior predictive power compared to traditional approaches, they have also been labeled a “black box” because they provide little explanatory insight into the relative influence of the independent variables in the prediction process. This lack of explanatory power is a major concern to ecologists since the interpretation of statistical models is desirable for gaining knowledge of the causal relationships driving ecological phenomena. In this study, we describe a number of methods for understanding the mechanics of ANNs (e.g. Neural Interpretation Diagram, Garsons algorithm, sensitivity analysis). Next, we propose and demonstrate a randomization approach for statistically assessing the importance of axon connection weights and the contribution of input variables in the neural network. This approach provides researchers with the ability to eliminate null-connections between neurons whose weights do not significantly influence the network output (i.e. predicted response variable), thus facilitating the interpretation of individual and interacting contributions of the input variables in the network. Furthermore, the randomization approach can identify variables that significantly contribute to network predictions, thereby providing a variable selection method for ANNs. We show that by extending randomization approaches to ANNs, the “black box” mechanics of ANNs can be greatly illuminated. Thus, by coupling this new explanatory power of neural networks with its strong predictive abilities, ANNs promise to be a valuable quantitative tool to evaluate, understand, and predict ecological phenomena.
Computational Statistics & Data Analysis | 2005
Pedro R. Peres-Neto; Donald A. Jackson; Keith M. Somers
Principal component analysis is one of the most widely applied tools in order to summarize common patterns of variation among variables. Several studies have investigated the ability of individual methods, or compared the performance of a number of methods, in determining the number of components describing common variance of simulated data sets. We identify a number of shortcomings related to these studies and conduct an extensive simulation study where we compare a larger number of rules available and develop some new methods. In total we compare 20 stopping rules and propose a two-step approach that appears to be highly effective. First, a Bartletts test is used to test the significance of the first principal component, indicating whether or not at least two variables share common variation in the entire data set. If significant, a number of different rules can be applied to estimate the number of non-trivial components to be retained. However, the relative merits of these methods depend on whether data contain strongly correlated or uncorrelated variables. We also estimate the number of non-trivial components for a number of field data sets so that we can evaluate the applicability of our conclusions based on simulated data.
Ecoscience | 1995
Donald A. Jackson
Abstract:A multivariate measure of the concordance or association between matrices of species abundances and environmental variables was generally lacking in ecology until recently. Traditional statistical procedures comparing such relationships are often unsuitable because of non-linearity among species and/or environmental data. To address these problems, I propose a randomization test based on Procrustes analysis. One matrix is subject to reflection, rigid rotation, translation, and dilation to minimize the sum of the squared residual deviations between points for each observation and the identical observation in the target matrix. This is a classical Procrustes approach to matrix analysis. To assess the significance of this measure of matrix concordance, I use a randomization test to determine whether the sum of residual deviations is less than that expected by chance. The PROcrustean randomization TEST (PROTEST) may be used with either raw data matrices or with multivariate summaries of the original ...
Ecology | 2003
Pedro R. Peres-Neto; Donald A. Jackson; Keith M. Somers
Principal component analysis (PCA) is one of the most commonly used tools in the analysis of ecological data. This method reduces the effective dimensionality of a multivariate data set by producing linear combinations of the original variables (i.e., com- ponents) that summarize the predominant patterns in the data. In order to provide meaningful interpretations for principal components, it is important to determine which variables are associated with particular components. Some data analysts incorrectly test the statistical significance of the correlation between original variables and multivariate scores using standard statistical tables. Others interpret eigenvector coefficients larger than an arbitrary absolute value (e.g., 0.50). Resampling, randomization techniques, and parallel analysis have been applied in a few cases. In this study, we compared the performance of a variety of approaches for assessing the significance of eigenvector coefficients in terms of type I error rates and power. Two novel approaches based on the broken-stick model were also evaluated. We used a variety of simulated scenarios to examine the influence of the number of real dimensions in the data; unique versus complex variables; the magnitude of eigen- vector coefficients; and the number of variables associated with a particular dimension. Our results revealed that bootstrap confidence intervals and a modified bootstrap confidence interval for the broken-stick model proved to be the most reliable techniques.
Ecology | 1989
Donald A. Jackson; Harold H. Harvey
Six geographic regions along the Laurentian Great Lakes in Ontario, represented by 286 lakes, were examined to identify the existence of regional similarities of fish species composition and their association to geographic location and regional patterns of lake morphology and pH. Lakes differed significantly among regions with respect to surface area, maximum depth, and pH. Species presence/absence data were summarized using correspondence analysis, and the resultant scores were used in multivariate analysis of variance and canonical variates analysis. These results indicated that the fish faunas of the six geographical areas were distinct. Interregional distances based on fish community scores, lake morphology–chemistry data, and geographical distances were contrasted using Mantels test. Regional faunal similarities were correlated significantly with geographical proximity, but not with lake morphology. We propose that post—glacial dispersal and lake thermal regimes are important determinants in structuring regional patterns of fish assemblages, whereas environmental conditions such as lake depth and pH assume greater importance in determining species compositions of individual lakes.
Transactions of The American Fisheries Society | 2002
Julian D. Olden; Donald A. Jackson; Pedro R. Peres-Neto
Abstract The prediction of species distributions is a primary goal in the study, conservation, and management of fisheries resources. Statistical models relating patterns of species presence or absence to multiscale habitat variables play an important role in this regard. Researchers, however, have paid little attention to how improper model validation and chance predictions can result in unfounded confidence in the performance and utility of such models. Using simulated and empirical data for 40 lake and stream fish species, we demonstrate that the commonly employed resubstitution approach to model validation (in which the same data are used for both model construction and prediction) produces highly biased estimates of correct classification rates and consequently an inaccurate perception of true model performance. In contrast, a jackknife approach to validation resulted in relatively unbiased estimates of model performance. The estimated rates of model correct classification are also shown to be substa...
Transactions of The American Fisheries Society | 2001
Julian D. Olden; Donald A. Jackson
Abstract Understanding and predicting the impacts of habitat modification and loss on fish populations are among the main challenges confronting fisheries biologists in the new millennium. Statistical models play an important role in this regard, providing a means to quantify how environmental conditions shape contemporary patterns in fish populations and communities and formulating this knowledge in a framework where future patterns can be predicted. Developing fish–habitat models by traditional statistical approaches is problematic because species often exhibit complex, nonlinear responses to environmental conditions and biotic interactions. We demonstrate the value of a robust statistical technique, artificial neural networks, relative to more traditional regression techniques for modeling such complexities in fish–habitat relationships. Using artificial neural networks, we provide both explanatory and predictive insight into the whole-lake and within-lake habitat factors shaping species occurrence and...
Ecoscience | 2000
Julian D. Olden; Donald A. Jackson
Abstract Multiple regression analysis continues to be a quantitative tool used extensively in the ecological literature. Consequently, methods for model selection and validation are important considerations, yet ecologists appear to pay little attention to how the choice of method can potentially influence the outcome and interpretation of their results. In this study we review commonly employed model selection and validation methods and use a Monte Carlo simulation approach to evaluate their ability to accurately estimate variable inclusion in the final regression model and model prediction error. We found that all methods of model selection erroneously excluded or included variables in the final model and the error rate depended on sample size and the number of predictor variables. In general, forward selection, backward elimination and stepwise selection showed better performance with small sample sizes, whereas a modified bootstrap approach outperformed other methods with larger sample sizes. Model selection using all-subsets or exhaustive search was highly biased, at times never selecting the correct predictor variables. Methods for model validation were also highly biased, with resubstitution and data-splitting (i.e., dividing the data into training and test samples) techniques producing biased and variable estimates of model prediction error. In contrast, jackknife validation was generally unbiased. Using an empirical example we show that the interpretation of the ecological relationships between fish species richness and lake habitat is highly dependent on the type of model selection and validation method employed. The fact that model selection is frequently unsuited to determine correct ecological relationships, and that traditional approaches for model validation over-estimate the strength and value of our empirical models, is a major concern.