Giles Hooker
Cornell University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Giles Hooker.
BioScience | 2009
Steve Kelling; Wesley M. Hochachka; Daniel Fink; Mirek Riedewald; Rich Caruana; Grant Ballard; Giles Hooker
The increasing availability of massive volumes of scientific data requires new synthetic analysis techniques to explore and identify interesting patterns that are otherwise not apparent. For biodiversity studies, a “data-driven” approach is necessary because of the complexity of ecological systems, particularly when viewed at large spatial and temporal scales. Data-intensive science organizes large volumes of data from multiple sources and fields and then analyzes them using techniques tailored to the discovery of complex patterns in high-dimensional data through visualizations, simulations, and various types of model building. Through interpreting and analyzing these models, truly novel and surprising patterns that are “born from the data” can be discovered. These patterns provide valuable insight for concrete hypotheses about the underlying ecological processes that created the observed data. Data-intensive science allows scientists to analyze bigger and more complex systems efficiently, and complements more traditional scientific processes of hypothesis generation and experimental testing to refine our understanding of the natural world.
Canadian Psychology | 2007
James O. Ramsay; Giles Hooker; Spencer Graves
The main characteristics of functional data and of functional models are introduced. Data on the growth of girls illustrate samples of functional observations, and data on the US nondurable goods manufacturing index are an example of a single long multilayered functional observation. Data on the gait of children and handwriting are multivariate functional observations. Functional data analysis also involves estimating functional parameters describing data that are not themselves functional, and estimating a probability density function for rainfall data is an example. A theme in functional data analysis is the use of information in derivatives, and examples are drawn from growth and weather data. The chapter also introduces the important problem of registration: aligning functional features.
Journal of Computational and Graphical Statistics | 2007
Giles Hooker
This article studies the problem of providing diagnostics for high-dimensional functions when the input variables are known to be dependent. In such situations, commonly used diagnostics can place an unduly large emphasis on functional behavior that occurs in regions of very low probability. Instead, a generalized functional ANOVA decomposition provides a natural representation of the function in terms of low-order components. This article details a weighted functional ANOVA that controls for the effect of dependence between input variables. The construction involves high-dimensional functions as nuisance parameters and suggests a novel estimation scheme for it. The methodology is demonstrated in the context of machine learning in which the possibility of poor extrapolation makes it important to restrict attention to regions of high data density.
Journal of Computational and Graphical Statistics | 2014
Matthew W. McLean; Giles Hooker; Ana-Maria Staicu; Fabian Scheipl; David Ruppert
We introduce the functional generalized additive model (FGAM), a novel regression model for association studies between a scalar response and a functional predictor. We model the link-transformed mean response as the integral with respect to t of F{X(t), t} where F( ·, ·) is an unknown regression function and X(t) is a functional covariate. Rather than having an additive model in a finite number of principal components as by Müller and Yao (2008), our model incorporates the functional predictor directly and thus our model can be viewed as the natural functional extension of generalized additive models. We estimate F( ·, ·) using tensor-product B-splines with roughness penalties. A pointwise quantile transformation of the functional predictor is also considered to ensure each tensor-product B-spline has observed data on its support. The methods are evaluated using simulated data and their predictive performance is compared with other competing scalar-on-function regression alternatives. We illustrate the usefulness of our approach through an application to brain tractography, where X(t) is a signal from diffusion tensor imaging at position, t, along a tract in the brain. In one example, the response is disease-status (case or control) and in a second example, it is the score on a cognitive test. The FGAM is implemented in R in the refund package. There are additional supplementary materials available online.
Wilderness & Environmental Medicine | 2005
Robert L. Norris; Jessica Ngo; Karen Nolan; Giles Hooker
Abstract Objective.—To determine whether volunteers (with or without prior medical training) can correctly apply pressure immobilization (PI) in a simulated snakebite scenario after receiving standard instructions describing the technique. Methods.—Twenty emergency medicine physicians (residents and attendings) and 20 lay volunteers without prior formal medical training were given standard printed instructions describing the application of PI for field management of snakebite. They were then supplied with appropriate materials and asked to apply the technique five separate times (twice to another individual [one upper and one lower extremity] and three times to themselves [nondominant upper extremity, dominant upper extremity, and one lower extremity]). Successful application was defined a priori by four criteria previously published in the literature: wrap begins at the bite site, entire extremity is wrapped, splint or sling is applied, and pressures under the dressing are between 40 and 70 mm Hg in upper-extremity application and between 55 and 70 mm Hg in lower-extremity use. Pressures were determined using a specially designed skin interface pressure-measuring device placed at the simulated bite site. Results.—The technique was correctly applied as judged by the preset criteria in only 13 out of 100 applications by emergency medicine physicians and in only 5 out of 100 applications by lay people. There was no significant difference in success rates between physicians and lay volunteers. Likewise, there was no significant difference in success based on which extremity was being wrapped. More detailed analysis revealed that the major contributor to failure was inability to achieve recommended target pressures. Conclusions.—Volunteers in a simulated snakebite scenario have difficulty applying PI correctly, as defined in the literature. The major source of failure is an inability to achieve recommended pressure levels under the dressing. New methods of instructing people in the proper use of PI or new technologies to guide or automate application are needed if this technique is to be used consistently in an effective manner for field management of bites by venomous snakes not known to cause significant local wound necrosis.
knowledge discovery and data mining | 2013
Yin Lou; Rich Caruana; Johannes Gehrke; Giles Hooker
Standard generalized additive models (GAMs) usually model the dependent variable as a sum of univariate models. Although previous studies have shown that standard GAMs can be interpreted by users, their accuracy is significantly less than more complex models that permit interactions. In this paper, we suggest adding selected terms of interacting pairs of features to standard GAMs. The resulting models, which we call GA2{M}
Journal of the Royal Society Interface | 2011
Giles Hooker; Stephen P. Ellner; Laura De Vargas Roditi; David J. D. Earn
-models, for Generalized Additive Models plus Interactions, consist of univariate terms and a small number of pairwise interaction terms. Since these models only include one- and two-dimensional components, the components of GA2M-models can be visualized and interpreted by users. To explore the huge (quadratic) number of pairs of features, we develop a novel, computationally efficient method called FAST for ranking all possible pairs of features as candidates for inclusion into the model. In a large-scale empirical study, we show the effectiveness of FAST in ranking candidate pairs of features. In addition, we show the surprising result that GA2M-models have almost the same performance as the best full-complexity models on a number of real datasets. Thus this paper postulates that for many problems, GA2M-models can yield models that are both intelligible and accurate.
Journal of Hepatology | 2011
Marija Zeremski; Giles Hooker; Marla A. Shu; Emily Winkelstein; Queenie Brown; Don C. Des Jarlais; Leslie H. Tobler; Barbara Rehermann; Michael P. Busch; Brian R. Edlin; Andrew H. Talal
Parameter estimation for infectious disease models is important for basic understanding (e.g. to identify major transmission pathways), for forecasting emerging epidemics, and for designing control measures. Differential equation models are often used, but statistical inference for differential equations suffers from numerical challenges and poor agreement between observational data and deterministic models. Accounting for these departures via stochastic model terms requires full specification of the probabilistic dynamics, and computationally demanding estimation methods. Here, we demonstrate the utility of an alternative approach, generalized profiling, which provides robustness to violations of a deterministic model without needing to specify a complete probabilistic model. We introduce novel means for estimating the robustness parameters and for statistical inference in this framework. The methods are applied to a model for pre-vaccination measles incidence in Ontario, and we demonstrate the statistical validity of our inference through extensive simulation. The results confirm that school term versus summer drives seasonality of transmission, but we find no effects of short school breaks and the estimated basic reproductive ratio ℛ0 greatly exceeds previous estimates. The approach applies naturally to any system for which candidate differential equations are available, and avoids many challenges that have limited Monte Carlo inference for state–space models.
knowledge discovery and data mining | 2004
Giles Hooker
BACKGROUND & AIMS Characterization of inflammatory mediators, such as chemokines, during acute hepatitis C virus (HCV) infection might shed some light on viral clearance mechanisms. METHODS Plasma levels of CXCR3 (CXCL9-11)- and CCR5 (CCL3-4)-associated chemokines, ALT, and HCV RNA were measured in nine injection drug users (median 26 samples/patient) before and during 10 acute (eight primary and two secondary) HCV infections. Using functional data analysis, we estimated smooth long-term trends in chemokine expression levels to obtain the magnitude and timing of overall changes. Residuals were analyzed to characterize short-term fluctuations. RESULTS CXCL9-11 induction began 38-53days and peaked 72-83days after virus acquisition. Increases in ALT levels followed a similar pattern. Substantial negative auto-correlations of chemokine levels at 1 week lags suggested substantial week-to-week oscillations. Significant correlations were observed between CXCL10 and HCV RNA as well as ALT and CXCR3-associated chemokines measured in the preceding week, CCL3-4 expression levels did not change appreciably during acute HCV infection. CONCLUSIONS Elevation of CXCR3-associated chemokines late during acute HCV infection suggests a role for cellular immune responses in chemokine induction. Week-to-week oscillations of HCV RNA, chemokines, and ALT suggest frequent, repeated cycles of gain and loss of immune control during acute hepatitis C.
Methods in Ecology and Evolution | 2016
Brittany J. Teller; Peter B. Adler; Collin B. Edwards; Giles Hooker; Stephen P. Ellner
Many automated learning procedures lack interpretability, operating effectively as a black box: providing a prediction tool but no explanation of the underlying dynamics that drive it. A common approach to interpretation is to plot the dependence of a learned function on one or two predictors. We present a method that seeks not to display the behavior of a function, but to evaluate the importance of non-additive interactions within any set of variables. Should the function be close to a sum of low dimensional components, these components can be viewed and even modeled parametrically. Alternatively, the work here provides an indication of where intrinsically high-dimensional behavior takes place.The calculations used in this paper correspond closely with the functional ANOVA decomposition; a well-developed construction in Statistics. In particular, the proposed score of interaction importance measures the loss associated with the projection of the prediction function onto a space of additive models. The algorithm runs in linear time and we present displays of the output as a graphical model of the function for interpretation purposes.