Featured Researches

Applications

A recipe for accurate estimation of lifespan brain trajectories, distinguishing longitudinal and cohort effects

We address the problem of estimating how different parts of the brain develop and change throughout the lifespan, and how these trajectories are affected by genetic and environmental factors. Estimation of these lifespan trajectories is statistically challenging, since their shapes are typically highly nonlinear, and although true change can only be quantified by longitudinal examinations, as follow-up intervals in neuroimaging studies typically cover less than 10 \% of the lifespan, use of cross-sectional information is necessary. Linear mixed models (LMMs) and structural equation models (SEMs) commonly used in longitudinal analysis rely on assumptions which are typically not met with lifespan data, in particular when the data consist of observations combined from multiple studies. While LMMs require a priori specification of a polynomial functional form, SEMs do not easily handle data with unstructured time intervals between measurements. Generalized additive mixed models (GAMMs) offer an attractive alternative, and in this paper we propose various ways of formulating GAMMs for estimation of lifespan trajectories of 12 brain regions, using a large longitudinal dataset and realistic simulation experiments. We show that GAMMs are able to more accurately fit lifespan trajectories, distinguish longitudinal and cross-sectional effects, and estimate effects of genetic and environmental exposures. Finally, we discuss and contrast questions related to lifespan research which strictly require repeated measures data and questions which can be answered with a single measurement per participant, and in the latter case, which simplifying assumptions that need to be made. The examples are accompanied with R code, providing a tutorial for researchers interested in using GAMMs.

Read more
Applications

A review of Bayesian perspectives on sample size derivation for confirmatory trials

Sample size derivation is a crucial element of the planning phase of any confirmatory trial. A sample size is typically derived based on constraints on the maximal acceptable type I error rate and a minimal desired power. Here, power depends on the unknown true effect size. In practice, power is typically calculated either for the smallest relevant effect size or a likely point alternative. The former might be problematic if the minimal relevant effect is close to the null, thus requiring an excessively large sample size. The latter is dubious since it does not account for the a priori uncertainty about the likely alternative effect size. A Bayesian perspective on the sample size derivation for a frequentist trial naturally emerges as a way of reconciling arguments about the relative a priori plausibility of alternative effect sizes with ideas based on the relevance of effect sizes. Many suggestions as to how such `hybrid' approaches could be implemented in practice have been put forward in the literature. However, key quantities such as assurance, probability of success, or expected power are often defined in subtly different ways in the literature. Starting from the traditional and entirely frequentist approach to sample size derivation, we derive consistent definitions for the most commonly used `hybrid' quantities and highlight connections, before discussing and demonstrating their use in the context of sample size derivation for clinical trials.

Read more
Applications

A robust and non-parametric model for prediction of dengue incidence

Disease surveillance is essential not only for the prior detection of outbreaks but also for monitoring trends of the disease in the long run. In this paper, we aim to build a tactical model for the surveillance of dengue, in particular. Most existing models for dengue prediction exploit its known relationships between climate and socio-demographic factors with the incidence counts, however they are not flexible enough to capture the steep and sudden rise and fall of the incidence counts. This has been the motivation for the methodology used in our paper. We build a non-parametric, flexible, Gaussian Process (GP) regression model that relies on past dengue incidence counts and climate covariates, and show that the GP model performs accurately, in comparison with the other existing methodologies, thus proving to be a good tactical and robust model for health authorities to plan their course of action.

Read more
Applications

A robust nonlinear mixed-effects model for COVID-19 deaths data

The analysis of complex longitudinal data such as COVID-19 deaths is challenging due to several inherent features: (i) Similarly-shaped profiles with different decay patterns; (ii) Unexplained variation among repeated measurements within each country, these repeated measurements may be viewed as clustered data since they are taken on the same country at roughly the same time; (iii) Skewness, outliers or skew-heavy-tailed noises are possibly embodied within response variables. This article formulates a robust nonlinear mixed-effects model based in the class of scale mixtures of skew-normal distributions for modeling COVID-19 deaths, which allows the analysts to model such data in the presence of the above described features simultaneously. An efficient EM-type algorithm is proposed to carry out maximum likelihood estimation of model parameters. The bootstrap method is used to determine inherent characteristics of the nonlinear individual profiles such as confidence interval of the predicted deaths and fitted curves. The target is to model COVID-19 deaths curves from some Latin American countries since this region is the new epicenter of the disease. Moreover, since a mixed-effect framework borrows information from the population-average effects, in our analysis we include some countries from Europe and North America that are in a more advanced stage of their COVID-19 deaths curve.

Read more
Applications

A safety factor approach to designing urban infrastructure for dynamic conditions

Current approaches to design flood-sensitive infrastructure typically assume a stationary rainfall distribution and neglect many uncertainties. These assumptions are inconsistent with observations that suggest intensifying extreme precipitation events and the uncertainties surrounding projections of the coupled natural-human systems. Here we show that assuming climate stationarity and neglecting deep uncertainties can drastically underestimate flood risks and lead to poor infrastructure design choices. We find that climate uncertainty dominates the socioeconomic and engineering uncertainties that impact the hydraulic reliability in stormwater drainage systems. We quantify the upfront costs needed to achieve higher hydraulic reliability and robustness against the deep uncertainties surrounding projections of rainfall, surface runoff characteristics, and infrastructure lifetime. Depending on the location, we find that adding safety factors of 1.4 to 1.7 to the standard stormwater pipe design guidance produces robust performance to the considered deep uncertainties.

Read more
Applications

A sampling algorithm to compute the set of feasible solutions for non-negative matrix factorization with an arbitrary rank

Non-negative Matrix Factorization (NMF) is a useful method to extract features from multivariate data, but an important and sometimes neglected concern is that NMF can result in non-unique solutions. Often, there exist a Set of Feasible Solutions (SFS), which makes it more difficult to interpret the factorization. This problem is especially ignored in cancer genomics, where NMF is used to infer information about the mutational processes present in the evolution of cancer. In this paper the extent of non-uniqueness is investigated for two mutational counts data, and a new sampling algorithm, that can find the SFS, is introduced. Our sampling algorithm is easy to implement and applies to an arbitrary rank of NMF. This is in contrast to state of the art, where the NMF rank must be smaller than or equal to four. For lower ranks we show that our algorithm performs similarly to the polygon inflation algorithm that is developed in relations to chemometrics. Furthermore, we show how the size of the SFS can have a high influence on the appearing variability of a solution. Our sampling algorithm is implemented in an R package \textbf{SFS} (\url{this https URL}).

Read more
Applications

A scala library for spatial sensitivity analysis

The sensitivity analysis and validation of simulation models require specific approaches in the case of spatial models. We describe the spatialdata scala library providing such tools, including synthetic generators for urban configurations at different scales, spatial networks, and spatial point processes. These can be used to parametrize geosimulation models on synthetic configurations, and evaluate the sensitivity of model outcomes to spatial configuration. The library also includes methods to perturb real data, and spatial statistics indicators, urban form indicators, and network indicators. It is embedded into the OpenMOLE platform for model exploration, fostering the application of such methods without technical constraints.

Read more
Applications

A spatial Poisson hurdle model with application to wildfires

Modelling wildfire occurrences is important for disaster management including prevention, detection and suppression of large catastrophic events. We present a spatial Poisson hurdle model for exploring geographical variation of monthly counts of wildfire occurrences and apply it to Indonesia and Australia. The model includes two a priori independent spatially structured latent effects that account for residual spatial variation in the probability of wildfire occurrence, and the positive count rate given an occurrence. Inference is provided by empirical Bayes using the Laplace approximation to the marginal posterior which provides fast inference for latent Gaussian models with sparse structures. In both cases, our model matched several empirically known facts about wildfires. We conclude that elevation, percentage tree cover, relative humidity, surface temperature, and the interaction between humidity and temperature to be important predictors of monthly counts of wildfire occurrences. Further, our findings show opposing effects for surface temperature and its interaction with relative humidity.

Read more
Applications

A statistical method for estimating the no-observed-adverse-event-level

In toxicological risk assessment the benchmark dose (BMD) is recommended instead of the no-observed-adverse effect-level (NOAEL). Still a simple test procedure to estimate NOAEL is proposed here, explaining its advantages and disadvantages. Versatile applicability is illustrated using four different data examples of selected in vivo toxicity bioassays.

Read more
Applications

A study on information behavior of scholars for article keywords selection

This project takes the factors of keyword selection behavior as the research object. Qualitative analysis methods such as interview and grounded theory were used to construct causal influence path model. Combined with computer simulation technology such as multi-agent simulation experiment method was used to study the factors of keyword selection from two dimensions of individual to group. The research was carried out according to the path of factor analysis at individual level macro situation simulation optimization of scientific research data management. Based on the aforementioned review of existing researches and explanations of keywords selection, this study adopts a qualitative research design to expand the explanation, and macro simulation based on the results of qualitative research. There are two steps in this study, one is do interview with authors and then design macro simulation according the deductive and qualitative content analysis results.

Read more

Ready to get started?

Join us today