Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dario Gasbarra is active.

Publication


Featured researches published by Dario Gasbarra.


Journal of the American Statistical Association | 1996

Bayesian Inference of Survival Probabilities, under Stochastic Ordering Constraints

Elja Arjas; Dario Gasbarra

Abstract In the statistical analysis of survival data arising from two populations, it often happens that the analyst knows, a priori, that the life lengths in one population are stochastically shorter than those in the other. Nevertheless, survival probability estimates, if determined separately from the corresponding samples, may not be consistent with this prior assumption, because of inherent statistical variability in the observations. This problem has been considered in a number of papers during the past decade, by adopting a (generalized) maximum likelihood approach. Our approach is Bayesian and, in essence, nonparametric. The a priori assumption regarding stochastic ordering is formulated naturally in terms of a joint prior distribution defined for pairs of survival functions. Nonparametric specification of the model, based on hazard rates and using a few hyperparameters, allows for sufficient flexibility in practical applications. The numerical computations are based on a coupled version of the M...


Lifetime Data Analysis | 2002

Testing equality of cause-specific hazard rates corresponding to m competing risks among K groups

Sangita Kulathinal; Dario Gasbarra

In this paper, a class of tests is developed for comparing the cause-specific hazard rates of m competing risks simultaneously in K (> or = 2) groups. The data available for a unit are the failure time of the unit along with the identifier of the risk claiming the failure. In practice, the failure time data are generally right censored. The tests are based on the difference between the weighted averages of the cause-specific hazard rates corresponding to each risk. No assumption regarding the dependence of the competing risks is made. It is shown that the proposed test statistic has asymptotically chi-squared distribution. The proposed test is shown to be optimal for a specific type of local alternatives. The choice of weight function is also discussed. A simulation study is carried out using multivariate Gumbel distribution to compare the optimal weight function with a proposed weight function which is to be used in practice. Also, the proposed test is applied to real data on the termination of an intrauterine device.


Computational Statistics & Data Analysis | 2009

Optimal designs to select individuals for genotyping conditional on observed binary or survival outcomes and non-genetic covariates

Juha Karvanen; Sangita Kulathinal; Dario Gasbarra

In gene-disease association studies, the cost of genotyping makes it economical to use a two-stage design where only a subset of the cohort is genotyped. At the first-stage, the follow-up data along with some risk factors or non-genetic covariates are collected for the cohort and a subset of the cohort is then selected for genotyping at the second-stage. Intuitively the selection of the subset for the second-stage could be carried out efficiently if the data collected at the first-stage are utilized. The information contained in the conditional probability of the genotype given the first-stage data and the initial estimates of the parameters of interest is being maximized for efficient selection of the subset. The proposed selection method is illustrated using the logistic regression and Coxs proportional hazards model and algorithms that can find optimal or nearly optimal designs in discrete design space are presented. Simulation comparisons between D-optimal design, extreme selection and case-cohort design suggest that D-optimal design is the most efficient in terms of variance of estimated parameters, but extreme selection may be a good alternative for practical study design.


Archive | 2006

Enlargement of Filtration and Additional Information in Pricing Models: Bayesian Approach

Dario Gasbarra; Esko Valkeila; Lioudmila Vostrikova

We show how the dynamical Bayesian approach can be used in the initial enlargement of filtrations theory. We use this approach to obtain new proofs and results for Levy processes. We apply the Bayesian approach to some problems concerning asymmetric information in pricing models, including so-called weak information approach introduced by Baudoin, as well as some other approaches. We give also Bayesian interpretation of utility gain related to asymmetric information.


IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2011

Estimating Haplotype Frequencies by Combining Data from Large DNA Pools with Database Information

Dario Gasbarra; Sangita Kulathinal; Matti Pirinen; Mikko J. Sillanpää

We assume that allele frequency data have been extracted from several large DNA pools, each containing genetic material of up to hundreds of sampled individuals. Our goal is to estimate the haplotype frequencies among the sampled individuals by combining the pooled allele frequency data with prior knowledge about the set of possible haplotypes. Such prior information can be obtained, for example, from a database such as HapMap. We present a Bayesian haplotyping method for pooled DNA based on a continuous approximation of the multinomial distribution. The proposed method is applicable when the sizes of the DNA pools and/or the number of considered loci exceed the limits of several earlier methods. In the example analyses, the proposed model clearly outperforms a deterministic greedy algorithm on real data from the HapMap database. With a small number of loci, the performance of the proposed method is similar to that of an EM-algorithm, which uses a multinormal approximation for the pooled allele frequencies, but which does not utilize prior information about the haplotypes. The method has been implemented using Matlab and the code is available upon request from the authors.


Scandinavian Journal of Statistics | 2000

Analysis of Competing Risks by Using Bayesian Smoothing

Dario Gasbarra; S. R. Karia

We consider the competing risks set-up. In many practical situations, the conditional probability of the cause of failure given the failure time is of direct interest. We propose to model the competing risks by the overall hazard rate and the conditional probabilities rather than the cause-specific hazards. We adopt a Bayesian smoothing approach for both quantities of interest. Illustrations are given at the end.


BMC Bioinformatics | 2007

Estimating genealogies from linked marker data: a Bayesian approach

Dario Gasbarra; Matti Pirinen; Mikko J. Sillanpää; Elja Arjas

BackgroundAnswers to several fundamental questions in statistical genetics would ideally require knowledge of the ancestral pedigree and of the gene flow therein. A few examples of such questions are haplotype estimation, relatedness and relationship estimation, gene mapping by combining pedigree and linkage disequilibrium information, and estimation of population structure.ResultsWe present a probabilistic method for genealogy reconstruction. Starting with a group of genotyped individuals from some population isolate, we explore the state space of their possible ancestral histories under our Bayesian model by using Markov chain Monte Carlo (MCMC) sampling techniques. The main contribution of our work is the development of sampling algorithms in the resulting vast state space with highly dependent variables. The main drawback is the computational complexity that limits the time horizon within which explicit reconstructions can be carried out in practice.ConclusionThe estimates for IBD (identity-by-descent) and haplotype distributions are tested in several settings using simulated data. The results appear to be promising for a further development of the method.


Genetics Research | 2008

Estimating population haplotype frequencies from pooled DNA samples using PHASE algorithm.

Matti Pirinen; Sangita Kulathinal; Dario Gasbarra; Mikko J. Sillanpää

Recent studies show that the PHASE algorithm is a state-of-the-art method for population-based haplotyping from individually genotyped data. We present a modified version of PHASE for estimating population haplotype frequencies from pooled DNA data. The algorithm is compared with (i) a maximum likelihood estimation under the multinomial model and (ii) a deterministic greedy algorithm, on both simulated and real data sets (HapMap data). Our results suggest that the PHASE algorithm is a method of choice also on pooled DNA data. The main reason for improvement over the other approaches is assumed to be the same as with individually genotyped data: the biologically motivated model of PHASE takes into account correlated genealogical histories of the haplotypes by modelling mutations and recombinations. The important questions of efficiency of DNA pooling as well as influence of the pool size on the accuracy of the estimates are also considered. Our results are in line with the earlier findings in that the pool size should be relatively small, only 2-5 individuals in our examples, in order to provide reliable estimates of population haplotype frequencies.


Genetics | 2005

Constructing the Parental Linkage Phase and the Genetic Map Over Distances <1 cM Using Pooled Haploid DNA

Dario Gasbarra; Mikko J. Sillanpää

A new statistical approach for construction of the genetic linkage map and estimation of the parental linkage phase based on allele frequency data from pooled gametic (sperm or egg) samples is introduced. This method can be applied for estimation of recombination fractions (over distances <1 cM) and ordering of large numbers (even hundreds) of closely linked markers. This method should be extremely useful in species with a long generation interval and a large genome size such as in dairy cattle or in forest trees; the conifer species have haploid tissues available in megagametophytes. According to Mendelian expectation, two parental alleles should occur in gametes in 1:1 proportions, if segregation distortion does not occur. However, due to mere sampling variation, the observed proportions may deviate from their expected value in practice. These deviations and their dependence along the chromosome can provide information on the parental linkage phase and on the genetic linkage map. Usefulness of the method is illustrated with simulations. The role of segregation distortion as a source of these deviations is also discussed. The software implementing this method is freely available for research purposes from the authors.


Genetics | 2009

Bayesian Quantitative Trait Locus Mapping Based on Reconstruction of Recent Genetic Histories

Dario Gasbarra; Matti Pirinen; Mikko J. Sillanpää; Elja Arjas

We assume that quantitative measurements on a considered trait and unphased genotype data at certain marker loci are available on a sample of individuals from a background population. Our goal is to map quantitative trait loci by using a Bayesian model that performs, and makes use of, probabilistic reconstructions of the recent unobserved genealogical history (a pedigree and a gene flow at the marker loci) of the sampled individuals. This work extends variance component-based linkage analysis to settings where the unobserved pedigrees are considered as latent variables. In addition to the measured trait values and unphased genotype data at the marker loci, the method requires as an input estimates of the population allele frequencies and of a marker map, as well as some parameters related to the population size and the mating behavior. Given such data, the posterior distribution of the trait parameters (the number, the locations, and the relative variance contributions of the trait loci) is studied by using the reversible-jump Markov chain Monte Carlo methodology. We also introduce two shortcuts related to the trait parameters that allow us to do analytic integration, instead of stochastic sampling, in some parts of the algorithm. The method is tested on two simulated data sets. Comparisons with traditional variance component linkage analysis and association analysis demonstrate the benefits of our approach in a gene mapping context.

Collaboration


Dive into the Dario Gasbarra's collaboration.

Top Co-Authors

Avatar

Elja Arjas

University of Helsinki

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jia Liu

University of Jyväskylä

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Aki Vehtari

Helsinki Institute for Information Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge