Samiran Sinha
Texas A&M University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Samiran Sinha.
Journal of the American Statistical Association | 2005
Samiran Sinha; Bhramar Mukherjee; Malay Ghosh; Bani K. Mallick; Raymond J. Carroll
This article considers Bayesian analysis of matched case-control problems when one of the covariates is partially missing. Within the likelihood context, the standard approach to this problem is to posit a fully parametric model among the controls for the partially missing covariate as a function of the covariates in the model and the variables making up the strata. Sometimes the strata effects are ignored at this stage. Our approach differs not only in that it is Bayesian, but, far more importantly, in the manner in which it treats the strata effects. We assume a Dirichlet process prior with a normal base measure for the stratum effects and estimate all of the parameters in a Bayesian framework. Three matched case-control examples and a simulation study are considered to illustrate our methods and the computing scheme.
Biometrics | 2010
Samiran Sinha; Bani K. Mallick; Victor Kipnis; Raymond J. Carroll
We propose a semiparametric Bayesian method for handling measurement error in nutritional epidemiological data. Our goal is to estimate nonparametrically the form of association between a disease and exposure variable while the true values of the exposure are never observed. Motivated by nutritional epidemiological data, we consider the setting where a surrogate covariate is recorded in the primary data, and a calibration data set contains information on the surrogate variable and repeated measurements of an unbiased instrumental variable of the true exposure. We develop a flexible Bayesian method where not only is the relationship between the disease and exposure variable treated semiparametrically, but also the relationship between the surrogate and the true exposure is modeled semiparametrically. The two nonparametric functions are modeled simultaneously via B-splines. In addition, we model the distribution of the exposure variable as a Dirichlet process mixture of normal distributions, thus making its modeling essentially nonparametric and placing this work into the context of functional measurement error modeling. We apply our method to the NIH-AARP Diet and Health Study and examine its performance in a simulation study.
Statistics in Medicine | 2011
Jenny X. Sun; Samiran Sinha; Suojin Wang; Tapabrata Maiti
We employ a general bias preventive approach developed by Firth (Biometrika 1993; 80:27-38) to reduce the bias of an estimator of the log-odds ratio parameter in a matched case-control study by solving a modified score equation. We also propose a method to calculate the standard error of the resultant estimator. A closed-form expression for the estimator of the log-odds ratio parameter is derived in the case of a dichotomous exposure variable. Finite sample properties of the estimator are investigated via a simulation study. Finally, we apply the method to analyze a matched case-control data from a low birthweight study.
Political Analysis | 2017
Scott J. Cook; Betsabe Blas; Raymond J. Carroll; Samiran Sinha
Media-based event data-i.e., data comprised from reporting by media outlets-are widely used in political science research. However, events of interest (e.g., strikes, protests, conflict) are often underreported by these primary and secondary sources, producing incomplete data that risks inconsistency and bias in subsequent analysis. While general strategies exist to help ameliorate this bias, these methods do not make full use of the information often available to researchers. Specifically, much of the event data used in the social sciences is drawn from multiple, overlapping news sources (e.g., Agence France-Presse, Reuters). Therefore, we propose a novel maximum likelihood estimator that corrects for misclassification in data arising from multiple sources. In the most general formulation of our estimator, researchers can specify separate sets of predictors for the true-event model and each of the misclassification models characterizing whether a source fails to report on an event. As such, researchers are able to accurately test theories on both the causes of and reporting on an event of interest. Simulations evidence that our technique regularly out performs current strategies that either neglect misclassification, the unique features of the data-generating process, or both. We also illustrate the utility of this method with a model of repression using the Social Conflict in Africa Database.
Biometrics | 2011
Jaeil Ahn; Bhramar Mukherjee; Stephen B. Gruber; Samiran Sinha
With advances in modern medicine and clinical diagnosis, case-control data with characterization of finer subtypes of cases are often available. In matched case-control studies, missingness in exposure values often leads to deletion of entire stratum, and thus entails a significant loss in information. When subtypes of cases are treated as categorical outcomes, the data are further stratified and deletion of observations becomes even more expensive in terms of precision of the category-specific odds-ratio parameters, especially using the multinomial logit model. The stereotype regression model for categorical responses lies intermediate between the proportional odds and the multinomial or baseline category logit model. The use of this class of models has been limited as the structure of the model implies certain inferential challenges with nonidentifiability and nonlinearity in the parameters. We illustrate how to handle missing data in matched case-control studies with finer disease subclassification within the cases under a stereotype regression model. We present both Monte Carlo based full Bayesian approach and expectation/conditional maximization algorithm for the estimation of model parameters in the presence of a completely general missingness mechanism. We illustrate our methods by using data from an ongoing matched case-control study of colorectal cancer. Simulation results are presented under various missing data mechanisms and departures from modeling assumptions.
Biometrics | 2014
Samiran Sinha; Yanyuan Ma
We take a semiparametric approach in fitting a linear transformation model to a right censored data when predictive variables are subject to measurement errors. We construct consistent estimating equations when repeated measurements of a surrogate of the unobserved true predictor are available. The proposed approach applies under minimal assumptions on the distributions of the true covariate or the measurement errors. We derive the asymptotic properties of the estimator and illustrate the characteristics of the estimator in finite sample performance via simulation studies. We apply the method to analyze an AIDS clinical trial data set that motivated the work.
Biometrics | 2014
Samiran Sinha; Krishna K. Saha; Suojin Wang
Missing covariate data often arise in biomedical studies, and analysis of such data that ignores subjects with incomplete information may lead to inefficient and possibly biased estimates. A great deal of attention has been paid to handling a single missing covariate or a monotone pattern of missing data when the missingness mechanism is missing at random. In this article, we propose a semiparametric method for handling non-monotone patterns of missing data. The proposed method relies on the assumption that the missingness mechanism of a variable does not depend on the missing variable itself but may depend on the other missing variables. This mechanism is somewhat less general than the completely non-ignorable mechanism but is sometimes more flexible than the missing at random mechanism where the missingness mechansim is allowed to depend only on the completely observed variables. The proposed approach is robust to misspecification of the distribution of the missing covariates, and the proposed mechanism helps to nullify (or reduce) the problems due to non-identifiability that result from the non-ignorable missingness mechanism. The asymptotic properties of the proposed estimator are derived. Finite sample performance is assessed through simulation studies. Finally, for the purpose of illustration we analyze an endometrial cancer dataset and a hip fracture dataset.
Journal of Nonparametric Statistics | 2009
Samiran Sinha; Suojin Wang
In this paper, we propose an easy-to-use semiparametric method for analysing matched case-control data when one of the covariates of interest is partially missing. Missing covariate information in matched case-control studies may create bias and reduce efficiency of the parameter estimates. In order to cope with this situation we consider a robust approach which is comprised of estimating some functionals of the distribution of the partially missing covariate using a kernel regression technique in a conditional likelihood framework. The large sample theory of the proposed estimator is investigated and the asymptotic normality is obtained. A simulation study is conducted to assess the performance of the proposed method in terms of robustness and efficiency. The proposed method is also applied to a real dataset which motivates this work.
Statistical Methods in Medical Research | 2018
Zhen Zhang; Samiran Sinha; Tapabrata Maiti; Eva Shipp
Accelerated failure time model is a popular model to analyze censored time-to-event data. Analysis of this model without assuming any parametric distribution for the model error is challenging, and the model complexity is enhanced in the presence of large number of covariates. We developed a nonparametric Bayesian method for regularized estimation of the regression parameters in a flexible accelerated failure time model. The novelties of our method lie in modeling the error distribution of the accelerated failure time nonparametrically, modeling the variance as a function of the mean, and adopting a variable selection technique in modeling the mean. The proposed method allowed for identifying a set of important regression parameters, estimating survival probabilities, and constructing credible intervals of the survival probabilities. We evaluated operating characteristics of the proposed method via simulation studies. Finally, we apply our new comprehensive method to analyze the motivating breast cancer data from the Surveillance, Epidemiology, and End Results Program, and estimate the five-year survival probabilities for women included in the Surveillance, Epidemiology, and End Results database who were diagnosed with breast cancer between 1990 and 2000.
Journal of biometrics & biostatistics | 2014
Jingang Miao; Samiran Sinha; Suojin Wang; W. Ryan Diver; Susan M. Gapstur
In modern cancer epidemiology, diseases are classified based on pathologic and molecular traits, and different combinations of these traits give rise to many disease subtypes. The effect of predictor variables can be measured by fitting a polytomous logistic model to such data. The differences (heterogeneity) among the relative risk parameters associated with subtypes are of great interest to better understand disease etiology. Due to the heterogeneity of the relative risk parameters, when a risk factor is changed, the prevalence of one subtype may change more than that of another subtype does. Estimation of the heterogeneity parameters is difficult when disease trait information is only partially observed and the number of disease subtypes is large. We consider a robust semiparametric approach based on the pseudo-conditional likelihood for estimating these heterogeneity parameters. Through simulation studies, we compare the robustness and efficiency of our approach with that of the maximum likelihood approach. The method is then applied to analyze the associations of weight gain with risk of breast cancer subtypes using data from the American Cancer Society Cancer Prevention Study II Nutrition Cohort.