Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sanjoy K. Sinha is active.

Publication


Featured researches published by Sanjoy K. Sinha.


Journal of the American Statistical Association | 2004

Robust Analysis of Generalized Linear Mixed Models

Sanjoy K. Sinha

The method of maximum likelihood (ML) is widely used for analyzing generalized linear mixed models (GLMMs). A full maximum likelihood analysis requires numerical integration techniques for calculation of the log-likelihood, and to avoid the computational problems involving irreducibly high-dimensional integrals, several maximum likelihood algorithms have been proposed in the literature to estimate the model parameters by approximating the log-likelihood function. Although these likelihood algorithms are useful for fitting the GLMMs efficiently under strict model assumptions, they can be highly influenced by the presence of unusual data points. In this article, the author develops a technique for finding robust maximum likelihood (RML) estimates of the model parameters in GLMMs, which appears to be useful in downweighting the influential data points when estimating the parameters. The asymptotic properties of the robust estimators are investigated under some regularity conditions. Small simulations are carried out to study the behavior of the robust estimates in the presence of outliers, and these estimates are also compared to the ordinary classical estimates. To avoid the computational problems involving high-dimensional integrals, the author proposes a robust Monte Carlo Newton–Raphson (RMCNR) algorithm for fitting GLMMs. The proposed robust method is illustrated in an analysis of data from a clinical experiment described in a biometrical journal.


Statistics & Probability Letters | 2003

Robust estimation of nonlinear regression with autoregressive errors

Sanjoy K. Sinha; Chris Field; Bruce Smith

Generalized M (or GM) estimation has been extended to the case of a nonlinear regression model with autoregressive and heteroscedastic errors. The robustness properties of the GM estimators have been investigated based on the time-series analog of Hampels influence function. The asymptotic properties of these estimators have been studied in some detail.


Journal of biometrics & biostatistics | 2013

A Parametric Survival Model When a Covariate is Subject to Left-Censoring.

Abdus Sattar; Sanjoy K. Sinha; Nathan J. Morris

PROBLEM STATEMENT Modeling survival data with a set of covariates usually assumes that the values of the covariates are fully observed. However, in a variety of applications, some values of a covariate may be left-censored due to inadequate instrument sensitivity to quantify the biospecimen. When data are left-censored, the true values are missing but are known to be smaller than the detection limit. The most commonly used ad-hoc method to deal with nondetect values is to substitute the nondetect values by the detection limit. Such ad-hoc analysis of survival data with an explanatory variable subject to left-censoring may provide biased and inefficient estimators of hazard ratios and survivor functions. METHOD We consider a parametric proportional hazards model to analyze time-to-event data. We propose a likelihood method for the estimation and inference of model parameters. In this likelihood approach, instead of replacing the nondetect values by the detection limit, we adopt a numerical integration technique to evaluate the observed data likelihood in the presence of a left-censored covariate. Monte Carlo simulations were used to demonstrate various properties of the proposed regression estimators including the consistency and efficiency. RESULTS The simulation study shows that the proposed likelihood approach provides approximately unbiased estimators of the model parameters. The proposed method also provides estimators that are more efficient than those obtained under the ad-hoc method. Also, unlike the ad-hoc estimators, the coverage probabilities of the proposed estimators are at their nominal level. Analysis of a large cohort study, genetic and inflammatory marker of sepsis study, shows discernibly different results based on the proposed method. CONCLUSION Naive use of detection limit in a parametric survival model may provide biased and inefficient estimators of hazard ratios and survivor functions. The proposed likelihood approach provides approximately unbiased and efficient estimators of hazard ratios and survivor functions.


Communications in Statistics-theory and Methods | 2002

ON PSEUDO-LIKELIHOOD INFERENCE IN THE BINARY LONGITUDINAL MIXED MODEL

Brajendra C. Sutradhar; Sanjoy K. Sinha

ABSTRACT Binary logistic and Poisson mixed models are used to analyse over/under-dispersed proportion and count data, respectively. It is, however, well known that a full likelihood analysis for such mixed models is hampered by the need for numerical integrations. To overcome such integration problems, recently Sutradhar and Qu (On Approximate Likelihood Inference in Poisson Mixed Model. The Canadian Journal of Statistics 1998, 26, 169–186) has introduced a small variance component (for random effects) based likelihood approximation (LA) approach to estimate the parameters of the Poisson mixed models and have shown that their LA approach performs better as compared to other leading approaches. More recently, Sutradhar and Das (A Higher-Order Approximation to the Likelihood Inference in the Poisson Mixed Model. Statistics and Probability Letters 2001, 52, 59–67) further improved the LA approach of Sutradhar and Qu to accommodate larger values of the variance component. These likelihood approximation techniques developed for Poisson mixed models are however not applicable to the binary mixed models. In this paper, we propose a multivariate binary distribution based pseudo-likelihood approach for the estimation of the parameters of the binary mixed models. We, in fact, do this in a wider binary longitudinal mixed model set up, binary mixed model being a special case. More specifically, two types of binary longitudinal mixed models are considered. Under the first model, conditional on certain independent random effects, repeated binary responses are assumed to follow a Bahadur type multivariate binary distribution, so that, unconditionally, the responses in the cluster follow a longitudinal binary mixed model. Under the second model, however, the binary responses in the cluster are assumed to be conditionally independent, conditional on certain correlated random effects, so that, unconditionally, responses in the cluster also follow a binary longitudinal mixed model. It is of primary interest to estimate the regression and the variance component parameters of the binary longitudinal mixed model, longitudinal correlation parameters being nuisance. The performance of the proposed pseudo-likelihood based estimators is examined through a simulation study. A comparison is also made with a highly competitive generalized estimating equation (GEE) approach, especially for the estimation of the variance component of the random effects.


Biometrics | 2011

A bivariate pseudolikelihood for incomplete longitudinal binary data with nonignorable nonmonotone missingness.

Sanjoy K. Sinha; Andrea B. Troxel; Stuart R. Lipsitz; Debajyoti Sinha; Garrett M. Fitzmaurice; Geert Molenberghs; Joseph G. Ibrahim

For analyzing longitudinal binary data with nonignorable and nonmonotone missing responses, a full likelihood method is complicated algebraically, and often requires intensive computation, especially when there are many follow-up times. As an alternative, a pseudolikelihood approach has been proposed in the literature under minimal parametric assumptions. This formulation only requires specification of the marginal distributions of the responses and missing data mechanism, and uses an independence working assumption. However, this estimator can be inefficient for estimating both time-varying and time-stationary effects under moderate to strong within-subject associations among repeated responses. In this article, we propose an alternative estimator, based on a bivariate pseudolikelihood, and demonstrate in simulations that the proposed method can be much more efficient than the previous pseudolikelihood obtained under the assumption of independence. We illustrate the method using longitudinal data on CD4 counts from two clinical trials of HIV-infected patients.


Journal of Multivariate Analysis | 2010

Multivariate logistic regression with incomplete covariate and auxiliary information

Sanjoy K. Sinha; Nan M. Laird; Garrett M. Fitzmaurice

In this article, we propose and explore a multivariate logistic regression model for analyzing multiple binary outcomes with incomplete covariate data where auxiliary information is available. The auxiliary data are extraneous to the regression model of interest but predictive of the covariate with missing data. describe how the auxiliary information can be incorporated into a regression model for a single binary outcome with missing covariates, and hence the efficiency of the regression estimators can be improved. We consider extending the method of Horton and Laird (2001) to the case of a multivariate logistic regression model for multiple correlated outcomes, and with missing covariates and completely observed auxiliary information. We demonstrate that in the case of moderate to strong associations among the multiple outcomes, one can achieve considerable gains in efficiency from estimators in a multivariate model as compared to the marginal estimators of the same parameters.


Statistics in Medicine | 2015

Frailty models for pneumonia to death with a left‐censored covariate

Abdus Sattar; Sanjoy K. Sinha; Xiaofeng Wang; Yehua Li

Frailty models are multiplicative hazard models for studying association between survival time and important clinical covariates. When some values of a clinical covariate are unobserved but known to be below a threshold called the limit of detection (LOD), naive approaches ignoring this problem, such as replacing the undetected value by the LOD or half of the LOD, often produce biased parameter estimate with larger mean squared error of the estimate. To address the LOD problem in a frailty model, we propose a flexible smooth nonparametric density estimator along with Simpsons numerical integration technique. This is an extension of an existing method in the likelihood framework for the estimation and inference of the model parameters. The proposed new method shows the estimators are asymptotically unbiased and gives smaller mean squared error of the estimates. Compared with the existing method, the proposed new method does not require distributional assumptions for the underlying covariates. Simulation studies were conducted to evaluate the performance of the new method in realistic scenarios. We illustrate the use of the proposed method with a data set from Genetic and Inflammatory Markers of Sepsis study in which interlekuin-10 was subject to LOD.


Canadian Journal of Statistics-revue Canadienne De Statistique | 2002

Minimax weights for generalised M-estimation in biased regression models

Sanjoy K. Sinha; Douglas P. Wiens

The authors consider the construction of weights for Generalised M-estimation. Such weights, when combined with appropriate score functions, afford protection from biases arising through incorrectly specified response functions, as well as from natural variation. The authors obtain minimax fixed weights of the Mallows type under the assumption that the density of the independent variables is correctly specified, and they obtain adaptive weights when this assumption is relaxed. A simulation study indicates that one can expect appreciable gains in precision when the latter weights are used and the various sources of model uncertainty are present.


Environment International | 2017

Identification of chemical mixtures to which Canadian pregnant women are exposed: The MIREC Study

Wan-Chen Lee; Mandy Fisher; Karelyn Davis; Tye E. Arbuckle; Sanjoy K. Sinha

Depending on the chemical and the outcome, prenatal exposures to environmental chemicals can lead to adverse effects on the pregnancy and child development, especially if exposure occurs during early gestation. Instead of focusing on prenatal exposure to individual chemicals, more studies have taken into account that humans are exposed to multiple environmental chemicals on a daily basis. The objectives of this analysis were to identify the pattern of chemical mixtures to which women are exposed and to characterize women with elevated exposures to various mixtures. Statistical techniques were applied to 28 chemicals measured simultaneously in the first trimester and socio-demographic factors of 1744 participants from the Maternal-Infant Research on Environment Chemicals (MIREC) Study. Cluster analysis was implemented to categorize participants based on their socio-demographic characteristics, while principal component analysis (PCA) was used to extract the chemicals with similar patterns and to reduce the dimension of the dataset. Next, hypothesis testing determined if the mean converted concentrations of chemical substances differed significantly among women with different socio-demographic backgrounds as well as among clusters. Cluster analysis identified six main socio-demographic clusters. Eleven components, which explained approximately 70% of the variance in the data, were retained in the PCA. Persistent organic pollutants (PCB118, PCB138, PCB153, PCB180, OXYCHLOR and TRANSNONA) and phthalates (MEOHP, MEHHP and MEHP) dominated the first and second components, respectively, and the first two components explained 25.8% of the source variation. Prenatal exposure to persistent organic pollutants (first component) were positively associated with women who have lower education or higher income, were born in Canada, have BMI ≥25, or were expecting their first child in our study population. MEOHP, MEHHP and MEHP, dominating the second component, were detected in at least 98% of 1744 participants in our cohort study; however, no particular group of pregnant women was identified to be highly exposed to phthalates. While widely recognized as important to studying potential health effects, identifying the mixture of chemicals to which various segments of the population are exposed has been problematic. We present an approach using factor analysis through principal component method and cluster analysis as an attempt to determine the pregnancy exposome. Future studies should focus on how to include these matrices in examining the health effects of prenatal exposure to chemical mixtures in pregnant women and their children.


Computational Statistics & Data Analysis | 2014

Inference for longitudinal data with nonignorable nonmonotone missing responses

Sanjoy K. Sinha; Amit Kaushal; Wenzhong Xiao

For the analysis of longitudinal data with nonignorable and nonmonotone missing responses, a full likelihood method often requires intensive computation, especially when there are many follow-up times. The authors propose and explore a Monte Carlo method, based on importance sampling, for approximating the maximum likelihood estimators. The finite-sample properties of the proposed estimators are studied using simulations. An application of the proposed method is also provided using longitudinal data on peptide intensities obtained from a proteomics experiment of trauma patients.

Collaboration


Dive into the Sanjoy K. Sinha's collaboration.

Top Co-Authors

Avatar

Abdus Sattar

Case Western Reserve University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge