Krishna K. Saha
Central Connecticut State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Krishna K. Saha.
Journal of Statistical Computation and Simulation | 2003
S. R. Paul; Krishna K. Saha; Uditha Balasooriya
This paper is concerned with properties (bias, standard deviation, mean square error and efficiency) of twenty six estimators of the intraclass correlation in the analysis of binary data. Our main interest is to study these properties when data are generated from different distributions. For data generation we considered three over-dispersed binomial distributions, namely, the beta-binomial distribution, the probit normal binomial distribution and a mixture of two binomial distributions. The findings regarding bias, standard deviation and mean squared error of all these estimators, are that (a) in general, the distributions of biases of most of the estimators are negatively skewed. The biases are smallest when data are generated from the beta-binomial distribution and largest when data are generated from the mixture distribution; (b) the standard deviations are smallest when data are generated from the beta-binomial distribution; and (c) the mean squared errors are smallest when data are generated from the beta-binomial distribution and largest when data are generated from the mixture distribution. Of the 26, nine estimators including the maximum likelihood estimator, an estimator based on the optimal quadratic estimating equations of Crowder (1987), and an analysis of variance type estimator is found to have least amount of bias, standard deviation and mean squared error. Also, the distributions of the bias, standard deviation and mean squared error for each of these estimators are, in general, more symmetric than those of the other estimators. Our findings regarding efficiency are that the estimator based on the optimal quadratic estimating equations has consistently high efficiency and least variability in the efficiency results. In the important range in which the intraclass correlation is small (≤0 5), on the average, this estimator shows best efficiency performance. The analysis of variance type estimator seems to do well for larger values of the intraclass correlation. In general, the estimator based on the optimal quadratic estimating equations seems to show best efficiency performance for data from the beta-binomial distribution and the probit normal binomial distribution, and the analysis of variance type estimator seems to do well for data from the mixture distribution.
Computational Statistics & Data Analysis | 2009
Krishna K. Saha; Roger Bilisoly
Extra-dispersion (overdispersion or underdispersion) is a common phenomenon in practice when the variance of count data differs from that of a Poisson model. This can arise when the data come from different subpopulations or when the assumption of independence is violated. This paper develops a procedure for testing the equality of the means of several groups of counts, when extra-dispersions among the treatment groups are unequal, based on the adjusted counts using the concept of the design and size effects employed by Rao and Scott, [Rao, J.N.K., Scott, A.J., 1999. A simple method for analyzing overdispersion in clustered Poisson data. Statist. Med. 18, 1373-1385]. We also obtain the score-type test statistics based on quasi-likelihoods using the mean-variance structure of the negative binomial model, and study the properties and performance characteristics of these. The simulation results indicate that the statistic based on the adjusted count data, which has a very simple form and does not require the estimates of the extra-dispersion parameters, performs best among all the statistics considered in this paper. Finally, the proposed test statistic and the score-type statistic based on double-extended quasi-likelihood are illustrated by an analysis of a set of fetal implants in mice arising from a developmental toxicity study.
Statistics in Medicine | 2012
Krishna K. Saha
The intraclass correlation in binary outcome data sampled from clusters is an important and versatile measure in many biological and biomedical investigations. Properties of the different estimators of the intraclass correlation based on the parametric, semi-parametric, and nonparametric approaches have been studied extensively, mainly in terms of bias and efficiency [see, for example, Ridout et al., Biometrics 1999, 55:137-148; Paul et al., Journal of Statistical Computation and Simulation 2003, 73:507-523; and Lee, Statistical Modelling 2004, 4: 113-126], but little attention has been paid to extending these results to the problem of the confidence intervals. In this article, we generalize the results of the four point estimators by constructing asymptotic confidence intervals obtaining closed-form asymptotic and sandwich variance expressions of those four point estimators. It appears from simulation results that the asymptotic confidence intervals based on these four estimators have serious under-coverage. To remedy this, we introduce the Fishers z-transformation approach on the intraclass correlation coefficient, the profile likelihood approach based on the beta-binomial model, and the hybrid profile variance approach based on the quadratic estimating equation for constructing the confidence intervals of the intraclass correlation for binary outcome data. As assessed by Monte Carlo simulations, these confidence interval approaches show significant improvement in the coverage probabilities. Moreover, the profile likelihood approach performs quite well by providing coverage levels close to nominal over a wide range of parameter combinations. We provide applications to biological data to illustrate the methods.
Statistics in Medicine | 2011
Krishna K. Saha
The over-dispersion parameter is an important and versatile measure in the analysis of one-way layout of count data in biological studies. For example, it is commonly used as an inverse measure of aggregation in biological count data. Its estimation from finite data sets is a recognized challenge. Many simulation studies have examined the bias and efficiency of different estimators of the over-dispersion parameter for finite data sets (see, for example, Clark and Perry, Biometrics 1989; 45:309-316 and Piegorsch, Biometrics 1990; 46:863-867), but little attention has been paid to the accuracy of the confidence intervals (CIs) of it. In this paper, we first derive asymptotic procedures for the construction of confidence limits for the over-dispersion parameter using four estimators that are specified by only the first two moments of the counts. We also obtain closed-form asymptotic variance formulae for these four estimators. In addition, we consider the asymptotic CI based on the maximum likelihood (ML) estimator using the negative binomial model. It appears from the simulation results that the asymptotic CIs based on these five estimators have coverage below the nominal coverage probability. To remedy this, we also study the properties of the asymptotic CIs based on the restricted estimates of ML, extended quasi-likelihood, and double extended quasi-likelihood by eliminating the nuisance parameter effect using their adjusted profile likelihood and quasi-likelihoods. It is shown that these CIs outperform the competitors by providing coverage levels close to nominal over a wide range of parameter combinations. Two examples to biological count data are presented.
Journal of Applied Statistics | 2008
Krishna K. Saha
This paper investigates several semiparametric estimators of the dispersion parameter in the analysis of over- or underdispersed count data when there is no likelihood available. In the context of estimating the dispersion parameter, we consider the double-extended quasi-likelihood (DEQL), the pseudo-likelihood and the optimal quadratic estimating (OQE) equations method and compare them with the maximum likelihood method, the method of moments and the extended quasi-likelihood through simulation study. The simulation study shows that the estimator based on the DEQL has superior bias and efficiency property for moderate and large sample size, and for small sample size the estimator based on the OQE equations outperforms the other estimators. Three real-life data sets arising in biostatistical practices are analyzed, and the findings from these analyses are quite similar to what are found from the simulation study.
Statistics in Medicine | 2014
Vivek Pradhan; Krishna K. Saha; Tathagata Banerjee; John C. Evans
Inference on the difference between two binomial proportions in the paired binomial setting is often an important problem in many biomedical investigations. Tang et al. (2010, Statistics in Medicine) discussed six methods to construct confidence intervals (henceforth, we abbreviate it as CI) for the difference between two proportions in paired binomial setting using method of variance estimates recovery. In this article, we propose weighted profile likelihood-based CIs for the difference between proportions of a paired binomial distribution. However, instead of the usual likelihood, we use weighted likelihood that is essentially making adjustments to the cell frequencies of a 2 × 2 table in the spirit of Agresti and Min (2005, Statistics in Medicine). We then conduct numerical studies to compare the performances of the proposed CIs with that of Tang et al. and Agresti and Min in terms of coverage probabilities and expected lengths. Our numerical study clearly indicates that the weighted profile likelihood-based intervals and Jeffreys interval (cf. Tang et al.) are superior in terms of achieving the nominal level, and in terms of expected lengths, they are competitive. Finally, we illustrate the use of the proposed CIs with real-life examples.
Biometrics | 2014
Samiran Sinha; Krishna K. Saha; Suojin Wang
Missing covariate data often arise in biomedical studies, and analysis of such data that ignores subjects with incomplete information may lead to inefficient and possibly biased estimates. A great deal of attention has been paid to handling a single missing covariate or a monotone pattern of missing data when the missingness mechanism is missing at random. In this article, we propose a semiparametric method for handling non-monotone patterns of missing data. The proposed method relies on the assumption that the missingness mechanism of a variable does not depend on the missing variable itself but may depend on the other missing variables. This mechanism is somewhat less general than the completely non-ignorable mechanism but is sometimes more flexible than the missing at random mechanism where the missingness mechansim is allowed to depend only on the completely observed variables. The proposed approach is robust to misspecification of the distribution of the missing covariates, and the proposed mechanism helps to nullify (or reduce) the problems due to non-identifiability that result from the non-ignorable missingness mechanism. The asymptotic properties of the proposed estimator are derived. Finite sample performance is assessed through simulation studies. Finally, for the purpose of illustration we analyze an endometrial cancer dataset and a hip fracture dataset.
Journal of Statistical Computation and Simulation | 2011
Krishna K. Saha; S. R. Paul
Inference concerning the negative binomial dispersion parameter, denoted by c, is important in many biological and biomedical investigations. Properties of the maximum-likelihood estimator of c and its bias-corrected version have been studied extensively, mainly, in terms of bias and efficiency [W.W. Piegorsch, Maximum likelihood estimation for the negative binomial dispersion parameter, Biometrics 46 (1990), pp. 863–867; S.J. Clark and J.N. Perry, Estimation of the negative binomial parameter κ by maximum quasi-likelihood, Biometrics 45 (1989), pp. 309–316; K.K. Saha and S.R. Paul, Bias corrected maximum likelihood estimator of the negative binomial dispersion parameter, Biometrics 61 (2005), pp. 179–185]. However, not much work has been done on the construction of confidence intervals (C.I.s) for c. The purpose of this paper is to study the behaviour of some C.I. procedures for c. We study, by simulations, three Wald type C.I. procedures based on the asymptotic distribution of the method of moments estimate (mme), the maximum-likelihood estimate (mle) and the bias-corrected mle (bcmle) [K.K. Saha and S.R. Paul, Bias corrected maximum likelihood estimator of the negative binomial dispersion parameter, Biometrics 61 (2005), pp. 179–185] of c. All three methods show serious under-coverage. We further study parametric bootstrap procedures based on these estimates of c, which significantly improve the coverage probabilities. The bootstrap C.I.s based on the mle (Boot-MLE method) and the bcmle (Boot-BCM method) have coverages that are significantly better (empirical coverage close to the nominal coverage) than the corresponding bootstrap C.I. based on the mme, especially for small sample size and highly over-dispersed data. However, simulation results on lengths of the C.I.s show evidence that all three bootstrap procedures have larger average coverage lengths. Therefore, for practical data analysis, the bootstrap C.I. Boot-MLE or Boot-BCM should be used, although Boot-MLE method seems to be preferable over the Boot-BCM method in terms of both coverage and length. Furthermore, Boot-MLE needs less computation than Boot-BCM.
Journal of Applied Statistics | 2014
Krishna K. Saha; Roger Bilisoly
In many clinical trials and epidemiological studies, comparing the mean count response of an exposed group to a control group is often of interest. This type of data is often over-dispersed with respect to Poisson variation, and previous studies usually compared groups using confidence intervals (CIs) of the difference between the two means. However, in some situations, especially when the means are small, interval estimation of the mean ratio (MR) is preferable. Moreover, Cox and Lewis [4] pointed out many other situations where the MR is more relevant than the difference of means. In this paper, we consider CI construction for the ratio of means between two treatments for over-dispersed Poisson data. We develop several CIs for the situation by hybridizing two separate CIs for two individual means. Extensive simulations show that all hybrid-based CIs perform reasonably well in terms of coverage. However, the CIs based on the delta method using the logarithmic transformation perform better than other intervals in the sense that they have slightly shorter interval lengths and show better balance of tail errors. These proposed CIs are illustrated with three real data examples.
Biometrical Journal | 2014
Krishna K. Saha
Over/underdispersed count data arise in many biostatistical practices in which a number of different treatment groups are compared in an experiment. In the analysis of several treatment groups of such count data, a very common statistical inference problem is to test whether these data come from the same population. The usual practice for testing homogeneity of several treatment groups in terms of means and dispersions is first to test the equality of dispersions and then to test the equality of the means based on the result of the test for equality of dispersions. Previous studies reported test procedures for testing the homogeneity of the means of several treatment groups with an assumption of equal or unequal dispersions. This article develops test procedures for testing the validity of the equal or unequal dispersions assumption of several treatment groups in the analysis of over/underdispersed count data. We consider the C(α) test based on the maximum likelihood (ML) method using the negative binomial model as well as the three other C(α) tests based on the method of moments, extended quasi-likelihood, and double extended quasi-likelihood using the models specified by the first two moments of counts. Monte Carlo simulations are then used to study the comparative behavior of these C(α) tests along with the likelihood ratio test in terms of size and power. The simulation results demonstrate that all four statistics hold the nominal level reasonably well in most of the data situations studied here, and the C(α) test based on ML shows some edge in power over the other three C(α) tests. Finally, applications to biostatistical practices are analyzed.