Is this you? Create Your Porfile

Tathagata Banerjee

Indian Institute of Management Ahmedabad

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tathagata Banerjee is active.

Explore More

Publication

Featured researches published by Tathagata Banerjee.

Journal of the American Statistical Association | 1998

Analysis of Two-Way Layout of Count Data Involving Multiple Counts in Each Cell

S. R. Paul; Tathagata Banerjee

Abstract Multiple counts may occur in each cell of an a × b two-way layout (balanced or unbalanced) of two fixed factors A and B. Standard log-linear model analysis based on a Poisson distribution assumption of the cell counts is not applicable here, because of the unbalanced nature of the table or because the Poisson distribution assumption is not valid. We develop C(α) tests for interaction and main effects assuming data to be Poisson distributed and also assuming that data within the cells have extra (over/under) dispersion beyond that explained by a Poisson distribution. For this we consider an extended negative binominal distribution and a semiparametric model using the quasi-likelihood. We show that in all situations the C(α) tests for interaction are of very simple forms. For C(α) tests for the main effect in presence of no interaction, such simplification is possible only under certain conditions. A score test for detecting extra dispersion in presence of interaction is also obtained and is of sim...

Computational Statistics & Data Analysis | 2001

Testing the equality of distributions of random vectors with categorical components

Dan Nettleton; Tathagata Banerjee

We develop a method for testing the equality of two or more distributions of random vectors with categorical components. We define a function that gives a distance between any two data vectors. Each observed data vector is linked with its nearest neighbor(s). The test statistic is the number of edges linking observations from different distributions. Inference is conditional on the number of observations from each distribution and the number of times each of the data vectors is observed in the pooled sample. Permutation testing and asymptotics are used to estimate the observed significance level.

The Annals of Applied Statistics | 2008

Optimal factorial designs for cDNA microarray experiments

Tathagata Banerjee; Rahul Mukerjee

We consider cDNA microarray experiments when the cell populations have a factorial Structure, and investigate the problem of their optimal designing under a baseline parametrization where the objects of interest differ from those under the more common orthogonal parametrization. First, analytical results are given for the 2 x 2 factorial. Since practical applications often involve a more complex factorial structure, we next explore general factorials and obtain a collection of optimal designs in the saturated, that is, most economic, case. This, in turn, is seen to yield an approach for finding optimal or efficient designs in the practically more important nearly saturated cases. Thereafter, the findings are extended to the more intricate situation where the underlying model incorporates dye-coloring effects, and the role of dye-swapping is critically examined.

Biometrical Journal | 2009

Analysis of Misclassified Correlated Binary Data Using a Multivariate Probit Model when Covariates are Subject to Measurement Error

Surupa Roy; Tathagata Banerjee

A multivariate probit model for correlated binary responses given the predictors of interest has been considered. Some of the responses are subject to classification errors and hence are not directly observable. Also measurements on some of the predictors are not available; instead the measurements on its surrogate are available. However, the conditional distribution of the unobservable predictors given the surrogate is completely specified. Models are proposed taking into account either or both of these sources of errors. Likelihood-based methodologies are proposed to fit these models. To ascertain the effect of ignoring classification errors and/or measurement error on the estimates of the regression and correlation parameters, a sensitivity study is carried out through simulation. Finally, the proposed methodology is illustrated through an example.

Statistical Methods in Medical Research | 2016

Binomial confidence intervals for testing non-inferiority or superiority: a practitioner’s dilemma

Vivek Pradhan; John C. Evans; Tathagata Banerjee

In testing for non-inferiority or superiority in a single arm study, the confidence interval of a single binomial proportion is frequently used. A number of such intervals are proposed in the literature and implemented in standard software packages. Unfortunately, use of different intervals leads to conflicting conclusions. Practitioners thus face a serious dilemma in deciding which one to depend on. Is there a way to resolve this dilemma? We address this question by investigating the performances of ten commonly used intervals of a single binomial proportion, in the light of two criteria, viz., coverage and expected length of the interval.

Computational Statistics & Data Analysis | 2006

Classification of pathological stage of prostate cancer patients using penalized splines

Tathagata Banerjee; Tapabrata Maiti; Pushpal Mukhopadhyay

We propose a penalized splines (P-splines) based method to predict the pathological stage of localized prostate cancer. A combination of prostate-specific antigen, Gleason histological score, and clinical stage from a cohort study of 834 prostate cancer patients are used to build the P-splines model. It turns out that the proposed methodology results in improved prediction of pathological stage compared to usual logistic regression after removing a few outliers. The improvement is shown to be statistically significant. Receiver-operating characteristic (ROC) curve is drawn and we show that the increase in area under the ROC curve over the commonly used logistic regression based classification method is also statistically significant.

Statistics in Medicine | 2014

Weighted profile likelihood‐based confidence interval for the difference between two proportions with paired binomial data

Vivek Pradhan; Krishna K. Saha; Tathagata Banerjee; John C. Evans

Inference on the difference between two binomial proportions in the paired binomial setting is often an important problem in many biomedical investigations. Tang et al. (2010, Statistics in Medicine) discussed six methods to construct confidence intervals (henceforth, we abbreviate it as CI) for the difference between two proportions in paired binomial setting using method of variance estimates recovery. In this article, we propose weighted profile likelihood-based CIs for the difference between proportions of a paired binomial distribution. However, instead of the usual likelihood, we use weighted likelihood that is essentially making adjustments to the cell frequencies of a 2 × 2 table in the spirit of Agresti and Min (2005, Statistics in Medicine). We then conduct numerical studies to compare the performances of the proposed CIs with that of Tang et al. and Agresti and Min in terms of coverage probabilities and expected lengths. Our numerical study clearly indicates that the weighted profile likelihood-based intervals and Jeffreys interval (cf. Tang et al.) are superior in terms of achieving the nominal level, and in terms of expected lengths, they are competitive. Finally, we illustrate the use of the proposed CIs with real-life examples.

Journal of Statistical Computation and Simulation | 2010

Analysis of mixed outcomes: misclassified binary responses and measurement error in covariates

Roy Surupa; Tathagata Banerjee

This paper considers regression models for mixed binary and continuous outcomes, when the true predictor is measured with error and the binary responses are subject to classification errors. The focus of the paper is to study the effects of these errors on the estimates of the model parameters and also to propose a model that incorporates both these errors. The proposed model results in a substantial improvement in the estimates as shown by extensive simulation studies.

Twin Research and Human Genetics | 2006

Hypotheses on the Effect of Cadmium on Glutathione Content of Red Blood Corpuscles

Sri N. Shekar; Tathagata Banerjee; Atanu Biswas

Previous studies have shown that Glutathione, a tripeptide found in blood, is involved in protecting against toxins. Glutathione levels are known to drop in response to cadmium. Using 15 twin pairs, we modeled the effect of cadmium on glutathione levels. The heritability of glutathione content was 91%. The application of cadmium significantly reduced the mean level of GSH. However, this reduction in GSH was not due to additive genetic influences in our sample.

Journal of Statistical Planning and Inference | 2003

A new formulation of stress–strength reliability in a regression setup

Tathagata Banerjee; Atanu Biswas

Abstract Bayesian inference on stress–strength reliability is considered when the observations are binary in nature and the covariates affecting the stress and the strength of a component are observable. The posterior is evaluated using Gibbs sampling. The method is illustrated with a data set.

Explore More