Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where James E. Johndrow is active.

Publication


Featured researches published by James E. Johndrow.


Annals of Statistics | 2017

Tensor decompositions and sparse log-linear models

James E. Johndrow; Anirban Bhattacharya; David B. Dunson

Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. We derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.


Journal of the American Statistical Association | 2018

MCMC for Imbalanced Categorical Data

James E. Johndrow; Aaron Smith; Natesh S. Pillai; David B. Dunson

ABSTRACT Many modern applications collect highly imbalanced categorical data, with some categories relatively rare. Bayesian hierarchical models combat data sparsity by borrowing information, while also quantifying uncertainty. However, posterior computation presents a fundamental barrier to routine use; a single class of algorithms does not work well in all settings and practitioners waste time trying different types of Markov chain Monte Carlo (MCMC) approaches. This article was motivated by an application to quantitative advertising in which we encountered extremely poor computational performance for data augmentation MCMC algorithms but obtained excellent performance for adaptive Metropolis. To obtain a deeper understanding of this behavior, we derive theoretical results on the computational complexity of commonly used data augmentation algorithms and the Random Walk Metropolis algorithm for highly imbalanced binary data. In this regime, our results show computational complexity of Metropolis is logarithmic in sample size, while data augmentation is polynomial in sample size. The root cause of this poor performance of data augmentation is a discrepancy between the rates at which the target density and MCMC step sizes concentrate. Our methods also show that MCMC algorithms that exhibit a similar discrepancy will fail in large samples—a result with substantial practical impact. Supplementary materials for this article are available online.


Biometrika | 2018

Theoretical limits of microclustering for record linkage

James E. Johndrow; Kristian Lum; David B. Dunson

&NA; There has been substantial recent interest in record linkage, where one attempts to group the records pertaining to the same entities from one or more large databases that lack unique identifiers. This can be viewed as a type of microclustering, with few observations per cluster and a very large number of clusters. We show that the problem is fundamentally hard from a theoretical perspective and, even in idealized cases, accurate entity resolution is effectively impossible unless the number of entities is small relative to the number of records and/or the separation between records from different entities is extremely large. These results suggest conservatism in interpretation of the results of record linkage, support collection of additional data to more accurately disambiguate the entities, and motivate a focus on coarser inference. For example, results from a simulation study suggest that sometimes one may obtain accurate results for population size estimation even when fine‐scale entity resolution is inaccurate.


Bayesian Analysis | 2018

Optimal Gaussian Approximations to the Posterior for Log-Linear Models with Diaconis–Ylvisaker Priors

James E. Johndrow; Anirban Bhattacharya

In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis-Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. Here we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis-Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.


arXiv: Computation | 2015

Approximations of Markov Chains and Bayesian Inference

James E. Johndrow; Nc Usa; Jonathan C. Mattingly; Sayan Mukherjee; David B. Dunson


international conference on artificial intelligence and statistics | 2013

Diagonal Orthant Multinomial Probit Models

James E. Johndrow; David B. Dunson; Kristian Lum


arXiv: Statistics Theory | 2016

Inefficiency of Data Augmentation for Large Sample Imbalanced Data.

James E. Johndrow; Aaron Smith; Natesh S. Pillai; David B. Dunson


arXiv: Machine Learning | 2016

A statistical framework for fair predictive algorithms.

Kristian Lum; James E. Johndrow


Archive | 2017

An algorithm for removing sensitive information: application to race-independent recidivism prediction

James E. Johndrow; Kristian Lum


Journal of Evolutionary Biology | 2015

Genetic diversity does not explain variation in extra-pair paternity in multiple populations of a songbird

Irene A Liu; James E. Johndrow; James Abe; Stefan Lüpold; Ken Yasukawa; David F. Westneat; Steve Nowicki

Collaboration


Dive into the James E. Johndrow's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Leo L. Duan

University of Cincinnati

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge