Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jiantao Jiao is active.

Publication


Featured researches published by Jiantao Jiao.


IEEE Transactions on Information Theory | 2013

Universal Estimation of Directed Information

Jiantao Jiao; Haim H. Permuter; Lei Zhao; Young-Han Kim; Tsachy Weissman

Four estimators of the directed information rate between a pair of jointly stationary ergodic finite-alphabet processes are proposed, based on universal probability assignments. The first one is a Shannon-McMillan-Breiman-type estimator, similar to those used by Verdú in 2005 and Cai in 2006 for estimation of other information measures. We show the almost sure and L1 convergence properties of the estimator for any underlying universal probability assignment. The other three estimators map universal probability assignments to different functionals, each exhibiting relative merits such as smoothness, nonnegativity, and boundedness. We establish the consistency of these estimators in almost sure and L1 senses, and derive near-optimal rates of convergence in the minimax sense under mild conditions. These estimators carry over directly to estimating other information measures of stationary ergodic finite-alphabet processes, such as entropy rate and mutual information rate, with near-optimal performance and provide alternatives to classical approaches in the existing literature. Guided by these theoretical results, the proposed estimators are implemented using the context-tree weighting algorithm as the universal probability assignment. Experiments on synthetic and real data are presented, demonstrating the potential of the proposed schemes in practice and the utility of directed information estimation in detecting and measuring causal influence and delay.


IEEE Transactions on Information Theory | 2017

Maximum Likelihood Estimation of Functionals of Discrete Distributions

Jiantao Jiao; Kartik Venkat; Yanjun Han; Tsachy Weissman

The Dirichlet prior is widely used in estimating discrete distributions and functionals of discrete distributions. In terms of Shannon entropy estimation, one approach is to plug-in the Dirichlet prior smoothed distribution into the entropy functional, while the other one is to calculate the Bayes estimator for entropy under the Dirichlet prior for squared error, which is the conditional expectation. We show that in general they do not improve over the maximum likelihood estimator, which plugs-in the empirical distribution into the entropy functional. No matter how we tune the parameters in the Dirichlet prior, this approach cannot achieve the minimax rates in entropy estimation, as recently characterized by Jiao, Venkat, Han, and Weissman [1], and Wu and Yang [2]. The performance of the minimax rate-optimal estimator with n samples is essentially at least as good as that of the Dirichlet smoothed entropy estimators with n ln n samples. We harness the theory of approximation using positive linear operators for analyzing the bias of plug-in estimators for general functionals under arbitrary statistical models, thereby further consolidating the interplay between these two fields, which was thoroughly exploited by Jiao, Venkat, Han, and Weissman [3] in estimating various functionals of discrete distributions. We establish new results in approximation theory, and apply them to analyze the bias of the Dirichlet prior smoothed plug-in entropy estimator. This interplay between bias analysis and approximation theory is of relevance and consequence far beyond the specific problem setting in this paper.


IEEE Transactions on Information Theory | 2015

Justification of Logarithmic Loss via the Benefit of Side Information

Jiantao Jiao; Thomas A. Courtade; Kartik Venkat; Tsachy Weissman

We consider a natural measure of relevance: the reduction in optimal prediction risk in the presence of side information. For any given loss function, this relevance measure captures the benefit of side information for performing inference on a random variable under this loss function. When such a measure satisfies a natural data processing property, and the random variable of interest has alphabet size greater than two, we show that it is uniquely characterized by the mutual information, and the corresponding loss function coincides with logarithmic loss. In doing so, our work provides a new characterization of mutual information, and justifies its use as a measure of relevance. When the alphabet is binary, we characterize the only admissible forms the measure of relevance can assume while obeying the specified data processing property. Our results naturally extend to measuring the causal influence between stochastic processes, where we unify different causality measures in the literature as instantiations of directed information.


international symposium on information theory | 2016

Minimax estimation of the L 1 distance

Jiantao Jiao; Yanjun Han; Tsachy Weissman

We consider the problem of estimating the L1 distance between two discrete probability measures P and Q from empirical data in a nonasymptotic and large alphabet setting. We construct minimax rate-optimal estimators for L1(P,Q) when Q is either known or unknown, and show that the performance of the optimal estimators with n samples is essentially that of the Maximum Likelihood Estimators (MLE) with n ln n samples. Hence, we demonstrate that the effective sample size enlargement phenomenon, discovered and discussed in Jiao et al. (2015), holds for this problem as well. However, the construction of optimal estimators for L1(P,Q) requires new techniques and insights outside the scope of the Approximation methodology of functional estimation in Jiao et al. (2015).


IEEE Transactions on Information Theory | 2015

Minimax Estimation of Discrete Distributions Under

Yanjun Han; Jiantao Jiao; Tsachy Weissman

We consider the problem of discrete distribution estimation under l1 loss. We provide tight upper and lower bounds on the maximum risk of the empirical distribution (the maximum likelihood estimator), and the minimax risk in regimes where the support size S may grow with the number of observations n. We show that among distributions with bounded entropy H, the asymptotic maximum risk for the empirical distribution is 2H/ln n, while the asymptotic minimax risk is H/ ln n. Moreover, we show that a hard-thresholding estimator oblivious to the unknown upper bound H, is essentially minimax. However, if we constrain the estimates to lie in the simplex of probability distributions, then the asymptotic minimax risk is again 2H/ ln n. We draw connections between our work and the literature on density estimation, entropy estimation, total variation distance (I1 divergence) estimation, joint distribution estimation in stochastic processes, normal mean estimation, and adaptive estimation.


IEEE Transactions on Information Theory | 2014

\ell _{1}

Jiantao Jiao; Thomas A. Courtade; Albert No; Kartik Venkat; Tsachy Weissman

Four problems related to information divergence measures defined on finite alphabets are considered. In three of the cases we consider, we illustrate a contrast that arises between the binary-alphabet and larger alphabet settings. This is surprising in some instances, since characterizations for the larger alphabet settings do not generalize their binary-alphabet counterparts. In particular, we show that f-divergences are not the unique decomposable divergences on binary alphabets that satisfy the data processing inequality, thereby clarifying claims that have previously appeared in the literature. We also show that Kullback-Leibler (KL) divergence is the unique Bregman divergence, which is also an f-divergence for any alphabet size. We show that KL divergence is the unique Bregman divergence, which is invariant to statistically sufficient transformations of the data, even when nondecomposable divergences are considered. Like some of the problems we consider, this result holds only when the alphabet size is at least three.


international symposium on information theory | 2015

Loss

Yanjun Han; Jiantao Jiao; Tsachy Weissman

We consider estimating the Shannon entropy of a discrete distribution P from n i.i.d. samples. Recently, Jiao, Venkat, Han, and Weissman (JVHW), and Wu and Yang constructed approximation theoretic estimators that achieve the minimax L2 rates in estimating entropy. Their estimators are consistent given n ≫ S/lnS samples, where S is the support size, and it is the best possible sample complexity. In contrast, the Maximum Likelihood Estimator (MLE), which is the empirical entropy, requires n ≫ S samples. In the present paper we significantly refine the minimax results of existing work. To alleviate the pessimism of minimaxity, we adopt the adaptive estimation framework, and show that the JVHW estimator is an adaptive estimator, i.e., it achieves the minimax rates simultaneously over a nested sequence of subsets of distributions P, without knowing the support size S or which subset P lies in. We also characterize the maximum risk of the MLE over this nested sequence, and show, for every subset in the sequence, that the performance of the minimax rate-optimal estimator with n samples is essentially that of the MLE with n ln n samples, thereby further substantiating the generality of “effective sample size enlargement” phenomenon discovered by Jiao, Venkat, Han, and Weissman. We provide a “pointwise” explanation of the sample size enlargement phenomenon, which states that for sufficiently small probabilities, the bias function of the JVHW estimator with n samples is nearly that of the MLE with n ln n samples.


international symposium on information theory | 2015

Information Measures: The Curious Case of the Binary Alphabet

Yanjun Han; Jiantao Jiao; Tsachy Weissman

The Dirichlet prior is widely used in estimating discrete distributions and functionals of discrete distributions. In terms of Shannon entropy estimation, one approach is to plug-in the Dirichlet prior smoothed distribution into the entropy functional, while the other one is to calculate the Bayes estimator for entropy under the Dirichlet prior for squared error, which is the conditional expectation. We show that in general they do not improve over the maximum likelihood estimator, which plugs-in the empirical distribution into the entropy functional. No matter how we tune the parameters in the Dirichlet prior, this approach cannot achieve the minimax rates in entropy estimation, as recently characterized by Jiao, Venkat, Han, and Weissman [1], and Wu and Yang [2]. The performance of the minimax rate-optimal estimator with n samples is essentially at least as good as that of the Dirichlet smoothed entropy estimators with n ln n samples. We harness the theory of approximation using positive linear operators for analyzing the bias of plug-in estimators for general functionals under arbitrary statistical models, thereby further consolidating the interplay between these two fields, which was thoroughly exploited by Jiao, Venkat, Han, and Weissman [3] in estimating various functionals of discrete distributions. We establish new results in approximation theory, and apply them to analyze the bias of the Dirichlet prior smoothed plug-in entropy estimator. This interplay between bias analysis and approximation theory is of relevance and consequence far beyond the specific problem setting in this paper.


international symposium on information theory | 2014

Adaptive estimation of Shannon entropy

Jiantao Jiao; Kartik Venkat; Tsachy Weissman

Fundamental relations between information and estimation have been established in the literature for the scalar Gaussian and Poisson channels. In this work, we demonstrate that such relations hold for a much larger class of observation models. We introduce the natural family of scalar Lévy channels where the distribution of the output conditioned on the input is infinitely divisible. For Lévy channels, we establish new representations relating the mutual information between the channel input and output to an optimal estimation loss, thereby unifying and considerably extending results from the Gaussian and Poissonian settings. We demonstrate the richness of our results by working out two examples of Lévy channels, namely the Gamma channel and the Negative Binomial channel, with corresponding relations between information and estimation. Extensions to the setting of mismatched estimation are also presented.


allerton conference on communication, control, and computing | 2014

Does dirichlet prior smoothing solve the Shannon entropy estimation problem

Thomas A. Courtade; Jiantao Jiao

Let X, Y be jointly Gaussian vectors, and consider random variables U, V that satisfy the Markov constraint U - X - Y - V. We prove an extremal inequality relating the mutual informations between all (42) pairs of random variables from the set (U, X, Y, V). As a first application, we show that the rate region for the two-encoder quadratic Gaussian source coding problem follows as an immediate corollary of the the extremal inequality. In a second application, we establish the rate region for a vector-Gaussian source coding problem where Löwner-John ellipsoids are approximated based on rate-constrained descriptions of the data.

Collaboration


Dive into the Jiantao Jiao's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Robert D. Nowak

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Young-Han Kim

University of California

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge