Shao-Lun Huang
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shao-Lun Huang.
international symposium on information theory | 2012
Shao-Lun Huang; Lizhong Zheng
Many network information theory problems face the similar difficulty of single letterization. We argue that this is due to the lack of a geometric structure on the space of probability distribution. In this paper, we develop such a structure by assuming that the distributions of interest are close to each other. Under this assumption, the K-L divergence is reduced to the squared Euclidean metric in an Euclidean space. Moreover, we construct the notion of coordinate and inner product, which will facilitate solving communication problems. We will also present the application of this approach to the point-to-point channel and the general broadcast channel, which demonstrates how our technique simplifies information theory problems.
IEEE Transactions on Information Theory | 2013
Emmanuel Abbe; Shao-Lun Huang; Emre Telatar
It is conjectured that the covariance matrices minimizing the outage probability under a power constraint for multiple-input multiple-output channels with Gaussian fading are diagonal with either zeros or constant values on the diagonal. In the multiple-input single-output (MISO) setting, this is equivalent to conjecture that the Gaussian quadratic forms having largest tail probability correspond to such diagonal matrices. This paper provides a proof of the conjecture in this MISO setting.
allerton conference on communication, control, and computing | 2015
Anuran Makur; Fabian Kozynski; Shao-Lun Huang; Lizhong Zheng
The Hirschfeld-Gebelein-Rényi maximal correlation is a well-known measure of statistical dependence between two (possibly categorical) random variables. In inference problems, the maximal correlation functions can be viewed as so called features of observed data that carry the largest amount of information about some latent variables. These features are in general non-linear functions, and are particularly useful in processing high-dimensional observed data. The alternating conditional expectations (ACE) algorithm is an efficient way to compute these maximal correlation functions. In this paper, we use an information theoretic approach to interpret the ACE algorithm as computing the singular value decomposition of a linear map between spaces of probability distributions. With this approach, we demonstrate the information theoretic optimality of the ACE algorithm, analyze its convergence rate and sample complexity, and finally, generalize it to compute multiple pairs of correlation functions from samples.
IEEE Journal on Selected Areas in Communications | 2015
Kwang-Cheng Chen; Shao-Lun Huang; Lizhong Zheng; H. Vincent Poor
Widespread use of the Internet and social networks invokes the generation of big data, which is proving to be useful in a number of applications. To deal with explosively growing amounts of data, data analytics has emerged as a critical technology related to computing, signal processing, and information networking. In this paper, a formalism is considered in which data are modeled as a generalized social network and communication theory and information theory are thereby extended to data analytics. First, the creation of an equalizer to optimize information transfer between two data variables is considered, and financial data are used to demonstrate the advantages of this approach. Then, an information coupling approach based on information geometry is applied for dimensionality reduction, with a pattern recognition example to illustrate the effectiveness of this formalism. These initial trials suggest the potential of communication theoretic data analytics for a wide range of applications.
allerton conference on communication, control, and computing | 2014
Shao-Lun Huang; Anuran Makur; Fabian Kozynski; Lizhong Zheng
In this paper, we study how information can be conveyed through a noisy channel and extracted efficiently, under the scenarios and applications, where the observing order of the symbols does not carry any useful information. In such cases, the information-carrying objects are the empirical distributions of the transmitted and received symbol sequences. We develop a local geometric structure and a new coordinate system for the space of distributions. With this approach, we can decompose the computation of the posterior distribution of the data into a sequence of score functions, with decreasing information volumes. Thus, when our goal is not to recover the entire data, but only to detect certain features of the data, we only need to compute the first few scores, which greatly simplifies the problem. We demonstrate the use of our technique with some image processing examples based on graphical models.
international symposium on information theory | 2016
I-Hsiang Wang; Shao-Lun Huang; Kuan-Yun Lee; Kwang-Cheng Chen
The problems of extracting information from a data set via histogram queries or arithmetic mean queries are considered. We first show that the fundamental limit on the number of histogram queries, m, so that the entire data set of size n can be extracted losslessly, is m = Θ(n/log n), sub-linear in the size of the data set. For proving the lower bound (converse), we use standard arguments based on simple counting. For proving the upper bound (achievability), we proposed two query mechanisms. The first mechanism is random sampling, where in each query, the items to be included in the queried subset are uniformly randomly selected. With random sampling, it is shown that the entire data set can be extracted with vanishing error probability using Ω(n/log n) queries. The second one is a non-adaptive deterministic algorithm. With this algorithm, it is shown that the entire data set can be extracted exactly (no error) using Ω(n/log n) queries. We then extend the results to arithmetic mean queries, and show that for data sets taking values in a real-valued finite arithmetic progression, the fundamental limit on the number of arithmetic mean queries to extract the entire data set is also Θ(n/log n).
international conference on communications | 2015
Ali Makhdoumi; Shao-Lun Huang; Muriel Médard; Yury Polyanskiy
With the boom of big data, traditional source coding techniques face the common obstacle to decode only a small portion of information efficiently. In this paper, we aim to resolve this difficulty by introducing a specific type of source coding scheme called locally decodable source coding (LDSC). Rigorously, LDSC is capable of recovering an arbitrary bit of the unencoded message from its encoded version, by only feeding a small number of the encoded message to the decoder, and we call the decoder t-local if only t encoded symbols are required.We consider both almost lossless (block error) and lossy (bit error) cases for LDSC. First, we show that using linear encoder and a decoder with bounded locality, the reliable compress rate can not be less than one. More importantly, we show that even with a general encoder and 2-local decoders (t = 2), the rate of LDSC is still one. On the contrary, the achievability bounds for almost lossless and lossy compressions with excess distortion suggest that optimal compression rate is achievable when O(log n) encoded symbols is queried by the decoder with block-length n. We also show that, rate distortion is achievable when the number of queries is scaled over n with a bound on the rate in finite-length regime. Although the achievability bounds are simply based on the concatenation of code blocks, they outperform the existing bounds in succinct data structures literature.
IEEE Transactions on Information Theory | 2015
Shao-Lun Huang; Changho Suh; Lizhong Zheng
In this paper, we extend the information theoretic framework that was developed in earlier works to multi-hop network settings. For a given network, we construct a novel deterministic model that quantifies the ability of the network in transmitting private and common messages across users. Based on this model, we formulate a linear optimization problem that explores the throughput of a multi-layer network, thereby offering the optimal strategy as to what kind of common messages should be generated in the network to maximize the throughput. With this deterministic model, we also investigate the role of feedback for multi-layer networks, from which we identify a variety of scenarios in which feedback can improve transmission efficiency. Our results provide fundamental guidelines as to how to coordinate cooperation between users to enable efficient information exchanges across them.
information theory workshop | 2012
Emmanuel Abbe; Shao-Lun Huang; Emre Telatar
It is conjectured in [6] that the covariance matrices minimizing the outage probability under a power constraint for MIMO channels with Gaussian fading are diagonal with either zeros or constant values on the diagonal. In the MISO setting, this is equivalent to conjecture that the Gaussian quadratic forms having largest tail probability correspond to such diagonal matrices. This paper provides a proof of the conjecture in this MISO setting.
allerton conference on communication, control, and computing | 2016
I-Hsiang Wang; Shao-Lun Huang; Kuan-Yun Lee
We investigate the problem of extracting a sparse data set via histogram queries. A data set is a collection of items, and each item carries a piece of data. A data set is called sparse if there are only a small number of items carrying data of interest. We show that the fundamental limit on the query complexity is equation, n being the size of the data set and k < n being the sparsity level. A counting argument is used to establish the converse part, that is, the lower bound on query complexity. For the achievability part, we analyze a randomized querying method, where in each query, the items to be included in the queried subset are uniformly randomly selected. It is shown that with high probability, the randomly constructed querying method exactly recovers the desired data. Furthermore, we propose an adaptive deterministic algorithm to extract the sparse data set with query complexity equation, achieving the fundamental limit to within a log log k factor.