Weicong Ding
Boston University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Weicong Ding.
Journalism & Mass Communication Quarterly | 2016
Lei Guo; Chris J. Vargo; Zixuan Pan; Weicong Ding; Prakash Ishwar
This article presents an empirical study that investigated and compared two “big data” text analysis methods: dictionary-based analysis, perhaps the most popular automated analysis approach in social science research, and unsupervised topic modeling (i.e., Latent Dirichlet Allocation [LDA] analysis), one of the most widely used algorithms in the field of computer science and engineering. By applying two “big data” methods to make sense of the same dataset—77 million tweets about the 2012 U.S. presidential election—the study provides a starting point for scholars to evaluate the efficacy and validity of different computer-assisted methods for conducting journalism and mass communication research, especially in the area of political communication.
international conference on acoustics, speech, and signal processing | 2013
Weicong Ding; Mohammad H. Rohban; Prakash Ishwar; Venkatesh Saligrama
A new geometrically-motivated algorithm for topic modeling is developed and applied to the discovery of latent “topics” in text and image “document” corpora. The algorithm is based on robustly finding and clustering extreme-points of empirical cross-document word-frequencies that correspond to novel words unique to each topic. In contrast to related approaches that are based on solving non-convex optimization problems using suboptimal approximations, locally-optimal methods, or heuristics, the new algorithm is convex, has polynomial complexity, and has competitive qualitative and quantitative performance compared to the current state- of-the-art approaches on synthetic and real-world datasets.
information theory and applications | 2015
Weicong Ding; Prakash Ishwar; Venkatesh Saligrama
Separability has recently been leveraged as a key structural condition in topic models to develop asymptotically consistent algorithms with polynomial statistical and computational efficiency guarantees. Separability corresponds to the presence of at least one novel word for each topic. Empirical estimates of topic matrices for Latent Dirichlet Allocation models have been observed to be approximately separable. Separability may be a convenient structural property, but it appears to be too restrictive a condition. In this paper we explicitly demonstrate that separability is, in fact, an inevitable consequence of high-dimensionality. In particular, we prove that when the columns of the topic matrix are independently sampled from a Dirichlet distribution, the resulting topic matrix will be approximately separable with probability tending to one as the number of rows (vocabulary size) scales to infinity sufficiently faster than the number of columns (topics). This is based on combining concentration of measure results with properties of the Dirichlet distribution and union bounding arguments. Our proof techniques can be extended to other priors for general nonnegative matrices.
ieee transactions on signal and information processing over networks | 2017
Weicong Ding; Christy Lin; Prakash Ishwar
Neural node embeddings have recently emerged as a powerful representation for supervised learning tasks involving graph-structured data. We leverage this recent advance to develop a novel algorithm for unsupervised community discovery in graphs. Through extensive experimental studies on simulated and real-world data, we demonstrate that the proposed approach consistently improves over the current state-of-the-art. Specifically, our approach empirically attains the information-theoretic limits for community recovery under the benchmark stochastic block models for graph generation and exhibits better stability and accuracy over both spectral clustering and acyclic belief propagation in the community recovery limits.
IEEE Journal of Selected Topics in Signal Processing | 2016
Weicong Ding; Prakash Ishwar; Venkatesh Saligrama
We develop necessary and sufficient conditions and a novel provably consistent and efficient algorithm for discovering topics (latent factors) from observations (documents) that are realized from a probabilistic mixture of shared latent factors that have certain properties. Our focus is on the class of topic models in which each shared latent factor contains a novel word that is unique to that factor, a property that has come to be known as separability. Our algorithm is based on the key insight that the novel words correspond to the extreme points of the convex hull formed by the row-vectors of a suitably normalized word co-occurrence matrix. We leverage this geometric insight to establish polynomial computational and sample complexity bounds based on a few isotropic random projections of the rows of the normalized word co-occurrence matrix. Our proposed random-projections-based algorithm is naturally amenable to an efficient distributed implementation and is attractive for modern web-scale distributed data mining applications.
international conference on acoustics, speech, and signal processing | 2017
Christy Lin; Prakash Ishwar; Weicong Ding
Neural node embedding has been recently developed as a powerful representation for supervised tasks with graph data. We leverage this recent advance and propose a novel approach for unsupervised community discovery in graphs. Through extensive experimental studies on simulated and real-world data, we demonstrate consistent improvement of the proposed approach over the current state-of-the-arts. Specifically, our approach empirically attains the information theoretic limits under the benchmark Stochastic Block Models and exhibits better stability and accuracy over the best known algorithms in the community recovery limits.
international conference on acoustics, speech, and signal processing | 2015
Weicong Ding; Prakash Ishwar; Venkatesh Saligrama
We propose a novel model for rank aggregation from pairwise comparisons which accounts for a heterogeneous population of inconsistent users whose preferences are different mixtures of multiple shared ranking schemes. By connecting this problem to recent advances in the non-negative matrix factorization (NMF) literature, we develop an algorithm that can learn the underlying shared rankings with provable statistical and computational efficiency guarantees. We validate the approach using semi-synthetic and real world datasets.
international conference on acoustics, speech, and signal processing | 2014
Weicong Ding; Prakash Ishwar; Venkatesh Saligrama; W. Clem Karl
We propose a novel approach for designing kernels for support vector machines (SVMs) when the class label is linked to the observation through a latent state and the likelihood function of the observation given the state (the sensing model) is available. We show that the Bayes-optimum decision boundary is a hyperplane under a mapping defined by the likelihood function. Combining this with the maximum margin principle yields kernels for SVMs that leverage knowledge of the sensing model in an optimal way. We derive the optimum kernel for the bag-of-words (BoWs) sensing model and demonstrate its superior performance over other kernels in document and image classification tasks. These results indicate that such optimum sensing-aware kernel SVMs can match the performance of rather sophisticated state-of-the-art approaches.
asilomar conference on signals, systems and computers | 2013
Weicong Ding; Prakash Ishwar; Venkatesh Saligrama
We consider a novel problem of endmember detection in hyperspectral imagery where signal of frequency bands are probed sequentially. We propose an adaptive strategy in controlling the sensing order to maximize the normalized solid angle as a robustness measure of the problem geometry. This is based on efficiently identifying pure pixels that are unique to each endmember and exploiting information from a spectral library known in advance though sequential random projections. We present simulations on synthetic datasets to demonstrate the merits of our scheme in reducing the observation cost.
international conference on machine learning | 2013
Weicong Ding; Mohammad H. Rohban; Prakash Ishwar; Venkatesh Saligrama