Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yaliang Li is active.

Publication


Featured researches published by Yaliang Li.


international conference on management of data | 2014

Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation

Qi Li; Yaliang Li; Jing Gao; Bo Zhao; Wei Fan; Jiawei Han

In many applications, one can obtain descriptions about the same objects or events from a variety of sources. As a result, this will inevitably lead to data or information conflicts. One important problem is to identify the true information (i.e., the truths) among conflicting sources of data. It is intuitive to trust reliable sources more when deriving the truths, but it is usually unknown which one is more reliable a priori. Moreover, each source possesses a variety of properties with different data types. An accurate estimation of source reliability has to be made by modeling multiple properties in a unified model. Existing conflict resolution work either does not conduct source reliability estimation, or models multiple properties separately. In this paper, we propose to resolve conflicts among multiple sources of heterogeneous data types. We model the problem using an optimization framework where truths and source reliability are defined as two sets of unknown variables. The objective is to minimize the overall weighted deviation between the truths and the multi-source observations where each source is weighted by its reliability. Different loss functions can be incorporated into this framework to recognize the characteristics of various data types, and efficient computation approaches are developed. Experiments on real-world weather, stock and flight data as well as simulated multi-source data demonstrate the necessity of jointly modeling different data types in the proposed framework.


Sigkdd Explorations | 2016

A Survey on Truth Discovery

Yaliang Li; Jing Gao; Chuishi Meng; Qi Li; Lu Su; Bo Zhao; Wei Fan; Jiawei Han

Thanks to information explosion, data for the objects of interest can be collected from increasingly more sources. However, for the same object, there usually exist conflicts among the collected multi-source information. To tackle this challenge, truth discovery, which integrates multi-source noisy information by estimating the reliability of each source, has emerged as a hot topic. Several truth discovery methods have been proposed for various scenarios, and they have been successfully applied in diverse application domains. In this survey, we focus on providing a comprehensive overview of truth discovery methods, and summarizing them from different aspects. We also discuss some future directions of truth discovery research. We hope that this survey will promote a better understanding of the current progress on truth discovery, and offer some guidelines on how to apply these approaches in application domains.


very large data bases | 2014

A confidence-aware approach for truth discovery on long-tail data

Qi Li; Yaliang Li; Jing Gao; Lu Su; Bo Zhao; Murat Demirbas; Wei Fan; Jiawei Han

In many real world applications, the same item may be described by multiple sources. As a consequence, conflicts among these sources are inevitable, which leads to an important task: how to identify which piece of information is trustworthy, i.e., the truth discovery task. Intuitively, if the piece of information is from a reliable source, then it is more trustworthy, and the source that provides trustworthy information is more reliable. Based on this principle, truth discovery approaches have been proposed to infer source reliability degrees and the most trustworthy information (i.e., the truth) simultaneously. However, existing approaches overlook the ubiquitous long-tail phenomenon in the tasks, i.e., most sources only provide a few claims and only a few sources make plenty of claims, which causes the source reliability estimation for small sources to be unreasonable. To tackle this challenge, we propose a confidence-aware truth discovery (CATD) method to automatically detect truths from conflicting data with long-tail phenomenon. The proposed method not only estimates source reliability, but also considers the confidence interval of the estimation, so that it can effectively reflect real source reliability for sources with various levels of participation. Experiments on four real world tasks as well as simulated multi-source long-tail datasets demonstrate that the proposed method outperforms existing state-of-the-art truth discovery approaches by successful discounting the effect of small sources.


international conference on embedded networked sensor systems | 2015

Truth Discovery on Crowd Sensing of Correlated Entities

Chuishi Meng; Wenjun Jiang; Yaliang Li; Jing Gao; Lu Su; Hu Ding; Yun Cheng

With the popular usage of mobile devices and smartphones, crowd sensing becomes pervasive in real life when human acts as sensors to report their observations about entities. For the same entity, users may report conflicting information, and thus it is important to identify the true information and the reliable users. This task, referred to as truth discovery, has recently attracted much attention. Existing work typically assumes independence among entities. However, correlations among entities are commonly observed in many applications. Such correlation information is crucial in the truth discovery task. When entities are not observed by enough reliable users, it is impossible to obtain true information. In such cases, it is important to propagate trustworthy information from correlated entities that have been observed by reliable users. We formulate the task of truth discovery on correlated entities as an optimization problem in which both truths and user reliability are modeled as variables. The correlation among entities adds to the difficulty of solving this problem. In light of the challenge, we propose both sequential and parallel solutions. In the sequential solution, we partition entities into disjoint independent sets and derive iterative approaches based on block coordinate descent. In the parallel solution, we adapt the solution to MapReduce programming model, which can be executed on Hadoop clusters. Experiments on real-world crowd sensing applications show the advantages of the proposed method on discovering truths from conflicting information reported on correlated entities.


knowledge discovery and data mining | 2015

On the Discovery of Evolving Truth

Yaliang Li; Qi Li; Jing Gao; Lu Su; Bo Zhao; Wei Fan; Jiawei Han

In the era of big data, information regarding the same objects can be collected from increasingly more sources. Unfortunately, there usually exist conflicts among the information coming from different sources. To tackle this challenge, truth discovery, i.e., to integrate multi-source noisy information by estimating the reliability of each source, has emerged as a hot topic. In many real world applications, however, the information may come sequentially, and as a consequence, the truth of objects as well as the reliability of sources may be dynamically evolving. Existing truth discovery methods, unfortunately, cannot handle such scenarios. To address this problem, we investigate the temporal relations among both object truths and source reliability, and propose an incremental truth discovery framework that can dynamically update object truths and source weights upon the arrival of new data. Theoretical analysis is provided to show that the proposed method is guaranteed to converge at a fast rate. The experiments on three real world applications and a set of synthetic data demonstrate the advantages of the proposed method over state-of-the-art truth discovery methods.


conference on information and knowledge management | 2016

Multi-View Time Series Classification: A Discriminative Bilinear Projection Approach

Sheng Li; Yaliang Li; Yun Fu

By virtue of the increasingly large amount of various sensors, information about the same object can be collected from multiple views. These mutually enriched information can help many real-world applications, such as daily activity recognition in which both video cameras and on-body sensors are continuously collecting information. Such multivariate time series (m.t.s.) data from multiple views can lead to a significant improvement of classification tasks. However, the existing methods for time series data classification only focus on single-view data, and the benefits of mutual-support multiple views are not taken into account. In light of this challenge, we propose a novel approach, named Multi-view Discriminative Bilinear Projections (MDBP), for extracting discriminative features from multi-view m.t.s. data. First, MDBP keeps the original temporal structure of m.t.s. data, and projects m.t.s. from different views onto a shared latent subspace. Second, MDBP incorporates discriminative information by minimizing the within-class separability and maximizing the between-class separability of m.t.s. in the shared latent subspace. Moreover, a Laplacian regularization term is designed to preserve the temporal smoothness within m.t.s.. Extensive experiments on two real-world datasets demonstrate the effectiveness of our approach. Compared to the state-of-the-art multi-view learning and m.t.s. classification methods, our approach greatly improves the classification accuracy due to the full exploration of multi-view streaming data. Moreover, by using a feature fusion strategy, our approach further improves the classification accuracy by at least 10%.


IEEE Transactions on Knowledge and Data Engineering | 2016

Conflicts to Harmony: A Framework for Resolving Conflicts in Heterogeneous Data by Truth Discovery

Yaliang Li; Qi Li; Jing Gao; Lu Su; Bo Zhao; Wei Fan; Jiawei Han

In many applications, one can obtain descriptions about the same objects or events from a variety of sources. As a result, this will inevitably lead to data or information conflicts. One important problem is to identify the true information (i.e., the truths) among conflicting sources of data. It is intuitive to trust reliable sources more when deriving the truths, but it is usually unknown which one is more reliable a priori. Moreover, each source possesses a variety of properties with different data types. An accurate estimation of source reliability has to be made by modeling multiple properties in a unified model. Existing conflict resolution work either does not conduct source reliability estimation, or models multiple properties separately. In this paper, we propose to resolve conflicts among multiple sources of heterogeneous data types. We model the problem using an optimization framework where truths and source reliability are defined as two sets of unknown variables. The objective is to minimize the overall weighted deviation between the truths and the multi-source observations where each source is weighted by its reliability. Different loss functions can be incorporated into this framework to recognize the characteristics of various data types, and efficient computation approaches are developed. The proposed framework is further adapted to deal with streaming data in an incremental fashion and large-scale data in MapReduce model. Experiments on real-world weather, stock, and flight data as well as simulated multi-source data demonstrate the advantage of jointly modeling different data types in the proposed framework.


international joint conference on artificial intelligence | 2017

A Correlated Topic Model Using Word Embeddings

Guangxu Xun; Yaliang Li; Wayne Xin Zhao; Jing Gao; Aidong Zhang

Conventional correlated topic models are able to capture correlation structure among latent topics by replacing the Dirichlet prior with the logistic normal distribution. Word embeddings have been proven to be able to capture semantic regularities in language. Therefore, the semantic relatedness and correlations between words can be directly calculated in the word embedding space, for example, via cosine values. In this paper, we propose a novel correlated topic model using word embeddings. The proposed model enables us to exploit the additional word-level correlation information in word embeddings and directly model topic correlation in the continuous word embedding space. In the model, words in documents are replaced with meaningful word embeddings, topics are modeled as multivariate Gaussian distributions over the word embeddings and topic correlations are learned among the continuous Gaussian topics. A Gibbs sampling solution with data augmentation is given to perform inference. We evaluate our model on the 20 Newsgroups dataset and the Reuters-21578 dataset qualitatively and quantitatively. The experimental results show the effectiveness of our proposed model.


knowledge discovery and data mining | 2017

Collaboratively Improving Topic Discovery and Word Embeddings by Coordinating Global and Local Contexts

Guangxu Xun; Yaliang Li; Jing Gao; Aidong Zhang

A text corpus typically contains two types of context information -- global context and local context. Global context carries topical information which can be utilized by topic models to discover topic structures from the text corpus, while local context can train word embeddings to capture semantic regularities reflected in the text corpus. This encourages us to exploit the useful information in both the global and the local context information. In this paper, we propose a unified language model based on matrix factorization techniques which 1) takes the complementary global and local context information into consideration simultaneously, and 2) models topics and learns word embeddings collaboratively. We empirically show that by incorporating both global and local context, this collaborative model can not only significantly improve the performance of topic discovery over the baseline topic models, but also learn better word embeddings than the baseline word embedding models. We also provide qualitative analysis that explains how the cooperation of global and local context information can result in better topic structures and word embeddings.


international conference on data mining | 2016

Topic Discovery for Short Texts Using Word Embeddings

Guangxu Xun; Vishrawas Gopalakrishnan; Fenglong Ma; Yaliang Li; Jing Gao; Aidong Zhang

Discovering topics in short texts, such as news titles and tweets, has become an important task for many content analysis applications. However, due to the lack of rich context information in short texts, the performance of conventional topic models on short texts is usually unsatisfying. In this paper, we propose a novel topic model for short text corpus using word embeddings. Continuous space word embeddings, which is proven effective at capturing regularities in language, is incorporated into our model to provide additional semantics. Thus we model each short document as a Gaussian topic over word embeddings in the vector space. In addition, considering that background words in a short text are usually not semantically related, we introduce a discrete background mode over word types to complement the continuous Gaussian topics. We evaluate our model on news titles from data sources like abcnews, showing that our model is able to extract more coherent topics from short texts compared with the baseline methods and learn better topic representation for each short document.

Collaboration


Dive into the Yaliang Li's collaboration.

Top Co-Authors

Avatar

Jing Gao

University at Buffalo

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lu Su

University at Buffalo

View shared research outputs
Top Co-Authors

Avatar

Qi Li

University at Buffalo

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chenwei Zhang

University of Illinois at Chicago

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge