Hao Huang
Wuhan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hao Huang.
intelligent information systems | 2016
Qingyang Hu; Qinming He; Hao Huang; Kevin Chiew; Zhenguang Liu
Crowdsourcing services have been proven efficient in collecting large amount of labeled data for supervised learning tasks. However, the low cost of crowd workers leads to unreliable labels, a new problem for learning a reliable classifier. Various methods have been proposed to infer the ground truth or learn from crowd data directly though, there is no guarantee that these methods work well for highly biased or noisy crowd labels. Motivated by this limitation of crowd data, in this paper, we propose a novel framewor for improving the performance of crowdsourcing learning tasks by some additional expert labels, that is, we treat each labeler as a personal classifier and combine all labelers’ opinions from a model combination perspective, and summarize the evidence from crowds and experts naturally via a Bayesian classifier in the intermediate feature space formed by personal classifiers. We also introduce active learning to our framework and propose an uncertainty sampling algorithm for actively obtaining expert labels. Experiments show that our method can significantly improve the learning quality as compared with those methods solely using crowd labels.
database systems for advanced applications | 2017
Qian Yan; Hao Huang; Yunjun Gao; Wei Lu; Qinming He
Influence maximization aims at finding a set of seed nodes in a social network that could influence the largest number of nodes. Existing work often focuses on the influence of individual nodes, ignoring that infecting different seeds may require different costs. Nonetheless, in many real-world applications such as advertising, advertisers care more about the influence of groups (e.g., crowds in the same areas or communities) rather than specific individuals, and are very concerned about how to maximize the influence with a limited budget. In this paper, we investigate the problem of group-level influence maximization with budget constraint. Towards this, we introduce a statistical method to reveal the influence relationship between the groups, based on which we propose a propagation model that can dynamically calculate the influence spread scope of seed groups, following by presenting a greedy algorithm called GLIMB to maximize the influence spread scope with a limited cost budget via the optimization of the seed-group portfolio. Theoretical analysis shows that GLIMB can guarantee an approximation ratio of at least \((1-1/\sqrt{e})\). Experimental results on both synthetic and real-world data sets verify the effectiveness and efficiency of our approach.
Knowledge and Information Systems | 2017
Hao Huang; Qian Yan; Yao Zhao; Wei Lu; Zhenguang Liu; Zongpeng Li
The smart grid is emerging as an efficient paradigm for electric power generation, transmission, and consumption, based on optimized decision making and control that leverage the measurement data of sensors and meters in the grid. False data injection is a new type of power grid attacks aiming to tamper such important data. For the security and robustness of the grid, it is critical to separate the false data injected by such attacks and recover the original measurement data. Nonetheless, the existing approaches often neglect the true changes on original measurement data that are caused by the real perturbations on grid states and hence have a risk of removing these true changes as injected false data during the data recovery. In this paper, we preserve these true changes by modeling the false data problem as a rank-bounded
very large data bases | 2018
Wei Lu; Haixiang Li; Xinyi Zhang; Zhiyu Shui; Zhe Peng; Xiao Zhang; Xiaoyong Du; Hao Huang; Xiaoyu Wang; Anqun Pan
database systems for advanced applications | 2018
Tongtong Wang; Hao Huang; Wei Lu; Zhe Peng; Xiaoyong Du
L_1
Knowledge and Information Systems | 2018
Zhe Peng; Tongtong Wang; Wei Lu; Hao Huang; Xiaoyong Du; Feng Zhao; Anthony K. H. Tung
Journal of Intelligent Information Systems | 2018
Lianhang Ma; Kevin Chiew; Hao Huang; Qinming He
L1 norm optimization and propose both offline and online algorithms to filter out the injected false data and recover original measurement data. Trace-driven simulations verify the efficacy of our solution.
database and expert systems applications | 2016
Hao Huang; Song Wang; Shuangke Wu; Yunjun Gao; Wei Lu; Qinming He; Shi Ying
Similarity search is a primitive operation in various database applications. Thus far, a large number of access methods have been proposed to accelerate the similarity query processing. Nonetheless, these methods mostly focus on developing standalone systems by proposing new indices. Given the fact that existing RDBMS merely support traditional indices, it is of great necessity and practical importance to develop a standard RDBMS built-in index based approach to speeding up the query processing. In this demonstration, we introduce MSQL+, a plugin toolkit that enable users to answer similarity queries in metric spaces simply using standard SQL statements. This toolkit can help existing RDBMS to effectively and efficiently handle with big data due to the following three advantages. First, MSQL+ enables users to find similar objects by submitting SELECT-FROM-WHERE statements so that it can be easily integrated into existing RDBMS. Second, MSQL+ works in a more general data space. Objects of any type can be indexed by B+-trees and the query processing can be boosted by using index seeks, as long as the similarity function is metric. Third, MSQL+ supports the parallelization of both pre-processing and query processing in distributed RDBMS.
asia-pacific web conference | 2016
Song Wang; Hao Huang; Yunjun Gao; Tieyun Qian; Liang Hong; Zhiyong Peng
Mining frequent subgraphs in large scale graph data sets helps reveal underlying knowledge. Since the mining approaches in centralized systems are often bottlenecked on calculation capacity, many parallelized solutions based on the MapReduce framework are proposed to scale out the mining process, which usually extracts frequent subgraphs in an iterative way. Nonetheless, the efficiency and scalability of these MapReduce based approaches are still bounded by the communication cost for passing the intermediate results and the unbalanced workload after a few iterations. In this paper, we propose an efficient and scalable framework for frequent subgraph mining by using distributed graph processing systems. It adopts a message-passing-free scheme among workers to reduce the communication cost, and utilizes a task scheduler to dynamically balance the workload. Experimental results on both synthetic and real-world data sets verify the efficacy of our proposed framework.
asia-pacific web conference | 2016
Qian Yan; Hao Huang; Yunjun Gao; Chen Ying; Qingyang Hu; Tieyun Qian; Qinming He
Frequent subgraph mining from a tremendous amount of small graphs is a primitive operation for many data mining applications. Existing approaches mainly focus on centralized systems and suffer from the scalability issue. Consider the increasing volume of graph data and mining frequent subgraphs is a memory-intensive task, it is difficult to tackle this problem on a centralized machine efficiently. In this paper, we therefore propose an efficient and scalable solution, called MRFSE, using MapReduce. MRFSE adopts the breadth-first search strategy to iteratively extract frequent subgraphs, i.e., all frequent subgraphs with