Hao Huang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hao Huang is active.

Explore More

Publication

Featured researches published by Hao Huang.

intelligent information systems | 2016

A formalized framework for incorporating expert labels in crowdsourcing environment

Qingyang Hu; Qinming He; Hao Huang; Kevin Chiew; Zhenguang Liu

Crowdsourcing services have been proven efficient in collecting large amount of labeled data for supervised learning tasks. However, the low cost of crowd workers leads to unreliable labels, a new problem for learning a reliable classifier. Various methods have been proposed to infer the ground truth or learn from crowd data directly though, there is no guarantee that these methods work well for highly biased or noisy crowd labels. Motivated by this limitation of crowd data, in this paper, we propose a novel framewor for improving the performance of crowdsourcing learning tasks by some additional expert labels, that is, we treat each labeler as a personal classifier and combine all labelers’ opinions from a model combination perspective, and summarize the evidence from crowds and experts naturally via a Bayesian classifier in the intermediate feature space formed by personal classifiers. We also introduce active learning to our framework and propose an uncertainty sampling algorithm for actively obtaining expert labels. Experiments show that our method can significantly improve the learning quality as compared with those methods solely using crowd labels.

database systems for advanced applications | 2017

Group-Level Influence Maximization with Budget Constraint

Qian Yan; Hao Huang; Yunjun Gao; Wei Lu; Qinming He

Influence maximization aims at finding a set of seed nodes in a social network that could influence the largest number of nodes. Existing work often focuses on the influence of individual nodes, ignoring that infecting different seeds may require different costs. Nonetheless, in many real-world applications such as advertising, advertisers care more about the influence of groups (e.g., crowds in the same areas or communities) rather than specific individuals, and are very concerned about how to maximize the influence with a limited budget. In this paper, we investigate the problem of group-level influence maximization with budget constraint. Towards this, we introduce a statistical method to reveal the influence relationship between the groups, based on which we propose a propagation model that can dynamically calculate the influence spread scope of seed groups, following by presenting a greedy algorithm called GLIMB to maximize the influence spread scope with a limited cost budget via the optimization of the seed-group portfolio. Theoretical analysis shows that GLIMB can guarantee an approximation ratio of at least \((1-1/\sqrt{e})\). Experimental results on both synthetic and real-world data sets verify the effectiveness and efficiency of our approach.

Knowledge and Information Systems | 2017

False data separation for data security in smart grids

Hao Huang; Qian Yan; Yao Zhao; Wei Lu; Zhenguang Liu; Zongpeng Li

The smart grid is emerging as an efficient paradigm for electric power generation, transmission, and consumption, based on optimized decision making and control that leverage the measurement data of sensors and meters in the grid. False data injection is a new type of power grid attacks aiming to tamper such important data. For the security and robustness of the grid, it is critical to separate the false data injected by such attacks and recover the original measurement data. Nonetheless, the existing approaches often neglect the true changes on original measurement data that are caused by the real perturbations on grid states and hence have a risk of removing these true changes as injected false data during the data recovery. In this paper, we preserve these true changes by modeling the false data problem as a rank-bounded

very large data bases | 2018

MSQL+: a plugin toolkit for similarity search under metric spaces in distributed relational database systems

Wei Lu; Haixiang Li; Xinyi Zhang; Zhiyu Shui; Zhe Peng; Xiao Zhang; Xiaoyong Du; Hao Huang; Xiaoyu Wang; Anqun Pan

database systems for advanced applications | 2018

Efficient and Scalable Mining of Frequent Subgraphs Using Distributed Graph Processing Systems

Tongtong Wang; Hao Huang; Wei Lu; Zhe Peng; Xiaoyong Du

L_1

Knowledge and Information Systems | 2018

Mining frequent subgraphs from tremendous amount of small graphs using MapReduce

Zhe Peng; Tongtong Wang; Wei Lu; Hao Huang; Xiaoyong Du; Feng Zhao; Anthony K. H. Tung

Journal of Intelligent Information Systems | 2018

Evaluation of local community metrics: from an experimental perspective

Lianhang Ma; Kevin Chiew; Hao Huang; Qinming He

L1 norm optimization and propose both offline and online algorithms to filter out the injected false data and recover original measurement data. Trace-driven simulations verify the efficacy of our solution.

database and expert systems applications | 2016

Mining Arbitrary Shaped Clusters and Outputting a High Quality Dendrogram

Hao Huang; Song Wang; Shuangke Wu; Yunjun Gao; Wei Lu; Qinming He; Shi Ying

Similarity search is a primitive operation in various database applications. Thus far, a large number of access methods have been proposed to accelerate the similarity query processing. Nonetheless, these methods mostly focus on developing standalone systems by proposing new indices. Given the fact that existing RDBMS merely support traditional indices, it is of great necessity and practical importance to develop a standard RDBMS built-in index based approach to speeding up the query processing. In this demonstration, we introduce MSQL+, a plugin toolkit that enable users to answer similarity queries in metric spaces simply using standard SQL statements. This toolkit can help existing RDBMS to effectively and efficiently handle with big data due to the following three advantages. First, MSQL+ enables users to find similar objects by submitting SELECT-FROM-WHERE statements so that it can be easily integrated into existing RDBMS. Second, MSQL+ works in a more general data space. Objects of any type can be indexed by B+-trees and the query processing can be boosted by using index seeks, as long as the similarity function is metric. Third, MSQL+ supports the parallelization of both pre-processing and query processing in distributed RDBMS.

asia-pacific web conference | 2016

Fast Rare Category Detection Using Nearest Centroid Neighborhood

Song Wang; Hao Huang; Yunjun Gao; Tieyun Qian; Liang Hong; Zhiyong Peng

Mining frequent subgraphs in large scale graph data sets helps reveal underlying knowledge. Since the mining approaches in centralized systems are often bottlenecked on calculation capacity, many parallelized solutions based on the MapReduce framework are proposed to scale out the mining process, which usually extracts frequent subgraphs in an iterative way. Nonetheless, the efficiency and scalability of these MapReduce based approaches are still bounded by the communication cost for passing the intermediate results and the unbalanced workload after a few iterations. In this paper, we propose an efficient and scalable framework for frequent subgraph mining by using distributed graph processing systems. It adopts a message-passing-free scheme among workers to reduce the communication cost, and utilizes a task scheduler to dynamically balance the workload. Experimental results on both synthetic and real-world data sets verify the efficacy of our proposed framework.

asia-pacific web conference | 2016

Modeling for Noisy Labels of Crowd Workers

Qian Yan; Hao Huang; Yunjun Gao; Chen Ying; Qingyang Hu; Tieyun Qian; Qinming He

Frequent subgraph mining from a tremendous amount of small graphs is a primitive operation for many data mining applications. Existing approaches mainly focus on centralized systems and suffer from the scalability issue. Consider the increasing volume of graph data and mining frequent subgraphs is a memory-intensive task, it is difficult to tackle this problem on a centralized machine efficiently. In this paper, we therefore propose an efficient and scalable solution, called MRFSE, using MapReduce. MRFSE adopts the breadth-first search strategy to iteratively extract frequent subgraphs, i.e., all frequent subgraphs with

Explore More