Hongxia Yang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hongxia Yang is active.

Explore More

Publication

Featured researches published by Hongxia Yang.

knowledge discovery and data mining | 2014

Learning with dual heterogeneity: a nonparametric bayes model

Hongxia Yang; Jingrui He

Traditional data mining techniques are designed to model a single type of heterogeneity, such as multi-task learning for modeling task heterogeneity, multi-view learning for modeling view heterogeneity, etc. Recently, a variety of real applications emerged, which exhibit dual heterogeneity, namely both task heterogeneity and view heterogeneity. Examples include insider threat detection across multiple organizations, web image classification in different domains, etc. Existing methods for addressing such problems typically assume that multiple tasks are equally related and multiple views are equally consistent, which limits their application in complex settings with varying task relatedness and view consistency. In this paper, we advance state-of-the-art techniques by adaptively modeling task relatedness and view consistency via a nonparametric Bayes model: we model task relatedness using normal penalty with sparse covariances, and view consistency using matrix Dirichlet process. Based on this model, we propose the NOBLE algorithm using an efficient Gibbs sampler. Experimental results on multiple real data sets demonstrate the effectiveness of the proposed algorithm.

knowledge discovery and data mining | 2015

Co-Clustering based Dual Prediction for Cargo Pricing Optimization

Yada Zhu; Hongxia Yang; Jingrui He

This paper targets the problem of cargo pricing optimization in the air cargo business. Given the features associated with a pair of origination and destination, how can we simultaneously predict both the optimal price for the bid stage and the outcome of the transaction (win rate) in the decision stage? In addition, it is often the case that the matrix representing pairs of originations and destinations has a block structure, i.e., the originations and destinations can be co-clustered such that the predictive models are similar within the same co-cluster, and exhibit significant variation among different co-clusters. How can we uncover the co-clusters of originations and destinations while constructing the dual predictive models for the two stages? We take the first step at addressing these problems. In particular, we propose a probabilistic framework to simultaneously construct dual predictive models and uncover the co-clusters of originations and destinations. It maximizes the conditional probability of observing the responses from both the quotation stage and the decision stage, given the features and the co-clusters. By introducing an auxiliary distribution based on the co-clustering assumption, such conditional probability can be converted into an objective function. To minimize the objective function, we propose the cocoa algorithm, which will generate both the suite of predictive models for all the pairs of originations and destinations, as well as the co-clusters consisting of similar pairs. Experimental results on both synthetic data and real data from cargo price bidding demonstrate the effectiveness and efficiency of the proposed algorithm.

international conference on data mining | 2014

Learning from Label and Feature Heterogeneity

Pei Yang; Jingrui He; Hongxia Yang; Haoda Fu

Multiple types of heterogeneity, such as label heterogeneity and feature heterogeneity, often co-exist in many real-world data mining applications, such as news article categorization, gene functionality prediction. To effectively leverage such heterogeneity, in this paper, we propose a novel graph-based framework for Learning with both Label and Feature heterogeneities, namely L2F. It models the label correlation by requiring that any two label-specific classifiers behave similarly on the same views if the associated labels are similar, and imposes the view consistency by requiring that view-based classifiers generate similar predictions on the same examples. To solve the resulting optimization problem, we propose an iterative algorithm, which is guaranteed to converge to the global optimum. Furthermore, we analyze its generalization performance based on Rademacher complexity, which sheds light on the benefits of jointly modeling the label and feature heterogeneity. Experimental results on various data sets show the effectiveness of the proposed approach.

knowledge discovery and data mining | 2017

A Hybrid Framework for Text Modeling with Convolutional RNN

Chenglong Wang; Feijun Jiang; Hongxia Yang

In this paper, we introduce a generic inference hybrid framework for Convolutional Recurrent Neural Network (conv-RNN) of semantic modeling of text, seamless integrating the merits on extracting different aspects of linguistic information from both convolutional and recurrent neural network structures and thus strengthening the semantic understanding power of the new framework. Besides, based on conv-RNN, we also propose a novel sentence classification model and an attention based answer selection model with strengthening power for the sentence matching and classification respectively. We validate the proposed models on a very wide variety of data sets, including two challenging tasks of answer selection (AS) and five benchmark datasets for sentence classification (SC). To the best of our knowledge, it is by far the most complete comparison results in both AS and SC. We empirically show superior performances of conv-RNN in these different challenging tasks and benchmark datasets and also summarize insights on the performances of other state-of-the-arts methodologies.

knowledge discovery and data mining | 2018

Adversarial Detection with Model Interpretation

Ninghao Liu; Hongxia Yang; Xia Hu

Machine learning (ML) systems have been increasingly applied in web security applications such as spammer detection, malware detection and fraud detection. These applications have an intrinsic adversarial nature where intelligent attackers can adaptively change their behaviors to avoid being detected by the deployed detectors. Existing efforts against adversaries are usually limited by the type of applied ML models or the specific applications such as image classification. Additionally, the working mechanisms of ML models usually cannot be well understood by users, which in turn impede them from understanding the vulnerabilities of models nor improving their robustness. To bridge the gap, in this paper, we propose to investigate whether model interpretation could potentially help adversarial detection. Specifically, we develop a novel adversary-resistant detection framework by utilizing the interpretation of ML models. The interpretation process explains the mechanism of how the target ML model makes prediction for a given instance, thus providing more insights for crafting adversarial samples. The robustness of detectors is then improved through adversarial training with the adversarial samples. A data-driven method is also developed to empirically estimate costs of adversaries in feature manipulation. Our approach is model-agnostic and can be applied to various types of classification models. Our experimental results on two real-world datasets demonstrate the effectiveness of interpretation-based attacks and how estimated feature manipulation cost would affect the behavior of adversaries.

international world wide web conferences | 2018

Subgraph-augmented Path Embedding for Semantic User Search on Heterogeneous Social Network

Zemin Liu; Vincent Wenchen Zheng; Zhou Zhao; Hongxia Yang; Kevin Chen Chuan Chang; Minghui Wu; Jing Ying

Semantic user search is an important task on heterogeneous social networks. Its core problem is to measure the proximity between two user objects in the network w.r.t. certain semantic user relation. State-of-the-art solutions often take a path-based approach, which uses the sequences of objects connecting a query user and a target user to measure their proximity. Despite their success, we assert that path as a low-order structure is insufficient to capture the rich semantics between two users. Therefore, in this paper we introduce a new concept of subgraph-augmented path for semantic user search. Specifically, we consider sampling a set of object paths from a query user to a target user; then in each object path, we replace the linear object sequence between its every two neighboring users with their shared subgraph instances. Such subgraph-augmented paths are expected to leverage both path»s distance awareness and subgraph»s high-order structure. As it is non-trivial to model such subgraph-augmented paths, we develop a Subgraph-augmented Path Embedding (SPE) framework to accomplish the task. We evaluate our solution on six semantic user relations in three real-world public data sets, and show that it outperforms the baselines.

international joint conference on artificial intelligence | 2018

ANRL: Attributed Network Representation Learning via Deep Neural Networks

Zhen Zhang; Hongxia Yang; Jiajun Bu; Sheng Zhou; Pinggang Yu; Jianwei Zhang; Martin Ester; Can Wang

Network representation learning (RL) aims to transform the nodes in a network into lowdimensional vector spaces while preserving the inherent properties of the network. Though network RL has been intensively studied, most existing works focus on either network structure or node attribute information. In this paper, we propose a novel framework, named ANRL, to incorporate both the network structure and node attribute information in a principled way. Specifically, we propose a neighbor enhancement autoencoder to model the node attribute information, which reconstructs its target neighbors instead of itself. To capture the network structure, attribute-aware skipgram model is designed based on the attribute encoder to formulate the correlations between each node and its direct or indirect neighbors. We conduct extensive experiments on six real-world networks, including two social networks, two citation networks and two user behavior networks. The results empirically show that ANRL can achieve relatively significant gains in node classification and link prediction tasks.

knowledge discovery and data mining | 2017

Local Algorithm for User Action Prediction Towards Display Ads

Hongxia Yang; Yada Zhu; Jingrui He

User behavior modeling is essential in computational advertisement, which builds users profiles by tracking their online behaviors and then delivers the relevant ads according to each users interests and needs. Accurate models will lead to higher targeting accuracy and thus improved advertising performance. Intuitively, similar users tend to have similar behaviors towards the displayed ads (e.g., impression, click, conversion). However, to the best of our knowledge, there is not much previous work that explicitly investigates such similarities of various types of user behaviors, and incorporates them into ad response targeting and prediction, largely due to the prohibitive scale of the problem. To bridge this gap, in this paper, we use bipartite graphs to represent historical user behaviors, which consist of both user nodes and advertiser campaign nodes, as well as edges reflecting various types of user-campaign interactions in the past. Based on this representation, we study random-walk-based local algorithms for user behavior modeling and action prediction, whose computational complexity depends only on the size of the output cluster, rather than the entire graph. Our goal is to improve action prediction by leveraging historical user-user, campaign-campaign, and user-campaign interactions. In particular, we propose the bipartite graphs AdvUserGraph accompanied with the ADNI algorithm. ADNI extends the NIBBLE algorithm to AdvUserGraph, and it is able to find the local cluster consisting of interested users towards a specific advertiser campaign. We also propose two extensions of ADNI with improved efficiencies. The performance of the proposed algorithms is demonstrated on both synthetic data and a world leading Demand Side Platform (DSP), showing that they are able to discriminate extremely rare events in terms of their action propensity.

ACM Transactions on Knowledge Discovery From Data | 2016

Jointly Modeling Label and Feature Heterogeneity in Medical Informatics

Pei Yang; Hongxia Yang; Haoda Fu; Dawei Zhou; Jieping Ye; Theodoros Lappas; Jingrui He

Multiple types of heterogeneity including label heterogeneity and feature heterogeneity often co-exist in many real-world data mining applications, such as diabetes treatment classification, gene functionality prediction, and brain image analysis. To effectively leverage such heterogeneity, in this article, we propose a novel graph-based model for Learning with both Label and Feature heterogeneity, namely L2F. It models the label correlation by requiring that any two label-specific classifiers behave similarly on the same views if the associated labels are similar, and imposes the view consistency by requiring that view-based classifiers generate similar predictions on the same examples. The objective function for L2F is jointly convex. To solve the optimization problem, we propose an iterative algorithm, which is guaranteed to converge to the global optimum. One appealing feature of L2F is that it is capable of handling data with missing views and labels. Furthermore, we analyze its generalization performance based on Rademacher complexity, which sheds light on the benefits of jointly modeling the label and feature heterogeneity. Experimental results on various biomedical datasets show the effectiveness of the proposed approach.

knowledge discovery and data mining | 2018

Mobile Access Record Resolution on Large-Scale Identifier-Linkage Graphs

Shen Xin; Hongxia Yang; Weizhao Xian; Martin Ester; Jiajun Bu; Zhongyao Wang; Can Wang

The e-commerce era is witnessing a rapid increase of mobile Internet users. Major e-commerce companies nowadays see billions of mobile accesses every day. Hidden in these records are valuable user behavioral characteristics such as their shopping preferences and browsing patterns. And, to extract these knowledge from the huge dataset, we need to first link records to the corresponding mobile devices. This Mobile Access Records Resolution (MARR) problem is confronted with two major challenges: (1) device identifiers and other attributes in access records might be missing or unreliable; (2) the dataset contains billions of access records from millions of devices. To the best of our knowledge, as a novel challenge industrial problem of mobile Internet, no existing method has been developed to resolve entities using mobile device identifiers in such a massive scale. To address these issues, we propose a SParse Identifier-linkage Graph (SPI-Graph) accompanied with the abundant mobile device profiling data to accurately match mobile access records to devices. Furthermore, two versions (unsupervised and semi-supervised) of Parallel Graph-based Record Resolution (PGRR) algorithm are developed to effectively exploit the advantages of the large-scale server clusters comprising of more than 1,000 computing nodes. We empirically show superior performances of PGRR algorithms in a very challenging and sparse real data set containing 5.28 million nodes and 31.06 million edges from 2.15 billion access records compared to other state-of-the-arts methodologies.

Explore More