Hongxia Jin
Samsung
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hongxia Jin.
web search and data mining | 2015
Bin Liu; Deguang Kong; Lei Cen; Neil Zhenqiang Gong; Hongxia Jin; Hui Xiong
Recent years have witnessed a rapid adoption of mobile devices and a dramatic proliferation of mobile applications (Apps for brevity). However, the large number of mobile Apps makes it difficult for users to locate relevant Apps. Therefore, recommending Apps becomes an urgent task. Traditional recommendation approaches focus on learning the interest of a user and the functionality of an item (e.g., an App) from a set of user-item ratings, and they recommend an item to a user if the items functionality well matches the users interest. However, Apps could have privileges to access a users sensitive resources ( e.g., contact, message, and location). As a result, a user chooses an App not only because of its functionality, but also because it respects the users privacy preference. To the best of our knowledge, this paper presents the first systematic study on incorporating both interest-functionality interactions and users privacy preferences to perform personalized App recommendations. Specifically, we first construct a new model to capture the trade-off between functionality and user privacy preference. Then we crawled a real-world dataset (16,344 users, 6,157 Apps, and 263,054 ratings) from Google Play and use it to comprehensively evaluate our model and previous methods. We find that our method consistently and substantially outperforms the state-of-the-art approaches, which implies the importance of user privacy preference on personalized App recommendations. Moreover, we explore the impact of different levels of privacy information on the performances of our method, which gives us insights on what resources are more likely to be treated as private by users and influence users behaviors at selecting Apps.
international world wide web conferences | 2015
Liyue Fan; Hongxia Jin
The availability of an increasing amount of user generated data is transformative to our society. We enjoy the benefits of analyzing big data for public interest, such as disease outbreak detection and traffic control, as well as for commercial interests, such as smart grid and product recommendation. However, the large collection of user generated data contains unique patterns and can be used to re-identify individuals, which has been exemplified by the AOL search log release incident. In this paper, we propose a practical framework for data analytics, while providing differential privacy guarantees to individual data contributors. Our framework generates differentially private aggregates which can be used to perform data mining and recommendation tasks. To alleviate the high perturbation errors introduced by the differential privacy mechanism, we present two methods with different sampling techniques to draw a subset of individual data for analysis. Empirical studies with real-world data sets show that our solutions enable accurate data analytics on a small fraction of the input data, reducing user privacy risk and data storage requirement without compromising the analysis results.
computer and communications security | 2015
Deguang Kong; Lei Cen; Hongxia Jin
Along with the increasing popularity of mobile devices, there exist severe security and privacy concerns for mobile apps. On Google Play, user reviews provide a unique understanding of security/privacy issues of mobile apps from users perspective, and in fact they are valuable feedbacks from users by considering users expectations. To best assist the end users, in this paper, we automatically learn the security/privacy related behaviors inferred from analysis on user reviews, which we call review-to-behavior fidelity. We design the system AUTOREB that automatically assesses the review-to-behavior fidelity of mobile apps. AUTOREB employs the state-of-the-art machine learning techniques to infer the relations between users reviews and four categories of security-related behaviors. Moreover, it uses a crowdsourcing approach to automatically aggregate the security issues from review-level to app-level. To our knowledge, AUTOREB is the first work that explores the user review information and utilizes the review semantics to predict the risky behaviors at both review-level and app-level. We crawled a real-world dataset of 2,614,186 users, 12,783 apps and 13,129,783 reviews from Google play, and use it to comprehensively evaluate AUTOREB. The experiment result shows that our method can predict the mobile app behaviors at user-review level with accuracy as high as 94.05%, and also it can predict the security issues at app-level by aggregating the predictions at review-level. Our research offers an insight into understanding the mobile app security concerns from users perspective, and helps bridge the gap between the security issues and users perception.
dependable systems and networks | 2016
Jianping He; Bin Liu; Deguang Kong; Xuan Bao; Na Wang; Hongxia Jin; George Kesidis
Sharing photos through Online Social Networks is an increasingly popular fashion. However, it poses a seriousthreat to end users as private information in the photos maybe inappropriately shared with others without their consent. This paper proposes a design and implementation of a system using a dynamic privacy preserving partial image sharing technique (namely PUPPIES), which allows data owners to stipulate specific private regions (e.g., face, SSN number) in an image and correspondingly set different privacy policies for each user. As a generic technique and system, PUPPIES targets at threats about over-privileged and unauthorized sharing of photos at photo service provider (e.g., Flicker, Facebook, etc) side. To this end, PUPPIES leverages the image perturbation technique to encrypt the sensitive areas in the original images, and therefore it can naturally support popular image transformations (such as cropping, rotation) and is well compatible with most image processing libraries. The extensive experiments on 19,000 images demonstrate that PUPPIES is very effective for privacy protection and incurs only a small computational overhead. In addition, PUPPIES offers high flexibility for different privacy settings, and is very robust to different types of privacy attacks.
conference on information and knowledge management | 2014
Yilin Shen; Hongxia Jin
People have multiple accounts on Online Social Networks (OSNs) for various purposes. It is of great interest for third parties to collect more users information by linking their accounts on different OSNs. Unfortunately, most users have not been aware of potential risks of such accounts linkage. Therefore, the design of a control methodology that allows users to share their information without the risk of being linked becomes an urgent need, yet still remains open. In this paper, we first aim to raise the users awareness by presenting an effective User Accounts Linkage Inference (UALI), which is shown to be more powerful to users than existing methods. In order to help users control the risks of UALI, we next propose the first Information Control Mechanism (ICM), in which users information is still visible as intended and, in the meanwhile, the risk of their accounts linkage can be controlled. Using real-world datasets, the performance of ICM is validated, and we also show that it works well for various linkage inference approaches. Both UALI and ICM approaches, designed to take generic inputs, extend their ability to be widely applied into many practical social services.
ubiquitous computing | 2014
Xuan Bao; Neil Zhenqiang Gong; Bing Hu; Yilin Shen; Hongxia Jin
Human lives are composed by series of events and activities. Considerable research effort has been made to probe, sense, and understand them. In our research, we are interested in exploring the intrinsic string that connects all these events together, that is, user status and transitions. Such transitions can be reflected from multiple activity dimensions, ranging from our daily mobility trajectories, app usage sequences, to communication patterns and motion state switches. In this paper, we aim to identify whether a personalized model can be learned to capture various user states from different sensing dimensions and whether a unified view can be established to explain the state transitions that drive the changes in user context during day-to-day routines. To this end, we have explored two types of traces -- connected wifi sequences and cell location trajectories. We first model the states among these two individual dimensions. In the end, the identified states from both dimensions are linked together to recognize the spatial-temporal relationship between them. As we evaluate with the DeviceAnalyzer dataset, our method is able to recognize a range of states such as at home, working, commute and the trasitions between them, all in an unsupervised manner.
ubiquitous computing | 2015
Xuan Bao; Bin Liu; Bo Tang; Bing Hu; Deguang Kong; Hongxia Jin
Web map services today, such as Google and Bing maps, have digitalized a great portion of the physical world into easily accessible location databases. After the industry invested huge efforts in gathering related information, a user now can search a physical location on the map and know what kind of place it is, known as reverse geo-coding. However, this functionality is mostly limited to public outdoor locations and to building level granularity. We believe that many services can benefit from knowing the semantic meanings of fine-grained locations including indoor places. For example, the phone can mute and delay incoming calls when a user enters a meeting room. Cameras can be disabled in bathrooms to protect users privacy. In this paper, we present PinPlace, an on-device service that can automatically associate semantic meanings with outdoor and indoor locations using the activity, transit, and time related features.
conference on information and knowledge management | 2015
Bing Hu; Bin Liu; Neil Zhenqiang Gong; Deguang Kong; Hongxia Jin
Mobile applications (Apps) could expose children or adolescents to mature themes such as sexual content, violence and drug use, which results in an inappropriate security and privacy risk for them. Therefore, mobile platforms provide rating policies to label the maturity levels of Apps and the reasons why an App has a given maturity level, which enables parents to select maturity-appropriate Apps for their children. However, existing approaches to implement these maturity rating policies are either costly (because of expensive manually labeling) or inaccurate (because of no centralized controls). In this work, we aim to design and build a machine learning framework to automatically predict maturity levels for mobile Apps and the associated reasons with a high accuracy and a low cost. To this end, we take a multi-label classification approach to predict the mature contents in a given App and then label the maturity level according to a rating policy. Specifically, we extract novel features from App descriptions by leveraging deep learning technique to automatically capture the semantic similarity of pairwise words and adapt Support Vector Machine to capture label correlations with pearson correlation in a multi-label classification setting. Moreover, we evaluate our approach and various baseline methods using datasets that we collected from both App Store and Google Play. We demonstrate that, with only App descriptions, our approach already achieves 85% Precision for predicting mature contents and 79% Precision for predicting maturity levels, which substantially outperforms baseline methods.
european conference on machine learning | 2016
Yilin Shen; Rui Chen; Hongxia Jin
Service providers typically collect user data for profiling users in order to provide high-quality services, yet this brings up user privacy concerns. One hand, service providers oftentimes need to analyze multiple user data attributes that usually have different privacy concern levels. On the other hand, users often pose different trusts towards different service providers based on their reputation. However, it is unrealistic to repeatedly ask users to specify privacy levels for each data attribute towards each service provider. To solve this problem, we develop the first lightweight and provably framework that not only guarantees differential privacy on both service provider and different data attributes but also allows configurable utility functions based on service needs. Using various large-scale real-world datasets, our solution helps to significantly improve the utility up to 5 times with negligible computational overhead, especially towards numerous low reputed service providers in practice.
conference on information and knowledge management | 2015
Rui Chen; Yilin Shen; Hongxia Jin
With the rapid advances in hardware technology, data streams are being generated daily in large volumes, enabling a wide range of real-time analytical tasks. Yet data streams from many sources are inherently sensitive, and thus providing continuous privacy protection in data streams has been a growing demand. In this paper, we consider the problem of private analysis of infinite data streams under differential privacy. We propose a novel data stream sanitization framework that periodically releases histograms summarizing the event distributions over sliding windows to support diverse data analysis tasks. Our framework consists of two modules, a sampling-based change monitoring module and a continuous histogram publication module. The monitoring module features an adaptive Bernoulli sampling process to accurately track the evolution of a data stream. We for the first time conduct error analysis of sampling under differential privacy, which allows to select the best sampling rate. The publication module features three different publishing strategies, including a novel technique called retroactive grouping to enjoy reduced noise. We provide theoretical analysis of the utility, privacy and complexity of our framework. Extensive experiments over real datasets demonstrate that our solution substantially outperforms the state-of-the-art competitors.