[PDF] Privacy Adversarial Network: Representation Learning for Mobile Data Privacy

Abstract

The remarkable success of machine learning has fostered a growing number of cloud-based intelligent services for mobile users. Such a service requires a user to send data, e.g. image, voice and video, to the provider, which presents a serious challenge to user privacy. To address this, prior works either obfuscate the data, e.g. add noise and remove identity information, or send representations extracted from the data, e.g. anonymized features. They struggle to balance between the service utility and data privacy because obfuscated data reduces utility and extracted representation may still reveal sensitive information. This work departs from prior works in methodology: we leverage adversarial learning to a better balance between privacy and utility. We design a \textit{representation encoder} that generates the feature representations to optimize against the privacy disclosure risk of sensitive information (a measure of privacy) by the \textit{privacy adversaries}, and concurrently optimize with the task inference accuracy (a measure of utility) by the \textit{utility discriminator}. The result is the privacy adversarial network (\systemname), a novel deep model with the new training algorithm, that can automatically learn representations from the raw data. Intuitively, PAN adversarially forces the extracted representations to only convey the information required by the target task. Surprisingly, this constitutes an implicit regularization that actually improves task accuracy. As a result, PAN achieves better utility and better privacy at the same time! We report extensive experiments on six popular datasets and demonstrate the superiority of \systemname compared with alternative methods reported in prior work.

Full PDF

1144Privacy Adversarial Network: Representation Learning for MobileData Privacy

SICONG LIU,

Xidian University, China

JUNZHAO DU ∗ , Xidian University, China

ANSHUMALI SHRIVASTAVA,

Rice University, USA

LIN ZHONG,

Rice University, USA

PAN ),a novel deep model with the new training algorithm, that can automatically learn representations from the raw data. And thetrained encoder can be deployed on the user side to generate representations that satisfy the task-defined utility requirementsand the user-specified/agnostic privacy budgets.Intuitively,

PAN adversarially forces the extracted representations to only convey information required by the target task.Surprisingly, this constitutes an implicit regularization that actually improves task accuracy. As a result,

PAN achieves betterutility and better privacy at the same time! We report extensive experiments on six popular datasets, and demonstrate thesuperiority of

PAN compared with alternative methods reported in prior work.CCS Concepts: •

Human-centered computing → Ubiquitous and mobile computing systems and tools ; •

Securityand privacy → Usability in security and privacy . ACM Reference Format:

Sicong Liu, Junzhao Du, Anshumali Shrivastava, and Lin Zhong. 2019. Privacy Adversarial Network: Representation Learningfor Mobile Data Privacy.

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

3, 4, Article 144 (December 2019), 18 pages.https://doi.org/10.1145/3369816

Machine learning has benefited numerous mobile services, such as speech-based assistant ( e.g.

Siri), reading logenabled book recommendation ( e.g.

Youboox). Many such services submit user data, e.g. sound, image, and human ∗ Corresponding Author: Junzhao DuAuthors’ addresses: Sicong Liu, Xidian University, School of Computer Science and Technology, Xi’an, China; Junzhao Du, Xidian University,School of Computer Science and Technology, Xi’an, China; Anshumali Shrivastava, Rice University, Department of Computer Science,Houston, TX, USA; Lin Zhong, Rice University, Department of Electrical & Computer Engineering, Houston, TX, USA.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided thatcopies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the firstpage. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copyotherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions [email protected].© 2019 Association for Computing Machinery.2474-9567/2019/12-ART144 $15.00https://doi.org/10.1145/3369816Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 3, No. 4, Article 144. Publication date: December 2019. a r X i v : . [ c s . L G ] J un activity records, to the service provider , posing well-known privacy risks [1, 2, 9]. Our goal is to avoid disclosingraw data to service providers by creating a device-local intermediate component that encodes the raw data andonly sends the encoded data to the service provider. And the encoded data must be both useful and private . Forinference-based services, utility can be quantified by the inference accuracy, achieved by the service providerusing a discriminative model. And Privacy can be quantified by the disclosure risk of private information.Existing solutions addressing the privacy concern struggle to balance between above two seemingly conflictingobjectives: privacy vs. utility. An obvious and widely practiced solution is to transform the raw data into task-specific features and upload features only, like Google Now [17] and Google Cloud [16]; This not only reducesthe data utility but also is vulnerable to reverse models that reconstruct the raw data from extracted features [28].The authors of [33] additionally apply dimensionality reduction, Siamese fine-tuning, and noise injection to thefeatures before sending them to the service provider. This unfortunately result in further loss in utility.Unlike previous work, we employ deep models and adversarial training to automatically learn features for asweet tradeoff between privacy and utility . Our key idea is to judiciously combine the discriminative learning, forminimizing the task-specific discriminative error as well as maximizing the user-specified privacy discriminativeerror, and the generative learning, for maximizing the agnostic privacy reconstruction error. Specifically, wepresent the Privacy Adversarial Network (

PAN ), an end-to-end deep model, and its training algorithm.

PAN controls three types of descent gradients, i.e. , utility discriminative error, privacy discriminative error, and privacyreconstruction error, in back propagation to guide the training of a feature extractor.As shown in Fig. 2, a

PAN consists of four parts: a feature extractor (Encoder E (·) ), a utility discriminator(UD), an adversarial privacy reconstructor (PR), and an adversarial privacy discriminator (PD). The output ofthe Encoder (E) feeds to the input of the utility discriminator (UD), privacy reconstructor (PR), and privacydiscriminator (PD). We envision the Encoder (E) runs in mobile devices to extract features from raw data. Theutility discriminator (UD) represents the inference service to ensure the utility of extracted features. PAN emulatestwo types of adversarials to ensure the privacy: the privacy discriminator (PD) emulates a malicious party thatseeks to extract private information, e.g. user location; the privacy reconstructor (PR) emulates one that seeks toreconstruct raw data from the features. We present a novel algorithm to explicitly train

PAN via an adversarialprocess that alternates between i.e. , training the Encoder with the utility discriminator (UD) to improve the utilityand confronting the Encoder with the adversaries of privacy discriminator (PD) and privacy reconstructor (PR) toenhance the privacy. All four parts iteratively evolve with others during the training phase. Understood from theperspective of manifold, the separate flows of gradients from utility discriminator (UD), privacy discriminator(PD), and privacy reconstructor (PR) through the Encoder in back-propagation can iteratively produces thefeature manifold that is both useful and private.Using digit recognition (MNIST [25]), image classification (CIFAR-10[23] and ImageNet [6]), sound sensing(Ubisound [37]), human activity recognition (Har [40]), and driver behavior prediction (StateFarm [21]), we show

PAN is effective in training the Encoder to generate deep features that provide better privacy-utility tradeoff thanother privacy preserving methods. Surprisingly, we observe that the adversarially learned features to removeredundant information, for privacy, even surpass the recognition accuracy of discriminatively learned features.That is, removing task-irrelevant information for privacy actually improves generalization and as a result, utility.In the rest of the paper, we formulate the problem of utility-privacy tradefoff in § PAN’s designand its training algorithm in §

3. We report an evaluation of

PAN in §

4. We attempt a theoretic interpretation of

PAN in §

5, review the related work in §

6, and conclude in § This section mathematically formulates the problem of utility-privacy tradeoff for mobile data. Many appealingcloud-based services exist today that require data from mobile users. For example, as shown in Fig 1, a user takes

Features Results

Mobile Device Service Provider

Encoder Inference modelSensor data App Features

Fig. 1. Privacy preservation in cloud-based mobile services. Mobile users leverage a learned Encoder to locally generatedeep features from the raw data ( i.e. , "tea bag" picture) and give them to the App. The App may send the features to itscloud-based backend. a picture of a product and sends it to a cloud-based service to find out how to purchase it, a service Amazonactually provides. The picture, on the other hand, can accidentally contain sensitive information, e.g., face andother identifying objects in the background. Therefore, the user faces a touch challenge: how to obtain the servicewithout trusting the service provider with the sensitive information?Toward addressing this challenge, our key insight is that most services actually do not need the raw data. Theuser can encode raw data I into representation E ( I ) through an Encoder E (·) on the mobile device and only sends E ( I ) to the service provider. The representation E ( I ) ideally should have the following two properties: • Utility: it must contain enough task-relevant information to be useful for the intended service, e.g., highaccuracy for object recognition; • Privacy: it must have little task-irrelevant information, especially that is considered sensitive by the user.In this work, we focus on classification-based services. Therefore, the utility of E ( I ) is measured by the taskinference error C u ( e.g. cross entropy) in the service provider. And we quantify the privacy of E ( I ) by the privacyleak risk C p of raw data in all possible attacking models X . Since the Encoder is distributed to mobile users, weassume it is available to both service providers and potential attackers. That is, both the service provider and themalicious party can train their models using raw data I and their corresponding Encoder output E ( I ) . As such wecan restate the desirable properties for the Encoder output E ( I ) within dataset T as below: Utility : Min E C u ( E ( I i )) , i ∈ T Privacy : Min E Max X C p ( E ( I i )) , i ∈ T (1)The first objective ( Utility ) is well-understood for discriminative learning, and achievable via a standardoptimization process on the Encoder (E) and the corresponding specialist discriminative model, i.e. , minimizingthe cross entropy between the predicted task label and the ground truth in a supervised manner [24].The second objective (

Privacy ) has two parts. The inner part,

Max X C p ( E ( I )) , is opposite to the the outer part Min E C p ( E ( I )) . Therefore, the Encoder ( E ) employed by the mobile user and the specialist attacker ( X ) used by themalicious party is adversarial to each other in their optimization objectives. Given the information loss in E ( I ) forprivacy, utility loss appears to be certain in theory. One would only hope to find a good, ideally Pareto-optimal,tradeoff between privacy and utility in devising E (·) . However, as we will show later, E (·) discovered via PAN actually improves privacy and utility at the same time, a result that can be explained by the practical limits ofdeep learning in §5.

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 3, No. 4, Article 144. Publication date: December 2019.

Important to the quantification of privacy, one must enumerate the privacy leak risks C p by all possibleattackers X in theory. Moreover, the measurement of the privacy leak risk C p is an open problem in itself [30].Therefore, we approximate privacy with two specific attackers, each with its own measurement of privacy,elaborated below.(1) Specified privacy quantification in which the user specifies what inference tasks should be forbidden andprivacy can be quantified by the accuracy of these tasks. For example, users may want to prevent a maliciousparty from inferring their identity. In this case, the privacy can be measured by the inaccuracy of identifyinference. In this case, the privacy leak risk C p can be defined as the inference accuracy by a discriminationmodel employed by the attacker.(2) Intuitive privacy quantification in which the privacy leakage risk C p is agnostic of the inference tasksunder taken by the attacker. In this work, we quantify this agnostic privacy by the difference betweenthe raw data, I , and I ′ , data reconstructed by a malicious party from the Encoder output E ( I ) . We choosethis reconstruction error as the agnostic measure for two reasons. First, the raw data in theory containsall information and difference between I and I ′ is computationally straightforward and intuitive. Second,prior works have already shown that it is possible to reconstruct the raw data from feature representationsoptimized for accuracy [28, 35, 43]. To find a good, hopefully Pareto-optimal tradeoff between utility and privacy, we design

PAN to learn an Encoder E (·) via a careful combination of discriminative, generative, and adversarial training. As we will show in §4, toour surprise, the resulting Encoder actually improves utility and privacy at the same time. As shown in Fig 2,

PAN employs two additional neural network modules, utility discriminator (UD) and privacyattacker, to quantify utility and privacy, respectively, in training the Encoder E (·) . The utility discriminatorsimulates the intended classification service; when PAN is trained by the service provider, the utility discriminatorcan be the same discriminative model used by the service. The privacy attacker, i.e. , the intuitive privacyreconstructor (PR) and the specified privacy discriminator (PD), simulates a malicious attacker that attempts toobtain sensitive information from the encoded features E ( I ) . These modules are end-to-end trained to learn theEncoder E (·) for users to extract deep features E ( I ) from raw data I . The training is an iterative process that wewill elaborate in §3.2. Below we first introduce PAN’s neural network architecture, along with some empiricallygained design insights. • The

Encoder E (·) consists of an input layer, multiple convolutional layers, pooling layers, and batch-normalizationlayers. The convolution layer applies a convolution operation to output activation map with a set of trainablefilters. We note that the clever usage of pooling layers and batch-normalization layers contribute to the deepfeature’s utility and privacy. The batch-normalization layer normalizes the outputted activation map of aprevious layer by subtracting the batch mean and dividing by the batch standard deviation [20]. It helps thefeatures’ utility because it normalizes the activation to avoid being too high or too low thus has a regularizationeffect [20]. It contributes to features’ privacy as well since it makes it harder for an attacker to recover sensitiveinformation from normalized features. And then, the pooling layer adopts a maximum or average value froma sub-region of the previous layer to form more compact features, which reduces the computational errorand avoids over-fitting [12]. It helps privacy because none of un-pooling techniques can recover fine detailsfrom the resulting features through shifting small parts to precisely arranging them into a larger meaningfulstructure [31]. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 3, No. 4, Article 144. Publication date: December 2019. rivacy Adversarial Network: Representation Learning for Mobile Data Privacy • 144:5

Fig. 2. Architecture of the privacy adversarial network (PAN). The error C u , C p , and C p , which respectively refer to theutility quantification, the specified privacy quantification and the intuitive privacy quantification, build the weight updatingcriterion during back-propagation. • The

Utility Discriminator (UD) builds a multi-layer perceptron (MLP) to process deep features E ( I ) andoutput the task classification results y ′ with several full-connected layers [24]. We also note that a serviceprovider can explore any classification architectures for its utility discriminator model, given the Encoder orits binary version. We choose the MLP architecture because some of the most successful CNN architectures, e.g. VGG and AlexNet, can be viewed as the Encoder plus an MLP. The standard cross entropy between theutility discriminator’s prediction output y ′ and the task ground truth y measures the utility error C u . • The

Privacy Attacker employs the two privacy attacking models presented at the end of §2. Specifically, theprivacy discriminator (PD) evaluates the recognition accuracy of private class from encoded features E ( I ) . Andthe privacy reconstructor (PR) quantifies the intuitive reconstruction error between mimic data I ′ and rawdata I . – Specified Privacy Discriminator (PD) employs a similar MLP classifier as the utility discriminator(UD) to predict the user-specified privacy class z ′ , e.g. personal identity, from features E ( I ) . The difference isthat the multi-layer PD maps to the corresponding private classes. As noted before, the architecture andtraining algorithm of PAN can easily incorporate other architectures as the privacy discriminator (PD). Theerror between the predicted private class z ′ and the privacy label z measures the specified privacy leak risk C p . – Intuitive Privacy Reconstructor (PR) is a usual Encoder turned upside down, composed of multipleun-pooling layers and deconvolutional layers. The un-pooling operation is realized by feature resizing ornearest-value padding [28]. And then the Deconvolution layer densifies the sparse activation obtained byun-pooling through reverse convolution operations [42]. The PR simulates a malicious party and quantifiesthe intuitive privacy error C p . After obtaining a (binary) version of the Encoder, a malicious party is free toexplore any neural architectures to reconstruct the raw data. In this work, we examine multiple reconstructorarchitectures and select the one with the lowest reconstruction error as the specialist privacy reconstructor.And we also include an exactly layer-to-layer reversed architecture to mirror the Encoder, to emulatea powerful adversarial reconstructor that knows the internals of the Encoder throughout training. Thereconstruction error, e.g. Euclidean distance, between I and I ′ measures the disclosure risk C p of agnosticprivacy information. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 3, No. 4, Article 144. Publication date: December 2019.

Algorithm 1:

Mini-batch stochastic training of privacy adversarial network (PAN)

Input:

Dataset T Output:

PAN’s Weights { θ E , θ U D , θ PD , θ PR } Initialize θ E , θ U D , θ PD , θ PR ; for n epochs do Sample mini-batch I of m samples from T ; for k steps do Update θ e and θ U D by gradient ascent with learning rate l : minimize C u ; Update θ PD by gradient ascent with learning rate l : minimize C p ; Update θ PR by gradient ascent with learning rate l : minimize C p ; end Update θ E and θ U D by gradient ascent with learning rate l : minimize C sum ; end *Note: n and k are two hyper-parameters to synchronize the training of E, UD, PD, and PR parts. Our goal with

PAN is to train an Encoder that can produce output that is both useful, i.e. , leading to high inferenceaccuracy when used for classification tasks, and private, i.e. , leading to low privacy inference accuracy and highreconstructive error when maliciously processed and reversely engineered by the attacker, respectively. As wenoted in §2, the utility and privacy objectives can be competing when taken naively. The key idea of the

PAN’s training algorithm is to train the Encoder along with the utility discriminator and the two types of privacyattackers, which specialize in discrimination and reconstruction, respectively. Given a training dataset T of m pairs of I , the raw data, y , the true task label, and z , the privacy label, we train a PAN through an iterative processwith the following four stages:(1) Discriminative training mainly maximizes the accuracy to train a specialist utility discriminator (UD);mathematically, it minimizes the cross entropy τ between predicted class U D ( E ( I i )) and true label y i : Min C u = m (cid:213) i = τ ( y i , U D ( E ( I i ))) . (2)(2) Discriminative training minimizes the cross entropy τ between predicted private class PD ( E ( I i )) and privateground truth z i , to primarily train a specialist privacy discriminator (PD): Min C p = m (cid:213) i = τ ( z i , PD ( E ( I i ))) (3)(3) Generative training minimizes the reconstructive error to train a specialist privacy reconstructor (PR): Min C p = m (cid:213) i = | I i − PR ( E ( I i ))| (4)(4) Adversarial training minimizes the sum error to find a privacy-utility tradeoff . Specifically, it trains theEncoder to suppress utility error C u and increase privacy error ( C p , C p ): Min C sum = m (cid:213) i = λ τ ( y i , U D ( E ( I i ))) − λ | I i − PR ( E ( I i ))| − λ τ ( z i , PD ( E ( I i ))) (5) Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 3, No. 4, Article 144. Publication date: December 2019. rivacy Adversarial Network: Representation Learning for Mobile Data Privacy • 144:7

Table 1. Summary of the mobile applications and corresponding datasets for evaluating PAN.

No. Target task (utility label) Private attribute (privacy label) Dataset Description T Digit ( classes) None MNIST [25] , images T Image ( classes) None CIFAR-10 [23] , images T Image ( classes) None ImageNet [6] , images T Acoustic event ( classes) None UbiSound [37] , audio clips T Human activity ( classes) Human identity ( classes) Har [40] , records of accelerometer and gyroscope T Driver behavior ( classes) Driver idenity ( classes) StateFarm [21] , images C sum is a Lagrangian function of C u , C p and C p . λ , λ , and λ are Lagrange multipliers that can be usedto control the relative importance of privacy and utility. When we set λ = λ = PAN only trainsthe Encoder to resist against the specified privacy discriminator or the intuitive privacy reconstructor,respectively.Algorithm 1 summarizes the training algorithm of

PAN . We leverage mini-batch techniques to split the trainingdata into small batches, over which we calculate the average of the gradient to reduce the variance of gradients,which balance the training robustness and efficiency (line 3) [26]. Within each epoch, we first perform thestandard discriminative and generative stages (line 5, 6, 7) to initialize the Encoder’s weights θ E and train thespecialist utility discriminator (UD), privacy discriminator (PD) and privacy reconstructor (PR). And then, weperform the adversarial stage (line 9) to shift the utility-privacy tradeoff on the Encoder weight θ E tuning. Wenote that k in line 4 is a hyper-parameter of the first three stages. These k steps followed by a single iterationof the forth stage is trying to synchronize the convergence speed of these four training stages well, borrowingexisting techniques in generative adversarial network [13]. Our implementation uses an empirically optimizedvalue of k =

3. And we leverage the AdamOptimizer [22] with an adaptive learning rate for all four stages (line 5,6, 7 and 9).

In this section, we evaluate

PAN’s performance using six classification services for mobile apps, with a focus onthe utility-privacy tradeoff. We compare

PAN against alternative methods reported in the literature and visualizethe results for insight into why

PAN excels.

Evaluation applications & datasets. We evaluate

PAN , especially the resulting Encoder, with six commonlyused mobile applications/services, for which the corresponding benchmark datasets are summarized in Table 1.Specifically, the target task in T (MNIST [25]) is handwritten digit recognition. The agnostic private informationin the real-world raw image may include individual handwritten style and the background paper. We use 50 , PAN training and 20 ,

000 images for validation and testing. The target tasks in T (CIFAR-10 [23]) and T (ImageNet [6]) are image classification. The agnostic private information in the real-world raw image mayinvolve background location, color, and brand. We choose 40 ,

000 images for training and remaining images fortesting in both cases. The target task in T (UbiSound [37]) is to recognize acoustic event. The agnostic privateinformation covers background voice and environment information. We use 6 ,

000 audio clips for training and1 ,

500 audio clips for testing. The target task in T (Har [40]) is human activity identification based on the recordsof accelerometer and gyroscope. The specified private attribute we intend to hide is useridentity. And the agnosticprivate information we expect to protect may contain individual habit. We randomly select 8 ,

000 records fortraining and 2 ,

000 records for testing. The target task in T (StateFarm [21]) is to predict driver behavior. Thespecified private attribute we choose is driver identity. And the agnostic private information within the real-worldraw image can be face and gender. We use 18 ,

000 images for training and 4 ,

424 images for testing.

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 3, No. 4, Article 144. Publication date: December 2019.

Evaluation models. In PAN , we leverage a utility discriminator (UD), a privacy discriminator (PD), and aprivacy reconstructor (PR) model to train and validate the Encoder (E). In the training phase, we refer to thesuccessful neural network architectures to build

PAN’s

Encoder (E), Utility Discriminator (UD) and PrivacyDiscriminator (PD) for different types of datasets. For example, according to the sample shape in the datasets, theLeNet is chosen as the reference for T , T and T , AlexNet is for T and T , and VGG-16 model is for T . To evaluatethe learned Encoder in the testing phase, we leverage another set of separately trained Utility Discriminator (UD)and Privacy Attackers (PD and PR), given PAN’s

Encoder output, to simulate the service provider and maliciousparties. In particular, we ensemble multiple optional MLP architectures to simulate the service provider’s UtilityDiscriminator (UD) for task recognition, as well as the malicious attacker’s Privacy Discriminator (PD) for privateattribute prediction. These MLP models have different fully-connected architectures by using varying scales ofsingular value decomposition, sparse-coding factorization, and global-average computation to replace the initialfully-connected layers. We also employ multiple generative architectures to select the most powerful one as theprivacy reconstruction attacker (PR). To emulate a powerful adversary that knows the Encoder for the attackers’training, we include a PR model that exactly mirrors the Encoder for each task.

Prototype implementation.

PAN has two phases: an offline phase to train the Encoder, and an online phasewhere we deploy the learned Encoder as a middleware on mobile platforms to encode the raw sensor datainto features. In the offline phase, we use the Python library of TensorFlow [38] to train the Encoder, utilitydiscriminator, privacy discriminator as well as privacy reconstructor using the datasets summarized in Table 1.And we leverage h5py [5] to separately save the trained models. To speedup the training, we leverage a serverwith four Geforce GTX 1080 Ti GPUs with CUDA 9.0. In the online phase, we prototype the mobile-side on theAndroid platform, i.e. , Xiaomi Mi6 smartphone, using TensorFlow Mobile Framework [15]. And we store thelearned Encoder in the smartphone’s L2-cache using Android’s LuCache API [14], which speeds up the on-devicedata encoding. The Encoder intercepts the incoming testing data and encodes it into features, which are then fedinto the corresponding Android Apps for real-word performance evaluation.

We employ four types of state-of-the-art data privacy preserving baselines to evaluate

PAN . The DNN methodprovides a high utility standard, and the DP, FL, and Hybrid DNN methods set a strict utility-privacy tradeoffbenchmark for

PAN . The detail settings of the baseline approaches and

PAN are as below. • Noisy (DP) method perturbs the raw data I by adding Laplace noise with diverse factors { . , . , ... . } ,and then submit the noisy data I to the service provider. This is a typical local differential privacy (DP)method [7, 18]. The utility u of noisy data is tested by the task ( e.g. the driver behavior in T ) recognitionaccuracy in a MLP classifier UC with multiple fully-connected layers. The specified privacy p is measured bythe inference accuracy over private attribute ( e.g. driver identity in T ) in another MLP classifier PC . And theintuitive privacy p is evaluated by the average information loss, i.e. , p = avд (| I i − I i | ) , I i ∈ T test . Here T test is the corresponding testing set within datasets T ∼ T . • Noisy (FL) method perturbs the data I by adding Gaussian noise N ( , σ ) with mean 0 and variance σ , wherewe set σ =

40 according to [34]. The Gaussian noise included in the noisy data I can provide rigorous guaranteesof differential privacy using less local noise. This is widely used in the noisy aggregation scheme of federatedlearning (FL) [34, 39]. We test the utility u , the specified privacy p , and the intuitive privacy p of this noisydata I using the similar methodology as DP baseline. • DNN method encodes the raw data I into features F using a deep encoder with multiple convolutional andpooling layers, and expose features F to the service provider [16, 17]. The utility u of DNN features is measuredby the inference accuracy in a classifier UC with multiple fully-connected layers. The specified privacy p is tested by the inference accuracy over the private attribute in another privacy classifier PC with multiple Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 3, No. 4, Article 144. Publication date: December 2019. rivacy Adversarial Network: Representation Learning for Mobile Data Privacy • 144:9 + + + + + Noisy(DP)Noisy(FL)DNNHybridPAN + (a) MNIST ( T ) ++ + +++ Noisy(DP)Noisy(FL)DNNHybridPAN + (b) CIFAR-10 ( T ) ++ + + + Noisy(DP)Noisy(FL)DNNHybridPAN + (c) ImageNet ( T ) + + ++ Noisy(DP)Noisy(FL)DNNHybridPAN + (d) UbiSound ( T ) Noisy(DP)Noisy(FL)DNNHybridPAN + +++++++ (e) Har ( T ) Noisy(DP)Noisy(FL)DNNHybridPAN + +++++++ (f) StateFarm ( T )Fig. 3. Performance comparison of PAN with four baselines across six different applications ( T , T , T , T , T , T ). TheX-axis shows the utility u ( i.e. , task inference accuracy) tested by the simulated service provider. And Y-axis is the intuitivereconstruction privacy p tested by the simulated malicious attacker, normalized by loд operation. The dashed line sets aprivacy loд ( p ) benchmark validated by PAN’s privacy reconstructor. fully-connected layers. And the intuitive privacy p is tested by the reconstruction error in a decoder D withmultiple deconvolutional and unpooling layers, i.e. , p = | I i − D ( F i )| , I i ∈ T test . • Hybrid

DNN method further perturbs the above DNN features through additional lossy processes, i.e. , principalcomponents analysis (PCA) and adding Laplace noise [33] with varying noise factors { . , . , ... . } , beforedelivering them to the service provider. The utility u , the specified privacy p , and the intuitive privacy p ofthe perturbed features F ′ is respectively tested by a task classifier, a private attribute classifier, and a decoder,using the same methodology of the DNN baseline. • PAN automatically transform raw data I into features, i.e. , E ( I ) , using the learned Encoder E (·) . In particular, weevaluate the following two types of PAN , that are trained to defend against different types of privacy attackersfor different benchmark tasks/datasets (Table 1): – PAN is trained with one privacy attacker, i.e. , the Privacy Reconstructor (PR), by setting λ = on six datasets ( i.e. , T ∼ T ). – PAN is trained with two privacy attackers, i.e. , Privacy Discriminator (PD) and Privacy Reconstructor(PR), for application datasets accompanied with both utility labels and private attribute labels ( i.e. , T and T ).The utility u of both PAN ’s and PAN ’s Encoder output are tested by the task inference accuracy in the serviceprovider’s utility discriminator (UD) using a classifier. The specified privacy p of PAN ’s Encoder output isevaluated by the inference accuracy in the attacker’s privacy discriminator (PD) using a classifier. As for theintuitive reconstruction privacy p in PAN and PAN , we select the most powerful decoder as the privacyreconstructor (PR) to evaluate it, i.e. , p = | I i − PR ( E ( I i ))| , I i ∈ T test . Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 3, No. 4, Article 144. Publication date: December 2019.

Noisy(DP)Noisy(FL)DNNHybridPAN + +++++++ (a) p , Har ( T ) Noisy(DP)Noisy(FL)DNNHybridPAN + ++++++++ (b) p , Har ( T ) Noisy(DP)Noisy(FL)DNNHybridPAN + ++ +++++ (c) p , StateFarm ( T ) Noisy(DP)Noisy(FL)DNNHybridPAN + +++++++ (d) p , StateFarm ( T )Fig. 4. Comparison of PAN with four baselines on Har ( T ) and StateFarm ( T ). X-axis is the utility u ( i.e. , task inferenceaccuracy). Y-axis in (a) and (c) represents the specified privacy p ( i.e. , private attribute inference accuracy) normalized by − p ( % ) . And Y-axis in (b) and (d) is the intuitive privacy p normalized by loд operation. The dashed line set the privacybenchmark validated by PAN’s privacy discriminator in (a) and (c) and privacy reconstructor in (b) and (d). This subsection evaluates

PAN in terms of the utility u by the service provider and the privacy p and p by themalicious attackers, compared with four privacy-preserving baselines (see § PAN . In this set of experiments, wetrain the PAN on the six application datasets ( T ∼ T ) based on utility labels, and train the PAN on Har ( T )and StateFarm ( T ) datasets accompanied with both utility labels and private attribute labels (see Table 1).First, PAN’s

Encoder output achieves the best privacy-utility tradeoff, compared to those encoded by otherfour baselines. In Figure 3, we see the performance of PAN ’s Encoder output lies in the upper right corner withmaximized utility u and maximized privacy loд ( p ) on the digit recognition applications ( T ), and lie around theupper right side with maximized utility u and competitive privacy loд ( p ) compared with other four baselines onimage classification applications ( T and T ) and audio sensing applications ( T ). In Figure 4, the PAN is also in theupper right corner with maximized utility u and maximized privacy ( − p ) % or loд ( p ) on both human activityrecognition application ( T ) and driver behavior prediction application ( T ). Here we transform the expectedminimized privacy p % to the maximized privacy ( − p ) %. When we consider the λ u + λ ( − p ) − λ loд ( p ) as a quantifiable metric of utility-privacy tradeoff, both the PAN ’s and PAN ’s Encoder output achieve the besttradeoff value according to the default relative importance λ = . , λ = . λ = .

3. While the DNNmethod provides unacceptable low privacy, and Hybrid DNN, Noisy (DP) and Noisy (FL) methods offer highprivacy at the cost of utility degradation. Second, the utility ( i.e. , task inference accuracy) of

PAN’s

Encoder outputis at least as good as and sometimes even better than the other four baseline methods across different applications.Specifically, the task inference accuracy by PAN is 83 . ∼ .

8% on MNIST ( T ), 70 . ∼ .

3% on CIFAR-10 ( T ),81 . ∼ .

2% on ImageNet ( T ), 83 . ∼ .

1% on UbiSound ( T ), 78 . ∼ .

5% on Har ( T ), and 77 . ∼ .

6% onStateFarm ( T ), maintaining at a high level. And the task inference accuracy by PAN is 85 . ∼ .

3% on Har( T ) and 87 . ∼ .

8% on StateFarm ( T ). It is even better than the utility of standard DNN features. Although,with carefully-calibrated Gaussian noise distribution in Noisy (FL), we observe better utility when using Noisy(FL) method than that using the Noisy (DP) method. The task inference accuracy in Noisy (DP), Noisy (FL) andHybrid DNN baselines is seriously unstable, ranging from 12 .

8% to 96 .

7% on different applications, because of theinjected noises. Also, we see the utility of PAN on Har and StateFarm is slightly improved than the PAN case. Itimplies the PAN with two adversaries learns better features than PAN with only one generative model-basedadversary. Third, PAN’s

Encoder output in both PAN and PAN cases considerably improves the privacy thanthe DNN method and achieves the competitive privacy compared with other three baselines. Moreover, the PAN’s

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 3, No. 4, Article 144. Publication date: December 2019. rivacy Adversarial Network: Representation Learning for Mobile Data Privacy • 144:11 (a) PAN on MNIST ( T ) (b) PAN on UbiSound ( T ) (c) PAN on Har ( T ) (d) PAN on StateFarm ( T )Fig. 5. We can tune the utility-privacy tradeoff in both PAN and PAN by selecting different Lagrange multipliers λ in theadversarial training phase (Eq.(5)). privacy p and p quantified by PAN’s privacy discriminator (PD) and privacy reconstructor (PR) (the dashedlines in Figure 3,4) is comparable with that measured by the third-party attackers (solid black triangles in Figure3,4). We train the third-party attackers using the binary version of

PAN’s

Encoder. This result demonstrates thestrong adversary ability of

PAN’s privacy discriminator and privacy reconstructor.

Summary.

First, although the PAN and PAN cannot always outperform the baseline methods in both utilityand privacy, it achieves the best Pareto front for the utility-privacy tradeoffs across various adversaries andapplications. Second, the utility, i.e. , inference accuracy, of PAN’s

Encoder output is even better than taht of thestandard DNN. We will revisit this surprising result in § § An important step in

PAN’s training is to determine the Lagrangian multipliers λ , λ , andλ in the adversarialtraining stage (see Eq.(5)). We verify that we are able to tune PAN’s utility-privacy tradeoff point through settingdifferent λ , λ , and λ in adversarila training phase (see Eq.(1)), shown in Figure 5. Let λ = λ = − λ , weshow evaluate the influence of Lagrangian multiplier on PAN tradeoff performance over two typical applications( e.g. T and T ), with five discrete choices of λ ∈ { . , . , . , . , . } . As for PAN , empirically assuming λ = . λ = − . − λ , we compare six discrete choices of λ ∈ { . , . , . , . , . , . } . And we see thatthe optimal choice of Lagrange multiplier, among above optional space, is λ = . on digit classification( T : MNIST), is λ = . on non-speech sound recognition ( T : Ubisound), is λ = . onhuman activity detection ( T : Har), and λ = . on driver behavior classification ( T : StateFarm). Summary.

The Lagrange multipliers λ , λ , and λ bring flexibility to PAN to satisfy different requirements ofutility-privacy tradeoffs according to the relative importance between utility and privacy budgets across varioustasks/applications. And we note it is exhaustive to search the optimal λ , λ and λ , since we can always searchit from a finer-grained discrete space ( e.g. { . , . , ..., . } ). An alternative in the future work is toleverage the automated search technique, e.g. deep deterministic policy gradient algorithm, for efficient searching. We next evaluate

PAN on a commercial off-the-shelf smartphone with six Android applications.

PAN’s

Encoder on Smartphone.

This subsection evaluates the run-time execution cost( e.g. latency, storage, and energy consumption) of the learned

PAN’s

Encoder for encoding different formats ofdata on the Xiaomi Mi6 smartphone. Specifically, we deploy the learned Encoder on the smartphone to interruptand encode the incoming testing sample into features ( i.e. , Encoder output). And then the Encoder output is

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 3, No. 4, Article 144. Publication date: December 2019.

Table 2. Resource cost of PAN’s Encoder for data encoding across five Android applications on Xiaomi Mi6 smartphone.

Applications Encoder’s Cost on Xiaomi Mi6 SmartphoneLatency ( ms ) Storage ( KB ) Energy ( mJ )Digit recognition ( T : MNIST)

26 135 0.8

Image classification ( T : CIFAR-10)

31 198 1.6

Image classification ( T : ImageNet)

102 310 3.2

Acoustic event recognition ( T : Ubisound)

42 213 1.8

Human activity prediction ( T : Har)

27 269 0.9

Table 3. Performance on the driver behavior recognition Android App. Utility u is the driver behavior recognition accuracy,specified privacy p is the driver identity classification accuracy, and intuitive privacy p is the data reconstruction error. Input to App

Utility u (%) Specified privacy p (%) Intuitive privacy loд ( p ) Case A: Raw image . . Case B: DNN features . . . Case C:

PAN’s

Encoder output . . . fed into the corresponding Android Apps for task recognition and privacy validation. In this experiment, thetask classifier is embedded in the corresponding Android App, and the privacy validation models ( i.e. , privateattribute classifier and privacy reconstructor) are executed on the cloud to attack the Encoder output collected byAndroid APP. We summarize the on-device execution cost of PAN’s

Encoder to encode five formats of data inTable 2. In particular, we load the

PAN’s

Encoder (parameters and architecture files) in the smartphone cache tospeedup processing, since it only occupies ≤ KB storage. And the multiply-accumulate (MAC) operations ofthe Encoder network are run using smartphone CPU [37]. PAN’s

Encoder occupies only 135 ∼ KB of memory,takes 26 ∼ ms of encoding latency, and incurs 0 . ∼ . mJ of energy cost for each encoding pass of raw data. Summary.

PAN’s

Encoder does not incur notable high resource cost. Therefore it is compact to deployon the resource-constrained mobile platforms as a data preprocessing middleware. In particular, it takes lowmemory usage since the Encoder only contains convolutional layers, without storage-exhaustive fully-connectedlayers. The execution delay is only several milliseconds [27]. And the energy cost is less than ≤ . mJ , which isinsignificant compared with Xiaomi Mi6’s battery capacity, i.e. , 3350 mAh . The user inputs data to an Android App to recognizedriver behavior. Meanwhile, he wants to hide the private attributes ( e.g. the driver identity) and other agnosticprivate information ( e.g. the driver race and car model). Therefore, he leverages

PAN to encode the raw imageinto features and only deliver the Encoder output to the Android App. We artificially play an example trace ofthe driver behavior during the study with 80 driver images from 10 drivers, selected from 4 ,

424 testing samplesof StateFarm ( T ). We consider 3 cases of the input data to the driver behavior recognition Android App: Case A:raw data, Case B: features generated by a standard DNN; and Case C: PAN’s

Encoder output. Table 3 shows theevaluation results on the driver behavior recognition App for three cases. In Case A with raw image input, thedriver classification accuracy (utility) by App is 98 . i.e. , driver identity, is 93 . . p by the malicious attacker is 65 . PAN’s

Encoder output, it incurs an improvement (0 . p by 70 . p in Case Cis 0 . . Summary.

This outcome demonstrates

PAN’s

Encoder improves utility with quantified privacy guarantees.

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 3, No. 4, Article 144. Publication date: December 2019. rivacy Adversarial Network: Representation Learning for Mobile Data Privacy • 144:13 (a) DNN (b) Hybrid DNN (c) PANFig. 6. Visualization of features learned by DNN, Hybrid method, and PAN’s Encoder on StateFarm ( T ) datasets. Differentcolor in each figure standards for one task class. Sailboat without “water”Sailboat with “water”Bus without “road”Bus with “road” ++ Sailboat SailboatBus BusPAN DNN

Fig. 7. Detail visualization of feature learning by PAN and DNN on two categories of images from ImageNet ( T ). The targettask is to classify the "sailboat" and "bus" image samples. The background "water" in "sailboat" images and the background"road" in "bus" images are redundant (private) to recognize the target "sailboat" and "bus". Raw PANDNNNoisy(DP)“Bus” Image 1“Bus” Image 2 Hybrid DNNNoisy(FL)

Fig. 8. Visualization of reconstruction privacy. From left to right: raw "bus" images from ImageNet (Raw), images withLaplace noise (DP), images with carefully-calibrated Gaussian noise (FL), and images reconstructed from DNN’s features,Hybrid DNN’s features, and PAN’s Encoder output.

In this subsection, we further visualize the

PAN’s

Encoder output in terms of feature distribution and thereconstruction privacy to seek insight into answering the following questions: what is the impact of

PAN on thelearned features, how does

PAN disentangle the feature components relevant to privacy from those relevant toutility, and how well does

PAN preserve the reconstruction privacy of raw data?

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 3, No. 4, Article 144. Publication date: December 2019.

Fig. 6 and Fig. 7 visualize how the feature manifoldis derived by DNN, DNN(resized), and

PAN . First, in Fig. 6,

PAN’s

Encoder output is highly separable as DNNmethod do on the feature space, which indicates its utility for task recognition. While the manifold driven bythe Hybrid baseline with PCA and noise addition processes on DNN features is blurry, this is why the resizedfeatures by Hybrid DNN method hurt utility. Moreover, the feature distribution (manifold) formed by

PAN is themost constrictive one compared to that from the DNN and Hybrid baselines, which leads to the improved utility.Second,

PAN to push the features away from redundant private information, for privacy, makes the manifoldmore constrictive, so that enhances the utility. Specifically, to zoom in on two categories of images from ImageNet( T ) for more details about how PAN and DNN form the feature manifold to achieve utility-privacy tradeoff, asshown in Fig. 7. The target task is to classify the two categories, "sailboat" and "bus". The private backgroundinformation in the "sailboat" raw image is "water", and the private information in the "bus" image is "road". Wesee PAN pushes features towards the constrictive space dominated by the samples without redundant (private)information, i.e. , "sailboat without water" and "bus without road", which guarantees privacy, avoids over-fitting,and improves utility as well. While the DNN method may capture the background (private) information "water"and "road" and retain them in the feature manifold to help the target task classification of "sailboat" and "bus",therefore hurts privacy. We defer the theoretical interpretation of this result to § PAN’s

Encoder output on Reconstruction Privacy.

Fig. 8 visualizes the reconstruction privacyof

PAN’s

Encoder output, in comparison to the baseline approaches, using two "bus" image samples from ImageNet.We adopt the same architectures of encoder ( i.e. , 12 conv layers, 5 pooling layers, and 1 batch-normalizationlayer) and privacy reconstructor ( i.e. , the encoder turned upside down) to decode the features generated by DNN,Hybrid DNN, and

PAN for fair comparison. We see the images reconstructed from the DNN features conveythe target object "bus" information and the private background "road" information, indicating a high risk ofprivate background leakage. Adding noise to the images in DP and FL or adding noise to the features in HybridDNN baselines obfuscate both utility-related "bus" information and the privacy-correlated background "road"information, compromising task detection accuracy (utility) at the cost of privacy. The

PAN , instead, only muddlesthe utility-irrelevant private information "road", making background information reconstruction impossiblewithout compromising the utility.

Our evaluation reported above shows that

PAN is able to train an Encoder that improves utility and accuracy atthe same time. This section attempts to provide a theoretical interpretation of this surprising result.We resort to the manifold perspective of the deep model. It is common in literature to assume that the high-dimensional raw data lies on a lower dimensional manifold [4]. A DNN can also be viewed as a parametric manifoldlearner utilizing the nonlinear mapping of multi-layer architectures and connection weights. We decomposethe input data into two orthogonal lower dimensional manifolds: I = I OD + I OD ⊥ . Here, the component I OD isthe manifold component that is both necessary and sufficient for task recognition ( e.g. driver behavior). Thus,ideally, we want our training algorithm to rely on this information for task recognition solely. Formally, forthe utility discriminator (UD), this implies that prob ( y | I ) = prob ( y | I OD ) . And the other manifold component I OD ⊥ , orthogonal to I OD , may or may not contain information for the objective class, but it is dispensable fortask detection. In practice, the real data does have redundant correlations. Thus I OD ⊥ may be learned for taskrecognition, but unnecessary. However, revealing I OD ⊥ is likely to contain some sensitive information ( e.g. driveridentity information and background information) thus hurt the privacy. If we assume that there does exist asweet-spot tradeoff between utility and privacy, that we hope to find, then it must be the case that I OD is notsensitive. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 3, No. 4, Article 144. Publication date: December 2019. rivacy Adversarial Network: Representation Learning for Mobile Data Privacy • 144:15

Fig. 9. A new manifold pushed by PAN to form the feature extractor, i.e. , Encoder, for better utility and privacy. The utility-specified discriminative learning objective (Eq.(2)) push it to contain I OD and I OD ⊥ , and the privacy-imposed adversarialtraining objective (Eq.(5)) pushes it away from the sensitive component I OD ⊥ . The features F learned by standard discriminative learning to minimize the classification error based oninformation from I , will mostly likely overlap (non-zero projection) with both I OD and I OD ⊥ . And the overlapwith I OD ⊥ compromises the privacy (as evident from our experiments). Meanwhile, the projection of manifold I OD on I OD ⊥ is significant as it might capture other extra sensitive features, which will help task recognitionaccuracy. Apart from privacy, the redundant correlation in I OD ⊥ is also likely only be spurious in training data.Thus, merely minimizing classification loss can lead to over-fitting.This is where we can skill two birds with one stone via an adversarial process. In PAN , the Encoder E (·) istrained by the utility-specified discriminative learning objective (Eq.(2)) and privacy-imposed adversarial learningobjective (Eq.(5)), to remove extra sensitive information in features F ′ as shown in Fig. 9. The transformedmanifold formulated by Encoder E (·) is forced by discriminative learning objective (Eq.(2)), just like the traditionalapproach, to contain information from both I OD as well as I OD ⊥ . However, the adversarial training objective(Eq.(5)) will push features F ′ away (or orthogonal) from I OD ⊥ . In this way, we get privacy as well, since F ′ as afunction of I which has two manifolds, being orthogonal to I OD ⊥ forces it to only depend on I OD .Meanwhile, from a generalization perspective, in the training data, the spurious information from I OD ⊥ thatmight over-fit the training data is iteratively removed by the adversarial training objective (Eq.(5)), leading toenhanced generalization. For example, as shown in Fig. 7, if we want to discriminate between "bus" and "sailboat",the background information "road" in the image can help in most cases but can also mislead when the test imagecontains a "sailboat" being transported on the "road". Therefore, by considering the background information, astandard DNN may not generalize well. In contrast, because the background may contain sensitive informationand contribute to reconstruction error, PAN is likely to train the Encoder to remove information about thebackground and as a result, improve the task accuracy .The above interpretation highlights the possibility that utility and privacy are not completely competingobjectives in practice. We believe that a rigorous formalism and thorough investigation of this phenomena isnecessary to shed more insight and derive better designs.

Our work is inspired by and closely related to the following works.

Data Privacy Protection in Machine Learning based Services : Randomized noise addition [18] and Dif-ferential privacy [1, 8] techniques are widely used by service providers to remove personal identities in thereleased datasets. They provide strong privacy guarantees but often lead to a significant reduction in utility (asshown in § et al. design a privacy mapping scheme for continuously releasedtime-series of user data to protect the correlated private information in the dataset [11]. Federated learning [34, 39]techniques prevent inference over the sensitive data exchanged between distributed parties during training by Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 3, No. 4, Article 144. Publication date: December 2019. noisy aggregation of multiple parties’ resulting models. However, all of the above techniques are tailored todatasets or the statistics of datasets, which are unsuitable to our problem settings, i.e. , run-time data privacyprotection in the online inference phase. Meanwhile, applying the statistic information to a new context-awarecase is still an open problem.

PAN is a very different approach toward preserving the privacy at run-time. As the raw data is generated, it isintercepted by a trained Encoder and the encoded features are then fed into the untrusted service.

Data Utility-Privacy Tradeoff using Adversarial Networks : Adversarial networks have been exploredfor data privacy protection, in which two or more players defend against others with conflicting utility/privacygoals. Seong et al. introduce an adversarial game to learn the image obfuscation strategy, in which the userand recogniser (attacker) strive for antagonistic goals: dis-/enabling recognition [32]. Wu et al. [41] propose anadversarial framework to learn the degradation transformation ( e.g. anonymized video) of video inputs. Thisframework optimizes the tradeoff between task performance and privacy budgets. However, both practicesonly consider protecting privacy against attackers that perform "vision" recognition on specific data format ( e.g. image), which is insufficient across diverse data modalities in ubiquitous computing. On the contrary,

PAN allowusers to specify the utility and privacy quantification towards different data formats according to applicationrequirements. And we have evaluated the

PAN’s usability across image, audio, and motion data formats in § i.e. , AutoEncoder) to jointly minimize privacy and utility loss, where theprivacy and utility requirements are modeled as adversarial networks.Although the above works share the idea of adversarial learning with ours, they use the generative model toobfuscate the raw data in a homomorphic way. In contrast, PAN uses the Encoder to learn a downsampling trans-formation, e.g. , features, from the raw data, and send the features, rather than any forms of obfuscated/syntheticdata, to service providers. A byproduct of this encoding is that PAN has more efficient data communication fromthe mobile to the service provider.

Deep Feature Learning for Utility or Privacy : Edwards et al. propose the adversarial learned representationsthat are both fair (independent of sensitive attributes) and discriminative for the prediction task [10]. However,they target at fair decision by quantifying the dependence between representation and sensitive variables thusprovide no privacy guarantees. Our work is closely related to [3] that employs a variational GAN to learn therepresentations that hide the personal identity and preserve the facial expression. However, it employs a generativemodel to minimize the reconstruction error for realistic image synthesis, which is vulnerable to agnostic privacyhacking by reverse engineering. In contrast, we maximize the reconstruction error for intuitive privacy-preserving.Also, discriminative and generative models are widely studied for latent feature learning, improving task inferencebut facilitating data reconstruction [35, 43]. They would make intuitive privacy protection even harder. Osia etal. [33] employ a combination of dimensionality reduction, noise addition, and Siamese fine-tuning to preserveprivacy. Importantly both its dimensionality reduction and Siamese fine-tuning are based on discriminativetraining. Specifically its Siamese fine-tuning seeks to reduce the intra-class variation in features amongst trainingsamples for the intended classification service. While the authors show these methods improve privacy, there isno systematic way to make tradeoffs between privacy and utility. In contrast,

PAN presents a rigorous mechanismto discover good tradeoffs via a combination of discriminative, generative, and adversarial training. Mohammad et al. [29] present a privacy-preserving transformation called Replacement AutoEncoder (RAE). Like the Encoderin

PAN , RAE also intends to eliminate sensitive information from the features while keeping the task-relevantinformation. Importantly it assumes that features/data relevant to an intended task do not overlap with thoserevealing sensitive information. As a result, it transforms the data by simply replacing the latter with features/datathat are irrelevant to the intended task and do not reveal sensitive information. With that assumption, RAEeschews the hard problem of making a good tradeoff between utility and privacy and is also solely based ondiscriminative training. Furthermore, RAE does not reduce the amount of data that have to be sent to the serviceprovider and would use significantly higher resources in transforming the data in which RAE will need to detect

PAN only needs torun the dimensionality-reducing Encoder on the raw data and send the features to the service provider, although

PAN may require significantly more computational resources in training the Encoder, which is done off-line, inthe cloud.

This paper addresses the privacy concern when mobile users send their data to an untrusted service providerfor classification services. We present

PAN , an adversarial framework to automatically generate deep featuresfrom the raw data with quantified guarantees in privacy and utility. We report a prototype of

PAN on Androidplatforms and cloud servers. Evaluation using Android applications and benchmark datasets show that

PAN’s

Encoder output attains a notably better privacy-utility tradeoff than known methods. To our surprise, it achieveseven better utility than standard DNNs that are completely ignorant of privacy. We surmise that this surprisingresult can be understood from the perspective of manifold.We also see three directions that the work reported in this paper can be extended. First, the

PAN framework canaccommodate other choices of context-aware utility, such as the sequence prediction, and privacy quantification,such as the information theory-based privacy, according to app requirements. Second, it can also integrate multipleutility discriminators and privacy attackers to train the Encoder, given the appropriate datasets accompaniedby utility and privacy labels. Third, our experience shows that the training of multiple adversarial modelsin

PAN must be carefully synchronized to avoid model degradation caused by difference in their objectivesand convergence speeds. Therefore, more heuristics and insights for guaranteeing and accelerating trainingconvergence are much needed.

ACKNOWLEDGMENTS

This work is supported in part by National Key R&D Program of China

Y F B

J M

PAN was conceived during Sicong Liu’s yearlong visit to Rice Universitywith support from China Scholarship Council to which the authors are grateful. The authors also thank theanonymous reviewers for their constructive feedback that has made the work stronger.

REFERENCES [1] Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning withdifferential privacy. In

Proceedings of SIGSAC . 308–318.[2] Jaspreet Bhatia, Travis D Breaux, Liora Friedberg, Hanan Hibshi, and Daniel Smullen. 2016. Privacy risk in cybersecurity data sharing.In

Proceedings of ACM Workshop on ISCS . ACM, 57–64.[3] Jiawei Chen, Janusz Konrad, and Prakash Ishwar. 2018. Vgan-based image representation learning for privacy-preserving facialexpression recognition. In

Proceedings of CVPR Workshops . 1570–1579.[4] Jen-Tzung Chien and Ching-Huai Chen. 2016. Deep discriminative manifold learning. In

Proceeding of ICASSP

Proceedings of CVPR .[7] Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N Rothblum. 2010. Differential privacy under continual observation. In

Proceedingsof STC . ACM, 715–724.[8] Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy.

Journal of Foundations and Trends inTheoretical Computer Science (2014), 211–407.[9] Cynthia Dwork, Adam Smith, Thomas Steinke, and Jonathan Ullman. 2017. Exposed! a survey of attacks on private data.

Annual Reviewof Statistics and Its Application arXiv preprint arXiv:1511.05897 (2015).Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 3, No. 4, Article 144. Publication date: December 2019. [11] Murat A Erdogdu, Nadia Fawaz, and Andrea Montanari. 2015. Privacy-utility trade-off for time-series with application to smart-meterdata. In

Proceedings of Workshops at AAAI .[12] Alessandro Giusti, Dan C Ciresan, Jonathan Masci, Luca M Gambardella, and Jurgen Schmidhuber. 2013. Fast image scanning with deepmax-pooling convolutional neural networks. In

Proceedings of ICIP . 4034–4038.[13] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio.2014. Generative adversarial nets. In

Advances in Neural Information Processing Systems

American ControlConference (ACC), 2017 . IEEE, 1673–1678.[19] Chong Huang, Peter Kairouz, Xiao Chen, Lalitha Sankar, and Ram Rajagopal. 2017. Context-aware generative adversarial privacy.

Entropy (2017).[20] Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 arXiv preprint arXiv:1412.6980 (2014).[23] Alex Krizhevsky, Nair Vinod, and Hinton Geoffrey. 2014. The CIFAR-10 dataset. https://goo.gl/hXmru5.[24] Rudolf Kruse, Christian Borgelt, Frank Klawonn, Christian Moewes, Matthias Steinbrecher, and Pascal Held. 2013. Multi-layer perceptrons.Springer, 47–81.[25] Yann LeCun. 1998. The MNIST database of handwritten digits. https://goo.gl/t6gTEy.[26] Mu Li, Tong Zhang, Yuqiang Chen, and Alexander J Smola. 2014. Efficient mini-batch training for stochastic optimization. In

Proceedingsof SIGKDD . ACM, 661–670.[27] Sicong Liu, Yingyan Lin, Zimu Zhou, Kaiming Nan, Hui Liu, and Junzhao Du. 2018. On-demand deep model compression for mobiledevices: a usage-driven model selection framework. In

Proceedings of ACM MobiSys .[28] Aravindh Mahendran and Andrea Vedaldi. 2015. Understanding deep image representations by inverting them. , 5188–5196 pages.[29] Mohammad Malekzadeh, Richard G Clegg, and Hamed Haddadi. 2018. Replacement autoencoder: A privacy-preserving algorithm forsensory data analysis. In

Proceedings of IEEE IoTDI . 165–176.[30] Ricardo Mendes and João P Vilela. 2017. Privacy-preserving data mining: methods, metrics, and applications.

IEEE Access

Proceedings of 3DV . 565–571.[32] Seong Joon Oh, Mario Fritz, and Bernt Schiele. 2017. Adversarial image perturbation for privacy protection a game theory perspective.In

Proceedings of ICCV . 1491–1500.[33] Seyed Ali Ossia, Ali Shahin Shamsabadi, Ali Taheri, Hamid R Rabiee, Nic Lane, and Hamed Haddadi. 2017. A hybrid deep learningarchitecture for privacy-preserving mobile analytics. arXiv preprint arXiv:1703.02952 (2017).[34] Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Úlfar Erlingsson. 2018. Scalable private learningwith pate.

Proceddings of ICLR (2018).[35] Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generativeadversarial networks. arXiv preprint arXiv:1511.06434 (2015).[36] Nisarg Raval, Ashwin Machanavajjhala, and Jerry Pan. 2019. Olympus: sensor privacy through utility aware obfuscation.

Proceedings ofPET (2019).[37] Liu Sicong, Zhou Zimu, Du Junzhao, Shangguan Longfei, Jun Han, and Xin Wang. 2017. UbiEar: Bringing Location-independent SoundAwareness to the Hard-of-hearing People with Smartphones.

Journal of IMWUT arXiv preprint arXiv:1812.03224 (2018).[40] UCI. 2017. Har: Dataset for Human Activity Recognition. https://goo.gl/m5bRo1.[41] Zhenyu Wu, Zhangyang Wang, Zhaowen Wang, and Hailin Jin. 2018. Towards privacy-preserving visual recognition via adversarialtraining: A pilot study. In

Proceedings of ECCV .[42] Matthew D Zeiler, Dilip Krishnan, Graham W Taylor, and Rob Fergus. 2010. Deconvolutional networks. In

Proceedings of CVPR .[43] Guoqiang Zhong, Li-Na Wang, Xiao Ling, and Junyu Dong. 2016. An overview on data representation learning: From traditional featurelearning to recent deep learning.