Science Driven Innovations Powering Mobile Product: Cloud AI vs. Device AI Solutions on Smart Device
SScience Driven Innovations Powering Mobile Product: Cloud AIvs. Device AI Solutions on Smart Device
Deguang Kong
Yahoo Research701 1st Ave, Sunnyvale, California, [email protected]
ABSTRACT
Recent years have witnessed the increasing popularity of mobiledevices (such as iphone) due to the convenience that it brings tohuman lives. On one hand, rich user profiling and behavior data (in-cluding per-app level, app-interaction level and system-interactionlevel) from heterogeneous information sources make it possible toprovide much better services (such as recommendation, advertise-ment targeting) to customers, which further drives revenue fromunderstanding users’ behaviors and improving user’ engagement.In order to delight the customers, intelligent personal assistants(such as Amazon Alexa, Google Home and Google Now) are highlydesirable to provide real-time audio, video and image recognition,natural language understanding, comfortable user interaction in-terface, satisfactory recommendation and effective advertisementtargeting.This paper presents the research efforts we have conducted onmobile devices which aim to provide much smarter and more conve-nient services by leveraging statistics and big data science, machinelearning and deep learning, user modeling and marketing tech-niques to bring in significant user growth and user engagementand satisfactions (and happiness) on mobile devices. The developednew features are built at either cloud side or device side, harmon-ically working together to enhance the current service with thepurpose of increasing users’ happiness. We illustrate how we designthese new features from system and algorithm perspective usingdifferent case studies, through which one can easily understandhow science driven innovations help to provide much better servicein technology and bring more revenue liftup in business. In themeantime, these research efforts have clear scientific contributionsand published in top venues, which are playing more and moreimportant roles for mobile AI products.
KEYWORDS engagement, growth, big data science, recommender, targeting,forecasting, effectiveness, deep, convolution, GoogLeNet, optimiza-tion, embedding, LSTM, natural language, malicious, adversarial,privacy
Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for third-party components of this work must be honored.For all other uses, contact the owner/author(s).
ACM’18, Dec 2017, USA © 2016 Copyright held by the owner/author(s).ACM ISBN 123-4567-24-567/08/06...$15.00https://doi.org/10.475/123_4
ACM Reference format:
Deguang Kong. 2018. Science Driven Innovations Powering Mobile Product:Cloud AI vs. Device AI Solutions on Smart Device. In
Proceedings of ACMMobile AI conference, USA, Dec 2017 (ACM’18),
13 pages.https://doi.org/10.475/123_4
Artificial intelligence (AI) is the technology driving the new revolu-tion. AI on cloud is running AI algorithms on cloud side, and it hasdemonstrated overwhelming performance in image recognition,speech recognition, and video understanding, etc. AI on device isrunning powerful AI algorithms on the device. Due to the limitedresources on device side, AI algorithm must be more economic andefficient to satisfy real-time serving requirement.New challenges come with the exponentially growing marketsof mobile device applications. We need to address many new prob-lems, for example, diversified app markets, heterogeneous users’behaviors and limited computational resources, in order to providebetter service and improve user engagement on mobile devices.AI is playing more and more important roles in mobile product,which is highly demanded by customers to provide more intelligent,smart and convenient services. In the paper next, we will show ourresearch efforts towards building smarter and more convenientmobile systems in the following aspects: • Cloud-AI solutions: The majority of computing and learningtasks are performed at cloud side, which provides rich re-sources and computational powers for computing. Cloud isnecessary and very helpful for pooling of big data and train-ing huge amount of history observations using sophisticatedmachine/deep learning and parameter tuning strategies. – Mobile App Understanding: Risk Assessment and Mali-cious App detection – Marketing for mobile app User Growth: Attraction newusers, retention, campaign effectiveness analysis – Mobile App User engagement: User modeling, profilingand Recommendation – Mobile App Monetization: native advertisement serving,advertisement targeting, bid optimization and pricing • Device AI solutions: The major learning and recognitiontasks are performed at device side. Running AI on devicehas brought many advantages, such as immediate response,enhanced reliability, increased privacy, and efficient use [22]of network bandwidth. AI inference algorithm can be mainlyrunning on mobile devices. – Image Recognition and Image Privacy on Mobile Devices – Deep learning model compression on mobile devices a r X i v : . [ c s . C Y ] N ov CM’18, Dec 2017, USA Deguang Kong
Figure 1:
Overview of cloud AI solution: (a) left panel: from appli-cation perspective, understanding apps and detection of maliciousapps; (b) right panel: user growth, user engagement and monetiza-tion. • Cloud AI and Device AI interactions: Some parts of learningand recognition tasks are performed at device side whileothers are put on cloud side. One particular example is per-sonal assistant, such as Amazon Alexa and Google Home.In particular, AI inference running entirely in the cloud willhave issues for real-time serving that are usually latency-sensitive and mission-critical (e.g., autonomous driving). Thereal-world system is expected to benefit from the cloud-sidesophisticated training and high-performance device process-ing and inference [23], leading to the best overall systemperformance. – Personal Assistant Engine on Mobile Device ∗ Dialog System ∗ Speech recognition ∗ Speech synthesis ∗ NLP understanding ∗ Chatbot ∗ Recommender systemWe provide scientific leaderships and provide practical solutionsto solving these challenging problems because we strongly believethat, pure engineering is not enough in practice. These techniqueswill be very helpful for addressing the challenging problems insolving mobile big data science and AI problems in real world if (abig “if”) the technology can be accurate, robust and scalable enough.We need research breakthrough to improve the state-of-the-art andwe have diligently worked on them. Fortunately, some of them havebeen graduated into product.
In this section, we provide cloud-AI solutions from the followingfour perspectives: • Mobile App Understanding: Risk Assessment and MaliciousApp detection • Marketing for mobile app User Growth: Attraction new users,retention, Campaign effectiveness analysis • Mobile App User engagement: User modeling, profiling andRecommendation
Figure 2:
Ranking the risk of mobile apps using multi-modal fea-tures such as descriptions, user review, permission access and adlibrary. • Mobile App Monetization: advertisement serving, advertise-ment targeting, bid optimization and pricingIn particular, Fig.1 shows an overview of the framework.
A mobile app is a software application developed specifically forthe use on smart devices. The mobile can provide the convenienceto users to achieve the users’ purpose and satisfy users’ interest,almost solving everything you may image. For example, one canget news from the news app, one can purchase a Swiss watch whilesitting at any office, one can even talk with the lover who is actuallyfar away for thousands of miles. In this section, we will provide themobile app understanding from several perspectives: • App risk assessment • Malicous app detection • App maturity rating
Comparing with traditional software markets, markets like GooglePlay and Apple Store have lower entry threshold for developers andfaster financial payback, hence greatly encouraging more and moredevelopers to invest in this thriving business. Therefore controllingthe quality of apps, especially the security risk of them across thewhole markets, becomes an important issue to all that involved. Onthe other hand, public concerns about privacy issues with onlineactivity and mobile phones are also elevating, demanding a mobileenvironment with more respect to users’ privacy. In mobile apps,permissions indicate the resources that the apps can access, andthus can be viewed as a privacy indicator. From users’ perspective,the meta data such as user reviews and developer descriptionsreflect users’ perceptions and developer expectations for the apps,and thus are also correlated with risks of apps.Our idea is to explore heterogeneous privacy indicators [15] [12]for app risk ranking, which, we believe, is very important to in-ternet company to improve user engagement in mobile platforms(as shown in Fig. 2). The risk ranking problem is formulated as amulti-view feature learning problem by exploring group LASSOand exclusive group LASSO techniques [16], [14], [17] which canautomatically select the most discriminant features by consideringboth inter-view feature competitions, and also intra-view featurecorrelations. In particular, we solve the following optimization prob-lem, given feature x i ∈ ℜ p for each app i , Y ki for label of app i with cience Driven Innovations Powering Mobile Product: Cloud AI vs. Device AI Solutions on Smart Device ACM’18, Dec 2017, USA Figure 3:
Flowchart of risk assessment from user comments. Given an app, the user comments are used to evaluate the risk of apps in twosteps: (a) “crowdsourcing” is used to accumulate user comments into app-level features to accumulate user comments into app-level features(shown as “feature extraction”, “auto annotation” and “crowdsourcing”); (b) “learning to rank” model is used to predict risk scores by utilizingthese latent features, where pairwise constraints are enforced between pairwise apps (shown as relative scores of two apps).
Figure 4:
Demonstration of app risk assessment app. category label k (1 ≤ k ≤ K ), we aim to find the feature weight w vk for class k regarding v -th (1 ≤ v ≤ V ) view feature, i.e., min W ∈ℜ p × K (cid:205) ni = (cid:205) Kk = (cid:16) Y ki loд (cid:205) Kk = exp w ⊤ k x i − Y ki w ⊤ k x i (cid:17) + α (cid:205) Kk = (cid:205) Vv = ∥ w vk ∥ + β (cid:205) Kk = (cid:205) Vv = ∥ w vk ∥ (1)Correspondingly, we derive an efficient iteratively re-weightedalgorithm to tackle the resultant optimization problem, which canhandle any group structure, regardless of coherent or exclusivegroup structures. It demonstrates very good performance in real-world datasets (totally 13, 174 apps, 34, 514 descriptions, 9,986, 568user reviews and 100 ads libraries).We also derive a crowdsourcing ranking approach [13] [1] (seeFig.3) to rank risk of apps from user comments by combining featurelearning and ranking SVM methods, which also provides goodsolutions in practice. The problem we solve is formalized as:min w ∈ℜ p , θ , Y ℓ − loд Pr ( D n | θ , Y ℓ ) − loд Pr ( θ ) λ ∥ w ∥ + C ∥( e − BY ℓ w ∥ , (2) Figure 5:
Enhanced malicious app detection process: (i) for benignapps, we generate adversarial examples; (ii) for malicious apps, wegenerate the mutated and evolved apps. based on the maximum a posteriori probability (MAP) estimation,where w is the feature weight, Y ℓ is the labels learned from the fea-ture pre-processing step and θ is the prior distribution of parametersin crowdsourcing process by aggregating different user reviews, and loдPr ( D n | θ , Y ℓ ) gives the likelihood of objective function given thecurrent parameters while ∥( e − BY ℓ w ∥ is the hinge-loss functionin SVM ranking. Lessons Learned
We do need multi heterogeneous models tofind the most discriminant features. User reviews and ads librariesplay import roles in understanding the risk of apps except thepermissions. A demo system is shown in Fig.4.
Existing techniques on adversarial malware generation employ fea-ture mutations based on feature vectors extracted from malware.However, most (if not all) of these techniques suffer from a commonlimitation: feasibility of these attacks is unknown. The synthesizedmutations may break the inherent constraints posed by code struc-tures of the malware, causing either crashes or malfunctioningof malicious payloads. To address the limitation, Yang et. al. [40]present Malware Recomposition Variation (MRV), an approach thatconducts semantic analysis of existing malware to systematicallyconstruct new malware variants for malware detectors to test andstrengthen their detection signatures/models. In particular, we use
CM’18, Dec 2017, USA Deguang Kong
Figure 6:
Phylogenetic tree generated for the DroidKungFu family.Each leaf in the graph denotes a malware sample in DroidKungFufamily, where leaf nodes (1–4) belong to droidkungfu.ab, (5–9) be-long to droidkungfu.aw, (10) belongs to droidkungfu.bb, (11– 12) be-long to droidkungfu.bl, (13–15) belong to droidkungfu.c, (16–22) be-long to droidkungfu.g, (23–28) belong to droidkungfu. m. two variation strategies (i.e., malware evolution attack and mal-ware confusion attack) following structures of existing malware toenhance feasibility of the attacks. Upon the given malware, we con-duct semantic-feature mutation analysis and phylogenetic analysisto synthesize mutation strategies. Based on these strategies, weperform program transplantation to automatically mutate malwarebyte-code to generate new malware variants. We evaluate our MRVapproach on actual malware variants, and our empirical evaluationon 1,935 Android benign apps and 1,917 malware shows that MRVproduces malware variants that can have high likelihood to evadedetection while still retaining their malicious behaviors. We alsopropose and evaluate three defense mechanisms to counter MRV.Fig.5 gives an overview of the technique we used for maliciousapp detection. The major differences between our work and theexisting works are:
Mutated samples
We mutate the malware features from theoriginal feature values to the ones that are less differentiable for mal-ware detection [18]. The newly generated samples are fed into theclassification model again for building the discriminant classifier.
Evolved samples
We mimic and automate the evolution ofmalware based on the insight that the evolution process of malwarereflects the strategies employed by malware authors to achieve amalicious purpose while evading detection. Fig.6 gives an exampleof phylogenetic tree generated for the DroidKungFu family. Thenewly generated samples are fed into the classification model againfor building the discriminant classifier.
Adversarial samples
We explore the malware features to iden-tify much more blind spots of existing detection, and generate theadversarial samples from benign apps that are actually labeled asmalicious ones. The newly generated adversarial samples are fedinto the classification model again for building the discriminantclassifier.Our work also has strong connections with adversarial learn-ing [21]. To make the discriminant classifier more robust, actuallywe generate the synthetic apps that fools the discriminator into
Figure 7:
App maturity rating: from word embedding to app em-bedding. accepting it as the true apps. Similar to Generative adversarial net-work (GAN) [6], these adversarial and mutated/evolved samplegeneration process is like a generative process, heavily relying onthe feature model used in malware detection, instead of the “ran-dom noises” as in standard GAN. The generative process actuallyenforces “data augmentation” operations, which is a key strategyused in deep learning process.Semi-supervised learning is applied for automatic generationof Android security policies in [38]. However, it suffers from theinherit limitation of semi-supervised learning [19]: the algorithmperformance degrades significantly at the situation when the misla-beled samples tend to propagate the errors instead of “belief” alongthe similarity measurement path computed from domain knowl-edge or underlying data manifold information, which is well knownin machine learning community.
App maturity rating concerns how to protect children from inappro-priate content in mobile apps. Apps may contain sexual, violenceand drug usage in their content. Therefore, mobile platforms pro-vide rating policies to label the maturity levels of Apps and the rea-sons why an App has a given maturity level, which enables parentsto select maturity-appropriate Apps for their children. However,existing approaches to implement these maturity rating policies areeither costly (because of expensive manual labeling) or inaccurate(because of no centralized controls). In this work [8] we aim todesign and build a machine learning framework to automaticallypredict maturity levels for mobile Apps and the associated reasonswith a high accuracy and a low cost using machine learning (anddeep learning) techniques.Specifically, we extract novel features from App descriptionsby leveraging the semantic embedding of words (a.k.a word em-bedding) to automatically capture the semantic similarity betweenwords and adapt Support Vector Machine to capture label correla-tions with Pearson correlation in a multi-label classification setting.In particular, in embedding step, given a sequence of training words w , w , w , · · · , w T , the skip-gram model [32] is used to maximizethe average log likelihood, i.e., max 1 T T (cid:213) t = (cid:213) − c ≤ j ≤ c , j (cid:44) loд Pr ( w t + j | w j ) , (3) cience Driven Innovations Powering Mobile Product: Cloud AI vs. Device AI Solutions on Smart Device ACM’18, Dec 2017, USA where c is the size of training context. and Pr ( w t + j | w j ) is the prob-ability of occurrence of w t + j given w t which is usually definedusing a softmax function, i.e., Pr ( w O | w I ) = exp (cid:16) ( v ′ w O ) ⊤ v w I (cid:17)(cid:205) Wo = exp (cid:16) ( v ′ w o ) ⊤ v w I (cid:17) (4)where v w I and v ′ w O are the “input” and “output” word embedding,and W is the number of words in vocabulary.In app maturity rating application, all app descriptions and tex-tual comments are pre-processed as word-embedding. Then theycan be fed into the embedding model perfectly to learn the appembedding. The key idea is to aggregate the semantics from wordlevel to app level, which actually leverages the semantics from word embedding to App embedding . Fig.7 shows the flowchart ofdesign. Essentially, the framework infers the app maturity usingthe following logics:App description and User comment → word embedding → App level embedding → Predictive model → Maturity contents labeling → maturity levelIn experiment, we evaluate our approach and various baselinemethods using datasets that we collected from both Apple storeand Google Play. We demonstrate that, with only App descriptions,our approach already achieves 85% Precision for predicting maturecontents and 79% Precision for predicting maturity levels, whichsubstantially outperforms baseline methods. Marketing the app is one of the most common ways used for drivinguser growth and improving user engagement. In this section, wewill discuss the business intelligence techniques used for attractingnew users, promoting the retention of users and improving adcampaign effectiveness.
There are several typical ways used for attracting new users inbusiness intelligence: • Concentrating on user experience : the best way to buildcustomer relationships is to delight the customers. In a sim-ple word, leave everyone who uses the app feeling good. • Cross promotion
One need to target people in differentsegments. Segment the potential customers and figure outhow likely one can bring the customer to daily active users(DAU) or monthly active users (MAU). Targeting model iswidely adopted in Facebook, Google and Yahoo business totargeting the look-alike users. • Develop a Content/functionality Strategy
The app is de-sired to provide the interesting contents and functionalitiesthat will hit the target audience at the right places. The con-tent and functionality features should cater to targeting’srequirement. This process is usually optimized using A/Btests to generate even more virality. The “viral factor” canbe used to measure how effectively the new feature can at-tract new customers, which is widely adopted in Facebook,Instagram, Snapchat, etc. • Media exposure
Tell a great story to the media and spendmoney on different publishers in ad campaigns. The returnon investment can be measured using campaign effectivenessanalysis. • Purchasing more traffics
At user-level, different coupons,rewards and discounts are exciting ways to attract new cus-tomers. At business side, jointly work with the major internetservice providers in bundled sales approach will promotethe user growth . This is also known as paid acquisition . • Search engine optimization
Promote the product usingthe search engine from creating contents that are more favor-able by search engine. The content includes Q&A, articles,long-form reviews, etc. The goal is to increase the page-viewsfrom attracting the new users after searching.In practice, all these different scenarios can be combined togetherto promote user growth based on the budget limit. The mathematicoptimization is easy to be obtained before reaching the ceiling onsaturation, i.e., max θ , Θ ∆ ( Θ ) ∆ Money_Spent ( θ ) , s . t . ( Θ ) ≤ Market_capacity (5)where ( Θ ) and Money_Spent ( θ ) denote the number of usersand current money spend in the marketplace, and Θ , θ are parame-ters, respectively. In real world, the optimization is performed eitherempirically from history experience or based on the forecastingresults using machine learning and statistical models. Althoughthe model itself can be highly biased due to the censored and noisydata observations, we believe robust statistical modeling is still aneffective and automated way for internet marketing compared toempirical analysis (or simple cohort analysis) from prior knowledge, i.e., Θ = arдmin Θ L (cid:16) User number , ˆ f ( user feature set, app feature set , Θ ) (cid:17) θ = arдmin θ L (cid:16) Money spent , ˆ д ( user feature set, app feature set , θ ) (cid:17) , where L( x , y ) is the loss function (e.g., least square loss or cross-entropy loss) that captures the difference between x and y , functions f ( . ) and д ( . ) can be learned using machine learning/deep learningmodel in distributed big data science environment. A retention analysis allows one to see the numbers like this: • What percentage users are coming back after a week? • What percentage of users are paying after a month? • How long users stick around for a week or for a month? • Did the new feature released last month increase retentionor degrade it? • Why some users churn and others do not? • How to form growth hypotheses based on quantitative data?Essentially, retention analysis tells a compelling story about “usersdoing A in a period" more than “number of users doing A". In cohortanalysis, users are grouped into different segmentation (e.g., basedon sign-up date) for behavioral analytics. For example, one caneasily observe the percentage of the accounts that still use the https://hbr.org/2016/02/every-company-needs-a-growth-manager CM’18, Dec 2017, USA Deguang Kong
Figure 8:
User segmentation (cohort) based on time-varying fea-tures. service in the following weeks since they signed up at differenttime. Also, stickiness is a measure of engagement, which looks athow many times a user performs a particular action in a weekly ormonthly interval. In practice, after setting the analytic goals (suchas the number of weeks and the cohort of users one wants to track),one can easily write funnel queries or use Google Analytics (orother tools) to achieve the goal.Retention analysis can help people to figure out how many peo-ple we have lost and drive the insight from such analysis. Thenumbers shown in retention analysis is not the final goal, but theinsight and solutions to improve user retentions are what we re-ally want. Therefore, it is necessary to figure out the relationshipbetween user actions and retention one you can identify the mainbehaviors that are correlated with long-term use. The most widelyused correlation measurement between retention and an action is pearson correction , i.e., r = (cid:205) i ( x i − ¯ x )( y i − ¯ y ) (cid:112)(cid:205) i ( x i − ¯ x ) (cid:112)(cid:205) i ( y i − ¯ y ) , (6)where x i is sample of retention observations and y i is sample ofuser action observations. For example, for facebook users, userswho already added 7 friends within the first 10 days after they usedfacebook are more likely to continue to use Facebook long-termwhile users who added less than 7 friends are more likely to churnout. For this situation, the user action of “adding 7 friends in acertain timeframe" is highly correlated with retention. Please keepin mind that “correlation” does not mean “ causality ”. The useraction may not be responsible for the retention. Further causalityanalysis is needed to connect the cause (i.e., user action) with theeffect (i.e., user retention). Fig.8 shows the user segmentation resultbased on user behaviors.Further we clarity the relations between retention rate, survivalfunction [33], hazard function (or hazard rate) widely used in statis-tic data science . Hazard Rate is usually defined as the probability λ i of objectthat does not survive in the i -th time interval ( t i − t i − ). Retention Rate is defined as the probability (=1 − λ i ) of objectthat survives in the i -th time interval ( t i − t i − ) . Survival Function S ( t i ) is defined as the probability of objectthat survives up to t i .Essentially we have: Figure 9:
Marketing funnel (from standard marketing sciencebook). The advertising campaigns are started to target differentcustomer cohort. The app-level campaign is generally initiated forbrand-level advertising, while user-level advertising is started forlifting up the direct response from users.
Campaign Name Targeting audience RewardsAppolo first time user New York travelLight burn MAU iphone 8Fantastic day Infrequent users Dogfood
Table 1: Campaign examples (only for demonstration pur-pose).
Lemma 2.1.
Retention rate = 1 - Hazard Rate. Survival function S ( t i ) is given by: Kaplan-Meier Method : S ( t i ) = (cid:214) i ( − λ i ) (7)Nelson-Aalen Method : S ( t i ) = exp (− (cid:213) i λ i ) (8) The marketing funnel is widely adopted in marketing research forunderstanding the chance of turning leads into customers frommarketing and sales perspective. The general idea is similar to a funnel , i.e. , marketers first give very broad net in order to capturethe potential customers as many as possible, and then slowly nur-ture prospective customers through the purchasing (or conversion)decision, by narrowing down the candidate pools in each stage ofthe funnel. Fig. shows the different stages in marketing funnel, i.e., Awareness → interest → consideration → intent → evaluation → purchase.In practice, at different stages of marketing funnel, the promo-tion campaign would be different based on targeting customers. Forexample, at “awareness” stage, the campaign goal is to discover thecustomer net via establishing the trust and thoughts with events,advertising, trade shows, blog posts, infographics, etc. Therefore,the campaign effectiveness measurement should be defined to mea-sure the reaches of customers at different information sources. At“evaluation” stage, one should convince the buyers to make a fi-nal decision, then the purchase rate will be an important factor toevaluate the campaign effectiveness. cience Driven Innovations Powering Mobile Product: Cloud AI vs. Device AI Solutions on Smart Device ACM’18, Dec 2017, USA In fact, marketing campaign can be performed at different stagesof funnels for B2C or B2B business. As illustrated before, the cam-paign effectiveness analysis actually depends on the targeting cus-tomers at different stages of funnel, which can be viewed as externalvariables to personalize the campaign effectiveness analysis. Weshow the generalized version of campaign effectiveness analysis,where personalized version can be obtained correspondingly byincorporating the external variable.We show how to compute the campaign effectiveness usinggeneralized analysis. Here we give several examples of targetingcampaigns in Table.1.Let Y i and Y i be potential benefits (e.g., bonus, rewards, credit,discount, etc) for individual i when i receives the treatment or doesnot receive treatment, respectively. The fundamental problem ofmaking a casual inference is how to reconstruct the results thatare not observed. For each individual i , what if individual i doesnot receive treatment? Basically, we do analysis at aggregation-level using average treatment effect (ATE) analysis and averagetreatment effect on the treated group (ATT) analysis. ATE is definedas: AT E = E ( Y i | T i = , ) − E ( Y i | T i = , ) , (9)where E ( . ) represents the expectations on the aggregation level,and T i denotes the treatment with value 1 for the treated groupwhile value 0 for the control group, i.e., the average effect thatwould be observed in the treated and control group received treat-ment, compared with if none in both groups received treatment.Correspondingly, ATT = E ( Y i | T i = ) − E ( Y i | T i = ) (10)which refers to the average differences if the treated group receivedtreatment if none of these in the treated group received treatment.Then the propensity score [31] is defined as the conditional proba-bility of receiving a treatment given pre-treatment characteristics:Pr ( X ) = Pr ( T = | X ) = E ( T | X ) , (11)where T = { , } is the indicator of exposure to treatment and X is the multi-dimensional representation of pre-treatment charac-teristics. The key idea is that treated and control units should beon average observationally identical (a.k.a balancing hypothesis ). Inother words, for a given propensity score, exposure to treatment israndom and therefore any standard probability model can be usedto estimate the propensity score, e.g.,Pr ( T i = | X i ) = ϕ ( h ( X i )) , (12)where ϕ denotes the normal cumulative distribution function, and h ( X i ) is a function of covariates with linear and higher order terms.Please keep in mind the choice of estimate of propensity score offunction h ( . ) must satisfy the balancing hypothesis. In this section, we discuss how to improve the user engagementusing user modeling, profiling and recommendation techniques.As the beginning of June 2014, App Store had 1.2 million Appsand a cumulative of 75 billion downloads. Therefore it is urgent todevelop effective personalized App recommendation systems. Inparticular, we discuss how to do privacy aware app recommendation
Figure 10:
Upper panel: privacy aware app recommendation;Lower panel: context aware app recommendation. and context aware app recommendation for mobile users. Recommen-dation is useful and helpful since it can capture users’ preferenceand interest, and it strongly connects with targeting in technologyalthough differs in business logic. Fig.10 gives an overview of theapp recommendation introduced in this section.
Recent years have witnessed a rapid adoption of mobile devices anda dramatic proliferation of mobile applications (Apps for brevity).However, the large number of mobile Apps makes it difficult forusers to locate relevant Apps. Therefore, recommending Apps be-comes an urgent task. Traditional recommendation approachesfocus on learning the interest of a user and the functionality of anitem (e.g., an App) from a set of user- item ratings, and they rec-ommend an item to a user if the item’s functionality well matchesthe user interest. However, Apps could have privileges to access auser’s sensitive resources (e.g., contact, message, and location). Asa result, a user chooses an App not only because of its functional-ity, but also because it respects the users’ privacy preference. Tothe best of our knowledge, this work presents the first systematicstudy on incorporating both interest-functionality interactions anduser privacy preferences to perform personalized App recommen-dations [29]. Specifically, we first construct a new model to capturethe trade-off between functionality and user privacy preference.In particular, in this work, it leverages the state-of-the-art Poissonfactorization technique and optimizes the objectivemax u , v Pr ( y ij | u i , v j , p s ) , = Poisson ( y ij , u i ( v j + λ (cid:213) s ∈ Σ j p s )) , (13)where y ij is the rating score for a particular user i for app j , u i is the user i latent factor, v j is the app j latent factor and p s isapp privacy latent factor with respect to app j . Then we crawled areal-world dataset (16, 344 users, 6, 157 Apps, and 263, 054 ratings)from Google Play and use it to comprehensively evaluate our modeland previous methods. We find that our method consistently and CM’18, Dec 2017, USA Deguang Kong
User Location Semantics Recommended ServiceJohn Safeway Apple Pay or Chase PayDamao Bank Get a free coffeeAmy Mall Use Banana Coupon
Table 2: Use cases of context aware recommendation. substantially outperforms the state-of-the-art approaches, whichimplies the importance of user privacy preference on personalizedApp recommendations. Moreover, we explore the impact of differentlevels of privacy information on the performances of our method,which gives us insights on what resources are more likely to betreated as private by users and influence user behaviors at selectingApps.
In many practical applications, in practice, the recommendationdepends on context. Here “context” is a very generic concept thatcan denote location, gender, age, interest or other segmentations.In other words, recommendation is performed on different bucketsbased on an attribute or a combination of a group of attributes.Similar to app recommendation, we solve this problem using tensorbilinear factorization technique [11]. In particular, we solve thefollowing problem:max U , V , P Pr ( X ijk | U ir , V js , P kt ) = Poisson ( X ijk , U ir V jr + U it P kt + V js P ks ) (14)where X ijk is the rating score for a particular user i for app j incontext k , U i : is the user i latent factor, V j : is the app j latent factorand P k : is app context latent factor for context k . Similarly thisframework can be easily extended for generating the context awareservice recommendations. The use cases are shown in Table. 2. Lessons Learned.
In this section, we use “app” as demonstratedexamples for recommendation purpose. Our approach can be easilyadapted for recommendations on purchase and others.
The basic idea is how to make more money from mobile apps.Therefore, this section includes advertisement serving, advertisementtargeting, bid optimization and pricing.
Although the technologyhas been graduated into product, this section is intentionally leftblank until our research works are published.
In this section, we provide device-AI solutions, which aims to pro-tect image privacy on mobile devices.
Every second, nearly 4,000 photos uploaded to Facebook, around4,600 photos exchanged through Snapchat Photos are uploaded,saved and shared on cloud, e.g., centralized photo sharing platforms(PSPs). In cloud side, sensitive regions are exposed to public, etc.What is the security and privacy risk? Photo owners worry about
Figure 11:
Private photo privacy protection. The sensitive regionsare marked using bounding box.
Figure 12:
Private photo privacy protection pipeline: originalimage I x is encrypted to E ( I x ) , after transformation it becomes T ( E ( I x )) , and finally after decryption operations it is in the formof D [ T ( E ( I x ))] . privacy leakage on cloud/PSPs, also Cloud/PSPs may access andprocess user photos without explicitly asking for user agreementand share the unprotected photos. In this work we propose imageperturbation technique to protect the image privacy. Ideally, givenencryption function E ( . ) , transformation function T ( . ) , the goal isto find decryption function D ( . ) , such that for an image I X , it exist D (cid:104) T ( E ( I X )) (cid:105) = T ( X ) (15)Our system design is guided by the following theorem. Let P ( . ) bethe image perturbation technique, we have:Theorem 3.1. Using Image perturbation technique E = P ( . ) for“encryption” of photos, it can exactly “decrypt" the photos and recoverthe original one, i.e., D (cid:104) T ( E ( I X )) (cid:105) = T ( X ) (16) where D = f ( T , E ) can be easily calculated given function E ( . ) = P ( . ) and T ( . ) . Note that “crypto” based technique (including symmetric andpublic key encryption) may not work since it is not compatible withtransformation T ( . ) and also D is impossible to be computed given E ( . ) and T ( . ) , and therefore X cannot be recovered (see Fig.12). Forexactly the same reason, differential privacy [27] added Laplaciannoises to the image, which is, in fact, irreversible although pri-vacy preserving. Finally one cannot recover anything given imagetransformation T ( . ) as well.In our approach [7], it supports different linear transformations(e.g., Rotation, Cropping, Scaling) and also non-linear transfor-mation such as compression. The key idea of our approach is toperturb DC and AC components discriminant in FFT domain, whichachieves the same purpose as crypto but is compatible with dif-ferent image transformation. Also, our approach has advantagesdue to its simple, fast and effective implementation. In our solution,applying a transformation on perturbed image is equal to applyingtransformation on original image plus applying transformation onvirtual image (generated from perturbation). An example is shownin Fig.13. cience Driven Innovations Powering Mobile Product: Cloud AI vs. Device AI Solutions on Smart Device ACM’18, Dec 2017, USA Figure 13:
Image perturbation process: the perturbed image isequal to the original image plus the perturbed parameters. The per-turbed image is stored on cloud, while the original image is collectedfrom the user.
This section provides a deep learning based technique to protectimage privacy. Several examples are shown in Fig. 11. Photo privacyis a very important problem in the digital age where photos arecommonly shared on social networking sites and mobile devices.The main challenge in photo privacy detection is how to generatediscriminant features to accurately detect privacy at risk photos.Existing photo privacy detection works, which rely on low-levelvision features, are non-informative to the users regarding whatprivacy information is leaked from their photos. In this section,we propose a new framework called PrivacyCNH [37] that utilizeshierarchical features which include both object and convolutionalfeatures in a deep learning model to detect privacy at risk photos. Inparticular, given the joint deep learning structure (i.e., Alexnet [20]) V = { V , V , V , V } and W = { W , W , · · · , W } , the posteriorprobability of privacy risk for an image i is given by the sigmoidfunction using the learned features, i.e., Pr ( y i = | X i ; V , W) = + exp (− z ) , (17) z = ( V k ) ⊤ h ( X i ) + ( W ℓ ) ⊤ ℓ ( X i ) + β , (18)where V i and W j are the CNN network structure parameters with i and j indicating the layer number, k indexes the hidden unit in layer i , ℓ indexes the hidden unit in layer j , h i and ℓ j are the activationfunctions for object CNN and low-level CNN respectively and β isthe biased scalar term.The generation of object features enables our model to betterinform the users about the reason why a photo has privacy risk. Thecombination of convolutional and object features provides a richermodel to understand photo privacy from different aspects, thusimproving photo privacy detection accuracy. Experimental resultsdemonstrate that the proposed model outperforms the state-of-the-art work and the standard convolutional neural network (CNN)with convolutional features on photo privacy detection tasks. Fig. 14demonstrates the pipeline of our method. CNN model has been widely and successfully used in many com-puter vision tasks, such as object detection, fine-grained imageclassification, age estimation [39], etc. The popularity of mobilephone brings the great convenience to people life due to the exis-tence of many practical and excellent apps. However, to run CNN models (even in testing phrase) for a typical vision task is a luxuryfor most devices due to the high computational cost and limitedmemory space and power resources. To accelerate CNN models ishighly desirable to facilitate mobile vision applications that highlydepends on the performance of CNN models.Our investigation on AlexNet indicates that not only full-connectedlayers and convolution layers consume a lot of time, but also somenon-tensor layers (such as Pooling layer and LRN layers) that donot contain any high-order tensor-type weight parameter are alsotime-consuming. However, current researches focus on rank ap-proximation or parameter compression in fully connected and con-volution layers. Although helpful, the acceleration and compressionof non-tensor layers are totally ignored.To address this limitation, this paper [24] proposes a unifiedframework to compress CNN models by dismembering non-tensorlayers, to simultaneously accelerate the CNN model testing perfor-mance with neglect performance degradation. With re-trained newnetwork parameters in “re-birth” layers, the functionality of non-tensor layers is equivalently implemented in the new merged layerswith significant efficiently improvement. The standard least squareerror is used to minimize the error function in re-training processwhere the new parameters are essentially the “quantized” old pa-rameters (in some sense). The framework includes both streamingmerge and branch merge that are able to conduct fast computationseasily adapted for current mainstream CNN models and potentialnew CNN pipelines. In the meantime, in order to run deep learn-ing on mobile devices, we provide an elastic approach to run deeplearning in a distributed fashion (shown in Fig. 15).
Theoretical Analysis
The convolution layer transforms the input feature map X ∈ℜ M × N × K → Y ∈ ℜ M ′ × N ′ × K ′ , i.e. f conv : X (cid:55)→ Y , Y i ′ j ′ k ′ = d k (cid:213) i = d k (cid:213) j = K (cid:213) k = W ijkk ′ X i + i ′ , j + j ′ , k ( ≤ k ′ ≤ K ′ ) where K , K ′ are the number of feature map channels, and M , N ; M ′ , N ′ are the size of the images, which is actually regular linear convolu-tion by a filter bank, d k × d k is the kernel size and feature map Y isessentially the sum of inner product by traversing along differentlocations with d k × d k kernel (e.g., d k =
3) and the output response Y is obtained by enforcing linear transformation W on feature map X . The local response normalization (LRN) layer performs “lat-eral inhibition” based on the fact that the activated neurons willhave impact on those neurons in its local input regions. There-fore, it usually performs normalizing over local input regions from ℜ M × N × K → ℜ M × N × K ′ , i.e., f LRN : X (cid:55)→ Y , Y ijk ′ = X ijk (cid:16) κ + α (cid:205) k ∈ G ( k ′ ) X ijk (cid:17) β , (19)where G ( k ) = (cid:2) k − ⌊ ρ ⌋ , k + ⌈ ρ ⌉ (cid:3) ∩ { , , . . . , K } is a group of ρ consecutive feature channels in the input map. Clearly, if κ = , α = , β =
1, this gives ℓ normalization. A batch normalization opera-tion [10] is usually applied to change the distributions of activationsto avoid “Internal covariate shift”. During SGD training, each acti-vation of the mini-batch is centered to zero-mean and unit variance CM’18, Dec 2017, USA Deguang Kong
Figure 14:
CNN pipeline for privacy detection that consists of two pipelines: (a) object feature learning pipeline (upper panel); (b) convolutionfeature learning pipeline (lower panel). h ( x ) , ℓ ( x ) are activation functions Figure 15:
Distributed Deep Learning execution framework. where the mean and variance are measured over the whole mini-batch. Then a learned offset β and multiplicative factor γ are thenapplied, i.e., given values of X over a mini-batch: { X , X , · · · , X m } ,batch normalization projects them into { Y , Y , · · · , Y m } using thefollowing steps: µ B ← m m (cid:213) i = X i ; σ B ← m ( X i − µ B ) ;ˆ X i ← X i − µ B (cid:113) σ B + ϵ ; Y i ← γ ˆ X i + β ; (20)where µ B and σ B are the mean and variance of the data in themini-batch.Then a pooling operator operates on individual feature channels,coalescing nearby feature values into one by the application of a suit-able operator such as max pooling from ℜ M × N × K → ℜ M ′ × N ′ × K ′ , i.e., f Pooling : X (cid:55)→ Y , Y ijk = max { Y i ′ j ′ k : i ≤ i ′ < i + p , j ≤ j ′ < j + p } , where p denotes the nearby p regions .To achieve the desired functionality with acceleration, the idea isto find a mapping function F : X ∈ ℜ M × N × K → Y ∈ ℜ M ′′ × N ′′ × K ′ such that it can get the same feature map value Y i given the sameinput feature map X i for any image i . Recall that convolution oper-ation can be viewed as enforcing linear transformation W on theinput feature maps in the fully connected layers, and therefore weaim to build a single convolution operation ( ∗ ) that replaces severalnon-tensor layers by setting a new optimization goal, i.e., ∀ i : Y i = Y i COM ; Y i COM ≃ ˆ W ∗ X i + ˆ b ; (21)While the type and sequence of functions is usually handcrafted,the parameters W and bias b can be learned from our experimentsfor solving a least square problem using SGD, i.e., ( ˆ W ∗ , ˆ b ∗ ) = arдmin ˆ W , ˆ b (cid:213) i ∥ Y i COM − ( ˆ W ∗ X i + ˆ b )∥ , (22)Note here ˆ W ∗ ∈ ℜ d s × d s × K × K ′ , the size of which is quite similar tothe original convolution steps except new kernel size changes to d s × d s by considering pooling operations. A typical example is “Intelligent personal assistant". The ideal per-sonal assistant can talk to you in speech conversation, understandyour words, get information for you and even do something foryou (such as write a letter). The major several products in markets Sum pooling can be similarly done. cience Driven Innovations Powering Mobile Product: Cloud AI vs. Device AI Solutions on Smart Device ACM’18, Dec 2017, USA
Figure 16:
Key component in “personal intelligent assistant”. include Amazon Echo , Google Home , Apple’s Siri , and MicrosoftCortana . For example, Amazon Alexa has been sold more than25 millions since 2015 . The personal assistant can work in thefollowing scenarios: • Online chat (such as some instant message app • Speech and voice recognition • Taking and uploading images.The personal assistant can provide a wide variety of services, suchas providing weather information, playing music from Spotify ,playing videos, buying items from Amazon, completing the cus-tomer service tasks, etc . Technically, personal assistant needs sixmajor components to support all its functionalities, i.e., • Speech Recognition • Speech Synthesis • Natural Language Understanding (NLU) • Chatbot • Dialog system • Recommender Systemwhich will be illustrated in detail in the following subsections.
Essentially, speech recognition component learns a function ˆ д which can automatically label the speech signal to its correspondinglabels (such as transcript) in a structured-input structured-outputway, i.e., ˆ д : speech signal → label of signal (23)The goal is to learn function ˆ д with high accuracy. ˆ д can be modeledusing HMM, LSTM and other deep learning models.The tradition phonetic-based (e.g., HMM-based method) [5] re-quired feature engineer (e.g. n-gram) and separate training com-ponent for pronunciation, acoustic and language model. The ben-efit of the current end-to-end deep learning pipeline essentially https://en.wikipedia.org/wiki/Amazon_Echo https://en.wikipedia.org/wiki/Google_Home https://en.wikipedia.org/wiki/Siri https://en.wikipedia.org/wiki/Cortana https://en.wikipedia.org/wiki/Amazon_Alexa Figure 17:
The trigged functionalities provided by the most popu-lar personal assistant: Amazon Alexa. The Alexa and app picturesare obtained from internet. jointly learns different components of speech recognizer, whichfacilitates the deep learning training and deployment on mobiledevice. Attention-based automatic speech recognition models (a.k.a"Listen, Attend and Spell") [3] based on deep neural network canliterally listen to the acoustic signal, pay attention to different partsof the signal and spell out the transcript one character at a time.From product perspective, the automatic speech recognition(ASR) module needs the support of the far-field technology, speakeradaption, noise filtering techniques [25] to make the ASR systemwork in practice. For example, far-field E (electric) and B (magnetic)field strength decreases inversely with distance from the source,resulting in an inverse-square law for the radiated power intensityof electromagnetic radiation.
Speech synthesis [36] aims to synthesize the human speech fromthe textual descriptions or from symbolic linguistic representations.A typical text-to-speech (TTS) system (or "engine") [4] has severalcomponents, and the learned function ˆ h can automatically generatethe speech accurately given the input text, i.e., ˆ h : text characters → speech (24)The front end first performs tokenization and normalization,which converts raw text (including numbers and abbreviations) intothe written-out words. Then each word was assigned the phonetictranscription, which divides the text into prosodic units such asclauses, phrases, and sentences. This actually finishes the text-to-phoneme conversion. After this process, the output is the symboliclinguistic representation consisting of phonetic transcriptions andprosody information.The next thing concerns how to convert the symbolic linguisticrepresentation into sound. The general way is to compute the targetprosody (pitch contour, phoneme durations) on the output speech.It seems the process is very complicated. Fortunately many APIsfrom big giants (e.g., Apple Siri, AT&T) have been offered to accel-erate the development process. Natural language understanding module processes the natural lan-guage input using disassembling and parsing techniques [9]. Given
CM’18, Dec 2017, USA Deguang Kong the utterance, the system needs to identify the proper name, part-of-speech (POS), named entity and finally parses it into the object andpredicator sets. Natural language understanding is performed fromsyntax level, semantic level to pragmatic level of understanding inlinguistic analysis, which lays foundation for understanding thesentiment and emotions [28], uncovering insights from structuredand unstructured data. The three most important features desiredfor NLU are: • Proper name identification: identify the “proper name" fromutterance • Part of speech tagging: labelling the part of speech as acategory of words that have similar grammatical properties.For example, in english, the labels are noun, verb, adjective,adverb, pronoun, preposition, conjunction, interjection, etc. • Syntactic/semantic parser: taking input data and building thedata structure - in the format of parse tree, such as abstractsyntax tree or other hierarchical structure to represent theinput and check the correctness of syntax or semantics.
Chatbot is a computer program that can conduct the conversationusing textual or audio methods with human being. The chatbotsimulates how a human would behave in a conversation, which iswidely used in a dialogue system or customer service for practicalpurpose. The simple chatbot uses keyword-based matching for theinput, and then queries the answers from database using keywordmatching from the prior knowledge database. Nowadays, more so-phisticated natural language processing techniques (e.g., recurrentneural networks/LSTM) are used in current Chatbot systems [34].
The dialog system [30] intends to have conversations with humanlike a conversion agent from employing text, speech, graphics,haptics, gestures and others for communication.One typical dialog system using reinforcement learning firsttriggers the language understanding component to understand theuse input, and then dialog manager queries the dialog policy, e.g., a = π ( s ) (25)where a is the action given the current sate s distribution π ( s ) , thenit collects the rewards using L = ( s , a , r , s ′ ) , (26)where r is the reward and s ′ is the state after transition. The Qfunction can be induced by applying Q-learning updates over minibatches such that Q -function is optimized usingmax Q ( s , a ) , (27)before making a decision on the dialog policy. After the dialoguepolicy is determined, the dialog system manages the general flowof the conversation based on the history and state of the dialog,and produces output using the output generator, including naturallanguage generator and trigger text-to-speech engine (TTS) as well.When the Q -function is learned, to achieve the better performance,using multilayer deep learning (e.g., GoogLeNet [35]) networks, wein fact, adopt deep reinforcement learning [26] to build the morerobust and accurate system. A typical way is to incorporate more information into both Itemfeatures and user feature modeling process, such as the contentinformation of the items (including title, artists, genre, year, etc),user demographics (including income, age, gender), geo-location,social network profile and other relevant features. Given millionsof features and data samples, a natural way is using distributed bigmachine learning framework to derive the corresponding solutionsusing “learning to ranking" functions, i.e., ˆ f : ( visit, item title, user income,... ) → likelihood of users’ like or dislikeGoogle implemented this type of recommendation system usingdeep and wide recommendation [2]. Summary
Imagine that one can easily build his own personalassistant, given the far-field recognition technique on device, Ama-zon cloud service (AWS), state-of-the-art speech recognition andnatural language understanding techniques, chatbot and dialogsystem, recommender techniques. It is the great opportunity foreveryone.
This paper presents our research efforts on mobile data science,which provides a scientific approach to drive innovations on dif-ferent mobile AI applications on both cloud side and device side.The paper presents very detailed case studies regarding how toapply machine learning and optimization techniques to solve thereal-world challenging problem on mobile devices. We are deliver-ing more intelligent mobile innovations for mobile AI applications.Stay tuned for our future works.
Acknowledgement
The majority of paper contents are basedon the author’s published papers. Thanks for all co-authors whohave contributed to this work as listed in references, and I reallyappreciate their strong support and help. Any opinions, findingsor recommendations expressed in this material are those of theauthors and do not necessarily reflect the views of any company.Any plagiarism of this work is forbidden (especially for bad guys).
REFERENCES [1] Lei Cen, Deguang Kong, Luo Si, and etc. 2015. Mobile App Security Risk Assess-ment: A Crowdsourcing Ranking Approach from User Comments. In
Proceedingsof the 2015 SIAM International Conference on Data Mining, Vancouver, BC, Canada,April 30 - May 2, 2015 . 658–666. https://doi.org/10.1137/1.9781611974010.74[2] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra,Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, RohanAnil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah.2016. Wide & Deep Learning for Recommender Systems.
CoRR abs/1606.07792(2016). arXiv:1606.07792 http://arxiv.org/abs/1606.07792[3] Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, andYoshua Bengio. 2015. Attention-based Models for Speech Recognition. In
Pro-ceedings of the 28th International Conference on Neural Information Process-ing Systems - Volume 1 (NIPS’15) . MIT Press, Cambridge, MA, USA, 577–585.http://dl.acm.org/citation.cfm?id=2969239.2969304[4] Thierry Dutoit. 1997.
An Introduction to Text-to-speech Synthesis . Kluwer Aca-demic Publishers, Norwell, MA, USA.[5] Mark Gales and Steve Young. 2007. The Application of Hidden Markov Modelsin Speech Recognition.
Found. Trends Signal Process.
1, 3 (Jan. 2007), 195–304.https://doi.org/10.1561/2000000004[6] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Genera-tive Adversarial Nets. In
Advances in Neural Information Processing Systems27 , Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Wein-berger (Eds.). Curran Associates, Inc., 2672–2680. http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf cience Driven Innovations Powering Mobile Product: Cloud AI vs. Device AI Solutions on Smart Device ACM’18, Dec 2017, USA [7] Jianping He, Bin Liu, Deguang Kong, Xuan Bao, and etc. 2016. PUPPIES:Transformation-Supported Personalized Privacy Preserving Partial Image Shar-ing. In . 359–370.https://doi.org/10.1109/DSN.2016.40[8] Bing Hu, Bin Liu, Neil Zhenqiang Gong, Deguang Kong, and etc. 2015. ProtectingYour Children from Inappropriate Content in Mobile Apps: An Automatic Matu-rity Rating Framework. In
Proceedings of the 24th ACM International Conference onInformation and Knowledge Management, CIKM 2015, Melbourne, VIC, Australia,October 19 - 23, 2015 . 1111–1120. https://doi.org/10.1145/2806416.2806579[9] Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon. 2001.
Spoken Language Pro-cessing: A Guide to Theory, Algorithm, and System Development (1st ed.). PrenticeHall PTR, Upper Saddle River, NJ, USA.[10] Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: AcceleratingDeep Network Training by Reducing Internal Covariate Shift. In
Proceedingsof the 32nd International Conference on Machine Learning (ICML-15) , David Bleiand Francis Bach (Eds.). JMLR Workshop and Conference Proceedings, 448–456.http://jmlr.org/proceedings/papers/v37/ioffe15.pdf[11] Deguang Kong. 2017. Context aware recommendation via bilinear tensor factor-ization. In
Technical Report .[12] Deguang Kong and Lei Cen. 2017. Mobile App Risk Assessment using Multi-modal Learning. In
Technical Report .[13] Deguang Kong, Lei Cen, and etc. 2015. AUTOREB: Automatically Understandingthe Review-to-Behavior Fidelity in Android Applications. In
Proceedings of the22nd ACM SIGSAC Conference on Computer and Communications Security, Denver,CO, USA, October 12-6, 2015 . 530–541. https://doi.org/10.1145/2810103.2813689[14] Deguang Kong and Chris H. Q. Ding. 2013. Efficient Algorithms for SelectingFeatures with Arbitrary Group Constraints via Group Lasso. In .379–388. https://doi.org/10.1109/ICDM.2013.168[15] Deguang Kong and etc. 2015. Towards Permission Request Prediction on MobileApps via Structure Feature Learning. In
Proceedings of the 2015 SIAM InternationalConference on Data Mining, Vancouver, BC, Canada, April 30 - May 2, 2015 . 604–612.https://doi.org/10.1137/1.9781611974010.68[16] Deguang Kong, Ryohei Fujimaki, Ji Liu, Feiping Nie, and Chris H. Q. Ding.2014. Exclusive Feature Learning on Arbitrary Structures via \ell_{1, 2}-norm. In
Advances in Neural Information Processing Systems 27: AnnualConference on Neural Information Processing Systems 2014, December 8-132014, Montreal, Quebec, Canada . 1655–1663. http://papers.nips.cc/paper/5631-exclusive-feature-learning-on-arbitrary-structures-via-ell_12-norm[17] Deguang Kong, Ji Liu, Bo Liu, and Xuan Bao. 2016. Uncorrelated Group LASSO.In
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February12-17, 2016, Phoenix, Arizona, USA.
The 19th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, KDD2013, Chicago, IL, USA, August 11-14, 2013 . 1357–1365. https://doi.org/10.1145/2487575.2488219[19] Deguang Kong and Guanhua Yan. 2014. Transductive malware label propagation:Find your lineage from your neighbors. In . 1411–1419. https://doi.org/10.1109/INFOCOM.2014.6848075[20] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classifi-cation with Deep Convolutional Neural Networks. In
Advances in Neural Infor-mation Processing Systems 25 , F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Wein-berger (Eds.). Curran Associates, Inc., 1097–1105. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf[21] Bo Li, Yining Wang, Aarti Singh, and Yevgeniy Vorobeychik. 2016.Data Poisoning Attacks on Factorization-Based Collaborative Fil-tering. In
Advances in Neural Information Processing Systems 29 ,D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett(Eds.). Curran Associates, Inc., 1885–1893. http://papers.nips.cc/paper/6142-data-poisoning-attacks-on-factorization-based-collaborative-filtering.pdf[22] Dawei Li and Mooi Choo Chuah. 2015. EMOD: an efficient on-device mobile visualsearch system. In
Proceedings of the 6th ACM Multimedia Systems Conference .ACM, 25–36.[23] Dawei Li, Theodoros Salonidis, Nirmit V Desai, and Mooi Choo Chuah. 2016.DeepCham: Collaborative Edge-Mediated Adaptive Deep Learning for MobileObject Recognition. In
Edge Computing (SEC), IEEE/ACM Symposium on . IEEE,64–76.[24] Dawei Li, Xiaolong Wang, and Deguang Kong. 2018. DeepRebirth: AcceleratingDeep Neural Network Execution on Mobile Devices. In
AAAI .[25] Jinyu Li, Li Deng, Yifan Gong, and Reinhold Haeb-Umbach. 2014. An Overviewof Noise-robust Automatic Speech Recognition.
IEEE/ACM Trans. Audio, Speechand Lang. Proc.
22, 4 (April 2014), 745–777. https://doi.org/10.1109/TASLP.2014. 2304637[26] Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, and Dan Ju-rafsky. 2016. Deep Reinforcement Learning for Dialogue Generation.
CoRR abs/1606.01541 (2016). arXiv:1606.01541 http://arxiv.org/abs/1606.01541[27] Ninghui Li, Min Lyu, Dong Su, and Weining Yang. 2016.
Differential Privacy:From Theory to Practice . Morgan & Claypool Publishers. https://doi.org/10.2200/S00735ED1V01Y201609SPT018[28] Percy Liang, Alexandre Bouchard-Côté, Dan Klein, and Ben Taskar. 2006. AnEnd-to-end Discriminative Approach to Machine Translation. In
Proceedings ofthe 21st International Conference on Computational Linguistics and the 44th AnnualMeeting of the Association for Computational Linguistics (ACL-44) . Associationfor Computational Linguistics, Stroudsburg, PA, USA, 761–768. https://doi.org/10.3115/1220175.1220271[29] Bin Liu, Deguang Kong, Lei Cen, Neil Zhenqiang Gong, and etc. 2015. Person-alized Mobile App Recommendation: Reconciling App Functionality and UserPrivacy Preference. In
Proceedings of the Eighth ACM International Conference onWeb Search and Data Mining, WSDM 2015, Shanghai, China, February 2-6, 2015 .315–324. https://doi.org/10.1145/2684822.2685322[30] Ryan Lowe, Nissan Pow, Iulian Serban, and Joelle Pineau. 2015. The UbuntuDialogue Corpus: A Large Dataset for Research in Unstructured Multi-TurnDialogue Systems.
CoRR abs/1506.08909 (2015). arXiv:1506.08909 http://arxiv.org/abs/1506.08909[31] Daniel F. Mccaffrey, Greg Ridgeway, and Andrew R. Morral. 2004. PropensityScore Estimation with Boosted Regression for Evaluating Causal Effects in Ob-servational Studies.
Psychological Methods
Advances in Neural Information Process-ing Systems 26: 27th Annual Conference on Neural Information ProcessingSystems 2013. Proceedings of a meeting held December 5-8, 2013, LakeTahoe, Nevada, United States. .[34] Iulian Vlad Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang,Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath Chan-dar, Nan Rosemary Ke, Sai Mudumba, Alexandre de Brébisson, Jose Sotelo, DendiSuhubdy, Vincent Michalski, Alexandre Nguyen, Joelle Pineau, and Yoshua Ben-gio. 2017. A Deep Reinforcement Learning Chatbot.
CoRR abs/1709.02349 (2017).arXiv:1709.02349 http://arxiv.org/abs/1709.02349[35] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, DragomirAnguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015.Going Deeper with Convolutions. In
Computer Vision and Pattern Recognition(CVPR) . http://arxiv.org/abs/1409.4842[36] Paul Taylor. 2009.
Text-to-Speech Synthesis (1st ed.). Cambridge University Press,New York, NY, USA.[37] Lam Tran, Deguang Kong, and Ji Liu. 2016. Privacy-CNH: A Framework to DetectPhoto Privacy with Convolutional Neural Network using Hierarchical Features.In
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February12-17, 2016, Phoenix, Arizona, USA. . 596–603. https://doi.org/10.1109/FG.2017.75[40] Wei Yang, Deguang Kong, Tao Xie, and Carl A. Gunter. 2017. Malware Detectionin Adversarial Se ings: Exploiting Feature Evolutions and Confusions in AndroidApps. In