[PDF] Algorithmic Social Intervention

Abstract

Social and behavioral interventions are a critical tool for governments and communities to tackle deep-rooted societal challenges such as homelessness, disease, and poverty. However, real-world interventions are almost always plagued by limited resources and limited data, which creates a computational challenge: how can we use algorithmic techniques to enhance the targeting and delivery of social and behavioral interventions? The goal of my thesis is to provide a unified study of such questions, collectively considered under the name "algorithmic social intervention". This proposal introduces algorithmic social intervention as a distinct area with characteristic technical challenges, presents my published research in the context of these challenges, and outlines open problems for future work. A common technical theme is decision making under uncertainty: how can we find actions which will impact a social system in desirable ways under limitations of knowledge and resources? The primary application area for my work thus far is public health, e.g. HIV or tuberculosis prevention. For instance, I have developed a series of algorithms which optimize social network interventions for HIV prevention. Two of these algorithms have been pilot-tested in collaboration with LA-area service providers for homeless youth, with preliminary results showing substantial improvement over status-quo approaches. My work also spans other topics in infectious disease prevention and underlying algorithmic questions in robust and risk-aware submodular optimization.

Full PDF

TThesis Proposal: Algorithmic Social Intervention

Bryan WilderDepartment of Computer Science and Center for Artiﬁcial Intelligence in SocietyUniversity of Southern [email protected]

Abstract

Social and behavioral interventions are a critical tool for governments and communities totackle deep-rooted societal challenges such as homelessness, disease, and poverty. However,real-world interventions are almost always plagued by limited resources and limited data, whichcreates a computational challenge: how can we use algorithmic techniques to enhance the tar-geting and delivery of social and behavioral interventions? The goal of my thesis is to providea uniﬁed study of such questions, collectively considered under the name “algorithmic socialintervention”. This proposal introduces algorithmic social intervention as a distinct area withcharacteristic technical challenges, presents my published research in the context of these chal-lenges, and outlines open problems for future work. A common technical theme is decisionmaking under uncertainty: how can we ﬁnd actions which will impact a social system in de-sirable ways under limitations of knowledge and resources? The primary application area formy work thus far is public health, e.g. HIV or tuberculosis prevention. For instance, I have de-veloped a series of algorithms which optimize social network interventions for HIV prevention.Two of these algorithms have been pilot-tested in collaboration with LA-area service providersfor homeless youth, with preliminary results showing substantial improvement over status-quoapproaches. My work also spans other topics in infectious disease prevention and underlyingalgorithmic questions in robust and risk-aware submodular optimization.

My research examines how techniques in artiﬁcial intelligence (including optimization, machinelearning, game theory, and social network analysis) can be used to intervene in social and be-havioral systems. Societies around the world face challenges of enormous scale: preventing andtreating disease, confronting poverty, and a range of other issues impacting billions of people. Inresponse, governments and communities deploy interventions addressing these problems (e.g., out-reach campaigns to enroll patients in treatment or educational programs to raise awareness aboutpreventative strategies). However, these interventions are always subject to limited resources andare deployed under considerable uncertainty about properties of the system; deciding manually onthe best way to deploy an intervention is extremely diﬃcult.Motivated by such challenges, the goal of this thesis is to establish a set of algorithmic techniqueswhich confront underlying challenges in the delivery of social and behavioral interventions (acrossboth public health and other areas) and to ﬁeld-test these techniques in socially impactful problemsettings. We refer to this domain as algorithmic social intervention . Social intervention domainsmotivate a range of common technical challenges (see Figure 1 and Section 2 for more details). Mypublished work spans all of these areas, though many interesting open problems remain. Speciﬁcally,I have studied information gathering [51, 52], optimization under uncertainty [55, 49, 50, 54, 53] andadaptive sequential decision making [55, 52]. Together with partners in social work and nonproﬁt1 a r X i v : . [ c s . A I] M a r igure 1: Technical components of algorithmic social intervention and related publications.agencies, I have empirically evaluated two of the resulting algorithms, with pilot tests showingsubstantial improvements over status-quo techniques [58, 52].Speciﬁcally, my research thus far has focused on algorithmic approaches to target and enhanceinterventions in public health settings. One line of work focuses on HIV prevention among homelessyouth, where information about HIV is spread through the youths’ social network. The challenge isselecting inﬂuential peer leaders who will be able to maximize the spread of the resulting diﬀusion. Ihave developed a set of algorithms for selecting peer leaders under uncertainty about the structure ofthe social network and how information propagates [56, 51, 52]. Two of these algorithms have beenpilot-tested with LA-area drop in centers serving homeless youth. These studies show substantialimprovement over the status-quo heuristic used to select peer leaders [52, 58]. Another area isinfectious disease prevention, where the challenge is to target limited intervention resources (e.g.,outreach campaigns to improve treatment uptake) to the population groups which will have thelargest impact on overall disease rates. I developed an algorithm to near-optimally target suchinterventions, with a particular focus on the problem of reducing tuberculosis spread in India [54].In simulation, this algorithm averts over 8,000 cases of tuberculosis per year compared to thestatus-quo policy. Underlying these applications are a number of fundamental technical challenges,related to decision making under uncertainty. Endemic to public health interventions is a lack ofinformation about the system: where the problems lie, how agents interact, and ultimately whatoutcome an intervention will have. Similar challenges arise in social and behavioral interventionsacross numerous contexts. Much of my work formalizes underlying challenges in decision makingunder uncertainty which are motivated by such applications, develops algorithmic solutions, andproves theoretical guarantees on their performance. The ultimate objective is algorithms that2ome with both rigorous theoretical analysis and ﬁeld-tested practical performance. Towards thisend, I have developed algorithms for robust [49] and risk-averse [50] submodular optimization.Submodularity formalizes a natural diminishing returns property which occurs across many settings(including the HIV and tuberculosis prevention settings discussed above), making submodularoptimization under uncertainty an important and natural algorithmic challenge.The remainder of this proposal is organized as follows. Section 2 deﬁnes the area of algorith-mic social intervention in greater detail and expands on the technical challenges common in suchdomains. Sections 3, 4, and 5 survey completed research, divided by focus area: Section 3 coversinﬂuence maximization in the ﬁeld, Section 4 covers submodular optimization under uncertainty,and Section 5 covers infectious disease prevention. Lastly, Section 6 discusses proposed future work. The goal for this thesis is to establish a uniﬁed study of algorithmic social intervention: compu-tational approaches for optimally targeting and enhancing social and behavioral interventions toachieve policy or community-level goals. The aim is to bridge algorithm design, optimization, andmachine learning with practice, ﬁeld deployments, and social impact. Relevant domains are oftencharacterized by the following goals and challenges (though not all may be present in a singledomain): • Interventions are delivered in a preexisting social context composed of many agents with theirown goals and behaviors. The interactions of these agents collectively produce the system’sbehavior. • Agents’ behaviors are not totally determined by the intervention: particular incentives, ser-vices, or rules may be introduced, but then agents make their own decisions in response. • Agents are not perfectly rational, requiring the use of models and techniques from the socialand behavioral sciences to describe behavior. • There are many unknowns: the dynamics and interactions between agents are complex andare not fully speciﬁed by the available data. • Applications often focus on vulnerable or underserved populations.Figure 1 divides the underlying technical challenges of such domains into several stages. Eachstage also lists associated publications. The ﬁrst stage is information gathering . Here, the challengeis to acquire the data needed to optimize the intervention in an eﬃcient manner. For instance,in a social network intervention, it may be necessary to minimize the number of nodes who aresurveyed to obtain edges. The second stage is optimization under uncertainty . Since the availabledata is rarely enough to fully specify the objective function, methods such as robust, stochastic,or risk-aware optimization are necessary. The third stage is adaptive sequential decision making .Once an intervention is in progress, the algorithmic system has the opportunity to interact withthe world, observe the consequences of its decisions, and adjust accordingly. Lastly, a critical partof algorithmic social intervention is to evaluate ﬁeld impact . While typical means of assessment(theoretical analysis, simulation experiments) are important tools, it is critical to validate thealgorithm in a ﬁeld experiment, ideally in comparison to alternate heuristics and algorithms.3

Inﬂuence maximization in the ﬁeld

Inﬂuence maximization is a crucial technique used in preventative health interventions, such as HIVprevention amongst homeless youth. Drop-in centers for homeless youth train a subset of youthas peer leaders who will disseminate information about HIV through their social networks. Thechallenge is to ﬁnd a small set of peer leaders who will have the greatest possible inﬂuence. Whilemany previous algorithms have been proposed for inﬂuence maximization [23, 13, 21, 44], nonefully address the challenges of inﬂuence maximization in a ﬁeld setting. Across public health (andother) settings, agencies will be uncertain about the structure of the social network and how infor-mation propagates. Accordingly, it is necessary to develop algorithms which gather only the mostparsimonious amount of information required to locate inﬂuential seeds and incorporate remainingunknowns into the optimization process. Moreoever, practical algorithms must also handle thecontingencies of real-world deployments; for instance, youth invited to attend an intervention maysimply fail to show up. This line of work develops a series of inﬂuence maximization algorithmswhich address such challenges.

The DOSIM algorithm, developed in [56] performs robust optimization under uncertainty abouthow inﬂuence spreads through the social network. We ﬁrst formalize the inﬂuence maximizationproblem as follows. The youth have a social network represented as a graph G = ( V, E ). Each youthis initially inactive, meaning that they have not received information about HIV prevention. Oncenodes are activated by the intervention, they have a chance to inﬂuence their peers. We model thisprocess through a variant on the classical independent cascade model (ICM) which has been usedby previous work on HIV prevention and better reﬂects realistic time dynamics [57, 55, 58]. Theprocess unfolds over discrete time steps t = 1 ...T , where T is a time horizon. There is a propagationprobability p e for each edge e . When a node becomes active, it attempts to activate each of itsneighbors. Each attempt succeeds independently with probability p e . Activation attempts aremade at each time step until either the neighbor is inﬂuenced or the time horizon is reached. Theobjective is to select a set of K seed nodes at each time step t so that the expected total inﬂuencespread is maximized.The key challenge is that the propagation probabilities p e are not known. We model this as azero-sum game between the inﬂuencer, who selects the seed nodes, and an adversary (nature) whoselects the true p e . The goal of the inﬂuencer is to ﬁnd a strategy which performs near-optimallyregardless of the unknown parameters, that is, their payoﬀ in the game is the ratio of the expectedinﬂuence spread resulting from their chosen seed set to the optimal inﬂuence spread achievable ifthe true parameters were known in advance.The algorithmic challenge is to compute equilibria in this game. However, it is not apparenthow to do so, since both players have extremely large strategy spaces. In particular, nature has an inﬁnite strategy space consisting of all (continuous) choices for the unknown parameters subject tointerval uncertainty, while the inﬂuencer can choose from all possible subsets of seed nodes. In orderto resolve this dilemma, two key technical approaches are used in [55]. First, to handle nature’sinﬁnite strategy space, it is proved that the strategy space may be discretized to a polynomialnumber of points with only arbitrarily small loss. Second, to handle the inﬂuencer’s exponentiallylarge number of actions, a double oracle approach is employed. Double oracle is an approachfor solving large zero-sum games which incrementally builds an equilibrium starting from a smallnumber of strategies. The algorithm proceeds over a series of iterations. At the ﬁrst iteration, each4layer is restricted to a small number of pure strategies. We compute a minimax equilibrium inthis restricted game (e.g., via linear programming) and then ﬁnd each player’s (approximate) bestresponse to current mixed strategy of their opponent. This best response is added to the player’scurrent strategy set, and the algorithm continues to iterate. Convergence to an equilibrium isguaranteed when the best response of each player is already contained in their current strategyset. In [55], we show in simulation that DOSIM results in substantially more robust solutions thatsimply planning based on a set of nominal parameters. In Section 3.2, we also give ﬁeld resultsfrom [58] showing that DOSIM is empirically successful at ﬁnding high-quality seed sets in a realworld pilot study. Previous algorithms for inﬂuence maximization assume that the social network is given explicitlyas input. However, in many real-world domains, the network is not initially known and must begathered via laborious ﬁeld observations. For example, collecting network data from vulnerablepopulations such as homeless youth, while crucial for health interventions, requires signiﬁcant timespent gathering ﬁeld observations [36]. Social media data is often unavailable when access totechnology is limited, for instance in developing countries or with vulnerable populations. Evenwhen such data is available, it often includes many weak links which are not eﬀective at spreadinginﬂuence [6]. For instance, a person may have hundreds of Facebook friends whom they barely know.In principle, the entire network could be reconstructed via surveys, and then existing inﬂuencemaximization algorithms applied. However, exhaustive surveys are very labor-intensive and oftenconsidered impractical [45]. For inﬂuence maximization to be relevant to many real-world problems,it must contend with limited information about the network, not just limited computation .The major informational restriction is the number of nodes which may be surveyed to explorethe network. Thus, a key question is: how can we ﬁnd inﬂuential nodes with a small number ofqueries?

We formalize this problem as exploratory inﬂuence maximization and seek a principledalgorithmic solution, i.e., an algorithm which makes a small number of queries and returns a setof seed nodes which are approximately as inﬂuential as the globally optimal seed set. Each querytargets a given node in the graph and reveals all of that node’s edges. At each step, the algorithmmay either query the neighbor of a previously queried node, or query a uniformly random node inthe graph. Existing ﬁeld work uses heuristics, such as sampling some percentage of the nodes andasking them to nominate inﬂuencers [45]. To our knowledge, no previous work directly addressesthis question from an algorithmic perspective.We show that for general graphs, any algorithm for exploratory inﬂuence maximization may per-form arbitrarily badly unless it examines almost the entire network. However, real world networksoften have strong community structure, where nodes form tightly connected subgroups which areonly weakly connected to the rest of the network [30]. Consequently, inﬂuence mostly propagateslocally. Community structure has been used to develop computationally eﬃcient inﬂuence max-imization algorithms [47, 14]. Here, we use it to design a highly information-eﬃcient algorithm.We make four contributions.

First , we introduce exploratory inﬂuence maximization and showthat it is intractable for general graphs.

Second , we present the ARISEN algorithm, which exploitscommunity structure to ﬁnd inﬂuential nodes.

Third , we show that ARISEN has strong empiricalperformance on an array of real world social networks.

Fourth , we formally analyze ARISEN ongraphs drawn from the Stochastic Block Model (SBM) [16], a widely studied model of communitystructure. We prove that it approximates the optimal inﬂuence if the entire network were knownby querying only a polylogarithmic number of nodes in the network size.We give the main idea behind the algorithm here and defer a formal description to the main5aper [51]. In a graph with community structure, a reasonable strategy is to try and select one seednode from each community (or the K largest communities if we have a budget of K seed nodes).The rationale is that we expect inﬂuence to propagate widely within a given community but onlyto a limited extent between communities. So, multiple seed nodes within a given communitywould be redundant compared to seeding another community entirely. The goal of the ARISENalgorithm is to use a small number of queries to choose seed nodes which are likely to lie in the K largest communities. The underlying approach is as follows. First, we sample a set of prospectiveseed nodes uniformly at random. Then, we use queries to simulate a random walk around eachprospective seed; it can be shown that this random walk will stay (with high probability) within thestarting community. The nodes encountered on this random walk are used to estimate the averagedegree of the community, which is in turn used to estimate the community’s size. Using theseestimates, ARISEN constructs a probability distribution over the prospective seeds and sampleseach actual seed independently at random from this distribution. The main challenge is that wecannot in general tell whether two prospective seed nodes lie in the same community, and so wemust construct a distribution which implicitly leads to seed nodes in diﬀerent communities beingselected. The number of times a given community is sampled as a prospective seed is proportionalto that community’s size. Hence, ARISEN’s probability distribution assigns each prospective seednode weight inversely proportional to its community’s estimated size. This evens out the samplingbias towards large communities and ensures that (in expectation) each of the largest K communitiesis seeded exactly once.We analyze ARISEN theoretically on graphs which are drawn from the Stochastic Block Model(SBM), a common model of community structure. The SBM originated in sociology [16] and latelyhas been intensively studied in computer science and statistics (see e.g. [1, 27, 33]). In the SBM, thenetwork is partitioned into disjoint communities C ....C L . Each within-community edge is presentindependently with probability p w and each between-community edge is present independently withprobability p b . Recall that the Erd˝os-R´enyi random graph G ( n, p ) is the graph on n nodes whereevery edge is independently present with probability p . In the SBM, community C i is internallydrawn as G ( | C i | , p w ) with additional random edges to other communities. While the SBM is asimpliﬁed model, our experimental results show that ARISEN also performs well on real-worldgraphs. ARISEN takes as input the parameters n, p w , and p b , but is not given any prior informationabout the realized draw of the network. It is reasonable to assume that the model parameters areknown since they can be estimated using existing network data from a similar population (in ourexperiments, we show that this approach works well). For instance in HIV prevention, homelessyouth social networks have been shown to exhibit community structure and several studies havegathered networks from which to infer p w and p b [57, 36].We state here a simpliﬁed version of our main theoretical result which captures the intuition.Suppose that the top K communities each have equal size µ , and occupy a linear portion of thenetwork – for concreteness, µK ≥ . n . We have Theorem 1 (Simpliﬁed case) . Under the above conditions, ARISEN can be implemented withapproximation ratio (1 − e . − . − K − o (1)) β ( µ ) using O (log n ) queries. Here, β ( µ ) is a constant which depends on p w and q . We a defer a detailed explanation to thepaper [51] and just note here that β ( µ ) is the fraction of nodes contained in the giant connectedcomponent of an Erd˝os-R´enyi random graph G ( µK, p w · q ). The query cost is chosen so that therandom walk based estimates of each community’s size are accurate with high probability. We em-phasize that only a polylogarithmic number of nodes need be queried, an exponential improvementover exhaustive surveys. The ﬁrst term in the approximation ratio is nearly 1 − /e , up to errorterms which decrease as n and K become large. We show that each of the top K communities is6eeded with probability close to 1 − /e . The second term, β ( µ ), is the fraction of each of the top K communities which can be inﬂuenced by a seed node.In the full paper, simulation results bear out the main conclusion, that ARISEN is able toﬁnd inﬂuential seed nodes with a small number of queries. Experimentally, ARISEN outperformsa range of heuristics and is often able to closely approximate the optimal inﬂuence spread whilequerying 15-20% of the network. Thus far, algorithms for inﬂuence maximization in the ﬁeld have a high barrier to entry: theyrequire a great deal of time to gather the complete social network of the youth, expertise to selectappropriate parameters, and computational power to run the algorithms. None of these are likelyavailable to resource-strained service providers who will ultimately be the ones to deploy inﬂuencemaximization. This paper [52] presents CHANGE, a novel system for inﬂuence maximization toameliorate these diﬃculties. CHANGE draws on the insights used to develop DOSIM and ARISEN,but is tailored to the constraints of a ﬁeld deployment in public health settings. Speciﬁcally,CHANGE is designed to avoid DOSIM’s high computational cost and circumvent some practicaldiﬃculties in deploying ARISEN. Speciﬁcally, it is diﬃcult to use ARISEN’s random walk basedprocedure with homeless youth, because it is often infeasible to locate a sequence of youth at theagency (youth may not be at the agency that day or be otherwise unreachable). Hence, CHANGEshould be thought of as a streamlined, ﬁeld-ready system which draws on a series of insights intoinﬂuence maximization among homeless youth but which lacks theoretical guarantees for somecomponents of the system.CHANGE is easy to deploy, but this simplicity is crucially enabled by a series of insights intothe social structure of homeless youth (which may be useful for other vulnerable populations). Weconducted a pilot test of CHANGE’s performance in a real deployment by a drop-in center servinghomeless youth in a major U.S. city. CHANGE was used to plan a series of interventions designedto spread HIV awareness among the youth.

CHANGE obtained comparable inﬂuence spread tostate of the art algorithms while surveying only 18% of nodes for network data , a ﬁnding which isbacked by additional simulation results.Overall, CHANGE oﬀers a practical, ﬁeld-tested vehicle for deployed inﬂuence maximizationwhich drastically lowers the barrier to entry.

To our knowledge, this is the ﬁrst real-world pilotstudy of a network sampling algorithm for inﬂuence maximization and only the second ever ﬁeldtest of any inﬂuence maximization algorithm.

Overview of algorithmic contributions:

We now summarize how CHANGE handles thechallenges above. A diagram of the agent can be found in Figure 2.First, to address the data gathering challenge, we present an easily deployable sampling protocolwhich randomly selects a small set of youth to interview. For each of these youth, a randomly chosenneighbor is also interviewed. We show that this procedure gathers enough of the network to enablehigh-quality inﬂuence maximization even though it surveys only a small number of nodes directly.Second, to address computational power challenge (which in turn stems from unknown pa-rameters), we present a heuristic for selecting inﬂuence maximization solutions which are robustto uncertainty in the probability p that inﬂuence will spread. We show that this heuristic ﬁndssolutions which obtain approximately 90% of the maximum possible inﬂuence spread under any value for p . Importantly, this heuristic runs in minutes on a laptop, while DOSIM (the previouslyproposed algorithm for this problem) requires hours or even days of time on a high performancecomputing cluster. 7 ctions CHANGE agent

Edges from sampled nodesPeer leaders present/absent Sample node and random neighborSelect peer leaders

Network sampling

Robust parameter choice

Peer leader selection

Observations

Figure 2: The CHANGE agent.Third, we integrate these components with an adaptive greedy algorithm for planning interven-tions and prove the ﬁrst theoretical guarantee for inﬂuence maximization under execution errors.The challenge is that some youth selected as peer leaders may not attend the intervention [55, 58].Our algorithm selects its action with such uncertainties in mind, observes which youth do attend,and then plans the next round using this observation. We prove that it obtains a constant-factorapproximation to the optimal adaptive policy.A detailed presentation of the CHANGE agent can be found in the full paper [52]. However,the next section presents ﬁeld results for both DOSIM and CHANGE from pilot studies carriedout in collaboration with LA-area drop in centers serving homeless youth.

In the pilot tests, trained social workers delivered the

Have You Heard intervention, previouslypublished in the public health literature [36]. The social workers conducted a day-long class withthe selected youth, covering HIV awareness and prevention, and training the youth as peer leadersto communicate with other youth at the agency. Four pilot tests have been conducted so far, eachusing a diﬀerent algorithm to select the peer leaders. Each pilot test used a distinct populationwith its own social network. The four algorithms were DOSIM and CHANGE, introduced above,along with HEALER [57] (a previously developed algorithm for the probem) and degree centrality(DC). DC is the status-quo heuristic used by agencies, and simply picks the highest degree nodesto seed.CHANGE ﬁrst queried a subset of 18% of nodes for network data, while DOSIM, HEALER,and DC received the full network in advance. Three sets of peer leaders were selected by eachalgorithm, with approximately 4 peer leaders in each set. Peer leaders were paid $60. One monthafter the start of the study, we conducted a follow up survey with all of the youth who initiallyenrolled. We asked the youth whether they had received information about HIV prevention from apeer who was part of the study. Youth were paid $20 to respond to the follow up survey. The ﬁeldresults are reported across two separate papers ([58] for HEALER, DOSIM, and DC, and [52] forCHANGE), but both studies used identical protocols. 60-70 participants were recruited for eachstudy.Figure 3 presents the main result: the amount of inﬂuence spread generated by each algorithm.Speciﬁcally, we used the follow up survey to examine the percentage of youth who were not peerleaders who reported that they received information about HIV prevention. We see that the8

HANGE HEALER DOSIM DC020406080100 % n o n - P L r e a c h e d

80 70 71 27

Figure 3: Percentage of non-peer leaders who reported receiving information about HIV in the pilotstudy corresponding to each algorithm.AI-based algorithms (CHANGE, HEALER, DOSIM) do fairly well, reaching 70-80% of non-peerleaders. However, DC performs poorly, reaching only 27% of non-peer leaders. While these resultsare preliminary, they show that there is promise in using algorithmic techniques to enhance inﬂuencemaximization interventions. We also note that CHANGE performs just as well as HEALER andDOSIM despite querying only 18% of the network for links. We do not claim that CHANGE actuallyoutperforms the other two algorithms (the diﬀerence could be caused by small sample sizes or otherexternal factors); however, the close results indicate that it may be possible to ﬁnd inﬂuential nodesusing only a limited amount of network data. More detailed analysis of these results can be foundin [58, 52], where we examine the robustness of these results through simulation, give more detailedexplanations for the diﬀerences between algorithms, and formulate general insights and lessonslearned about inﬂuence maximization in a ﬁeld setting.

Inspired by the challenges of inﬂuence maximization with limited data, this next section considersthe general problem of optimizing a monotone submodular function under uncertainty about thetrue objective. We study the problem from two perspectives (presented in [49] and [50] respectively).First, robust optimization , where the goal is to maximize the worst case from a set of possibleobjectives. Second, risk-averse optimization , where we aim to avoid disastrous outcomes instead ofsimply maximizing expected utility. In both cases, we substantially improve the existing state ofthe art, claims that are borne out both by theoretical guarantees and experimental results.

Let X be a set of items with | X | = n . A function f : 2 X → R is submodular if for any A ⊆ B and i ∈ X \ B , f ( A ∪ { i } ) − f ( A ) ≥ f ( B ∪ { i } ) − f ( B ). We restrict our attention to functions that are monotone , i.e., f ( A ∪ { i } ) − f ( A ) ≥ i ∈ X, A ⊂ X . Without loss of generality, we assumethat f ( ∅ ) = 0 and hence f ( S ) ≥ ∀ S . Let I be a collection of subsets of X . For instance, we couldhave I = { S ⊆ X : | S |≤ k } . In general, we will allow I to be any matroid. The objective is to ﬁnda utility-maximizing element of I .We consider the robust optimization setting where the true objective to be optimized is notknown exactly. Instead, it belongs to an uncertainty set which gives the set of possibilities consistentwith prior knowledge. Let F = { f ...f m } be a ﬁnite set of submodular functions on the ground set9 . We are promised that the true objective belongs to F but do not know which element of F it is.Accordingly, we aim to maximize the minimum value, max S ∈I min f i ∈F f i ( S ). The total number ofobjective functions m may be very large, potentially exponentially large in the size of the groundset n .Since the robust submodular optimization problem is in general inapproximable [25], we considera common relaxation of it to a zero sum game [26, 12]. We would like to ﬁnd a minimax equilibriumof the game where the maximizing player’s pure strategies are the subsets in I , and the minimizingplayer’s pure strategies are the functions in F . The payoﬀ to the strategies S ∈ I and f i ∈ F is f i ( S ). We call a game in this form a submodular best response (SBR) game. For the maximizingplayer, computing the minimax equilibrium is equivalent to solvingmax p ∈ ∆( I ) min f ∈F E S ∼ p [ f ( S )] (1)where ∆( I ) is the set of all distributions over the elements of I . Oftentimes, we will workwith independent distributions over X , which can be fully speciﬁed by a vector x ∈ R n + . x i givesthe marginal probability that item i is chosen. Denote by p I x the independent distribution withmarginals x .The equilibrium computation problem has been studied by Krause et al. [26] and Chen et al.[12] using very similar techniques: both iterate dynamics where the adversary plays a no-regretlearning algorithm and the decision maker plays a greedy best response. This algorithm maintainsa variable for every function in F and so is only computationally tractable when F is small. Bycontrast, we deal with the setting where F is exponentially large, with the objective function arisingfrom an underlying combinatorial structure. In [49], we explore two applications falling into thissetting: a robust budget allocation problem, and security games played on networks. In both cases,our framework leads to the ﬁrst sub-exponential time algorithm for the problem. Here, we juststate the main algorithmic result.We solve Problem 1 under the assumption that there is a best response oracle available forthe adversary, which computes the minimizing function for a given distribution of the maximizingplayer. However, we require only a weaker oracle, which we call an best response to independentdistributions oracle (BRI). A BRI oracle is only required to compute a best response to mixedstrategies which are independent distributions, represented as the marginal probability that eachitem in X appears. Given a vector x ∈ R n + , where x i is the probability that element i ∈ X is chosen,a BRI oracle computes arg min f i ∈F E S ∼ p I x [ f i ( S )]. We use S ∼ x to denote that S is drawn fromthe independent distribution with marginals x . In some domains (e.g., network security games), aBRI oracle is readily available even when the full best response is NP-hard.Our main technical contribution is the EQUATOR algorithm, which computes a (1 − /e ) -approximation to Problem 1, modulo an additive loss of (cid:15) . Crucially, EQUATOR makes onlypolynomially many calls to the BRI, with no direct runtime dependence on |F | . Speciﬁcally,EQUATOR takes time polynomial in n , (cid:15) , and M , where M is an upper bound on the value ofany single item ( M ≥ max f i ∈F ,j ∈ S f i ( { j } )). In general, this results in a pseudopolynomial timealgorithm (since there is polynomial dependence on M ), though M is constant in many cases ofinterest.Since the pure strategy sets can be exponentially large, it is unclear what it even means tocompute an equilibrium: representing a mixed strategy may require exponential space. Our solutionto this dilemma is to show how to eﬃciently sample pure strategies from an approximate equilibriummixed strategy. This suﬃces for the maximizing player to implement their strategy. Alternatively,we can build an approximate mixed strategy with sparse support by drawing a polynomial numberof samples and outputing the uniform distribution over the samples. In order to generate these10

200 400 600 800 1000 n W o r s t c a s e p r o ﬁ t a EQUATORDOGreedy n − R un t i m e ( s ) b EQUATORDOGreedy | L | W o r s t c a s e p r o ﬁ t c EQUATORDOGreedy n R un t i m e ( s ) d EQUATORDOGreedy

Figure 4: Experimental results for budget allocation.samples, EQUATOR ﬁrst solves a continuous optimization problem. This continuous relaxationuses the multilinear relaxations of the the functions in F (we refer the reader to [9] for more detailson the multilinear relaxation). Essentially, the multilinear extension of a submodular function f deﬁnes a continuous function over the hypercube [0 , n which agrees with f at the vertices.EQUATOR optimizes the pointwise minimum of the multilinear extensions of the functions in F and then uses known techniques (see [11]) to round the resulting fractional point to a distributionover integral sets. This continuous optimization problem is non-convex and nonsmooth. We designa novel stochastic Frank-Wolfe algorithm which obtains a (1 − /e )-approximation to the continuousproblem. After the rounding step, we have the following guarantee: Theorem 2.

EQUATOR outputs a set S ∈ I such that min i E [ f i ( S )] ≥ (1 − e ) OP T − (cid:15) withprobability at least − δ . Its runtime is ˜ O (cid:16) T M k n(cid:15) + T k M n(cid:15) log δ (cid:17) where T is the time toperform linear optimization over the convex hull of I and T is the time to compute a gradient. We remark that T is small ( O ( n log n )) in cases of interest such as the k -cardinality constraint,while T is typically dominated by the runtime of the BRI . This theoretical result substantiallyimproves over the current state of the art; no-regret learning based algorithms proposed for thisproblem [26, 12] work only when F is small, while the “double oracle” algorithms often used inpractice [18, 19] may take exponential runtime in the worst case.Figure 4 shows experimental results for a robust budget allocation problem. Budget allocationmodels an advertiser’s choice of how to divide a ﬁnite budget B between a set of advertising channels[41, 42, 3]. Each channel is a vertex on the left hand side L of a bipartite graph. The right hand R consists of customers. Each customer v ∈ R has a value w v which is the advertiser’s expected proﬁtfrom reaching v . In the robust problem, the proﬁts w are not known exactly, instead belonging toan uncertainty set (e.g., based on historical data).We compare EQUATOR to the state of the art double oracle algorithm [8, 18] (DO), whichcomputes a (1 − /e )-approximate solution but takes exponential time in the worst case. We also The ˜ O notation hides logarithmic terms w is chosen as the worst case in the uncertaintyset, with n increasing on the x axis. Figure 4(b) plots the average runtime for each n . We see thatdouble oracle produces highly robust solutions. However, for even n = 500, its execution was haltedafter 10 hours. Greedy is highly scalable, but produces solutions that are approximately 40% lessrobust than double oracle. EQUATOR produces solution quality within 7% of double oracle andruns in less than 30 seconds with n = 1000. In Figure 4(c), we see that both double oracle andEQUATOR ﬁnd highly robust solutions, with EQUATOR’s solution value within 8% of that ofdouble oracle. By contrast, greedy obtains no proﬁt in the worst case for | L | >

20, validating theimportance of robust solutions on real problems. In Figure 4(d), we observe that double oracle wasterminated after 10 hours for n = 500 while EQUATOR scales to n = 1000 in under 40 seconds. Weconclude that EQUATOR is empirically successful at ﬁnding highly robust solutions in an eﬃcientmanner, complementing its theoretical guarantees. Decision-making under uncertainty is an ubiquitous problem. Suppose we want to maximize afunction F ( x , y ), where x is a vector of decision variables and y a random variable drawn froma distribution D . A natural approach is to maximize E y [ F ( x , y )], i.e., to maximize the expectedvalue of the chosen decision. However, decision makers are often risk-averse : they would ratherminimize the chance of having a very low reward than focus purely on the average. This is arational behavior when failure can have large consequences. For instance, if a corporation suﬀersa disastrous loss, they may simply go out of business. Or in many cases, low performance entailssafety issues. For instance, if a sensor network for water contamination detects problems instantlyin 80% cases, but fails entirely in 20%, the population will inevitably be exposed to an unacceptablehealth risk. It is much better to have a sensor network which always detects contaminants, even ifit requires somewhat more time on average.Hence, it is natural to move beyond average-case analysis and optimize a risk-aware objectivefunction. One widespread choice is the conditional value at risk (CVaR). CVaR takes a tunableparameter α . Roughly, it measures the performance of a decision in the worst α fraction of scenarios.It is known that when the objective F is a concave function, then CVaR can be optimized viaa concave program as well. However, many natural objective functions are not concave, andno general algorithms are known for nonconcave functions. We focus on submodular functions.Submodularity captures diminishing returns and appears in application domains ranging from viralmarketing [23], to machine learning [28], to auction theory [46]. We analyze submodular functionsin two settings: Continuous:

Continuous submodularity, which has lately received increasing attention [4, 5, 42]generalizes the notion of a submodular set function to continuous domains. Many well-knowndiscrete problems (e.g., sensor placement, inﬂuence maximization, or facility location) admit naturalextensions where resources are divided in a continuous manner. Continuous submodular functionshave also been extensively studied in economics as a model of diminishing returns or strategicsubstitutes [24, 40]. Our main result is a (1 − e )-approximation algorithm for maximizing theCVaR of any monotone, continuous submodular function. No algorithm was previously known forthis problem. Portfolio of discrete sets:

Our results for continuous submodular functions also transfer to setfunctions. We study a setting where the algorithm can select a distribution over feasible sets, which12s of interest when the aim is to select a portfolio of sets to hedge against risk [35]. This is a similarrelaxation as in the robust setting studied above. We give a black-box reduction from the discreteportfolio problem to CVaR optimization of continuous submodular functions, allowing us to applyour algorithm for the continuous problem. The state of the art for the discrete portfolio setting is analgorithm by Ohsaka and Yoshida [35] for CVaR inﬂuence maximization. Our results are strongerin two ways: (i) they apply to any submodular function and (ii) give stronger approximationguarantee. Allowing the algorithm to select a convex combination of sets is provably necessary:Maehara [32] proved that restricted to single sets, it is NP-hard to compute any multiplicativeapproximation to the CVaR of a submodular set function.In this overview, we focus on the continuous setting; details on the reduction from the discreteportfolio problem to continuous submodular optimization can be found in the full paper [50].Our main contribution is the RASCAL algorithm, which computes a (1 − /e )-approximation tooptimizing the CVaR of a smooth, continuous submodular function (up to an additive loss of (cid:15) ).RASCAL jointly exploits properties of both submodularity and the CVaR to provably approximatethe non-concave maximization problem. We start out by formalizing the problem. Continuous submodularity:

Let X = (cid:81) ni =1 X i be a subset of R n , where each X i is a compactsubset of R . A twice-diﬀerentiable function F : X → R is diminishing returns submodular (DR-submodular) if for all x ∈ X and all i, j = 1 ...n , ∂ F ( x ) ∂x i ∂x j ≤ F only shrinks as x grows, just as the marginal gains of a submodular set function only decreaseas items are added. Continuous submodular functions need not be convex or concave (concavityrequires that the Hessian is negative semi-deﬁnite, not that the individual entries are nonpositive).We consider monotone functions, where F ( x ) ≤ F ( y ) ∀ x (cid:22) y ( (cid:22) denotes element-wise inequality).We assume that F lies in [0 , M ] for some constant M . Without loss of generality, we assume F (0) = 0 (normalization).In our setting F is a function of both the decision variables x and a random parameter y .Speciﬁcally, we consider functions F ( x , y ) where F ( · , y ) is continuous submodular in x for eachﬁxed y . We allow any DR-submodular F which satisﬁes some standard smoothness conditions.First, we assume that F is L -Lipschitz for some constant L (for concreteness, with respect to the (cid:96) norm ). Second, we assume that F is twice diﬀerentiable with L -Lipschitz gradient. Third, weassume that F has bounded gradients, ||∇ F || ≤ G . Only the last condition is strictly necessary;our approach can be extended to any F with bounded gradients via known techniques [15]. Conditional value at risk:

Intuitively, the CVaR measures performance in the α worstfraction of cases. First, we deﬁne the value at risk at level α ∈ [0 , α ( x ) = inf { τ ∈ R : Pr y [ F ( x , y ) ≤ τ ] ≥ α } . That is, VaR α ( x ) is the α -quantile of the random variable F ( x , y ). CVaR is the expectation of F ( x , y ), conditioned on it falling into this set of α -worst cases:CVaR α ( x ) = E y [ F ( x , y ) | F ( x , y ) ≤ VaR α ( x )] . CVaR is a more popular risk measure than VaR both because it counts the impact of the entire α -tail of the distribution and because it has better mathematical properties [38]. Optimization problem:

We consider the problem of maximizing CVaR α ( x ) over x belongingto some feasible set P . We allow P to be any downward closed polytope. A polytope is downward We use the (cid:96) norm for concreteness. However, our arguments easily generalize to any (cid:96) p norm. (cid:96) such that x (cid:23) (cid:96) ∀ x ∈ P and for any y ∈ P , (cid:96) (cid:22) x (cid:22) y impliesthat x ∈ P . Without loss of generality, we assume that P is entirely nonnegative with (cid:96) = 0.Otherwise, we can deﬁne the translated set P (cid:48) = { x − (cid:96) : x ∈ P} and corresponding function F (cid:48) ( x , y ) = F ( x − (cid:96) , y ). Let d = max x , y ∈P || x − y || be the diameter of P .We want to solve the problem max x ∈P CVaR α ( x ). It is important to note that CVaR α ( x )need not be a smooth DR-submodular function in x . However, we would like to leverage the niceproperties of the underling F . Towards this end, we note that the above problem can be rewrittenin a more useful form [38]. Let [ t ] + = max( t, α ( x ) is equivalent to solvingmax x ∈P ,τ ∈ [0 ,M ] H ( x , τ ) = τ − α E (cid:2) [ τ − F ( x , y )] + (cid:3) (2)where τ is an auxiliary parameter. For any ﬁxed x , the optimal value of τ is VaR α ( x ) [38]. Itis known that when F ( · , y ) is concave in x , this is a concave optimization problem. However, littleis known when F may be nonconcave.We now introduce the RASCAL (Risk Averse Submodular optimization via Conditional vALueat risk) algorithm for continuous submodular CVaR optimization. RASCAL solves Problem 2,which is a function of both the decision variables x and the auxiliary parameter τ . Roughly, τ should be understood as a threshold maintained by the algorithm for what constitutes a “bad”scenario: at each iteration, RASCAL tries to increase F ( x , y ) for those scenarios y such that F ( x , y ) ≤ τ .More formally, RASCAL is a coordinate ascend style algorithm. Each iteration ﬁrst makes aFrank-Wolfe style update to x . Recall that Frank-Wolfe is a gradient-based algorithm originallydeveloped for concave optimization. However, it can be modiﬁed to maximize continuous submod-ular functions [5]. RASCAL then sets τ to its optimal value given the current x . This approach ismotivated by the unique properties of the CVaR objective H . It can be shown that H is jointlyup-concave in the variable ( x , τ ). However, H is not monotone in τ . Indeed, H is decreasing in τ for τ > VaR α ( x ). The Frank-Wolfe algorithm relies crucially on monotonicity; nonmonotonicity ismuch more diﬃcult to handle.Instead, we exploit a unique form of structure in H . Speciﬁcally, H is monotone in x , but onlyup-concave (not fully concave). Conversely, while H is nonmonotone in τ , we can easily solve theone-dimensional problem max τ ∈ [0 ,M ] H ( x , τ ) for any ﬁxed x (see the full paper for details). Ourapproach makes use of both properties: the Frank-Wolfe update leverages monotone up-concavityin x , while the update to τ leverages easy solvability of the one-dimensional subproblem.In order to make this approach work, two ingredients are necessary. First, we need access to thegradient of H in order to implement the Frank-Wolfe update for x . Unfortunately, H is not evendiﬀerentiable everywhere. We instead present a smoothed estimator SmoothGrad which restoresdiﬀerentiability at the cost of introducing a controlled amount of bias. Second, we need to solvethe one-dimensional problem of ﬁnding the optimal value of τ . We in fact introduce a subroutine SmoothTau which solves a smoothed version of the optimal τ problem. In the end, we obtain thefollowing theoretical guarantee: Theorem 3.

For any (cid:15) > , RASCAL outputs a solution x ∈ P satisfying CVaR α ( x ) ≥ (1 − /e ) OP T − (cid:15) with probability at least − δ . There are K = O (cid:16) L d α(cid:15) + L Gd α (cid:15) (cid:17) iterations, requiring O ( sK ) total evaluations of F , O ( sK ) evaluations of ∇ F , and K calls to a linear optimizationoracle for P . Here, s = O (cid:16) nM (cid:15) log δ log L (cid:15) (cid:17) is the number of samples taken from the underlyingdistribution. Infectious disease prevention

Treatable infectious diseases cause hundreds of thousands of cases of disability and death worldwide.Often, this burden is caused by long-term diseases which are continuously present in the population,as opposed to short-term epidemics like inﬂuenza. For instance, tuberculosis (TB) deaths in Indianumbered over 480,000 in 2014 [48], and even developed nations like the U.S. have observed over395,000 cases of gonorrhea in 2015 [10]. In both cases, many individuals remain undiagnosedalthough treatment is available. Outreach eﬀorts to increase screening can lower disease burden;e.g., the Indian government conducts advertising campaigns for TB awareness. Limited resourcesrequire these campaigns to be carefully targeted at the most eﬀective groups for reducing disease.Targeting is complicated by changing population dynamics, as individuals age and migrate overtime, as well as by uncertainty around disease transmission rates. Oﬃcials currently make suchdecisions by hand as no algorithmic assistance is available.To remedy this situation, we design an algorithm to divide a limited outreach budget betweendemographic groups in order to minimize long term disease prevalence under uncertain populationdynamics. Our approach contrasts with existing algorithms for disease control, which often considerdisease spread between nodes on a static graph [39, 7]. This is a sensible model of short term diseasespread but is less suitable for long-term planning in diseases such as TB or gonorrhea, where peopleare born, die, age, and move [31]. Accounting for changes in the underlying agents is particularlysalient for a policymaker who must divide resources between demographic groups over many yearsto maximize societal long-term health. For instance, India produces 5 year plans to combat TB[37]. Our approach also contrasts with previous work on agent-based disease models [20, 29].Such models may include realistic behaviors, but their complexity usually precludes algorithmicapproaches to ﬁnding the optimal policy in an entire feasible set.An additional challenge, largely unexplored in previous algorithmic work, is that of uncertainty.Data is always limited; policymakers are never sure of exactly how many people are infected in eachgroup, or of the contact patterns between them. In order to impact real world policy, algorithmsfor resource allocation must account for such uncertainties.We introduce a model which both captures underlying agent dynamics and can be solved usingan algorithmic approach in a stochastic setting. We make four main contributions, which are ex-plored in detail in the full paper [54].

First , we present the MCF-SIS model (Multiagent ContinuousFlow-SIS) where disease spreads in a multiagent system with birth, death, and movement. Thesystem evolves according to SIS (susceptible-infected-susceptible) dynamics and is stratiﬁed acrossage groups. This introduces a new problem in multiagent systems: computing the optimal resourceallocation under MFS-SIS, as in the case where an outreach campaign must decide how to dividelimited advertising dollars (or rupees) between the groups.MCF-SIS introduces a continuous, nonconvex, highly nonlinear optimization problem whichcannot be solved by existing methods. Many factors must be accounted for. E.g., between-groupdisease transmission makes focusing on the groups with the most infected agents suboptimal. More-over, agents in a targeted group are not cured instantaneously, so, e.g., to reduce prevalence in agegroup 30, we may need to start targeting resources at age 27. Lastly, we consider a stochasticsetting where parts of the model (contact patterns between agents, the number of infected agentsin each group, etc.) are not known exactly but are drawn from a distribution.Our second contribution shows that optimal allocation in MCF-SIS is a continuous submodular problem. This opens up a novel set of optimization techniques which have not previously been usedin disease prevention. Continuous submodularity generalizes submodular set functions to contin-uous domains. Intuitively, infections averted by spending one unit of treatment resources can nolonger be averted by additional spending, creating diminishing returns.

Continuous submodularity s deliberately enabled by our modeling choices, in particular our shift from the discrete, graph-basedsetting common in previous work [39, 7] to a continuous, population-based model. Our third contribution is a new algorithm called DOMO (Disease Outreach via MultiagentOptimization), which obtains an eﬃcient (1 − /e )-approximation to the optimal allocation. Ouralgorithm builds on a recent theoretical framework for submodular optimization [5]. DOMO’sgeneralization of this framework to the stochastic setting may be of independent interest.Our fourth contribution is to instantiate MCF-SIS in two domains using empirical data whichtakes into account behavioral, demographic, and epidemic trends: ﬁrst, TB spread in India, andsecond, gonorrhea in the United States. DOMO averts 8,000 annual person-years of TB and 20,000person-years of gonorrhea compared to current policy. There are many promising future questions related to algorithmic social intervention. Here, I detailtwo directions in progress.

Oftentimes, ﬁnding the best intervention amounts to performing optimization in a complex model.For instance, epidemiologists have build enormously complicated models of disease spread, whichsimulate the (stochastic) interactions of millions of agents and account for a range of factors. Suchmodels are very faithful to what are believed to be the real-world processes of diseases spread,but suﬀer from very high computational cost and are poorly understood from an optimizationperspective. Hence, researchers interested in optimization (including my work [54]) seek simplerand more tractable models. It is hoped that these simpler models are suﬃciently faithful to realityto yield useful insights, but they will clearly not be as accurate as more detailed simulations.Hence, a natural direction is to pursue better methods for multi-ﬁdelity optimization: usinga simpler model as a guide, or surrogate, to optimize a more complex one. Such methods haverecently attracted interest in machine learning for use in hyperparameter optimization [22], andhave previously been studied in several engineering disciplines [43, 17, 2]. However, previous mod-els suﬀer from a variety of shortcomings in how they treat both the high-ﬁdelity and surrogatemodels. For instance, most do not incorporate stochasticity in the high-ﬁdelity model, where onlynoisy observations of the ground truth are available. This can easily become problematic becauserandomness is ubiquitous in modeling, especially in noisy domains like human interaction. Withrespect to the low-ﬁdelity model, previous work usually assumes black-box access. However, thisneglects the potential advantage that can be gained through exploiting known structure in thesurrogate. For instance, if we were to use the MCF-SIS model introduced in [54] as a surrogatefor a complex disease model, the DOMO algorithm can be used to ﬁnd provably good approximatesolutions.Accordingly, the purpose of this project is to remedy such shortcomings by proposing multi-ﬁdelity optimization methods which naturally incorporate stochasticity and leverage known struc-ture in the surrogate model. The immediate application for such techniques is optimizing policiesfor preventing disease spread, but many other application areas are possible.

In this project, we consider a more nuanced treatment of uncertainty in submodular optimization,which yields improved properties in learning and optimizing from limited data. Suppose that16e wish to maximize a submodular function which is not known exactly. For instance, we mayhave a ﬁnite collection of samples from an unknown distribution and wish to maximize expectedperformance over that distribution. Or, we may have a probabilistic model (e.g., a model ofinﬂuence spread) but do not believe that this model is exactly correct. We can draw samples fromthis model and optimize empirical performance over the samples (the de facto approach in inﬂuencemaximization), but such a process will not incorporate our uncertainty about the true distributionthat the objective is drawn from.In both settings (limited data and model uncertainty), is there a better approach than maxi-mizing empirical performance on the samples? One attractive alternative is distributionally robust optimization. Let the empirical distribution on sample objective functions f ...f n be denoted by ˆ p n .Let D ( p || ˆ p n ) be a divergence measure between another distribution p and the empirical distributionˆ p n (e.g., the χ divergence). The distributionally robust optimization problem is to solvemax S min p : D ( p || ˆ p n ) ≤ ρ E f ∼ p [ f ( S )] . That is, we aim to maximize our worst-case expected performance over all distributions that are“close” to the observed distribution ˆ p n . One advantage of this formulation is that it can be seen asmaximizing a high-probability bound on expected performance. Let D be the unknown distributiongenerating the objective. Given n samples from D , classical arguments (e.g., the Bernstsein bound)show that E f ∼D [ f ( S )] ≥ E f ∼ ˆ p n [ f ( S )] − C (cid:114) Var D [ f ( S )] n where C is a constant (e.g., depending on the probability with which we want the bound tohold). Hence, when the variance is large, we can do better by optimizing the entire term on the right-hand side instead of just empirical performance on the samples (the ﬁrst term). It has recently beenshown [34] that (under some conditions) the distributionally robust problem corresponds exactlyto such a variance-regularized objective. This has led to improved generalization for convex lossfunctions, where distributionally robust optimization remains a convex optimization problem. Thepurpose of this project is to extend distributionally robust techniques to submodular optimization.This will entail the development of new algorithmic tools to deal with the (natively combinatorial)nonconvex problem. However, such development is a very relevant direction for algorithmic socialintervention since objectives in many problems are inferred from limited data or uncertain models. References [1] Emmanuel Abbe and Colin Sandon. Community detection in general stochastic block models:Fundamental limits and eﬃcient algorithms for recovery. In

FOCS , pages 670–688. IEEE, 2015.[2] Natalia M Alexandrov, Robert Michael Lewis, Clyde R Gumbert, Lawrence L Green, andPerry A Newman. Approximation and model management in aerodynamic optimization withvariable-ﬁdelity models.

Journal of Aircraft , 38(6):1093–1101, 2001.[3] Noga Alon, Iftah Gamzu, and Moshe Tennenholtz. Optimizing budget allocation among chan-nels and inﬂuencers. In

WWW , pages 381–388, 2012.174] Francis Bach. Submodular functions: from discrete to continous domains. arXiv preprintarXiv:1511.00394 , 2015.[5] Andrew An Bian, Baharan Mirzasoleiman, Joachim M. Buhmann, and Andreas Krause. Guar-anteed non-convex optimization: Submodular maximization over continuous domains. In

AIS-TATS , 2017.[6] Robert M Bond, Christopher J Fariss, Jason J Jones, Adam DI Kramer, Cameron Marlow,Jaime E Settle, and James H Fowler. A 61-million-person experiment in social inﬂuence andpolitical mobilization.

Nature , 489(7415):295–298, 2012.[7] Christian Borgs, Jennifer Chayes, Ayalvadi Ganesh, and Amin Saberi. How to distributeantidote to control epidemics.

Random Structures & Algorithms , 37(2):204–222, 2010.[8] Branislav Bosansky, Christopher Kiekintveld, Viliam Lisy, and Michal Pechoucek. An ex-act double-oracle algorithm for zero-sum extensive-form games with imperfect information.

Journal of Artiﬁcial Intelligence Research , 51:829–866, 2014.[9] Gruia Calinescu, Chandra Chekuri, Martin P´al, and Jan Vondr´ak. Maximizing a mono-tone submodular function subject to a matroid constraint.

SIAM Journal on Computing ,40(6):1740–1766, 2011.[10] CDC. Reported STDs in the United States., 2015.[11] Chandra Chekuri, Jan Vondrak, and Rico Zenklusen. Dependent randomized rounding viaexchange properties of combinatorial structures. In

FOCS , 2010.[12] Robert Chen, Brendan Lucier, Yaron Singer, and Vasilis Syrgkanis. Robust optimization fornon-convex objectives. In

NIPS , 2017.[13] Wei Chen, Chi Wang, and Yajun Wang. Scalable inﬂuence maximization for prevalent viralmarketing in large-scale social networks. In

KDD , pages 1029–1038. ACM, 2010.[14] Yi-Cheng Chen, Wen-Yuan Zhu, Wen-Chih Peng, Wang-Chien Lee, and Suh-Yin Lee. Cim:community-based inﬂuence maximization in social networks.

ACM Transactions on IntelligentSystems and Technology (TIST) , 5(2):25, 2014.[15] John C Duchi, Peter L Bartlett, and Martin J Wainwright. Randomized smoothing for stochas-tic optimization.

SIAM Journal on Optimization , 22(2):674–701, 2012.[16] Stephen E Fienberg and Stanley S Wasserman. Categorical data analysis of single sociometricrelations.

Sociological methodology , 12:156–192, 1981.[17] Alexander IJ Forrester, Andr´as S´obester, and Andy J Keane. Multi-ﬁdelity optimization viasurrogate modelling. In

Proceedings of the royal society of london a: mathematical, physicaland engineering sciences , volume 463, pages 3251–3269. The Royal Society, 2007.[18] Manish Jain, Vincent Conitzer, and Milind Tambe. Security scheduling for real-world networks.In

AAMAS , 2013.[19] Manish Jain, Dmytro Korzhyk, Ondˇrej Vanˇek, Vincent Conitzer, Michal Pˇechouˇcek, andMilind Tambe. A double oracle algorithm for zero-sum security games on graphs. In

AA-MAS , 2011. 1820] Akshay Jindal and Shrisha Rao. Agent-based modeling and simulation of mosquito-bornedisease transmission. In

Proceedings of the 16th Conference on Autonomous Agents and Mul-tiAgent Systems , pages 426–435. International Foundation for Autonomous Agents and Mul-tiagent Systems, 2017.[21] Kyomin Jung, Wooram Heo, and Wei Chen. Irie: Scalable and robust inﬂuence maximizationin social networks. In

ICDM , pages 918–923. IEEE, 2012.[22] Kirthevasan Kandasamy, Gautam Dasarathy, Junier B Oliva, Jeﬀ Schneider, and Barnab´asP´oczos. Gaussian process bandit optimisation with multi-ﬁdelity evaluations. In

Advances inNeural Information Processing Systems , pages 992–1000, 2016.[23] David Kempe, Jon Kleinberg, and ´Eva Tardos. Maximizing the spread of inﬂuence through asocial network. In

KDD , 2003.[24] Levent Ko¸ckesen, Efe A Ok, and Rajiv Sethi. The strategic advantage of negatively interde-pendent preferences.

Journal of Economic Theory , 92(2):274–299, 2000.[25] Andreas Krause, H Brendan McMahan, Carlos Guestrin, and Anupam Gupta. Robust sub-modular observation selection.

Journal of Machine Learning Research , 9(Dec):2761–2801,2008.[26] Andreas Krause, Alex Roper, and Daniel Golovin. Randomized sensing in adversarial envi-ronments. In

IJCAI , 2011.[27] Florent Krzakala, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zde-borov´a, and Pan Zhang. Spectral redemption in clustering sparse networks.

PNAS ,110(52):20935–20940, 2013.[28] Alex Kulesza and Ben Taskar. Determinantal point processes for machine learning.

Founda-tions and Trends in Machine Learning , 5(2–3):123–286, 2012.[29] Bruce Y Lee, Shawn T Brown, George W Korch, Philip C Cooley, Richard K Zimmerman,William D Wheaton, Shanta M Zimmer, John J Grefenstette, Rachel R Bailey, Tina-MarieAssi, et al. A computer simulation of vaccine prioritization, allocation, and rationing duringthe 2009 h1n1 inﬂuenza pandemic.

Vaccine , 28(31):4875–4879, 2010.[30] Jure Leskovec, Kevin J Lang, Anirban Dasgupta, and Michael W Mahoney. Communitystructure in large networks: Natural cluster sizes and the absence of large well-deﬁned clusters.

Internet Mathematics , 6(1):29–123, 2009.[31] Douglas A Luke and Katherine A Stamatakis. Systems science methods in public health:dynamics, networks, and agents.

Annual review of public health , 33:357–376, 2012.[32] Takanori Maehara. Risk averse submodular utility maximization.

Operations Research Letters ,43(5):526–529, 2015.[33] Elchanan Mossel, Joe Neeman, and Allan Sly. Reconstruction and estimation in the plantedpartition model.

Probability Theory and Related Fields , 162(3-4):431–461, 2015.[34] Hongseok Namkoong and John C Duchi. Variance-based regularization with convex objectives.In

Advances in Neural Information Processing Systems , pages 2975–2984, 2017.1935] Naoto Ohsaka and Yuichi Yoshida. Portfolio optimization for inﬂuence spread. In

WWW ,pages 977–985, 2017.[36] Eric Rice, Eve Tulbert, Julie Cederbaum, Anamika Barman Adhikari, and Norweeta G Mil-burn. Mobilizing homeless youth for HIV prevention.

Health education research , 27(2):226–236,2012.[37] RNTCP. Revised national tuberculosis control programme annual status report. New Delhi,India: Ministry of Health and Family Welfare. http://tbcindia.nic.in/showfile.php?lid=3180 , 2016.[38] R Tyrrell Rockafellar and Stanislav Uryasev. Optimization of conditional value-at-risk.

Journalof risk , 2:21–42, 2000.[39] Sudip Saha, Abhijin Adiga, B Aditya Prakash, and Anil Kumar S Vullikanti. Approximationalgorithms for reducing the spectral radius to control epidemic spread. In

Proceedings of theSIAM International Conference on Data Mining , pages 568–576. SIAM, 2015.[40] Thomas Sampson. Assignment reversals: Trade, skill allocation and wage inequality.

J. Econ.Theory , 163:365–409, 2016.[41] Tasuku Soma, Naonori Kakimura, Kazuhiro Inaba, and Ken-ichi Kawarabayashi. Optimalbudget allocation: Theoretical guarantee and eﬃcient algorithm. In

ICML , pages 351–359,2014.[42] Matthew Staib and Stefanie Jegelka. Robust budget allocation via continuous submodularfunctions. In

ICML , 2017.[43] Guangyong Sun, Guangyao Li, Michael Stone, and Qing Li. A two-stage multi-ﬁdelity opti-mization procedure for honeycomb-type cellular materials.

Computational Materials Science ,49(3):500–511, 2010.[44] Youze Tang, Xiaokui Xiao, and Yanchen Shi. Inﬂuence maximization: Near-optimal timecomplexity meets practical eﬃciency. In

KDD . ACM, 2014.[45] Thomas W Valente and Patchareeya Pumpuang. Identifying opinion leaders to promote be-havior change.

Health Education & Behavior , 2007.[46] Jan Vondr´ak. Optimal approximation for the submodular welfare problem in the value oraclemodel. In

STOC , pages 67–74, 2008.[47] Yu Wang, Gao Cong, Guojie Song, and Kunqing Xie. Community-based greedy algorithm formining top-k inﬂuential nodes in mobile social networks. In

KDD , pages 1039–1048. ACM,2010.[48] WHO. World Health Organization. Tuberculosis country proﬁles. , 2015.[49] Bryan Wilder. Equilibrium computation and robust optimization in zero sum games withsubmodular structure. In

AAAI , 2018.[50] Bryan Wilder. Risk-sensitive submodular optimization. In

Proceedings of the 32nd AAAIConference on Artiﬁcial Intelligence , 2018. 2051] Bryan Wilder, Nicole Immorlica, Eric Rice, and Milind Tambe. Maximizing inﬂuence in anunknown social network. In

AAAI , 2018.[52] Bryan Wilder, Laura Onasch-Vera, Juliana Hudson, Jose Luna, Nicole Wilson, Robin Petering,Darlene Woo, Milind Tambe, and Eric Rice. End-to-end inﬂuence maximization in the ﬁeld,2018.[53] Bryan Wilder, Han Ching Ou, Kayla de la Haye, and Milind Tambe. Optimizing networkstructure for preventative health. In

AAMAS , 2018.[54] Bryan Wilder, Sze-Chuan Suen, and Milind Tambe. Preventing infectious disease in dynamicpopulations under uncertainty. In

AAAI , 2018.[55] Bryan Wilder, Amulya Yadav, Nicole Immorlica, Eric Rice, and Milind Tambe. Unchartedbut not uninﬂuenced: Inﬂuence maximization with an uncertain network. In

AAMAS , 2017.[56] Bryan Wilder, Amulya Yadav, Nicole Immorlica, Eric Rice, and Milind Tambe. Unchartedbut not uninﬂuenced: Inﬂuence maximization with an uncertain network. In

AAMAS , pages740–748, 2017.[57] Amulya Yadav, Hau Chan, Albert Xin Jiang, Haifeng Xu, Eric Rice, and Milind Tambe. Usingsocial networks to aid homeless shelters: Dynamic inﬂuence maximization under uncertainty.In

AAMAS , pages 740–748, 2016.[58] Amulya Yadav, Bryan Wilder, Eric Rice, Robin Petering, Jaih Craddock, Amanda Yoshioka-Maxwell, Mary Hemler, Laura Onasch-Vera, Milind Tambe, and Darlene Woo. Inﬂuencemaximization in the ﬁeld: The arduous journey from emerging to deployed application. In