[PDF] Algorithms for Hiring and Outsourcing in the Online Labor Market

Abstract

Although freelancing work has grown substantially in recent years, in part facilitated by a number of online labor marketplaces, (e.g., Guru, Freelancer, Amazon Mechanical Turk), traditional forms of "in-sourcing" work continue being the dominant form of employment. This means that, at least for the time being, freelancing and salaried employment will continue to co-exist. In this paper, we provide algorithms for outsourcing and hiring workers in a general setting, where workers form a team and contribute different skills to perform a task. We call this model team formation with outsourcing. In our model, tasks arrive in an online fashion: neither the number nor the composition of the tasks is known a-priori. At any point in time, there is a team of hired workers who receive a fixed salary independently of the work they perform. This team is dynamic: new members can be hired and existing members can be fired, at some cost. Additionally, some parts of the arriving tasks can be outsourced and thus completed by non-team members, at a premium. Our contribution is an efficient online cost-minimizing algorithm for hiring and firing team members and outsourcing tasks. We present theoretical bounds obtained using a primal-dual scheme proving that our algorithms have a logarithmic competitive approximation ratio. We complement these results with experiments using semi-synthetic datasets based on actual task requirements and worker skills from three large online labor marketplaces.

Full PDF

aa r X i v : . [ m a t h . O C ] F e b Algorithms for Hiring and Outsourcing in theOnline Labor Market ∗† Aris Anagnostopoulos

Sapienza University of Rome

Carlos Castillo

Universitat Pompeu Fabra

Adriano Fazzone

Sapienza University of Rome

Stefano Leonardi

Sapienza University of Rome

Evimaria Terzi

Boston University

ABSTRACT

Although freelancing work has grown substantially in recent years,in part facilitated by a number of online labor marketplaces, tradi-tional forms of “in-sourcing” work continue being the dominantform of employment. This means that, at least for the time being,freelancing and salaried employment will continue to co-exist. Inthis paper, we provide algorithms for outsourcing and hiring work-ers in a general setting, where workers form a team and contributediﬀerent skills to perform a task. We call this model team formationwith outsourcing . In our model, tasks arrive in an online fashion:neither the number nor the composition of the tasks are knowna-priori. At any point in time, there is a team of hired workerswho receive a ﬁxed salary independently of the work they per-form. This team is dynamic: new members can be hired and exist-ing members can be ﬁred, at some cost. Additionally, some parts ofthe arriving tasks can be outsourced and thus completed by non-team members, at a premium. Our contribution is an eﬃcient on-line cost-minimizing algorithm for hiring and ﬁring team membersand outsourcing tasks. We present theoretical bounds obtained us-ing a primal–dual scheme proving that our algorithms have log-arithmic competitive approximation ratio. We complement theseresults with experiments using semi-synthetic datasets based onactual task requirements and worker skills from three large onlinelabor marketplaces.

ACM Reference Format:

Aris Anagnostopoulos, Carlos Castillo, Adriano Fazzone, Stefano Leonardi,and Evimaria Terzi. 2018. Algorithms for Hiring and Outsourcing in theOnline Labor Market. In

KDD 2018: 24th ACM SIGKDD International Con-ference on Knowledge Discovery & Data Mining, August 19–23, 2018, London,United Kingdom.

ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3219819.3220056 ∗ Because of space limitations, some details and proofs have been omitted, they willappear in the full version of this work. † The research for this work has been supported by the Google Focused ResearchAward “Algorithms for Large-Scale Data Analysis,” the EU FET project MULTIPLEX317532, the ERC Advanced Grant 788893 “Algorithmic and Mechanism Design Re-search for Online MArkets (AMDROMA),” NSF grants CAREER 1253393 and IIS1421759, and La Caixa project LCF/PR/PR16/11110009.Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor proﬁt or commercial advantage and that copies bear this notice and the full cita-tion on the ﬁrst page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior speciﬁc permissionand/or a fee. Request permissions from [email protected].

Self-employment is an increasing trend; for instance, between 10%and 20% of workers in developed countries are self-employed [24].This phenomenon can be partially attributed to business downsiz-ing and employee dissatisfaction, as well as to the existence ofonline labor markets (e.g.,

Guru.com, Freelancer.com ). This trendhas enabled freelancers to work remotely on specialized tasks, andprompted researchers and practitioners to explore the beneﬁts ofoutsourcing and crowdsourcing [14, 15, 17, 22, 25, 28].Although crowdsourcing adoption was driven, at least in part,by the assumption that problems can be decomposed into partsthat can be addressed separately by independent workers, crowd-sourcing results can be improved by allowing some degree of col-laboration among them [20, 26]. The idea of combining collabora-tion with crowdsourcing has led to research on team formation [2–4, 10–12, 16, 18, 19, 21, 27], in which a common thread is the needfor complementary skills, and problem settings diﬀer in aspectssuch as objectives (e.g., load balancing and/or compatibility), con-straints (e.g., worker capacity), and algorithmic set up (online oroﬄine).

Overview of problem setting and assumptions.

We considertasks that arrive in an online fashion and must be completed by as-signing them to one or more workers, who jointly cover the skillsrequired for each task. At any point in time, there is a team ofhired workers who are paid a salary, independently of the workthey perform. This team is dynamic: new members can be hiredand existing members can be ﬁred. Hiring and ﬁring workers isexpensive, which is why companies routinely keep on the payrollskilled workers even if they are temporarily idle; however, theyalso seek to maintain “benching” to a minimum [29]. Outsourcingprovides additional ﬂexibility as some parts of the incoming taskscan be completed by non-team members who are outsourced. Inpractice, outsourcing involves additional costs such as searching,contracting, communicating with, and managing an expert or spe-cialist external to a company [6].Deciding when to hire, ﬁre, and outsource workers is a diﬃcultonline problem with parameters that depend on job market con-ditions and employment regulations. Intuitively: (1) if the cost ofhiring or ﬁring workers is too high, outsourcing becomes prefer-able to hiring; (2) if the cost of outsourcing work relative to salariesof hired workers is too high, hiring becomes preferable to outsourc-ing; and (3) if the workload consists of many repetitions of similartasks, hiring becomes preferable to outsourcing.In this paper, we formulate this as an online cost minimizationproblem, which we call Team Formation with Outsourcing (TFO).e formally deﬁne this problem in Section 2 and solve it in Sec-tions 3 and 4. Despite this being a model and hence not capturingevery aspect of employment decisions in a company, we show howit brings formalism to the intuitions we have outlined, helps un-derstand under which circumstances a combination of hiring andoutsourcing can be cost eﬀective, and motivates experimentationon semi-synthetic data allowing us to cover a broad range of cases,as we show in Section 5.

Algorithmic techniques.

To the best of our knowledge, we arethe ﬁrst to consider this problem and study some of its variants.Our problem turns out to be an original generalization of onlineset cover and online ski rental , two of the most paradigmatic on-line problems. In fact TFO has elements that make it more com-plex; to solve it, an algorithm has to address its various character-istics: (1) it is also online, so decisions should be taken with lim-ited information on the input, but at each step, an entirely new in-stance of the set-cover problem needs to be solved by using hiredand outsourced workers; (2) hired and outsourced workers collab-orate with each other, and this needs to be taken into account; and(3) workers can be hired, ﬁred, hired again, and so on, so one hasto keep track of their status at every point in time.Several natural approaches inspired by online algorithms for theproblems we mentioned previously, fail to provide solutions withtheoretical guarantees. Therefore, we consider an approach intro-duced in the last years for studying complex online problems, the online primal–dual scheme [8]. The idea is to create a sequence ofinteger programs to model the online problem by incrementallyintroducing variables and constraints. We then consider their lin-ear relaxations and their duals to design an online algorithm andwe analyze it by comparing the costs of the primal and the dualprograms as they evolve over time with the arrival of new tasks.This is a powerful approach, which has so far been applied withsuccess to several classical online problems: packing and coveringproblems, ski-rental, weighted caching, k-server among others [5].We refer to [5, 8] for a survey of the applications of the onlineprimal-dual method.Our analysis results in polynomial-time algorithms that havelogarithmic competitive approximation ratios. This means that de-spite the fact that our algorithms work in an online fashion andthey do not have any knowledge of the number and the composi-tion of future tasks, we can guarantee that the cost they will incurwill be, at every time instance, only a logarithmic factor worse thanthe cost incurred by an optimal algorithm that knows the set of re-quests a priori.

Contributions.

The key contributions of our work are: • We formalize TFO: the problem of designing an online cost-minimizingalgorithm for hiring, ﬁring and outsourcing. • We design eﬃcient and eﬀective approximation algorithms forTFO using an online primal–dual scheme, and provide approxi-mation guarantees on their performance. • We experiment on semi-synthetic data based on actual task re-quirements and worker skills from three large online labor mar-ketplaces, testing algorithms under a broad range of conditions. • We provide experimental evidence of the quality of the perfor-mance of online primal–dual algorithms for a complex real-worldproblem. Prior work has performed theoretical analysis mostly for classical or practically motivated online problems [7, 9]. Tothe best of our knowledge, the empirical validation was previ-ously addressed only for the Adwords matching problem [13].We demonstrate that such approaches, even though they arebased on heavy theoretical machinery, can be easily implementedand are eﬃcient in practice.

In this section, we formally describe our setting and problem, andprovide some necessary background.

Skills.

We consider a set S of skills with | S | = m . Skills can be anykind of qualiﬁcation a worker can have or a task may require, suchas video editing , technical writing , or project management . Tasks.

We consider a set of T ∗ tasks (or jobs), J = { J t ; t = , , . . . , T ∗ } , which arrive one-by-one in a streaming fashion; J t is the t th task that arrives. Each task J ∈ J requires a set of skillsfrom S , therefore, J ⊆ S . We use J t to refer to both the task andthe skills that it requires. Workers.

Throughout we assume that we have a set W of n work-ers: W = { W r ; r = , . . . , n } . Every worker r possesses a set ofskills ( W r ⊆ S ), and P ℓ denotes the subset of workers possessing agiven skill ℓ : P ℓ = { r ; ℓ ∈ W r } . Similarly to the tasks, we use W r to denote both the worker and his/her skills.We partition the set of available workers W into the set of work-ers who are hired at time t , denoted by H t , and the set of workerswho are not hired , denoted by F t (we sometimes refer to theseworkers as freelancers , and they can be outsourced for J t ), so that H t ∩ F t = ∅ and W = H t ∪ F t . Coverage of tasks.

Whenever task J t ⊆ S arrives, an algorithmhas to assign one or more workers to it, i.e., a team . We say that J t can be completed or covered by a team Q ⊆ W if for every skillrequired by J t , there exists at least one worker in Q who possessesthis skill: J t ⊆ ∪ W ∈Q W . We assume that for every skill in theincoming task there is at least one worker possessing that skill, soall tasks can be covered. Costs.

Every worker W r potentially can charge the following non-negative, worker-speciﬁc fees: (1) an outsourcing fee λ r , (2) a hiringfee C r , and (3) a salary σ r . Outsourcing fees λ r denote the pay-ment required by a (non-hired) worker when a task is outsourcedto him/her. Note that λ r depends on the worker but does not de-pend on the task. Hiring fees C r reﬂect all expenses associatedto hiring and ﬁring a worker, such as signup bonuses and sever-ance payments. Given that any algorithm commits to pay the ﬁr-ing costs the moment in which it hires a worker, we follow a stan-dard methodology used in online algorithms for caching [8] andaccount for both hiring and ﬁring costs when the worker is hired.Once a worker r is hired, s/he is paid a recurring salary σ r , whichrecurs for every step t that the worker is hired. The above notationis summarized on Table 1. Assumptions.

To avoid making the model overly complicated, weassume that the salary periods are deﬁned by the arriving tasks,this is, there is one task per salary period, and task completiontakes one salary period. A further assumption will be that σ r < able 1: Notation S Set of skills, size m J Set of tasks, size T ∗ T Number of tasks till current time J t The t ’th task arriving J t ℓ = t requires skill ℓ , 0 otherwise W Set of workers, size n . W r ℓ = r possess skill ℓ , 0 otherwise P ℓ Subset of workers possessing skill ℓ C r Hiring fee, paid when worker r is hired λ r Outsourcing fee, paid every time r performs a task σ r Salary paid to a hired worker r λ r , as in practice requesting a single task from an external workerinvolves extra costs [6], which are reduced when the worker ishired (or when an outsourcing arrangement for an external groupof workers to perform a speciﬁc recurring task is done, which isdiﬀerent from the individual outsourcing we discuss here). Finally,we assume λ r < C r + σ r , because otherwise workers would behired and ﬁred for every task. We now deﬁne the problem that we study:

Problem 1 (Team Formation with Outsourcing – TFO).

Thereexists a set of skills S . We have a pool of workers W , where eachworker W r ∈ W is characterized by a subset of skills W r ⊆ S , anoutsourcing cost λ r ∈ R ≥ , a hiring cost C r ∈ R ≥ , and a salarycost σ r ∈ R ≥ . Given a set of tasks J = { J , J , . . . , J T ∗ } , with J t ⊆ S , which arrive in a streaming fashion, the goal is to designan algorithm that, when task J t arrives, decides which workers tohire (paying cost C r + σ r ), keep hired (paying cost σ r ), and outsource(paying cost λ r ), such that all the tasks are covered by the workerswho are hired or outsourced and the total cost paid over all the tasksis minimized. TFO is an online problem: J is revealed one task at a time. Ourgoal is to guarantee that for any input stream J the total cost ofour online algorithm, A LG (J ) , is at most a small factor greaterthan the total cost of the optimal (oﬄine) algorithm that knows J in advance, O PT (J ) . This factor, max J A LG (J )/ O PT (J ) , iscalled the competitive ratio of the algorithm.We solve the TFO problem in Section 4. Because neither the algo-rithm nor its analysis are trivial, we introduce them gradually byﬁrst solving a simpliﬁed version of

TFO , which we describe andsolve in Section 3.

Two special cases of

TFO are

SetCover and

SkiRental . SetCover : the single-task, multiple-skill case. The set coverproblem is an instance of our problem when there is a single task J ⊆ S and for each worker W r , C r = ∞ . Then, as soon as the task J arrives, the algorithm needs to cover all skills in J by selecting aset of workers Q ⊆ W such that Q covers J and Í r ∈Q λ r is min-imized. In this case, our problem can be solved using the greedyalgorithm for the set-cover problem (see [30, Chapter 2]). SkiRental : the single-skill, single-worker case. The skirental problem is an instance of our problem when the sequenceof tasks J consists of a repetition of the same single-skill task J and the workforce W consists of a single worker W r who pos-sesses the same one skill, and has σ r = C r , λ r . Inthis ski-rental version of our problem [23], the question is the fol-lowing: without knowledge of the total number of tasks that willarrive, when should worker W be hired so that the total cost paidto him/her in outsourcing plus hiring fees is minimized?A well-known algorithm for this problem is the following: forevery instance of J t that arrives outsource J t to worker W r as longas: Í tt ′ = λ r < C r . Then, hire the worker when Í tt ′ = λ r ≥ C r . Theabove algorithm achieves a competitive ratio of 2. LUMPSUM

PROBLEM

First, we solve a simpliﬁed version of the

TFO problem, where forevery worker W r the salary is equal to 0 ( σ r = LumpSum , a hired worker W r is paida lump sum of C r the moment s/he is hired and this amount is as-sumed to cover all future work done by the worker. Instead, whena worker W r is outsourced, a payment of λ r is done every times/he performs a task. LumpSum-Heuristic

Algorithm

A natural algorithm for solving the

LumpSum problem combinesideas from

SetCover and

SkiRental as follows: ﬁrst, it starts withno worker being hired and each worker W r is associated with avariable δ r initially set to 0.For any T ∈ { , . . . , T ∗ } , when task J T arrives, the algorithmproceeds as follows: ﬁrst, it identiﬁes J T F to be the set of skills of J T that cannot be covered by already-hired workers. Then, it coversthe skills in J T F using the greedy algorithm for SetCover . This wayit ﬁnds Q T ⊆ W such that Í W r ∈Q T λ r is minimized. Finally, foreach worker W r ∈ Q T , it updates δ r ← δ r + λ r . Worker W r ishired when δ r ≥ C r . Clearly, since there are no salaries there is nomotivation to ﬁre a worker once s/he is hired. LumpSum-Heuristic has arbitrarily bad competitive ratio.

Al-though our experiments (Section 5) demonstrate that the above al-gorithm, which we call

LumpSum-Heuristic , performs quite wellin many practical cases, we can show that its competitive ratiocan be arbitrarily bad. For this, consider an example where W = { W , W } and both workers have the same skill: W = W = { ℓ } .Further assume that λ = λ = + ϵ and C = M , C = M is a large value and ϵ a small one. For a sequence of tasks J = J = . . . = J T ∗ = { ℓ } , it is clear that LumpSum-Heuristic willalways outsource to W until hiring him/her and will incur worst-case cost 2 M , whereas the optimal algorithm pays just C = The above discussion illustrates that to obtain an algorithm withbounded competitive ratio, we need to take into account both theoutsourcing and hiring costs of all workers. To do so, we deploy anonline primal–dual scheme, which drives our algorithm design.

The integer and linear programs.

The ﬁrst step of the primal–dual approach, is to deﬁne an integer formulation for the problem,or each step T ∈ { , . . . , T ∗ } . We assume that the current task isthe T th task and we use the following variables: • x r = W r is hired when task J T arrives; otherwise x r = • f rt = W r is outsourced for performing task J t ; oth-erwise f rt = LumpSum can be formulated as follows:Linear program for

LumpSum :min n Õ r = C r x r + λ r T Õ t = f rt ! subject to: ∀ t = , . . . , T , ℓ ∈ J t : Õ W r ∈ P ℓ ( x r + f rt ) ≥ ∀ t = , . . . , T , r = , . . . , n : x r , f rt ≥ x r , f rt ∈ N ,form the integer program for LumpSum . In this formulation, the ob-jective function sums over all workers the hiring costs (paid if thecorresponding worker has been hired by time t ) and the outsourc-ing cost for the tasks for which the worker has been outsourced.This is the total cost of the solution until the current task J T . Notethat in this formulation of the problem there is no motivation fora worker who is hired to be ﬁred. Therefore, once x r is set to 1, itdoes not change its value to become 0 again.The ﬁrst constraint (1) in the above program is the covering con-straint: it simply enforces that for every skill required for each task,there exists a hired or outsourced worker who has this skill. Thisguarantees that the team selected for each task J t covers all therequired skills. The nonnegativity and the integrality constraints,ensure that the solutions that we obtain from the integer-programformulation can be transformed to a solution to our problem: even-tually, every variable will take the value 0 or 1. To apply the online primal–dual approach, we ﬁrst consider thelinear relaxation of the integer program, which simply drops theintegrality constraints x r , f rt ∈ N . In a solution to this linear pro-gram (LP) each variable takes values in [ , ] . Given this LP, we canwrite its dual as follows:The dual of the linear program for LumpSum :max T Õ t = Õ ℓ ∈ J t u ℓ t subject to: ∀ r = , . . . , n : T Õ t = Õ ℓ ∈ J t ∩ W r u ℓ t ≤ C r (2) ∀ t = , . . . , T , r = , . . . , n : Õ ℓ ∈ J t ∩ W r u ℓ t ≤ λ r (3) ∀ t = , . . . , T , ℓ ∈ J t : u ℓ t ≥ A solution in which some variables take values greater than 1, can be transformedto another feasible solution with lower cost by setting these variables to 1.

Note that at every time t ∈ { , . . . , T } we have such a pair ofprimal–dual formulations. We are now going to use these two for-mulations for designing and analyzing our algorithm. The

LumpSum algorithm:

Next, we present the

LumpSum algorithm,which is designed and analyzed using the primal and the dual lin-ear programs. We assume that task J T , T ∈ { , . . . , T ∗ } , has justarrived and the algorithm must act before task J T + arrives (or thestream ﬁnishes if T = T ∗ ).All the variables used in our algorithm are initialized to 0 beforethe arrival of the ﬁrst task. When task J T arrives the followingsteps are done:1. Let F T and H T represent the workers who are not hired and hired , respectively, at the time that J T arrives. Clearly,when the ﬁrst task arrives ( T = F T = W and H T = ∅ .For T >

1, the values of H T and F T are updated in the laststep (step 10) of the previous round.2. Let J T H = J T ∩ ∪ W r ∈H T W r be the skills from J T that arecovered by already-hired workers and J T F = J T \ J T H .3. For every skill ℓ ∈ J T F let P F ℓ = P ℓ ∩F T be the set of workersin F T such that every worker in P F ℓ has skill ℓ . Also let P F T = ∪ ℓ ∈ J T F P F ℓ be the set of unhired workers who possess at leastone skill that is required and not covered by already-hiredworkers.4. for each W r ∈ P F T : set ˜ x ′ r ← ˜ x r .5. for each skill ℓ ∈ J T F : while Í W r ∈ P ℓ (cid:16) ˜ x r + ˜ f rT (cid:17) < for each W r ∈ P ℓ : ˜ x r ← ˜ x r (cid:16) + C r (cid:17) + nC r for each W r ∈ P ℓ : ˜ f rT ← ˜ f rT (cid:16) + λ r (cid:17) + nλ r for each W r ∈ P F T : set ∆ ˜ x r ← ˜ x r − ˜ x ′ r .7. Set H ′ ← ∅ .8. repeat ρ times: for each W r ∈ P F T with probability ∆ ˜ x r :hire worker W r (set x r ← H ′ ← H ′ ∪ { r } )with probability ˜ f rT :outsource worker W r (set f rT ← for each skill ℓ ∈ J T F : if skill ℓ is not covered:hire worker W r ∈ P F ℓ with minimum cost C r (set x r ← H ′ ← H ′ ∪ { r } )10. H T + ← H T ∪ H ′ , F T + ← W \ H T + .For T =

1, the

LumpSum starts with no worker being hired. Intu-itively, as tasks arrive, the algorithm tries to gauge two quantities:(1) the usefulness of every worker for the task at hand J T and (2)the overall usefulness of every worker for tasks J , . . . , J T . This isdone in step 5, via variables ˜ f rT (for (1)) and ˜ x r (for (2)). In partic-ular, the more useful W r proves over time, the larger the value ˜ x r .Subsequently, in step 8 every worker is outsourced or hired basedon the increase in the values of ˜ f rT and ˜ x r observed in step 5. Fi-nally, for every skill that remains uncovered after step 8 (whichis randomized), LumpSum hires worker W r with the minimum C r that covers the skill. Note that the increase of the variables u ℓ T intep 5 is not required for solving the LumpSum , but it is used in ouranalysis and thus we leave it in the description above.Our analysis requires to set the value of ρ in step 8 to ρ = ln m + ln C ∗ , where C ∗ = max W r ∈W C r .Although one may think that an additive update of variables instep 5 would seem more natural, such an update would introducea Θ ( m ) factor in the competitive ratio. On the other hand, the mul-tiplicative update that we adopt, has the property that the more aworker W r is required over time the higher the increase of the cor-responding variable ˜ x r . This fact, leads us to Theorem 3.1 below. Analysis.

We have the following result for

LumpSum . Theorem 3.1.

LumpSum is an O ( log n ( log m + log C ∗ )) - competi-tive algorithm for the LumpSum problem, where C ∗ = max W r ∈W C r . Running time.

The running time of

LumpSum per task is domi-nated by the execution of steps 5 and 8. For step 5, using binarysearch, the algorithm can determine in O ( log C ∗ ) steps the min-imum increase of ˜ x r and ˜ f rT that makes false the condition ofthe while loop for at least one uncovered skill ℓ . Therefore, therunning time of step 5 is O (cid:16)(cid:12)(cid:12) J T (cid:12)(cid:12) n log C ∗ (cid:17) . Step 8, using a hashtable to store hired workers, can be executed in expected time O ( ρ n ) = O ( n ( log m + log C ∗ )) . Therefore, the expected time re-quired for processing task J T is O (cid:16) n (cid:16) log C ∗ (cid:12)(cid:12) J T (cid:12)(cid:12) + log m (cid:17)(cid:17) . TFO

PROBLEM

In this section, we provide an algorithm for the general versionof

TFO (Problem 1). In contrast with

LumpSum , now after hiringa worker we must pay a salary σ r ≥

0, complicating the problemsigniﬁcantly as it may now be cost-eﬀective to ﬁre workers.

The integer and linear programs for

TFO . Given that work-ers can be hired, then ﬁred and potentially hired again, and soon, we introduce in this new LP the notion of intervals. Theseintervals are used to model periods in which workers are hired I = {{ t a , t b } | t a , t b ∈ N , t a ≤ t b } . Intuitively, an interval is a sub-set of time steps during which an algorithm decides to hire a givenworker. The new LP, (omitted) uses the following variables: • x ( r , I ) with I ∈ I : x ( r , I ) = W r is hired during theentire interval I ; otherwise x ( r , I ) = • f rt : f rt = W r is outsourced for performing J t .It turns out that it is hard to design an approximation algorithmwith proven guarantees using this program, mostly because it ishard to keep track of the costs being paid for every worker whenthe intervals of him/her being hired, outsourced, or idle are ofvariable length. Therefore, we resort to a diﬀerent overall strategy: First, we deﬁne the

Alt-TFO problem, in which the solutions arerestricted such that every worker is hired for ﬁxed-length (worker-speciﬁc) intervals (Section 4.1). Then, we design an algorithm for

Alt-TFO with good competitive ratio (Section 4.2). Finally, we provethat a solution to

Alt-TFO can be transformed to a solution for

TFO , and that any solution of

TFO can be transformed to a feasi-ble solution of

Alt-TFO that is a factor of at most 3 times higher(Section 4.3), obtaining an approximation algorithm for

TFO . Alt-TFO

Problem

The diﬀerence between

Alt-TFO and

TFO is that we restrict the so-lutions of the former to have a speciﬁc structure; whenever worker W r is hired s/he is then ﬁred after η r △ = ⌈ C r / σ r ⌉ time units—independently of whether s/he is used or not in tasks within these η r time units.In this case, every worker W r is associated with a new hiringcost b C r , which is the summation of his/her original hiring cost C r plus the salaries paid to him/her for the η r time units he is hired.Thus, the total hiring cost and salary for an entire interval is C r + η r · σ r ≤ C r + (cid:16) C r σ r + (cid:17) · σ r ≤ C r . We will use b C r △ = · C r .We can now write the LP for Alt-TFO . In addition to the no-tation we discussed in the previous paragraph, we use I t ∈ I todenote the interval that starts at time t . Worker W r has x ( r , I ) = entire interval I . All intervals I for which x ( r , I ) = η r .Linear program for Alt-TFO :min n Õ r = "Õ I ∈I b C r x ( r , I ) + T Õ t = λ r f rt subject to: ∀ t = . . . T , ℓ ∈ J t : Õ W r ∈ P ℓ f rt + Õ I ∈I : t ∈ I x ( r , I ) ! ≥ ∀ t = . . . T , r = . . . n , I ∈ I : x ( r , I ) , f rt ≥ Alt-TFO

Problem

In this section, we design and analyze an algorithm for the

Alt-TFO problem. The similarity between the LPs for

Alt-TFO and

LumpSum (Section 3) translates into a similarity in the algorithms(and their analysis) of the two problems. The key diﬀerence nowis that we need to take care of the ﬁrings.Our algorithm for

Alt-TFO diﬀers from the algorithm for

Lump-Sum in steps 1, 5, 8, and 9, which are changed as follows:1’. Let F T and H T represent the workers who are not hiredand hired, respectively, at the time that J T arrives. Clearly,when the ﬁrst task arrives ( T = F T = W and H T = ∅ . For T >

1, the values of H T and F T are updated in thelast step (step 10) of the previous round and then we removeworkers whose hiring interval ﬁnished in the previous step: F ′ ← { r ∈ W ; x ( r , I T − η r ) = }H T ← H T \ F ′ , F T ← W \ H T for each W r ∈ F ′ : set ˜ x r ← for each skill ℓ ∈ J T F : while Í r ∈ P ℓ (cid:16) ˜ x r + ˜ f rT (cid:17) < for each r ∈ P ℓ : ˜ x r ← ˜ x r (cid:16) + b C r (cid:17) + n b C r for each r ∈ P ℓ : ˜ f rT ← ˜ f rT (cid:16) + λ r (cid:17) + nλ r repeat ρ ( T ) times: for each r ∈ P F T with probability ∆ ˜ x r :hire worker W r (set x ( r , I T ) ← H ′ ← H ′ ∪ { r } )with probability ˜ f rT :utsource worker W r (set f rT ← for each skill ℓ ∈ J T F : if skill ℓ is not covered:outsource worker W r , r ∈ P F ℓ , with minimum cost λ r (set f rT ← ρ ( T ) = ln m + ln λ ∗ + T , where λ ∗ = max W r ∈W λ r . Analysis of

Alt-TFO . Algorithm

Alt-TFO gives a solution withproven theoretical guarantees for

Alt-TFO . As before, the multi-plicative update is needed to obtain this competitive ratio. We havethe following theorem (proof omitted due to space constraints).

Theorem 4.1.

Alt-TFO is an O ( log n ( log m + log λ ∗ + log T ∗ )) -competitive algorithm for the Alt-TFO problem.

TFO

Using

Alt-TFO

Note that any solution output by

Alt-TFO can be transformed intoa feasible solution to the original

TFO problem by setting д rt ← r , t ∈ I for which x ( r , I ) =

1, and д rt ← Alt-TFO and subsequently does thistransformation a its ﬁnal step, the

TFO algorithm.The question is whether

TFO provides a solution with boundedcompetitive ratio for the

TFO problem. We answer this questionaﬃrmatively by showing (1) that the solution of

TFO for the

TFO problem is feasible and has a cost bounded by the cost of

Alt-TFO for the

Alt-TFO problem, and (2) that any solution for the

TFO problem can be turned into a feasible solution to the

Alt-TFO problem at the expense of a small loss in the approximation factor.These two suﬃce to prove that the solution produced by

TFO is agood solution for the

TFO problem. We have the following result(proof omitted due to space constraints):

Theorem 4.2.

TFO is an O ( log n ( log m + log λ ∗ + log T ∗ )) -competitivealgorithm for the TFO problem.

Running time.

Similarly to Section 3, the expected time requiredto process task J T is O (cid:16) n (cid:16) log C ∗ (cid:12)(cid:12) J T (cid:12)(cid:12) + log m + log T (cid:17)(cid:17) . Lower bound.

Note that there is little hope for signiﬁcant im-provement of our theoretical results. In particular, Alon et al. [1]have proven a lower bound of Ω (cid:16) log n log m log log n + log log m (cid:17) on the compet-itiveness of any deterministic algorithm for the unweighted onlineset cover problem . The unweighted online set cover problem, is aspecial case of TFO (and of

LumpSum ) where for each worker W r we have C r = λ r = σ r =

0, and for each task J T we have J T − ∪ { ℓ } , for some skill ℓ ∈ S \ J T − (with J = ∅ ). TFO-Heuristic

Similarly to

LumpSum , we also consider the heuristic

TFO-Heuristic ,which is a generalization of

LumpSum-Heuristic , for general val-ues of σ r . Speciﬁcally, the diﬀerence is that worker W r is hiredwhen δ r ≥ C r + η r · σ r , and is ﬁred after η r tasks (see Sections 3.1and 4.1 for deﬁnitions of δ r and η r ). Note that theoretically TFO-Heuristic may perform arbitrarily bad: the example of Section 3.1 holds for

TFO-Heuristic for small σ r . Yet, in Section 5 we observe that eventhough it does not oﬀer the theoretical guarantees of TFO , it per-forms well in practice.

Table 2: Characteristics of the three source datasets used togenerate workloads for our experiments. Numbers in italicscorrespond to tasks generated for the Upwork dataset, as ex-plained in Section 5.1.

Dataset UpWork Freelancer GuruSkills ( m ) 2,335 175 1,639Workers ( n ) 18,000 1,211 6,119Tasks ( T )

992 3,194... distinct

600 2,939... avg. similarity (Jaccard)

TFO-Adaptive algorithm

As we will see in Section 5, although

TFO gives theoretical guaran-tees for the worst-case performance, in practice some of our otheralgorithms for the

TFO problem may perform better under someinput parameters. Given the low running time of all our solutionapproaches to

TFO , we implemented the

TFO-Adaptive algorithm.This algorithm runs in parallel all the presented methods for solv-ing the

TFO problem (

TFO , TFO-Heuristic , Always-Outsource and

Always-Hire ), and selects at each time the current minimum-cost algorithm to apply to solve the current task, switching be-tween algorithms when it is advantageous. The asymptotic worst-case results hold for the

TFO-Adaptive algorithm as well. Further-more, our experiments (see Section 5) show that it is beneﬁcial tochange the hiring policy even if we pay switching costs.

Our experiments seek to compare the total cost that would be in-curred by companies using diﬀerent algorithms to assign workersto a stream of incoming tasks. We use synthetic datasets represent-ing possible workloads, built using actual task requirements andworker skills from three large online marketplaces. Synthetic data,while having the limitation of not reﬂecting the particular condi-tions of a speciﬁc company, allows us to evaluate the eﬀectivenessof our algorithms under a broad range of conditions. Section 5.1 in-troduces our datasets, Section 5.2 presents results on the

LumpSum problem, and Section 5.3 on the

TFO problem.

We start by introducing our datasets and discussing our choice ofcost parameters for experimentation.

Source datasets.

To create a large pool of tasks from which to sam-ple workloads, we use datasets obtained from three large onlinemarketplaces for outsourcing: UpWork, Freelancer and Guru (theauthors are not associated with any of these services). All three arein the top-30 of traﬃc in their category (“consulting marketplaces”)according to data from Alexa (Feb. 2018), indeed, Freelancer andGuru are respectively number 1 and number 3. General statisticsof these datasets are shown on Table 2.

Worker skills.

The input data that we obtained contain anonymizedproﬁles for people registered as freelancers in these marketplaces. C o s t Tasks

Always OutsourceLumpSum heuristicLumpSumAlways Hire (a)

LumpSum

UpWork C o s t Tasks

Always HireAlways OutsourceTFOTFO adaptiveTFO heuristic (b)

TFO

UpWork C o s t Tasks (c)

LumpSum

Freelancer C o s t Tasks (d)

TFO

Freelancer C o s t Tasks (e)

LumpSum

Guru C o s t Tasks (f)

TFO

Guru

Figure 1: Experimental comparison of algorithms showingtotal cost due to outsourcing, hiring, and paying salaries as afunction of the number of tasks in the input, averaged over100 workloads generated with p = . Left: Algorithms forproblem LumpSum . As expected,

Always-Hire has the small-est cost if the number of tasks is large, however an onlinealgorithm does not know the number of tasks. Our online al-gorithm and its heuristic version (

LumpSum-Heuristic ) showa cost that does not exceed twice of that of

Always-Hire .In contrast,

Always-Outsource has cost proportional to thenumber of tasks. Parameters C r = λ r and T = .Right: Algorithms for problem TFO . Our online algorithm,its heuristic version (

TFO-Heuristic ) and the

TFO-Adaptive have smaller cost than

Always-Outsource and

Always-Hire .The latter diverges rapidly due to salary costs. Parameters C r = λ r , σ r = . λ r and T = . These include their self-declared sets of skills, as well as the aver-age rate that they charge for their services. There is a large varia-tion in the number of skills per worker among datasets, as can beseen in Table 2. Data have been cleaned to remove skills that werenot possessed by any worker and skills that were never requiredby any task. The numbers in Table 2 refer to the clean datasets.

Tasks.

For both Freelancer and Guru we have access to a largesample of tasks commissioned by buyers in the marketplace; theyare included as tasks on Table 2. They correspond to actual tasksbrought to these marketplaces by actual users. These samples are anonymized: we do not know the name of the company commis-sioning them, and there are no timestamps in this data. In the caseof Upwork, we generate synthetic tasks following a data-generationprocedure used in previous work [4]: we remove a small numberof workers (10%), who are excluded from the pool of workers inthe dataset, and then repeatedly sample subsets of them to createtasks, by interpreting the union of their skills as task requirements.

Workloads.

Marketplaces for online work cover a broad range oftasks from graphic design and web development to accounting,administrative assistance, and legal consulting. Except for hugeconglomerates, most ﬁrms will not outsource work across all cat-egories at the same time. The workload-generation process thatwe use has a single parameter p , which we call the coherence pa-rameter of the workload, and works as follows. First, we start witha random task, which we select as pivot. To select the next task,with probability 1 / p we select a random task from the pool of dis-tinct tasks in the dataset and make this task the new pivot, andwith probability 1 − / p we select another task with Jaccard simi-larity at least 0 . p . Each workload stream that we create has 10Ktasks. We also experimented with streams of up to 100K tasks, butwe observed that 10K tasks suﬃce to expose the trends of the algo-rithms that we compare. We believe that in general a large value of p is realistic for a company, as customers would probably procurefrom it services exhibiting a certain coherence; we also evaluateour algorithms for a broad range of values for p .For each dataset and for each coherence parameter that we use,we generated 100 workload streams; the costs that we report in ourexperiments are averages over these 100 workloads. Cost parameters.

We have data about the rates charged by work-ers in each marketplace, which we directly interpret as their out-sourcing costs λ r . However, we do not have their hiring or salarycosts, so we experiment with diﬀerent values for these costs.For hiring costs , which are characterized by C r > λ r , we as-sume they are a multiplicative factor larger than the hiring cost, C r = α r λ r . We performed extensive experiments in which C r var-ied between 1 λ r and 30 λ r , either as a ﬁxed value, or setting α r tobe a random variable distributed uniformly in a small range.For salary costs , we assume that they are a fraction of outsourcecosts, experimenting with values from σ r = λ r /

100 to σ r = λ r / σ r are smaller than outsourcing costs λ r because the latterincludes many costs in which a company incurs when outsourc-ing [6], including: (i) outside-hired consultants are usually morehighly paid per hour/day than regular employees for a company,(ii) there are transaction costs involved in locating and contractingand outsourced worker that do not exist for regular employees, and(iii) there are communication and management costs of handlingsomeone external to a company. LumpSum

Baselines.

We consider two baselines. The

Always-Hire baselinesolves the

SetCover problem for ﬁnding a low-cost set of workersthat cover the task’s uncovered skills and hires them. The

Always-Outsource baseline never hires, instead it outsources to workers that coverthe required skills for the task, by solving a

SetCover probleminstance. esults.

Figure 1 (Left) summarizes our results for

LumpSum forworkloads generated with the UpWork, Freelancer and Guru datasets,depicting total cost as a function of the number of tasks.We observe that under all these workloads the algorithms be-have similarly.

Always-Outsource has cost proportional to thenumber of tasks and is not competitive, its cost is mostly outsidethe range of Figure 1 (Left). As expected,

Always-Hire performsthe best in the long run, because if the number of tasks is large, hir-ing is a dominant strategy; however the online algorithm does notknow the number of tasks. Experimentally, the

LumpSum algorithmhas a cost that does not exceed that of

Always-Hire by more thana factor of 2, across all the scenarios that we tested. We note thatfor short sequences

LumpSum has lower cost; this diﬀerence in thecost can sometimes be an order of magnitude smaller (plots omit-ted for brevity). We also note that although

LumpSum-Heuristic can, theoretically, perform arbitrarily bad, in our experiments itperforms quite well—although worse than the theoretically justi-ﬁed

LumpSum . Variations (plots omitted for brevity).

Figure 1 (Left) is obtainedwith C r = λ r . We do not observe dramatic variations in the resultswhen varying this parameter in the studied range (1 λ r through30 λ r ): LumpSum has a smaller cost than

Always-Outsource . In gen-eral, higher hiring costs mean the number of tasks required beforehiring a worker is larger, the costs of

LumpSum and

Always-Hire are higher, and the advantage for

LumpSum over

Always-Hire fora small number of tasks holds for a longer period of time.In all plots of Figure 1 we use coherence parameter p = p = LumpSum is still better than

Always-Outsource . TFO

Baselines.

As in

LumpSum , we consider baselines

Always-Hire and

Always-Outsource . Additionally, we consider

TFO-Heuristic (deﬁned in Section 4.4), which does not have a theoretical guaran-tee.

Results.

Figure 1 (Right) summarizes our results for

TFO . We ob-serve that

TFO , TFO-Heuristic , and

TFO-Adaptive have the small-est total cost, followed by

Always-Outsource . In contrast, the

Always-Hire strategy has much higher cost due to mounting salary costs. Wealso observe that while

TFO-Heuristic does not oﬀer the theoret-ical guarantees of

TFO , it performs well in practice.

Variations.

Similarly to

LumpSum , varying C r does not bring dra-matic changes, but as C r increases while maintaining workloadcoherence and salary to outsource cost ratios constant, the advan-tage of TFO over

Always-Outsource decreases, and for large hir-ing costs

Always-Outsource has the smallest cost (plots omittedfor brevity). Concretely, for p =

100 and σ r = λ r /

10, if we varythe hiring cost C r (from 1 λ r to 30 λ r ), the total cost of TFO re-mains less or equal than the total cost of

Always-Outsource until C r = λ r , when the cost of TFO becomes larger than the costof

Always-Outsource for the workload generated using the Gurudataset. The corresponding values of C r for workloads generatedwith Freelancer and Upwork data are C r = λ r and C r = λ r C oh e r e n ce ( p ) Salary-to-Outsource Ratio (a) UpWork:

TFO vs.

Always-Outsource C oh e r e n ce ( p ) Salary-to-Outsource Ratio (b) UpWork:

TFO-Adaptive vs.

Always-Outsource C oh e r e n ce ( p ) Salary-to-Outsource Ratio (c) Freelancer:

TFO vs.

Always-Outsource C oh e r e n ce ( p ) Salary-to-Outsource Ratio (d) Freelancer:

TFO-Adaptive vs.

Always-Outsource C oh e r e n ce ( p ) Salary-to-Outsource Ratio (e) Guru:

TFO vs.

Always-Outsource C oh e r e n ce ( p ) Salary-to-Outsource Ratio (f) Guru:

TFO-Adaptive vs.

Always-Outsource

Figure 2: (Best seen in color.) Left: ratio of the cost achievedby

TFO and

Always-Outsource . Right: ratio of the costachieved by

TFO-Adaptive and

Always-Outsource . Coherenceparameter p varies from 0.02 to 0.24; salary-to-outsource ra-tio varies from 20 to 200; the number of tasks is 10K. Col-ors represent the ratio of costs: blue (dominant towardsthe bottom-left) indicates the region where our algorithms TFO and

TFO-Adaptive have smaller cost, while red indicatesthe region where the baseline

Always-Outsource has smallercost. In the white region both algorithms have similar costs. respectively. As expected, if the hiring costs are suﬃciently large,

Always-Outsource becomes a dominant strategy.Figure 2 (Left) compares

TFO and

Always-Outsource experi-mentally by varying the coherence parameter p from 20 to 200 and σ r from λ r /

50 to λ r /

4. We observe that less coherent workloadsand high salaries make hiring more expensive;

Always-Outsource then becomes a dominant strategy. Figure 2 (Right) shows the powerof the

TFO-Adaptive algorithm. We observe that it performs equalor better than

Always-Outsource for all the range of parameters.

Performance.

Our code, which will be released with this paper,is a relatively straightforward mapping of the algorithm to simplecounters. Written in Java, it requires about 5 to 8 seconds on av-erage to process 10K incoming tasks using commodity hardware.e remark that, although our formulation is a linear program, themethod does not involve solving the linear program, instead, weobtain the solution using the speciﬁc primal–dual method that wehave described and analyzed.

To the best of our knowledge, we are the ﬁrst to introduce andsolve the Team Formation with Outsourcing (

TFO ) problem. How-ever, our work is related to existing work on crowdsourcing, teamformation, and online algorithms design, which we outline next.

Crowdsourcing.

Among the extensive literature in crowdsourc-ing, the most related to ours is the work of Ho and Vaughan [13].Their goal is to assign individual workers to tasks, based on theworkers’ skills. Although Ho and Vaughan also deploy the primal–dual technique to solve the task-assignment problem, the tasksthey consider can be performed by individual workers and not byteams. Thus, both their problem and their algorithm is diﬀerentfrom ours.

Team formation.

A large body of work in team formation con-siders the following problem: given a social or a collaboration net-work among the workers and a set of skills that needed to be cov-ered, select a team of experts that can collectively cover all the re-quired skills, while minimizing the communication cost betweenthe team members [2, 4, 10, 11, 16, 18, 19, 27]. Other variants ofthis problem have also considered optimizing the cost of recruit-ing promising candidates for a set of pre-deﬁned tasks in an of-ﬂine fashion [12] and minimizing the workload assigned to eachindividual team member [3, 21].Although the concept of set-cover is common between our workand previous work, the framework we propose on this paper is dif-ferent in multiple dimensions. First, we do not focus on optimizingthe communication cost; in fact we do not assume any networkamong the individual workers. Our goal is to minimize the overallcost paid on hiring, outsourcing, and salary costs. This diﬀerence inthe objectives leads to diﬀerent (and new) optimization problemsthat we need to solve. Secondly, most of the work above focuseson the oﬄine version of the team-formation problem, where thetasks to be completed are a-priori known to the algorithm. The ex-ception is the work of Anagnostopoulos et al. [3, 4]. However, intheir setting they aim to distribute the workload as evenly as pos-sible among the workers, while our objective is to minimize theoverall cost of maintaining a team that can complete the arrivingtasks. Moreover, the option of outsourcing that we propose is newwith respect to the team formation literature. Finally, in the designof our online algorithms we use the primal–dual framework, whichwas not the case for previous work on online team formation.

Primal–dual algorithms for online problems.

The algorithmswe design for our problems use the primal–dual technique. A thor-ough analysis on the applicability of this technique for online prob-lems can be found in the book by Buchbinder and Naor [8] and in[5]. Probably the most closely related to problem are the ski-rental and the set cover problems. We have already discussed the con-nection of

TFO to ski-rental and set cover in Section 2. One canalso draw the analogy with caching; one can think that bringinga page to the main memory is analogous to hiring a person. Themain diﬀerences are that in the typical caching problem we do not have covering constraints, there are no recurring costs for keep-ing pages in the cache, and there is a ﬁxed limit on the number ofpages we can insert in the cache.

In this paper, we introduced and studied

Team Formation with Out-sourcing . We showed that hiring, ﬁring, and outsourcing decisionscan be taken by an online algorithm leading to cost savings withrespect to alternatives. These cost savings are more striking when(1) the hiring and salary costs are low, because then hiring becomesan attractive option; (2) the tasks exhibit high coherence, i.e., con-secutive tasks are similar to each other; and (3) the time horizon islong enough that we can ﬁnd a core pool of workers to stay hiredand satisfy a large fraction of the skills required by incoming tasks.Technically, the problems we have analyzed in this paper in-volve embedding a set-cover problem in an online algorithm. Ourmain algorithms (

LumpSum , TFO ) are able to give results that arecompetitive in practice and, equally importantly, theoretically closeto the best one can hope for. The design of our algorithms is basedon the online primal–dual technique; we provide an experimen-tal evidence of the goodness of this method even for a complexreal-world problem. Furthermore, we present two heuristics which,although in theory are not competitive, perform well in practice.Future work may extend this by considering worker compatibil-ity [4, 18], learning of new skills by hired workers, or other exten-sions.

Future work.

As most problems, we can introduce further ele-ments to introduce even more generality. For instance, the algo-rithms we have described assume one and only one task arrivesper unit of time, can be extended trivially to cases in which taskarrivals occur at arbitrary times.As we noted in Section 6, there are also parallels with scenariosof caching and paging. Extending

TFO when the number of hiredworkers is limited turned out to be a challenging combination ofset cover, weighted caching and ski rental. We have began to studythese problems, our preliminary results show that we can achievea O ( log k log m ) approximation, in which k is the maximum size ofthe worker pool. A more natural constraint could be that, for in-stance, the total cost paid per unit of time cannot exceed a certainbudget, which would represent a cap in weekly or monthly per-sonel expenses. Another element we could incorporate is the pos-sibility of not handling a task, but instead paying a penalty when atask is too diﬃcult to handle with current workers and it is expen-sive to replace the worker pool with new workers. Other variantscan include workers with diﬀerent ability levels. We plan to studysome of these variants in future work.Additionally, we note that all the algorithms we have presentedin this paper are deterministic. Just as randomized algorithms forpaging can be deﬁned in the primal–dual framework [8], it is ofinterest to introduce other update rules for the primal variablesthat allow us to describe a randomized algorithm. Reproducibility.

The code and data of this paper can be found at https://github.com/adrianfaz/Algorithms-for-Hiring-and-Outsourcing-in-the-Online-Labor-Market . EFERENCES [1] Noga Alon, Baruch Awerbuch, Yossi Azar, Niv Buchbinder, , and Joseph Naor.2009. The Online Set Cover Problem.

SIAM J. Comput.

39, 2 (2009), 361–370.[2] Aijun An, Mehdi Kargar, and Morteza ZiHayat. 2013. Finding Aﬀordable andCollaborative Teams from a Network of Experts. In

SDM . 587–595.[3] Aris Anagnostopoulos, Luca Becchetti, Carlos Castillo, Aristides Gionis, and Ste-fano Leonardi. 2010. Power in unity: forming teams in large-scale communitysystems. In

ACM CIKM . 599–608.[4] Aris Anagnostopoulos, Luca Becchetti, Carlos Castillo, Aristides Gionis, and Ste-fano Leonardi. 2012. Online team formation in social networks. In

WWW . 839–848.[5] Nikhil Bansal. 2013. The Primal-Dual Approach for Online Algorithms. In

Ap-proximation and Online Algorithms , Thomas Erlebach and Giuseppe Persiano(Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1–1.[6] Jerome Barthelemy. 2001. The hidden costs of IT outsourcing.

MIT Sloan man-agement review

42, 3 (2001), 60.[7] Niv Buchbinder, Kamal Jain, and Joseph Seﬃ Naor. 2007. Online Primal-dual Al-gorithms for Maximizing Ad-auctions Revenue. In

Proceedings of the 15th AnnualEuropean Conference on Algorithms (ESA’07) .Springer-Verlag, Berlin, Heidelberg,253–264. http://dl.acm.org/citation.cfm?id=1778580.1778606[8] Niv Buchbinder and Joseph Naor. 2009. The design of competitive online al-gorithms via a primal: dual approach.

Foundations and Trends in TheoreticalComputer Science

3, 2–3 (2009), 93–263.[9] Nikhil R. Devanur and Thomas P. Hayes. 2009. The Adwords Problem: OnlineKeyword Matching with Budgeted Bidders Under Random Permutations. In

Pro-ceedings of the 10th ACM Conference on Electronic Commerce (EC ’09) . ACM, NewYork, NY, USA, 71–78. https://doi.org/10.1145/1566374.1566384[10] Christoph Dorn and Schahram Dustdar. 2010. Composing Near-Optimal ExpertTeams: A Trade-Oﬀ between Skills and Connectivity. In

OTM Conferences (1) .472–489.[11] Amita Gajewar and Atish Das Sarma. 2012. Multi-skill Collaborative Teamsbased on Densest Subgraphs. In

SDM . 165–176.[12] Behzad Golshan, Theodoros Lappas, and Evimaria Terzi. 2014. Proﬁt-maximizing cluster hires. In

ACM SIGKDD . 1196–1205.[13] Chien-Ju Ho and Jennifer Wortman Vaughan. 2012. Online Task Assignment inCrowdsourcing Markets. In

AAAI , Vol. 12. 45–51. [14] J. Howe. 2006. The rise of crowdsourcing.

WIRED (June) (2006).[15] L.B. Jeppesen and K.R. Lakhani. 2010. Marginality and problem-solving eﬀec-tiveness in broadcast search.

Organization Science

21, 5 (2010), 1016–1033.[16] Mehdi Kargar and Aijun An. 2011. Discovering top-k teams of expertswith/without a leader in social networks. In

CIKM . 985–994.[17] A. Kittur, B. Smus, S. Khamkar, and R.E. Kraut. 2011. CrowdForge: Crowdsourc-ing Complex Work. In

Annual ACM Symposium on User Interface Software andTechnology . 43–52.[18] Theodoros Lappas, Kun Liu, and Evimaria Terzi. 2009. Finding a team of expertsin social networks. In

ACM SIGKDD . 467–476.[19] Cheng-Te Li and Man-Kwan Shan. 2010. Team Formation for Generalized Tasksin Expertise Social Networks. In

SocialCom/PASSAT . 9–16.[20] Ann Majchrzak and Arvind Malhotra. 2013. Towards an information systemsperspective and research agenda on crowdsourcing for innovation.

The Journalof Strategic Information Systems

22, 4 (2013), 257–268.[21] Anirban Majumder, Samik Datta, and K. V. M. Naidu. 2012. Capacitated teamformation problem on social networks. In

KDD . 1005–1013.[22] T.W. Malone, R. Laubacher, and C. Dellarocas. 2010. The collective intelligencegenome.

MIT Sloan Management Review

51, 3 (2010), 21–31.[23] Mark S Manasse. 2008. Ski Rental Problem. In

Encyclopedia of Algorithms .Springer, 849–851.[24] OECD. 2016. Organization for Economic Cooperation and Development Dataon Self-Employment. https://data.oecd.org/emp/self-employment-rate.htm.[25] D. Retelny, S. Robaszkiewicz, A. To, W.S. Lasecki, and J. Patel. 2014. Expertcrowdsourcing with ﬂash teams. In

ACM symposium on User interface softwareand technology . 75–85.[26] C. Riedl and A. W. Woolley. 2016. Teams vs. Crowds: Incentives, member ability,and collective intelligence in temporary online team organizations. (2016).[27] Mauro Sozio and Aristides Gionis. 2010. The Community-search Problem andHow to Plan a Successful Cocktail Party. In

ACM SIGKDD . 939–948.[28] J. Surowiecki. 2004.

The wisdom of crowds: Why the many are smarter than thefew and how collective wisdom shapes business, economies, societies and nations .Anchor Books.[29] Sushma Un. 2017. The waning days of Indian IT workers being paid to do noth-ing.

Quartz India (2017).[30] Vijay V Vazirani. 2013.