Algorithms for Hiring and Outsourcing in the Online Labor Market
Aris Anagnostopoulos, Carlos Castillo, Adriano Fazzone, Stefano Leonardi, Evimaria Terzi
aa r X i v : . [ m a t h . O C ] F e b Algorithms for Hiring and Outsourcing in theOnline Labor Market ∗† Aris Anagnostopoulos
Sapienza University of Rome
Carlos Castillo
Universitat Pompeu Fabra
Adriano Fazzone
Sapienza University of Rome
Stefano Leonardi
Sapienza University of Rome
Evimaria Terzi
Boston University
ABSTRACT
Although freelancing work has grown substantially in recent years,in part facilitated by a number of online labor marketplaces, tradi-tional forms of “in-sourcing” work continue being the dominantform of employment. This means that, at least for the time being,freelancing and salaried employment will continue to co-exist. Inthis paper, we provide algorithms for outsourcing and hiring work-ers in a general setting, where workers form a team and contributedifferent skills to perform a task. We call this model team formationwith outsourcing . In our model, tasks arrive in an online fashion:neither the number nor the composition of the tasks are knowna-priori. At any point in time, there is a team of hired workerswho receive a fixed salary independently of the work they per-form. This team is dynamic: new members can be hired and exist-ing members can be fired, at some cost. Additionally, some parts ofthe arriving tasks can be outsourced and thus completed by non-team members, at a premium. Our contribution is an efficient on-line cost-minimizing algorithm for hiring and firing team membersand outsourcing tasks. We present theoretical bounds obtained us-ing a primal–dual scheme proving that our algorithms have log-arithmic competitive approximation ratio. We complement theseresults with experiments using semi-synthetic datasets based onactual task requirements and worker skills from three large onlinelabor marketplaces.
ACM Reference Format:
Aris Anagnostopoulos, Carlos Castillo, Adriano Fazzone, Stefano Leonardi,and Evimaria Terzi. 2018. Algorithms for Hiring and Outsourcing in theOnline Labor Market. In
KDD 2018: 24th ACM SIGKDD International Con-ference on Knowledge Discovery & Data Mining, August 19–23, 2018, London,United Kingdom.
ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3219819.3220056 ∗ Because of space limitations, some details and proofs have been omitted, they willappear in the full version of this work. † The research for this work has been supported by the Google Focused ResearchAward “Algorithms for Large-Scale Data Analysis,” the EU FET project MULTIPLEX317532, the ERC Advanced Grant 788893 “Algorithmic and Mechanism Design Re-search for Online MArkets (AMDROMA),” NSF grants CAREER 1253393 and IIS1421759, and La Caixa project LCF/PR/PR16/11110009.Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].
KDD 2018, August 19–23, 2018, London, United Kingdom © 2018 Association for Computing Machinery.ACM ISBN 978-1-4503-5552-0/18/08...$15.00https://doi.org/10.1145/3219819.3220056
Self-employment is an increasing trend; for instance, between 10%and 20% of workers in developed countries are self-employed [24].This phenomenon can be partially attributed to business downsiz-ing and employee dissatisfaction, as well as to the existence ofonline labor markets (e.g.,
Guru.com, Freelancer.com ). This trendhas enabled freelancers to work remotely on specialized tasks, andprompted researchers and practitioners to explore the benefits ofoutsourcing and crowdsourcing [14, 15, 17, 22, 25, 28].Although crowdsourcing adoption was driven, at least in part,by the assumption that problems can be decomposed into partsthat can be addressed separately by independent workers, crowd-sourcing results can be improved by allowing some degree of col-laboration among them [20, 26]. The idea of combining collabora-tion with crowdsourcing has led to research on team formation [2–4, 10–12, 16, 18, 19, 21, 27], in which a common thread is the needfor complementary skills, and problem settings differ in aspectssuch as objectives (e.g., load balancing and/or compatibility), con-straints (e.g., worker capacity), and algorithmic set up (online oroffline).
Overview of problem setting and assumptions.
We considertasks that arrive in an online fashion and must be completed by as-signing them to one or more workers, who jointly cover the skillsrequired for each task. At any point in time, there is a team ofhired workers who are paid a salary, independently of the workthey perform. This team is dynamic: new members can be hiredand existing members can be fired. Hiring and firing workers isexpensive, which is why companies routinely keep on the payrollskilled workers even if they are temporarily idle; however, theyalso seek to maintain “benching” to a minimum [29]. Outsourcingprovides additional flexibility as some parts of the incoming taskscan be completed by non-team members who are outsourced. Inpractice, outsourcing involves additional costs such as searching,contracting, communicating with, and managing an expert or spe-cialist external to a company [6].Deciding when to hire, fire, and outsource workers is a difficultonline problem with parameters that depend on job market con-ditions and employment regulations. Intuitively: (1) if the cost ofhiring or firing workers is too high, outsourcing becomes prefer-able to hiring; (2) if the cost of outsourcing work relative to salariesof hired workers is too high, hiring becomes preferable to outsourc-ing; and (3) if the workload consists of many repetitions of similartasks, hiring becomes preferable to outsourcing.In this paper, we formulate this as an online cost minimizationproblem, which we call Team Formation with Outsourcing (TFO).e formally define this problem in Section 2 and solve it in Sec-tions 3 and 4. Despite this being a model and hence not capturingevery aspect of employment decisions in a company, we show howit brings formalism to the intuitions we have outlined, helps un-derstand under which circumstances a combination of hiring andoutsourcing can be cost effective, and motivates experimentationon semi-synthetic data allowing us to cover a broad range of cases,as we show in Section 5.
Algorithmic techniques.
To the best of our knowledge, we arethe first to consider this problem and study some of its variants.Our problem turns out to be an original generalization of onlineset cover and online ski rental , two of the most paradigmatic on-line problems. In fact TFO has elements that make it more com-plex; to solve it, an algorithm has to address its various character-istics: (1) it is also online, so decisions should be taken with lim-ited information on the input, but at each step, an entirely new in-stance of the set-cover problem needs to be solved by using hiredand outsourced workers; (2) hired and outsourced workers collab-orate with each other, and this needs to be taken into account; and(3) workers can be hired, fired, hired again, and so on, so one hasto keep track of their status at every point in time.Several natural approaches inspired by online algorithms for theproblems we mentioned previously, fail to provide solutions withtheoretical guarantees. Therefore, we consider an approach intro-duced in the last years for studying complex online problems, the online primal–dual scheme [8]. The idea is to create a sequence ofinteger programs to model the online problem by incrementallyintroducing variables and constraints. We then consider their lin-ear relaxations and their duals to design an online algorithm andwe analyze it by comparing the costs of the primal and the dualprograms as they evolve over time with the arrival of new tasks.This is a powerful approach, which has so far been applied withsuccess to several classical online problems: packing and coveringproblems, ski-rental, weighted caching, k-server among others [5].We refer to [5, 8] for a survey of the applications of the onlineprimal-dual method.Our analysis results in polynomial-time algorithms that havelogarithmic competitive approximation ratios. This means that de-spite the fact that our algorithms work in an online fashion andthey do not have any knowledge of the number and the composi-tion of future tasks, we can guarantee that the cost they will incurwill be, at every time instance, only a logarithmic factor worse thanthe cost incurred by an optimal algorithm that knows the set of re-quests a priori.
Contributions.
The key contributions of our work are: • We formalize TFO: the problem of designing an online cost-minimizingalgorithm for hiring, firing and outsourcing. • We design efficient and effective approximation algorithms forTFO using an online primal–dual scheme, and provide approxi-mation guarantees on their performance. • We experiment on semi-synthetic data based on actual task re-quirements and worker skills from three large online labor mar-ketplaces, testing algorithms under a broad range of conditions. • We provide experimental evidence of the quality of the perfor-mance of online primal–dual algorithms for a complex real-worldproblem. Prior work has performed theoretical analysis mostly for classical or practically motivated online problems [7, 9]. Tothe best of our knowledge, the empirical validation was previ-ously addressed only for the Adwords matching problem [13].We demonstrate that such approaches, even though they arebased on heavy theoretical machinery, can be easily implementedand are efficient in practice.
In this section, we formally describe our setting and problem, andprovide some necessary background.
Skills.
We consider a set S of skills with | S | = m . Skills can be anykind of qualification a worker can have or a task may require, suchas video editing , technical writing , or project management . Tasks.
We consider a set of T ∗ tasks (or jobs), J = { J t ; t = , , . . . , T ∗ } , which arrive one-by-one in a streaming fashion; J t is the t th task that arrives. Each task J ∈ J requires a set of skillsfrom S , therefore, J ⊆ S . We use J t to refer to both the task andthe skills that it requires. Workers.
Throughout we assume that we have a set W of n work-ers: W = { W r ; r = , . . . , n } . Every worker r possesses a set ofskills ( W r ⊆ S ), and P ℓ denotes the subset of workers possessing agiven skill ℓ : P ℓ = { r ; ℓ ∈ W r } . Similarly to the tasks, we use W r to denote both the worker and his/her skills.We partition the set of available workers W into the set of work-ers who are hired at time t , denoted by H t , and the set of workerswho are not hired , denoted by F t (we sometimes refer to theseworkers as freelancers , and they can be outsourced for J t ), so that H t ∩ F t = ∅ and W = H t ∪ F t . Coverage of tasks.
Whenever task J t ⊆ S arrives, an algorithmhas to assign one or more workers to it, i.e., a team . We say that J t can be completed or covered by a team Q ⊆ W if for every skillrequired by J t , there exists at least one worker in Q who possessesthis skill: J t ⊆ ∪ W ∈Q W . We assume that for every skill in theincoming task there is at least one worker possessing that skill, soall tasks can be covered. Costs.
Every worker W r potentially can charge the following non-negative, worker-specific fees: (1) an outsourcing fee λ r , (2) a hiringfee C r , and (3) a salary σ r . Outsourcing fees λ r denote the pay-ment required by a (non-hired) worker when a task is outsourcedto him/her. Note that λ r depends on the worker but does not de-pend on the task. Hiring fees C r reflect all expenses associatedto hiring and firing a worker, such as signup bonuses and sever-ance payments. Given that any algorithm commits to pay the fir-ing costs the moment in which it hires a worker, we follow a stan-dard methodology used in online algorithms for caching [8] andaccount for both hiring and firing costs when the worker is hired.Once a worker r is hired, s/he is paid a recurring salary σ r , whichrecurs for every step t that the worker is hired. The above notationis summarized on Table 1. Assumptions.
To avoid making the model overly complicated, weassume that the salary periods are defined by the arriving tasks,this is, there is one task per salary period, and task completiontakes one salary period. A further assumption will be that σ r < able 1: Notation S Set of skills, size m J Set of tasks, size T ∗ T Number of tasks till current time J t The t ’th task arriving J t ℓ = t requires skill ℓ , 0 otherwise W Set of workers, size n . W r ℓ = r possess skill ℓ , 0 otherwise P ℓ Subset of workers possessing skill ℓ C r Hiring fee, paid when worker r is hired λ r Outsourcing fee, paid every time r performs a task σ r Salary paid to a hired worker r λ r , as in practice requesting a single task from an external workerinvolves extra costs [6], which are reduced when the worker ishired (or when an outsourcing arrangement for an external groupof workers to perform a specific recurring task is done, which isdifferent from the individual outsourcing we discuss here). Finally,we assume λ r < C r + σ r , because otherwise workers would behired and fired for every task. We now define the problem that we study:
Problem 1 (Team Formation with Outsourcing – TFO).
Thereexists a set of skills S . We have a pool of workers W , where eachworker W r ∈ W is characterized by a subset of skills W r ⊆ S , anoutsourcing cost λ r ∈ R ≥ , a hiring cost C r ∈ R ≥ , and a salarycost σ r ∈ R ≥ . Given a set of tasks J = { J , J , . . . , J T ∗ } , with J t ⊆ S , which arrive in a streaming fashion, the goal is to designan algorithm that, when task J t arrives, decides which workers tohire (paying cost C r + σ r ), keep hired (paying cost σ r ), and outsource(paying cost λ r ), such that all the tasks are covered by the workerswho are hired or outsourced and the total cost paid over all the tasksis minimized. TFO is an online problem: J is revealed one task at a time. Ourgoal is to guarantee that for any input stream J the total cost ofour online algorithm, A LG (J ) , is at most a small factor greaterthan the total cost of the optimal (offline) algorithm that knows J in advance, O PT (J ) . This factor, max J A LG (J )/ O PT (J ) , iscalled the competitive ratio of the algorithm.We solve the TFO problem in Section 4. Because neither the algo-rithm nor its analysis are trivial, we introduce them gradually byfirst solving a simplified version of
TFO , which we describe andsolve in Section 3.
Two special cases of
TFO are
SetCover and
SkiRental . SetCover : the single-task, multiple-skill case. The set coverproblem is an instance of our problem when there is a single task J ⊆ S and for each worker W r , C r = ∞ . Then, as soon as the task J arrives, the algorithm needs to cover all skills in J by selecting aset of workers Q ⊆ W such that Q covers J and Í r ∈Q λ r is min-imized. In this case, our problem can be solved using the greedyalgorithm for the set-cover problem (see [30, Chapter 2]). SkiRental : the single-skill, single-worker case. The skirental problem is an instance of our problem when the sequenceof tasks J consists of a repetition of the same single-skill task J and the workforce W consists of a single worker W r who pos-sesses the same one skill, and has σ r = C r , λ r . Inthis ski-rental version of our problem [23], the question is the fol-lowing: without knowledge of the total number of tasks that willarrive, when should worker W be hired so that the total cost paidto him/her in outsourcing plus hiring fees is minimized?A well-known algorithm for this problem is the following: forevery instance of J t that arrives outsource J t to worker W r as longas: Í tt ′ = λ r < C r . Then, hire the worker when Í tt ′ = λ r ≥ C r . Theabove algorithm achieves a competitive ratio of 2. LUMPSUM
PROBLEM
First, we solve a simplified version of the
TFO problem, where forevery worker W r the salary is equal to 0 ( σ r = LumpSum , a hired worker W r is paida lump sum of C r the moment s/he is hired and this amount is as-sumed to cover all future work done by the worker. Instead, whena worker W r is outsourced, a payment of λ r is done every times/he performs a task. LumpSum-Heuristic
Algorithm
A natural algorithm for solving the
LumpSum problem combinesideas from
SetCover and
SkiRental as follows: first, it starts withno worker being hired and each worker W r is associated with avariable δ r initially set to 0.For any T ∈ { , . . . , T ∗ } , when task J T arrives, the algorithmproceeds as follows: first, it identifies J T F to be the set of skills of J T that cannot be covered by already-hired workers. Then, it coversthe skills in J T F using the greedy algorithm for SetCover . This wayit finds Q T ⊆ W such that Í W r ∈Q T λ r is minimized. Finally, foreach worker W r ∈ Q T , it updates δ r ← δ r + λ r . Worker W r ishired when δ r ≥ C r . Clearly, since there are no salaries there is nomotivation to fire a worker once s/he is hired. LumpSum-Heuristic has arbitrarily bad competitive ratio.
Al-though our experiments (Section 5) demonstrate that the above al-gorithm, which we call
LumpSum-Heuristic , performs quite wellin many practical cases, we can show that its competitive ratiocan be arbitrarily bad. For this, consider an example where W = { W , W } and both workers have the same skill: W = W = { ℓ } .Further assume that λ = λ = + ϵ and C = M , C = M is a large value and ϵ a small one. For a sequence of tasks J = J = . . . = J T ∗ = { ℓ } , it is clear that LumpSum-Heuristic willalways outsource to W until hiring him/her and will incur worst-case cost 2 M , whereas the optimal algorithm pays just C = The above discussion illustrates that to obtain an algorithm withbounded competitive ratio, we need to take into account both theoutsourcing and hiring costs of all workers. To do so, we deploy anonline primal–dual scheme, which drives our algorithm design.
The integer and linear programs.
The first step of the primal–dual approach, is to define an integer formulation for the problem,or each step T ∈ { , . . . , T ∗ } . We assume that the current task isthe T th task and we use the following variables: • x r = W r is hired when task J T arrives; otherwise x r = • f rt = W r is outsourced for performing task J t ; oth-erwise f rt = LumpSum can be formulated as follows:Linear program for
LumpSum :min n Õ r = C r x r + λ r T Õ t = f rt ! subject to: ∀ t = , . . . , T , ℓ ∈ J t : Õ W r ∈ P ℓ ( x r + f rt ) ≥ ∀ t = , . . . , T , r = , . . . , n : x r , f rt ≥ x r , f rt ∈ N ,form the integer program for LumpSum . In this formulation, the ob-jective function sums over all workers the hiring costs (paid if thecorresponding worker has been hired by time t ) and the outsourc-ing cost for the tasks for which the worker has been outsourced.This is the total cost of the solution until the current task J T . Notethat in this formulation of the problem there is no motivation fora worker who is hired to be fired. Therefore, once x r is set to 1, itdoes not change its value to become 0 again.The first constraint (1) in the above program is the covering con-straint: it simply enforces that for every skill required for each task,there exists a hired or outsourced worker who has this skill. Thisguarantees that the team selected for each task J t covers all therequired skills. The nonnegativity and the integrality constraints,ensure that the solutions that we obtain from the integer-programformulation can be transformed to a solution to our problem: even-tually, every variable will take the value 0 or 1. To apply the online primal–dual approach, we first consider thelinear relaxation of the integer program, which simply drops theintegrality constraints x r , f rt ∈ N . In a solution to this linear pro-gram (LP) each variable takes values in [ , ] . Given this LP, we canwrite its dual as follows:The dual of the linear program for LumpSum :max T Õ t = Õ ℓ ∈ J t u ℓ t subject to: ∀ r = , . . . , n : T Õ t = Õ ℓ ∈ J t ∩ W r u ℓ t ≤ C r (2) ∀ t = , . . . , T , r = , . . . , n : Õ ℓ ∈ J t ∩ W r u ℓ t ≤ λ r (3) ∀ t = , . . . , T , ℓ ∈ J t : u ℓ t ≥ A solution in which some variables take values greater than 1, can be transformedto another feasible solution with lower cost by setting these variables to 1.
Note that at every time t ∈ { , . . . , T } we have such a pair ofprimal–dual formulations. We are now going to use these two for-mulations for designing and analyzing our algorithm. The
LumpSum algorithm:
Next, we present the
LumpSum algorithm,which is designed and analyzed using the primal and the dual lin-ear programs. We assume that task J T , T ∈ { , . . . , T ∗ } , has justarrived and the algorithm must act before task J T + arrives (or thestream finishes if T = T ∗ ).All the variables used in our algorithm are initialized to 0 beforethe arrival of the first task. When task J T arrives the followingsteps are done:1. Let F T and H T represent the workers who are not hired and hired , respectively, at the time that J T arrives. Clearly,when the first task arrives ( T = F T = W and H T = ∅ .For T >
1, the values of H T and F T are updated in the laststep (step 10) of the previous round.2. Let J T H = J T ∩ ∪ W r ∈H T W r be the skills from J T that arecovered by already-hired workers and J T F = J T \ J T H .3. For every skill ℓ ∈ J T F let P F ℓ = P ℓ ∩F T be the set of workersin F T such that every worker in P F ℓ has skill ℓ . Also let P F T = ∪ ℓ ∈ J T F P F ℓ be the set of unhired workers who possess at leastone skill that is required and not covered by already-hiredworkers.4. for each W r ∈ P F T : set ˜ x ′ r ← ˜ x r .5. for each skill ℓ ∈ J T F : while Í W r ∈ P ℓ (cid:16) ˜ x r + ˜ f rT (cid:17) < for each W r ∈ P ℓ : ˜ x r ← ˜ x r (cid:16) + C r (cid:17) + nC r for each W r ∈ P ℓ : ˜ f rT ← ˜ f rT (cid:16) + λ r (cid:17) + nλ r for each W r ∈ P F T : set ∆ ˜ x r ← ˜ x r − ˜ x ′ r .7. Set H ′ ← ∅ .8. repeat ρ times: for each W r ∈ P F T with probability ∆ ˜ x r :hire worker W r (set x r ← H ′ ← H ′ ∪ { r } )with probability ˜ f rT :outsource worker W r (set f rT ← for each skill ℓ ∈ J T F : if skill ℓ is not covered:hire worker W r ∈ P F ℓ with minimum cost C r (set x r ← H ′ ← H ′ ∪ { r } )10. H T + ← H T ∪ H ′ , F T + ← W \ H T + .For T =
1, the
LumpSum starts with no worker being hired. Intu-itively, as tasks arrive, the algorithm tries to gauge two quantities:(1) the usefulness of every worker for the task at hand J T and (2)the overall usefulness of every worker for tasks J , . . . , J T . This isdone in step 5, via variables ˜ f rT (for (1)) and ˜ x r (for (2)). In partic-ular, the more useful W r proves over time, the larger the value ˜ x r .Subsequently, in step 8 every worker is outsourced or hired basedon the increase in the values of ˜ f rT and ˜ x r observed in step 5. Fi-nally, for every skill that remains uncovered after step 8 (whichis randomized), LumpSum hires worker W r with the minimum C r that covers the skill. Note that the increase of the variables u ℓ T intep 5 is not required for solving the LumpSum , but it is used in ouranalysis and thus we leave it in the description above.Our analysis requires to set the value of ρ in step 8 to ρ = ln m + ln C ∗ , where C ∗ = max W r ∈W C r .Although one may think that an additive update of variables instep 5 would seem more natural, such an update would introducea Θ ( m ) factor in the competitive ratio. On the other hand, the mul-tiplicative update that we adopt, has the property that the more aworker W r is required over time the higher the increase of the cor-responding variable ˜ x r . This fact, leads us to Theorem 3.1 below. Analysis.
We have the following result for
LumpSum . Theorem 3.1.
LumpSum is an O ( log n ( log m + log C ∗ )) - competi-tive algorithm for the LumpSum problem, where C ∗ = max W r ∈W C r . Running time.
The running time of
LumpSum per task is domi-nated by the execution of steps 5 and 8. For step 5, using binarysearch, the algorithm can determine in O ( log C ∗ ) steps the min-imum increase of ˜ x r and ˜ f rT that makes false the condition ofthe while loop for at least one uncovered skill ℓ . Therefore, therunning time of step 5 is O (cid:16)(cid:12)(cid:12) J T (cid:12)(cid:12) n log C ∗ (cid:17) . Step 8, using a hashtable to store hired workers, can be executed in expected time O ( ρ n ) = O ( n ( log m + log C ∗ )) . Therefore, the expected time re-quired for processing task J T is O (cid:16) n (cid:16) log C ∗ (cid:12)(cid:12) J T (cid:12)(cid:12) + log m (cid:17)(cid:17) . TFO
PROBLEM
In this section, we provide an algorithm for the general versionof
TFO (Problem 1). In contrast with
LumpSum , now after hiringa worker we must pay a salary σ r ≥
0, complicating the problemsignificantly as it may now be cost-effective to fire workers.
The integer and linear programs for
TFO . Given that work-ers can be hired, then fired and potentially hired again, and soon, we introduce in this new LP the notion of intervals. Theseintervals are used to model periods in which workers are hired I = {{ t a , t b } | t a , t b ∈ N , t a ≤ t b } . Intuitively, an interval is a sub-set of time steps during which an algorithm decides to hire a givenworker. The new LP, (omitted) uses the following variables: • x ( r , I ) with I ∈ I : x ( r , I ) = W r is hired during theentire interval I ; otherwise x ( r , I ) = • f rt : f rt = W r is outsourced for performing J t .It turns out that it is hard to design an approximation algorithmwith proven guarantees using this program, mostly because it ishard to keep track of the costs being paid for every worker whenthe intervals of him/her being hired, outsourced, or idle are ofvariable length. Therefore, we resort to a different overall strategy: First, we define the
Alt-TFO problem, in which the solutions arerestricted such that every worker is hired for fixed-length (worker-specific) intervals (Section 4.1). Then, we design an algorithm for
Alt-TFO with good competitive ratio (Section 4.2). Finally, we provethat a solution to
Alt-TFO can be transformed to a solution for
TFO , and that any solution of
TFO can be transformed to a feasi-ble solution of
Alt-TFO that is a factor of at most 3 times higher(Section 4.3), obtaining an approximation algorithm for
TFO . Alt-TFO
Problem
The difference between
Alt-TFO and
TFO is that we restrict the so-lutions of the former to have a specific structure; whenever worker W r is hired s/he is then fired after η r △ = ⌈ C r / σ r ⌉ time units—independently of whether s/he is used or not in tasks within these η r time units.In this case, every worker W r is associated with a new hiringcost b C r , which is the summation of his/her original hiring cost C r plus the salaries paid to him/her for the η r time units he is hired.Thus, the total hiring cost and salary for an entire interval is C r + η r · σ r ≤ C r + (cid:16) C r σ r + (cid:17) · σ r ≤ C r . We will use b C r △ = · C r .We can now write the LP for Alt-TFO . In addition to the no-tation we discussed in the previous paragraph, we use I t ∈ I todenote the interval that starts at time t . Worker W r has x ( r , I ) = entire interval I . All intervals I for which x ( r , I ) = η r .Linear program for Alt-TFO :min n Õ r = "Õ I ∈I b C r x ( r , I ) + T Õ t = λ r f rt subject to: ∀ t = . . . T , ℓ ∈ J t : Õ W r ∈ P ℓ f rt + Õ I ∈I : t ∈ I x ( r , I ) ! ≥ ∀ t = . . . T , r = . . . n , I ∈ I : x ( r , I ) , f rt ≥ Alt-TFO
Problem
In this section, we design and analyze an algorithm for the
Alt-TFO problem. The similarity between the LPs for
Alt-TFO and
LumpSum (Section 3) translates into a similarity in the algorithms(and their analysis) of the two problems. The key difference nowis that we need to take care of the firings.Our algorithm for
Alt-TFO differs from the algorithm for
Lump-Sum in steps 1, 5, 8, and 9, which are changed as follows:1’. Let F T and H T represent the workers who are not hiredand hired, respectively, at the time that J T arrives. Clearly,when the first task arrives ( T = F T = W and H T = ∅ . For T >
1, the values of H T and F T are updated in thelast step (step 10) of the previous round and then we removeworkers whose hiring interval finished in the previous step: F ′ ← { r ∈ W ; x ( r , I T − η r ) = }H T ← H T \ F ′ , F T ← W \ H T for each W r ∈ F ′ : set ˜ x r ← for each skill ℓ ∈ J T F : while Í r ∈ P ℓ (cid:16) ˜ x r + ˜ f rT (cid:17) < for each r ∈ P ℓ : ˜ x r ← ˜ x r (cid:16) + b C r (cid:17) + n b C r for each r ∈ P ℓ : ˜ f rT ← ˜ f rT (cid:16) + λ r (cid:17) + nλ r repeat ρ ( T ) times: for each r ∈ P F T with probability ∆ ˜ x r :hire worker W r (set x ( r , I T ) ← H ′ ← H ′ ∪ { r } )with probability ˜ f rT :utsource worker W r (set f rT ← for each skill ℓ ∈ J T F : if skill ℓ is not covered:outsource worker W r , r ∈ P F ℓ , with minimum cost λ r (set f rT ← ρ ( T ) = ln m + ln λ ∗ + T , where λ ∗ = max W r ∈W λ r . Analysis of
Alt-TFO . Algorithm
Alt-TFO gives a solution withproven theoretical guarantees for
Alt-TFO . As before, the multi-plicative update is needed to obtain this competitive ratio. We havethe following theorem (proof omitted due to space constraints).
Theorem 4.1.
Alt-TFO is an O ( log n ( log m + log λ ∗ + log T ∗ )) -competitive algorithm for the Alt-TFO problem.
TFO
Using
Alt-TFO
Note that any solution output by
Alt-TFO can be transformed intoa feasible solution to the original
TFO problem by setting д rt ← r , t ∈ I for which x ( r , I ) =
1, and д rt ← Alt-TFO and subsequently does thistransformation a its final step, the
TFO algorithm.The question is whether
TFO provides a solution with boundedcompetitive ratio for the
TFO problem. We answer this questionaffirmatively by showing (1) that the solution of
TFO for the
TFO problem is feasible and has a cost bounded by the cost of
Alt-TFO for the
Alt-TFO problem, and (2) that any solution for the
TFO problem can be turned into a feasible solution to the
Alt-TFO problem at the expense of a small loss in the approximation factor.These two suffice to prove that the solution produced by
TFO is agood solution for the
TFO problem. We have the following result(proof omitted due to space constraints):
Theorem 4.2.
TFO is an O ( log n ( log m + log λ ∗ + log T ∗ )) -competitivealgorithm for the TFO problem.
Running time.
Similarly to Section 3, the expected time requiredto process task J T is O (cid:16) n (cid:16) log C ∗ (cid:12)(cid:12) J T (cid:12)(cid:12) + log m + log T (cid:17)(cid:17) . Lower bound.
Note that there is little hope for significant im-provement of our theoretical results. In particular, Alon et al. [1]have proven a lower bound of Ω (cid:16) log n log m log log n + log log m (cid:17) on the compet-itiveness of any deterministic algorithm for the unweighted onlineset cover problem . The unweighted online set cover problem, is aspecial case of TFO (and of
LumpSum ) where for each worker W r we have C r = λ r = σ r =
0, and for each task J T we have J T − ∪ { ℓ } , for some skill ℓ ∈ S \ J T − (with J = ∅ ). TFO-Heuristic
Similarly to
LumpSum , we also consider the heuristic
TFO-Heuristic ,which is a generalization of
LumpSum-Heuristic , for general val-ues of σ r . Specifically, the difference is that worker W r is hiredwhen δ r ≥ C r + η r · σ r , and is fired after η r tasks (see Sections 3.1and 4.1 for definitions of δ r and η r ). Note that theoretically TFO-Heuristic may perform arbitrarily bad: the example of Section 3.1 holds for
TFO-Heuristic for small σ r . Yet, in Section 5 we observe that eventhough it does not offer the theoretical guarantees of TFO , it per-forms well in practice.
Table 2: Characteristics of the three source datasets used togenerate workloads for our experiments. Numbers in italicscorrespond to tasks generated for the Upwork dataset, as ex-plained in Section 5.1.
Dataset UpWork Freelancer GuruSkills ( m ) 2,335 175 1,639Workers ( n ) 18,000 1,211 6,119Tasks ( T )
992 3,194... distinct
600 2,939... avg. similarity (Jaccard)
TFO-Adaptive algorithm
As we will see in Section 5, although
TFO gives theoretical guaran-tees for the worst-case performance, in practice some of our otheralgorithms for the
TFO problem may perform better under someinput parameters. Given the low running time of all our solutionapproaches to
TFO , we implemented the
TFO-Adaptive algorithm.This algorithm runs in parallel all the presented methods for solv-ing the
TFO problem (
TFO , TFO-Heuristic , Always-Outsource and
Always-Hire ), and selects at each time the current minimum-cost algorithm to apply to solve the current task, switching be-tween algorithms when it is advantageous. The asymptotic worst-case results hold for the
TFO-Adaptive algorithm as well. Further-more, our experiments (see Section 5) show that it is beneficial tochange the hiring policy even if we pay switching costs.
Our experiments seek to compare the total cost that would be in-curred by companies using different algorithms to assign workersto a stream of incoming tasks. We use synthetic datasets represent-ing possible workloads, built using actual task requirements andworker skills from three large online marketplaces. Synthetic data,while having the limitation of not reflecting the particular condi-tions of a specific company, allows us to evaluate the effectivenessof our algorithms under a broad range of conditions. Section 5.1 in-troduces our datasets, Section 5.2 presents results on the
LumpSum problem, and Section 5.3 on the
TFO problem.
We start by introducing our datasets and discussing our choice ofcost parameters for experimentation.
Source datasets.
To create a large pool of tasks from which to sam-ple workloads, we use datasets obtained from three large onlinemarketplaces for outsourcing: UpWork, Freelancer and Guru (theauthors are not associated with any of these services). All three arein the top-30 of traffic in their category (“consulting marketplaces”)according to data from Alexa (Feb. 2018), indeed, Freelancer andGuru are respectively number 1 and number 3. General statisticsof these datasets are shown on Table 2.
Worker skills.
The input data that we obtained contain anonymizedprofiles for people registered as freelancers in these marketplaces. C o s t Tasks
Always OutsourceLumpSum heuristicLumpSumAlways Hire (a)
LumpSum
UpWork C o s t Tasks
Always HireAlways OutsourceTFOTFO adaptiveTFO heuristic (b)
TFO
UpWork C o s t Tasks (c)
LumpSum
Freelancer C o s t Tasks (d)
TFO
Freelancer C o s t Tasks (e)
LumpSum
Guru C o s t Tasks (f)
TFO
Guru
Figure 1: Experimental comparison of algorithms showingtotal cost due to outsourcing, hiring, and paying salaries as afunction of the number of tasks in the input, averaged over100 workloads generated with p = . Left: Algorithms forproblem LumpSum . As expected,
Always-Hire has the small-est cost if the number of tasks is large, however an onlinealgorithm does not know the number of tasks. Our online al-gorithm and its heuristic version (
LumpSum-Heuristic ) showa cost that does not exceed twice of that of
Always-Hire .In contrast,
Always-Outsource has cost proportional to thenumber of tasks. Parameters C r = λ r and T = .Right: Algorithms for problem TFO . Our online algorithm,its heuristic version (
TFO-Heuristic ) and the
TFO-Adaptive have smaller cost than
Always-Outsource and
Always-Hire .The latter diverges rapidly due to salary costs. Parameters C r = λ r , σ r = . λ r and T = . These include their self-declared sets of skills, as well as the aver-age rate that they charge for their services. There is a large varia-tion in the number of skills per worker among datasets, as can beseen in Table 2. Data have been cleaned to remove skills that werenot possessed by any worker and skills that were never requiredby any task. The numbers in Table 2 refer to the clean datasets.
Tasks.
For both Freelancer and Guru we have access to a largesample of tasks commissioned by buyers in the marketplace; theyare included as tasks on Table 2. They correspond to actual tasksbrought to these marketplaces by actual users. These samples are anonymized: we do not know the name of the company commis-sioning them, and there are no timestamps in this data. In the caseof Upwork, we generate synthetic tasks following a data-generationprocedure used in previous work [4]: we remove a small numberof workers (10%), who are excluded from the pool of workers inthe dataset, and then repeatedly sample subsets of them to createtasks, by interpreting the union of their skills as task requirements.
Workloads.
Marketplaces for online work cover a broad range oftasks from graphic design and web development to accounting,administrative assistance, and legal consulting. Except for hugeconglomerates, most firms will not outsource work across all cat-egories at the same time. The workload-generation process thatwe use has a single parameter p , which we call the coherence pa-rameter of the workload, and works as follows. First, we start witha random task, which we select as pivot. To select the next task,with probability 1 / p we select a random task from the pool of dis-tinct tasks in the dataset and make this task the new pivot, andwith probability 1 − / p we select another task with Jaccard simi-larity at least 0 . p . Each workload stream that we create has 10Ktasks. We also experimented with streams of up to 100K tasks, butwe observed that 10K tasks suffice to expose the trends of the algo-rithms that we compare. We believe that in general a large value of p is realistic for a company, as customers would probably procurefrom it services exhibiting a certain coherence; we also evaluateour algorithms for a broad range of values for p .For each dataset and for each coherence parameter that we use,we generated 100 workload streams; the costs that we report in ourexperiments are averages over these 100 workloads. Cost parameters.
We have data about the rates charged by work-ers in each marketplace, which we directly interpret as their out-sourcing costs λ r . However, we do not have their hiring or salarycosts, so we experiment with different values for these costs.For hiring costs , which are characterized by C r > λ r , we as-sume they are a multiplicative factor larger than the hiring cost, C r = α r λ r . We performed extensive experiments in which C r var-ied between 1 λ r and 30 λ r , either as a fixed value, or setting α r tobe a random variable distributed uniformly in a small range.For salary costs , we assume that they are a fraction of outsourcecosts, experimenting with values from σ r = λ r /
100 to σ r = λ r / σ r are smaller than outsourcing costs λ r because the latterincludes many costs in which a company incurs when outsourc-ing [6], including: (i) outside-hired consultants are usually morehighly paid per hour/day than regular employees for a company,(ii) there are transaction costs involved in locating and contractingand outsourced worker that do not exist for regular employees, and(iii) there are communication and management costs of handlingsomeone external to a company. LumpSum
Baselines.
We consider two baselines. The
Always-Hire baselinesolves the
SetCover problem for finding a low-cost set of workersthat cover the task’s uncovered skills and hires them. The
Always-Outsource baseline never hires, instead it outsources to workers that coverthe required skills for the task, by solving a
SetCover probleminstance. esults.
Figure 1 (Left) summarizes our results for
LumpSum forworkloads generated with the UpWork, Freelancer and Guru datasets,depicting total cost as a function of the number of tasks.We observe that under all these workloads the algorithms be-have similarly.
Always-Outsource has cost proportional to thenumber of tasks and is not competitive, its cost is mostly outsidethe range of Figure 1 (Left). As expected,
Always-Hire performsthe best in the long run, because if the number of tasks is large, hir-ing is a dominant strategy; however the online algorithm does notknow the number of tasks. Experimentally, the
LumpSum algorithmhas a cost that does not exceed that of
Always-Hire by more thana factor of 2, across all the scenarios that we tested. We note thatfor short sequences
LumpSum has lower cost; this difference in thecost can sometimes be an order of magnitude smaller (plots omit-ted for brevity). We also note that although
LumpSum-Heuristic can, theoretically, perform arbitrarily bad, in our experiments itperforms quite well—although worse than the theoretically justi-fied
LumpSum . Variations (plots omitted for brevity).
Figure 1 (Left) is obtainedwith C r = λ r . We do not observe dramatic variations in the resultswhen varying this parameter in the studied range (1 λ r through30 λ r ): LumpSum has a smaller cost than
Always-Outsource . In gen-eral, higher hiring costs mean the number of tasks required beforehiring a worker is larger, the costs of
LumpSum and
Always-Hire are higher, and the advantage for
LumpSum over
Always-Hire fora small number of tasks holds for a longer period of time.In all plots of Figure 1 we use coherence parameter p = p = LumpSum is still better than
Always-Outsource . TFO
Baselines.
As in
LumpSum , we consider baselines
Always-Hire and
Always-Outsource . Additionally, we consider
TFO-Heuristic (defined in Section 4.4), which does not have a theoretical guaran-tee.
Results.
Figure 1 (Right) summarizes our results for
TFO . We ob-serve that
TFO , TFO-Heuristic , and
TFO-Adaptive have the small-est total cost, followed by
Always-Outsource . In contrast, the
Always-Hire strategy has much higher cost due to mounting salary costs. Wealso observe that while
TFO-Heuristic does not offer the theoret-ical guarantees of
TFO , it performs well in practice.
Variations.
Similarly to
LumpSum , varying C r does not bring dra-matic changes, but as C r increases while maintaining workloadcoherence and salary to outsource cost ratios constant, the advan-tage of TFO over
Always-Outsource decreases, and for large hir-ing costs
Always-Outsource has the smallest cost (plots omittedfor brevity). Concretely, for p =
100 and σ r = λ r /
10, if we varythe hiring cost C r (from 1 λ r to 30 λ r ), the total cost of TFO re-mains less or equal than the total cost of
Always-Outsource until C r = λ r , when the cost of TFO becomes larger than the costof
Always-Outsource for the workload generated using the Gurudataset. The corresponding values of C r for workloads generatedwith Freelancer and Upwork data are C r = λ r and C r = λ r C oh e r e n ce ( p ) Salary-to-Outsource Ratio (a) UpWork:
TFO vs.
Always-Outsource C oh e r e n ce ( p ) Salary-to-Outsource Ratio (b) UpWork:
TFO-Adaptive vs.
Always-Outsource C oh e r e n ce ( p ) Salary-to-Outsource Ratio (c) Freelancer:
TFO vs.
Always-Outsource C oh e r e n ce ( p ) Salary-to-Outsource Ratio (d) Freelancer:
TFO-Adaptive vs.
Always-Outsource C oh e r e n ce ( p ) Salary-to-Outsource Ratio (e) Guru:
TFO vs.
Always-Outsource C oh e r e n ce ( p ) Salary-to-Outsource Ratio (f) Guru:
TFO-Adaptive vs.
Always-Outsource
Figure 2: (Best seen in color.) Left: ratio of the cost achievedby
TFO and
Always-Outsource . Right: ratio of the costachieved by
TFO-Adaptive and
Always-Outsource . Coherenceparameter p varies from 0.02 to 0.24; salary-to-outsource ra-tio varies from 20 to 200; the number of tasks is 10K. Col-ors represent the ratio of costs: blue (dominant towardsthe bottom-left) indicates the region where our algorithms TFO and
TFO-Adaptive have smaller cost, while red indicatesthe region where the baseline
Always-Outsource has smallercost. In the white region both algorithms have similar costs. respectively. As expected, if the hiring costs are sufficiently large,
Always-Outsource becomes a dominant strategy.Figure 2 (Left) compares
TFO and
Always-Outsource experi-mentally by varying the coherence parameter p from 20 to 200 and σ r from λ r /
50 to λ r /
4. We observe that less coherent workloadsand high salaries make hiring more expensive;
Always-Outsource then becomes a dominant strategy. Figure 2 (Right) shows the powerof the
TFO-Adaptive algorithm. We observe that it performs equalor better than
Always-Outsource for all the range of parameters.
Performance.
Our code, which will be released with this paper,is a relatively straightforward mapping of the algorithm to simplecounters. Written in Java, it requires about 5 to 8 seconds on av-erage to process 10K incoming tasks using commodity hardware.e remark that, although our formulation is a linear program, themethod does not involve solving the linear program, instead, weobtain the solution using the specific primal–dual method that wehave described and analyzed.
To the best of our knowledge, we are the first to introduce andsolve the Team Formation with Outsourcing (
TFO ) problem. How-ever, our work is related to existing work on crowdsourcing, teamformation, and online algorithms design, which we outline next.
Crowdsourcing.
Among the extensive literature in crowdsourc-ing, the most related to ours is the work of Ho and Vaughan [13].Their goal is to assign individual workers to tasks, based on theworkers’ skills. Although Ho and Vaughan also deploy the primal–dual technique to solve the task-assignment problem, the tasksthey consider can be performed by individual workers and not byteams. Thus, both their problem and their algorithm is differentfrom ours.
Team formation.
A large body of work in team formation con-siders the following problem: given a social or a collaboration net-work among the workers and a set of skills that needed to be cov-ered, select a team of experts that can collectively cover all the re-quired skills, while minimizing the communication cost betweenthe team members [2, 4, 10, 11, 16, 18, 19, 27]. Other variants ofthis problem have also considered optimizing the cost of recruit-ing promising candidates for a set of pre-defined tasks in an of-fline fashion [12] and minimizing the workload assigned to eachindividual team member [3, 21].Although the concept of set-cover is common between our workand previous work, the framework we propose on this paper is dif-ferent in multiple dimensions. First, we do not focus on optimizingthe communication cost; in fact we do not assume any networkamong the individual workers. Our goal is to minimize the overallcost paid on hiring, outsourcing, and salary costs. This difference inthe objectives leads to different (and new) optimization problemsthat we need to solve. Secondly, most of the work above focuseson the offline version of the team-formation problem, where thetasks to be completed are a-priori known to the algorithm. The ex-ception is the work of Anagnostopoulos et al. [3, 4]. However, intheir setting they aim to distribute the workload as evenly as pos-sible among the workers, while our objective is to minimize theoverall cost of maintaining a team that can complete the arrivingtasks. Moreover, the option of outsourcing that we propose is newwith respect to the team formation literature. Finally, in the designof our online algorithms we use the primal–dual framework, whichwas not the case for previous work on online team formation.
Primal–dual algorithms for online problems.
The algorithmswe design for our problems use the primal–dual technique. A thor-ough analysis on the applicability of this technique for online prob-lems can be found in the book by Buchbinder and Naor [8] and in[5]. Probably the most closely related to problem are the ski-rental and the set cover problems. We have already discussed the con-nection of
TFO to ski-rental and set cover in Section 2. One canalso draw the analogy with caching; one can think that bringinga page to the main memory is analogous to hiring a person. Themain differences are that in the typical caching problem we do not have covering constraints, there are no recurring costs for keep-ing pages in the cache, and there is a fixed limit on the number ofpages we can insert in the cache.
In this paper, we introduced and studied
Team Formation with Out-sourcing . We showed that hiring, firing, and outsourcing decisionscan be taken by an online algorithm leading to cost savings withrespect to alternatives. These cost savings are more striking when(1) the hiring and salary costs are low, because then hiring becomesan attractive option; (2) the tasks exhibit high coherence, i.e., con-secutive tasks are similar to each other; and (3) the time horizon islong enough that we can find a core pool of workers to stay hiredand satisfy a large fraction of the skills required by incoming tasks.Technically, the problems we have analyzed in this paper in-volve embedding a set-cover problem in an online algorithm. Ourmain algorithms (
LumpSum , TFO ) are able to give results that arecompetitive in practice and, equally importantly, theoretically closeto the best one can hope for. The design of our algorithms is basedon the online primal–dual technique; we provide an experimen-tal evidence of the goodness of this method even for a complexreal-world problem. Furthermore, we present two heuristics which,although in theory are not competitive, perform well in practice.Future work may extend this by considering worker compatibil-ity [4, 18], learning of new skills by hired workers, or other exten-sions.
Future work.
As most problems, we can introduce further ele-ments to introduce even more generality. For instance, the algo-rithms we have described assume one and only one task arrivesper unit of time, can be extended trivially to cases in which taskarrivals occur at arbitrary times.As we noted in Section 6, there are also parallels with scenariosof caching and paging. Extending
TFO when the number of hiredworkers is limited turned out to be a challenging combination ofset cover, weighted caching and ski rental. We have began to studythese problems, our preliminary results show that we can achievea O ( log k log m ) approximation, in which k is the maximum size ofthe worker pool. A more natural constraint could be that, for in-stance, the total cost paid per unit of time cannot exceed a certainbudget, which would represent a cap in weekly or monthly per-sonel expenses. Another element we could incorporate is the pos-sibility of not handling a task, but instead paying a penalty when atask is too difficult to handle with current workers and it is expen-sive to replace the worker pool with new workers. Other variantscan include workers with different ability levels. We plan to studysome of these variants in future work.Additionally, we note that all the algorithms we have presentedin this paper are deterministic. Just as randomized algorithms forpaging can be defined in the primal–dual framework [8], it is ofinterest to introduce other update rules for the primal variablesthat allow us to describe a randomized algorithm. Reproducibility.
The code and data of this paper can be found at https://github.com/adrianfaz/Algorithms-for-Hiring-and-Outsourcing-in-the-Online-Labor-Market . EFERENCES [1] Noga Alon, Baruch Awerbuch, Yossi Azar, Niv Buchbinder, , and Joseph Naor.2009. The Online Set Cover Problem.
SIAM J. Comput.
39, 2 (2009), 361–370.[2] Aijun An, Mehdi Kargar, and Morteza ZiHayat. 2013. Finding Affordable andCollaborative Teams from a Network of Experts. In
SDM . 587–595.[3] Aris Anagnostopoulos, Luca Becchetti, Carlos Castillo, Aristides Gionis, and Ste-fano Leonardi. 2010. Power in unity: forming teams in large-scale communitysystems. In
ACM CIKM . 599–608.[4] Aris Anagnostopoulos, Luca Becchetti, Carlos Castillo, Aristides Gionis, and Ste-fano Leonardi. 2012. Online team formation in social networks. In
WWW . 839–848.[5] Nikhil Bansal. 2013. The Primal-Dual Approach for Online Algorithms. In
Ap-proximation and Online Algorithms , Thomas Erlebach and Giuseppe Persiano(Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1–1.[6] Jerome Barthelemy. 2001. The hidden costs of IT outsourcing.
MIT Sloan man-agement review
42, 3 (2001), 60.[7] Niv Buchbinder, Kamal Jain, and Joseph Seffi Naor. 2007. Online Primal-dual Al-gorithms for Maximizing Ad-auctions Revenue. In
Proceedings of the 15th AnnualEuropean Conference on Algorithms (ESA’07) .Springer-Verlag, Berlin, Heidelberg,253–264. http://dl.acm.org/citation.cfm?id=1778580.1778606[8] Niv Buchbinder and Joseph Naor. 2009. The design of competitive online al-gorithms via a primal: dual approach.
Foundations and Trends in TheoreticalComputer Science
3, 2–3 (2009), 93–263.[9] Nikhil R. Devanur and Thomas P. Hayes. 2009. The Adwords Problem: OnlineKeyword Matching with Budgeted Bidders Under Random Permutations. In
Pro-ceedings of the 10th ACM Conference on Electronic Commerce (EC ’09) . ACM, NewYork, NY, USA, 71–78. https://doi.org/10.1145/1566374.1566384[10] Christoph Dorn and Schahram Dustdar. 2010. Composing Near-Optimal ExpertTeams: A Trade-Off between Skills and Connectivity. In
OTM Conferences (1) .472–489.[11] Amita Gajewar and Atish Das Sarma. 2012. Multi-skill Collaborative Teamsbased on Densest Subgraphs. In
SDM . 165–176.[12] Behzad Golshan, Theodoros Lappas, and Evimaria Terzi. 2014. Profit-maximizing cluster hires. In
ACM SIGKDD . 1196–1205.[13] Chien-Ju Ho and Jennifer Wortman Vaughan. 2012. Online Task Assignment inCrowdsourcing Markets. In
AAAI , Vol. 12. 45–51. [14] J. Howe. 2006. The rise of crowdsourcing.
WIRED (June) (2006).[15] L.B. Jeppesen and K.R. Lakhani. 2010. Marginality and problem-solving effec-tiveness in broadcast search.
Organization Science
21, 5 (2010), 1016–1033.[16] Mehdi Kargar and Aijun An. 2011. Discovering top-k teams of expertswith/without a leader in social networks. In
CIKM . 985–994.[17] A. Kittur, B. Smus, S. Khamkar, and R.E. Kraut. 2011. CrowdForge: Crowdsourc-ing Complex Work. In
Annual ACM Symposium on User Interface Software andTechnology . 43–52.[18] Theodoros Lappas, Kun Liu, and Evimaria Terzi. 2009. Finding a team of expertsin social networks. In
ACM SIGKDD . 467–476.[19] Cheng-Te Li and Man-Kwan Shan. 2010. Team Formation for Generalized Tasksin Expertise Social Networks. In
SocialCom/PASSAT . 9–16.[20] Ann Majchrzak and Arvind Malhotra. 2013. Towards an information systemsperspective and research agenda on crowdsourcing for innovation.
The Journalof Strategic Information Systems
22, 4 (2013), 257–268.[21] Anirban Majumder, Samik Datta, and K. V. M. Naidu. 2012. Capacitated teamformation problem on social networks. In
KDD . 1005–1013.[22] T.W. Malone, R. Laubacher, and C. Dellarocas. 2010. The collective intelligencegenome.
MIT Sloan Management Review
51, 3 (2010), 21–31.[23] Mark S Manasse. 2008. Ski Rental Problem. In
Encyclopedia of Algorithms .Springer, 849–851.[24] OECD. 2016. Organization for Economic Cooperation and Development Dataon Self-Employment. https://data.oecd.org/emp/self-employment-rate.htm.[25] D. Retelny, S. Robaszkiewicz, A. To, W.S. Lasecki, and J. Patel. 2014. Expertcrowdsourcing with flash teams. In
ACM symposium on User interface softwareand technology . 75–85.[26] C. Riedl and A. W. Woolley. 2016. Teams vs. Crowds: Incentives, member ability,and collective intelligence in temporary online team organizations. (2016).[27] Mauro Sozio and Aristides Gionis. 2010. The Community-search Problem andHow to Plan a Successful Cocktail Party. In
ACM SIGKDD . 939–948.[28] J. Surowiecki. 2004.
The wisdom of crowds: Why the many are smarter than thefew and how collective wisdom shapes business, economies, societies and nations .Anchor Books.[29] Sushma Un. 2017. The waning days of Indian IT workers being paid to do noth-ing.
Quartz India (2017).[30] Vijay V Vazirani. 2013.