[PDF] How to Build Your Network? A Structural Analysis

Abstract

Creating new ties in a social network facilitates knowledge exchange and affects positional advantage. In this paper, we study the process, which we call network building, of establishing ties between two existing social networks in order to reach certain structural goals. We focus on the case when one of the two networks consists only of a single member and motivate this case from two perspectives. The first perspective is socialization: we ask how a newcomer can forge relationships with an existing network to place herself at the center. We prove that obtaining optimal solutions to this problem is NP-complete, and present several efficient algorithms to solve this problem and compare them with each other. The second perspective is network expansion: we investigate how a network may preserve or reduce its diameter through linking with a new node, hence ensuring small distance between its members. We give two algorithms for this problem. For both perspectives the experiment demonstrates that a small number of new links is usually sufficient to reach the respective goal.

Full PDF

HHow to Build Your Network? A StructuralAnalysis

Anastasia Moskvina and Jiamou Liu Auckland University of Technology, New Zealand [email protected] The University of Auckland, New Zealand [email protected]

Abstract.

Creating new ties in a social network facilitates knowledgeexchange and aﬀects positional advantage. In this paper, we study theprocess, which we call network building , of establishing ties between twoexisting social networks in order to reach certain structural goals. Wefocus on the case when one of the two networks consists only of a singlemember and motivate this case from two perspectives. The ﬁrst perspec-tive is socialization : we ask how a newcomer can forge relationships withan existing network to place herself at the center. We prove that obtain-ing optimal solutions to this problem is NP-complete, and present severaleﬃcient algorithms to solve this problem and compare them with eachother. The second perspective is network expansion : we investigate howa network may preserve or reduce its diameter through linking with anew node, hence ensuring small distance between its members. For bothperspectives the experiment demonstrates that a small number of newlinks is usually suﬃcient to reach the respective goal.

The creation of interpersonal ties has been a fundamental question in the struc-tural analysis of social networks. While strong ties emerge between individualswith similar social circles, forming a basis of trust and hence community struc-ture, weak ties link two members who share few common contacts. The inﬂu-ential work of Granovetter reveals the vital roles of weak ties: It is weak tiesthat enable information transfer between communities and provide individualspositional advantage and hence inﬂuence and power [8].Natural questions arise regarding the establishment of weak ties betweencommunities: How to merge two departments in an organization into one? Howdoes a company establish trade with an existing market? How to create a trans-port map from existing routes? We refer to such questions as network building .The basic setup involves two networks; the goal is to establish ties between themto achieve certain desirable properties in the combined network. A real-life ex-ample of network building is the inter-marriages between members of the Medici,the leading family of Renaissance Florence, and numerous other noble Florentinefamilies, towards gaining power and control over the city [11]. Another example a r X i v : . [ c s . S I] M a y s by Paul Revere, a prominent Patriot during the American Revolution, whostrategically created social ties to raise a militia [24].The examples of the Medici and Paul Revere pose a more restricted scenarioof network building: Here one of the two networks involved is only a single node,and the goal is to establish this node in the other network. We motivate thissetup from two directions:1. This setup amounts to the problem of socialization : the situation when anewcomer joins a network as an organizational member. A natural questionfor the newcomer is the following: How should I forge new relationships inorder to take an advantageous position in the organization? As indicated in[18], socialization is greatly inﬂuenced by the social relations formed by thenewcomer with “insiders” of the network.2. This setup also amounts to the problem of network expansion . For example,an airline expands its existing route map with a new destination, while tryingto ensure a small number of legs between any cities. Distance refers to the length of a shortest path between two members ina network; this is an important measure of the amount of inﬂuence one mayexert to another in the network [13]. The radius of a network refers to themaximal distance from a central member to all others in a network. Hence whena newcomer joins an established network, it is in the interest of the newcomer tokeep her distance to others bounded by the radius. The diameter of a networkrefers to the longest distance between any two members. It has long been arguedfrom network science that small-world property – the property that any twomembers of a network are linked by short paths – improves network robustnessand facilitates information ﬂow [25]. Hence it is in the interest of the networkto keep the diameter small as the network expands. Furthermore, each relationrequires time and eﬀort to establish and maintain; thus one is interested inminimizing the number of new ties while building a network.

Contribution.

The novelty of this work is in proposing a formal, algorithmicstudy of organizational socialization. More speciﬁcally we investigate the follow-ing network building problems : Given a network G , add a new node u to G andcreate as few ties as possible for u such that:(1) u is in the center of the resulting network; or(2) the diameter of the resulting network is not larger than a speciﬁc value.Intuitively, (1) asks how a newcomer u may optimally connect herself with mem-bers of G , so that she belongs to the center. We prove that this problem is infact NP-complete (Theorem 3). Nevertheless, we give several eﬃcient algorithmsfor this problem; in particular, we demonstrate a “simpliﬁcation” process thatsigniﬁcantly improves performance. Intuitively, (2) asks how a network may pre-serve or reduce its diameter by connecting with a new member u . We showthat “preserving the diameter” is trivial for most real-life networks and givetwo algorithms for “reducing the diameter”. We experimentally test and com-pare the performance of all our algorithms. Quite surprisingly, the experimentsemonstrate that a very small number of new edges is usually suﬃcient for eachproblem even when the graph becomes large. Related works.

This work is predated by organizational behavioral studies[21,9,18], which look at how social ties aﬀect a newcomer’s integration and as-similation to the organization. The authors in [4,24] argue brokers – those whobridge and connect to diverse groups of individuals – enable good network build-ing; creating ties with and even becoming a broker oneself allows a person togain private information, wide skill set and hence power. Network building the-ory has also been applied to various other contexts such as economics (strategicalliance of companies) [23], governance (forming inter-government contracts) [1],and politics (individuals’ joining of political movements) [20]. Compared to theseworks, the novelty here is in proposing a formal framework of network building,which employs techniques from complexity theory and algorithmics.This work is also related to two forms of network formation: dynamic models and agent-based models , both aim to capture the natural emergence of socialstructures [11]. The former originates from random graphs, viewing the emer-gence of ties as a stochastic process which may or may not lead to an optimalstructure [5]. The latter comes from economics, treating a network as a mul-tiagent system where utility-maximizing nodes establish ties in a competitivesetting [12,10]. Our work diﬀers from network formation as the focus here is oncalculated strategies that achieve desirable goals in the combined network.

We view a network as an undirected unweighted connected graph G = ( V, E )where V is a set of nodes and E is a set of (undirected) edges on V . We denotean edge { u, v } as uv . If uv ∈ E then v is said to be adjacent to u . A path (of length k ) is a sequence of nodes u , u , . . . , u k where u i u i +1 ∈ E for any 0 ≤ i < k .The distance between u and v , denoted by dist ( u, v ), is the length of a shortestpath from u to v . The eccentricity of u is the maximum distance from u to anyother node, i.e., ecc ( u ) = max v ∈ V dist ( u, v ). The diameter of the network G is diam ( G ) = max u ∈ V ecc ( u ). The radius rad ( G ) of G is min u ∈ V ecc ( u ). The center of G consists of those nodes that are closest to all other nodes; it is the set C ( G ) := { u ∈ V | ecc ( u ) = rad ( G ) } . Deﬁnition 1.

Let G = ( V, E ) be a network and u be a node not in V . For S ⊆ V , denote by E S the set of edges { uv | v ∈ S } . Deﬁne G ⊕ S u as the graph ( V ∪ { u } , E ∪ E S ) . We require that S (cid:54) = ∅ and thus G ⊕ S u is a network built by incorporating u into G . By [24], for a newcomer u to establish herself in G it is essential to identify information brokers who connect to diverse parts of the network. Following thisintuition, we make the following deﬁnition Deﬁnition 2.

A set S ⊆ V is a broker set of G if ecc ( u ) = rad ( G ⊕ S u ) ; namely,linking with S enables u to get in the center of the network. ormally, given a network G = ( V, E ), the problem of network building for u means selecting a set S ⊆ V so that the combined network G ⊕ S u satisﬁes certainconditions. Moreover, the desired set S should contain as few nodes as possible.We focus on the following two key problems:1. BROKER : The set S is a broker set.2. DIAM ∆ : The diameter diam ( G ⊕ S u ) ≤ ∆ for a given ∆ ≤ diam ( G ).Note that for any network G , if u is adjacent to all nodes in G , it will haveeccentricity 1, i.e., in the network G ⊕ V u , ecc ( u ) = 1 = rad ( G ⊕ V u ) and diam ( G ⊕ V u ) = 2. Hence a desired S must exist for BROKER and

DIAM ∆ where ∆ ≥

2. Insubsequent section we systematically investigate these two problems.

BROKER

We investigate the computational complexity of the decision problem

BROKER ( G, k ),which is deﬁned as follows:

INPUT

A network G = ( V, E ), and an integer k ≥ OUTPUT

Does G have a broker set of size k ?The BROKER ( G, k ) problem is trivial if G has radius 1, as then V is theonly broker set. When rad ( G ) >

1, we recall the following notion: A set of nodes S ⊆ V is a dominating set if every node not in S is adjacent to at least onemember of S . The domination number γ ( G ) is the size of a smallest dominatingset for G . The DOM ( G, k ) problem concerns testing whether γ ( G ) ≤ k for a givengraph G and input k ; it is a classical NP-complete decision problem [7]. Theorem 3.

The

BROKER ( G, k ) problem is NP-complete.Proof. The

BROKER ( G, k ) problem is clearly in NP. Therefore we only showNP-hardness. We present a reduction from

DOM ( G, k ) to

BROKER ( G, k ). Notethat when rad ( G ) = 1, γ ( G ) = 1. Hence DOM ( G, k ) remains NP-complete if weassume rad ( G ) >

1. Given a graph G = ( V, E ) where rad ( G ) >

1, we constructa graph H . The set of nodes in H is { v i | v ∈ V, ≤ i ≤ } . The edges of H areas follows: – Add an edge v i v i +1 for every v ∈ V , 1 ≤ i < – Add an edge v w for every v, w ∈ V – Add an edge v w for every edge vw ∈ E Namely, for each node v ∈ V we create three nodes v , v , v which form a path.We link the nodes in { v | v ∈ V } to form a complete graph, and nodes in { v | v ∈ V } to form a copy of G . Since rad ( G ) ≥

2, for each node v ∈ V thereis w ∈ V with dist ( v, w ) ≥

2. Hence in H , dist ( v , w ) ≥

4, and dist ( v , w ) ≥ v to any other node is 3, we have rad ( H ) = 3.uppose S is a dominating set of G . If we add all edges uv where v ∈ D = { v | v ∈ S } , ecc ( u ) = 3 = rad ( H ⊕ D u ). Hence D is a broker set for H .Thus the size of a minimal broker set of H is at most the size of a minimaldominating set of G . Conversely, for any set D of nodes in H , deﬁne the projection p ( D ) = { v | v i ∈ D for some 1 ≤ i ≤ } . Suppose p ( D ) is not a dominating setof G . Then there is some v ∈ V such that for all w ∈ p ( D ), dist ( v , w ) ≥ { ux | x ∈ D } , dist ( u, v ) ≥

4. But then ecc ( w ) = 3for any w ∈ p ( D ). So D is not a broker set. This shows that the size of a minimaldominating set of G is at most the size of a minimal broker set.The above argument implies that the size of a minimal broker set for H coincides with the size of a minimal dominating set for G . This ﬁnishes thereduction and hence the proof. (cid:117)(cid:116) Theorem 3 implies that computing optimal solution of

BROKER is computation-ally hard. Nevertheless, we next present a number of eﬃcient algorithms thattake as input a network G = ( V, E ) with radius r and output a small broker set S for G . A set S ⊆ V is called sub-radius dominating if for all v ∈ V not in S ,there exists some w ∈ S with dist ( v, w ) < r . Our algorithms are based on thefollowing fact, which is clear from deﬁnition: Fact 1

Any sub-radius dominating set is also a broker set. (a) Three greedy algorithms

We ﬁrst present three greedy algorithms; eachalgorithm applies a heuristic that iteratively adds new nodes to the broker set S . The starting conﬁguration is S = ∅ and U = V . During its computation,the algorithm maintains a subgraph F = ( U, E (cid:22) U ), which is induced by the set U of all “uncovered” nodes, i.e., nodes that have distance > ( r −

1) from anycurrent nodes in S . It repeatedly performs the following operations until U = ∅ ,at which point it outputs S :1. Select a node v ∈ U based on the corresponding heuristic and add v to S .2. Compute all nodes at distance at most ( r −

1) from v . Remove these nodesand all attached edges from F . Algorithm 1:

Max (Max-Degree).

The ﬁrst heuristic is based on the intuitionthat one should connect to the person with the highest number of social ties; ateach iteration, it adds to S a node with maximum degree in the graph F . Algorithm 2:

Btw (Betweenness).

The second heuristic is based on between-ness , an important centrality measure in networks [3]. More precisely, the be-tweenness of a node v is the number of shortest paths from all nodes to all othersthat pass through v . Hence high betweenness of v implies, in some sense, that v is more likely to have short distance with others. This heuristic works in thesame manner as Max but picks nodes with maximum betweenness in F . lgorithm 3: ML (Min-Leaf ). The third heuristic is based on the followingintuition: A node is called a leaf if it has minimum degree in the network; leavescorrespond to least connected members in the network, and may become out-liers once nodes with higher degrees are removed from the network. Hence thisheuristic gives ﬁrst priority to leaves. Namely, at each iteration, the heuristicadds to S a node that has distance at most r − v . More precisely, theheuristic ﬁrst picks a leaf v in F , then applies a sub-procedure to ﬁnd the nextnode w to be added to S . The sub-procedure determines a path v = u , u , . . . in F iteratively as follows:1. Suppose u i is picked. If i = r or u i has no adjacent node in F , set u i as w and terminate the process.2. Otherwise select a u i +1 (which is diﬀerent from u i − ) among adjacent nodesof u i with maximum degree.After the process above terminates, the algorithm adds w to S . Note that thedistance between w and v is at most r − regular graphs ,i.e., graphs where all nodes have the same degree. In particular, ML has beenshown to produce small k -dominating sets for given k in the average case forregular graphs. (b) Simpliﬁed greedy algorithms One signiﬁcant shortcoming of Algorithms1–3 is that, by deleting nodes from the network G , the network may becomedisconnected, and nodes that could have been connected via short paths are nolonger reachable from each other. This process may produce isolated nodes in F ,i.e., nodes having degree 0, which are subsequently all added to the output set S .Moreover, maintaining the graph F at each iteration also makes implementationsmore complex. Therefore we next propose simpliﬁed versions of Algorithms 1–3. Algorithms 4 S - Max , 5 S - Btw , 6 S - ML . The simpliﬁed algorithms act in asimilar way as their “non-simpliﬁed” counterparts; the diﬀerence is that here theheuristic works over the original network G as opposed to the updated network F . Hence the graph F is no longer computed. Instead we only need to maintain aset U of “uncovered” nodes. The simpliﬁed algorithms have the following generalstructure: Start from S = ∅ and U = V , and repeatedly perform the followinguntil U = ∅ , at which point output S :1. Select a node v from U based on the corresponding heuristic and add v to S .2. Compute all nodes with distance < rad ( G ) from v , and remove any of thesenode from U .We stress that here the same heuristics as described above in Algorithms 1–3 areapplied, except that we replace any mention of “ F ” in the description with “ U ”,while all notions of degrees, distances, and betweenness are calculated based onthe original network G .s an example, in Fig. 1 we run Max and S - Max on the same network G ,which contains 30 nodes. The ﬁgures show the result of both algorithms, and inparticular, how S - Max outputs a smaller sub-radius dominating set. We furtherverify via experiments below that the simpliﬁed algorithms lead to much smalleroutput S in almost all cases. (c) Center-based algorithms The 6 algorithms presented above can all beapplied to ﬁnd k -dominating set for arbitrary k ≥

1. Since our focus is in ﬁndingsub-radius dominating set to answer the

BROKER problem, we describe twoalgorithms that are speciﬁcally designed for this task. When building networkfor a newcomer, it is natural to consider nodes that are already in the center ofthe network G . Hence our two algorithms are based on utilizing the center of G . Algorithm 7

Center . The algorithm ﬁnds a center v in G with minimum degree,then output all nodes that are adjacent to v . Since v belongs to the center, for all w ∈ V , we have dist ( v, w ) ≤ rad ( G ) and thus there is v (cid:48) adjacent to v such that dist ( w, v (cid:48) ) = dist ( w, v ) − < rad ( G ). Hence the algorithm returns a sub-radiusdominating set. Despite its apparent simplicity, Center returns surprisingly goodresults in many cases, as shown in the experiments below.

Algorithm 8

Imp - Center . We present a modiﬁed version of

Center , which wecall

Imp - Center . The algorithm ﬁrst picks a center with minimum degree, andthen orders all its neighbors in decreasing degree. It adds the ﬁrst neighbor to S and remove all nodes ≤ ( r − C . If C has a smallerradius than r , we add the center of this component to S ; otherwise we add thenext neighbor to S . We then remove from F all nodes at distance ≤ ( r − F is empty. SeeProcedure 1. Fig. 2 shows an example where Imp - Center out-performs

Center .Finally, we note that all of Algorithms 1–8 output a sub-radius dominatingset S for the network G . Thus the following theorem is a direct implication fromFact 1. Theorem 4.

All of Algorithms 1–8 output a brocker set for the network G . BROKER

We implemented the algorithms using Sage [22]. We apply two models of randomgraphs: The ﬁrst (BA) is Barabasi-Albert’s preferential attachment model whichgenerates scale-free graphs whose degree distribution of nodes follows a powerlaw; this is an essential property of numerous real-world networks [2]. The second(NWS) is Newman-Watts-Strogatz’s small-world network [19], which producesgraphs with small average path lengths and high clustering coeﬃcient. ig. 1.

The network G contains 30 nodes and has radius rad ( G ) = 4. The Max algorithm: The algorithm ﬁrst puts node 3 (shown in green) into S . Thenremoves all nodes (and attached edges that are at distance three from the node3; these nodes are considered “covered” by 3. In the remaining graph, thereare three isolated nodes 8,14,26, as well as a line of length 2. The algorithmthen puts the node 18 into S which “covers” 27 and 13. Thus the output set is S = { , , , , } . The S - Max algorithm: The algorithm ﬁrst puts 3 into theset S , but does not remove the covered nodes. It simply construct a set containingall “uncovered” nodes, namely, { , , , , , } . The algorithm then selectsthe node 13 which has max degree from these nodes, and puts into S . It thenturns out that all nodes are covered. Therefore the output set is S = { , } .Thus S - Max is superior in this example. rocedure 1

Imp - Center : Given G = ( V, E ) (with radius r ) Pick a center node v in G with minimum degree d Sort all adjacent nodes of v to a list u , u , . . . , u d in decreasing order of degreesSet S ← ∅ and i ← while U (cid:54) = ∅ do Set C as the largest connected component in F if rad ( C ) < rad ( G ) − then Pick a center node w of C . Set S ← S ∪ { w } Set U ← U \ { w (cid:48) ∈ U | dist ( w, w (cid:48) ) < r } else Set S ← S ∪ { u i } Set U ← U \ { w (cid:48) ∈ U | dist ( u i , w (cid:48) ) < r } Set i ← i + 1 end if Set F as the subgraph induced by the current U end whilereturn S Fig. 2.

The graph G has radius rad ( G ) = 3. The yellow node 0 is a center withmin degree 4. Thus Center outputs 4 nodes { , , , } . The dark green node29 adjacent to 0 has max degree; the red nodes are “uncovered” by 29. Thus Imp - Center outputs the 3 blue circled nodes { , , } .For each algorithm we are interested in two indicators of its performance:1) Output size : The average size of the output broker set (for a speciﬁc classof random graphs). 2)

Optimality rate : The probability that the algorithm givesoptimal broker set for a random graph. To compute this we need to ﬁrst computehe size of an optimal broker set (by brute force) and count the number of timesthe algorithm produces optimal solution for the generated graphs.

Experiment 1: Output sizes.

We generate 300 graphs whose numbers ofnodes vary between 100 and 1000 using each random graph model. We computeaveraged output sizes of generated graphs by their number of nodes n and radius r . The results are shown in Fig. 3. From the result we see: a) The simpliﬁed algo-rithms produce signiﬁcantly smaller broker sets compared to their unsimpliﬁedcounterparts. This shows superiority of the simpliﬁed algorithms. b) BA graphsin general allow smaller output set than NWS graphs. This may be due to thescale-free property which results in high skewness of the degree distribution. Fig. 3.

Comparing results: average performance of the

Max , Btw , ML , algorithmsversus their simpliﬁed versions on randomly generated graphs (BA graphs on theleft; NWS on the right) Experiment 2: Optimality rates.

For the second goal, we compute the op-timality rates of algorithms when applied to random graphs, which are shownin Fig. 4. For BA graphs, the simpliﬁed algorithm S - ML has signiﬁcantly higherptimality rate ( ≥ ML has the worst optimality rate. This is somewhat contrary toDuckworth and Mans’s work showing ML gives very small solution set for regu-lar graphs [6]. For NWS graphs, several algorithms have almost equal optimalityrate. The three best algorithms are S - Max , S - Btw and S - ML which has varyingperformance for graphs with diﬀerent sizes (See Fig. 5). Fig. 4.

Optimality rates for diﬀerent types of random graphs

Fig. 5.

Optimality rates when graphs are classiﬁed by sizes xperiment 3: Real-world datasets.

We test the algorithms on severalreal-world datasets: The

Facebook dataset, collected from survey participantsof Facebook App, consists of friendship relation on Facebook [17].

Enron is anemail network of the company made public by the FERC [14]. Nodes of the net-work are email addresses and if an address i sent at least one email to address j , the graph contains an undirected edge from i to j . Col

Col

2) [13].

Facebook Enron Col1 Col2Number of nodes 4,039 33,969 4,158 8,638Number of edges 88,234 180,811 13,422 24,806Largest connected subgraph 4,039 33,696 4,158 8,638Diameter 8 13 17 18Radius 4 7 9 10

Table 1.

Network propertiesResults on the datasets are shown in Fig. 6.

Btw and S - Btw algorithms becometoo ineﬃcient as it requires computing shortest paths between all pairs in eachiteration. Moreover, S - Max also did not terminate within reasonable time for the

Enron dataset. Even though the datasets have many nodes, the output sizes arein fact very small (within 10). For instance, the smallest output sets of the

Enron , Col

Col

Fig. 6.

The number of new ties for the four real-world networksAmong all algorithm

Imp - Center has the best performance, producing thesmallest output set for all networks. Moreover, for

Enron , Col

Col Imp - enter returns the optimal broker set with cardinality 2. A rather surprisingfact is, despite straightforward seemingly-naive logic, Center also produces smalloutputs in three networks. This reﬂects the fact that in order to become centralit is often a good strategy to create ties with the friends of a central person.

DIAM ∆ Let G = ( V, E ) be a network and u / ∈ V . The DIAM ∆ problem asks for a set S ⊆ V such that the network G ⊕ S u has diameter ≤ ∆ ; we refer to any such S as ∆ -enabling . We ﬁrst look at a special case when ∆ = diam ( G ), which has a natural mo-tivation: How can an airline expand its existing route map with an additionaldestination while ensuring the maximum number of hops between any two des-tinations is not increased? We are interested in creating as few new connectionsas possible to reach this goal. Let δ ( G ) denote the size of the smallest diam ( G )-enabling set for G . We say a graph is diametrically uniform if all nodes have thesame eccentricity. Theorem 5. (a) If G is not diametrically uniform, δ ( G ) = 1 .(b) If G is complete, then δ ( G ) = | V | .(c) If G is diametrically uniform and incomplete, then < δ ( G ) ≤ d where d isthe minimum degree of any node in G , and the upper bound d is sharp.Proof. For (a) , suppose G is not diametrically uniform. Take any v where ecc ( v ) < diam ( G ). Then in the expanded network G ⊕ { v } u , we have ecc ( u ) = ecc ( v ) + 1 ≤ diam ( G ). (b) is clear. For (c) Suppose G is diametrically uniformand incomplete. For the lower bound, suppose γ diam ( G ) − ( G ) = 1. Then thereis some v ∈ V with the following property: In the network G ⊕ { v } u we have ecc ( u ) ≤ diam ( G ), which means that ecc ( v ) < diam ( G ). This contradicts the factthat G is diametrically uniform. For the upper bound, take a node v ∈ V withthe minimum degree d . Let N be the set of nodes adjacent to v . From any node w (cid:54) = v , there is a shortest path of length ≤ diam ( G ) to v . This path contains anode in N . Hence w is at distance ≤ diam ( G ) − N . Further-more as G is not complete, diam ( G ) ≥ v is at distance 1 ≤ diam ( G ) − N . (cid:117)(cid:116) Remark

We point out that in case (c) calculating the exact value of δ ( G ) isa hard: In [16], its parametrized complexity is shown to be complete for W [2],second level of the W -hierarchy. Hence DIAM ∆ is unlikely to be in P . On the otherhand, we argue that real-life networks are rarely diametrically uniform. Henceby Thm. 5(a), the smallest number of new connections needed to preserve thediameter is 1. .2 Reducing the diameter We now explore the question

DIAM ∆ where 2 ≤ ∆ < diam ( G ); this refers to thegoal of placing a new member in the network and creating ties to allow a closerdistance between all pairs of members. We suggest two heuristics to solve thisproblem. Algorithm 9

Periphery . The periphery P ( G ) of G consists of all nodes v with ecc ( v ) = diam ( G ). Suppose diam ( G ) >

2. Then the combined network G ⊕ P ( G ) u has diameter smaller than diam ( G ). Hence we apply the following heuristic: Twonodes v, w in G are said to form a peripheral pair if dist ( v, w ) = diam ( G ). Thealgorithm ﬁrst adds the new node u to G and repeats the following procedureuntil the current graph has diameter ≤ ∆ :1) Randomly pick a peripheral pair v, w in the current graph2) Adds the edges uv, uw if they have not been added already3) Compute the diameter of the updated graphNote that once v, w are chosen as a peripheral pair and the corresponding edges uv, uw added, v and w will have distance 2 and they will not be chosen as aperipheral pair again. Hence the algorithm eventually terminates and producesa graph with diameter at most ∆ . Algorithm 10 CP (Center-Periphery). This algorithm applies a similarheuristic as

Periphery , but instead of picking peripheral pairs at each iteration,it ﬁrst picks a node v in the center and adds the edge uv ; it then repeats thefollowing procedure until the current graph has diameter ≤ ∆ :1) Randomly pick a node w in the periphery of the current graph2) Add the edge uw if it has not been added already3) Compute the diameter of the updated graphSuppose at one iteration the algorithm picks w in the periphery. Then after thisiteration the eccentricity of w is at most r + 2 where r is the radius of the graph. DIAM ∆ We implement and test the performance of Algorithms 9,10 for the problem

DIAM ∆ .The performance of these algorithms are measured by the number ofnew ties created. Experiment 4: Random graphs.

We apply the two models of random graphs,BA and NWS, as described above. We generated 350 graphs and considered thecase when ∆ = d ( G ) −

1, i.e. the aim was to improve the diameter by one. Forboth types of random graphs (ﬁxing size and radius), the average number of newties are shown in Fig. 7. The experiments show that

Periphery performs betterwhen the radius of the graph is close to the diameter (when radius is > / CP is slightly better when the radius is signiﬁcantly smallerthan the diameter. ig. 7. Comparing two methods for improving diameter applied to BA (left)and NWS (right) graphs

Experiment 5: Real-World Datasets.

We run both

Periphery and CP on thenetworks Col

Col ∆ = diam ( G ) − i for 1 ≤ i ≤ Periphery and CP are shown in Figure 8;naturally for increasing i , more ties need to be created. We point out that, despitethe large total number of nodes, one needs less than 19 new edges to improvethe diameter even by four. This reveals an interesting phenomenon: While acollaboration network may be large, a few more collaborations are suﬃcient toreduce the diameter of the network.On the Facebook dataset,

Periphery is signiﬁcantly better than CP : To reducethe diameter of this network from 8 to 7, Periphery requires 2 edges while CP requires 47. When one wants to reach the diameter 6, the numbers of new edgesincrease to 6 for Periphery and 208 for CP . Fig. 8.

Applying algorithms for improving diameter to Collaboration 1 andCollaboration 2 datasets

Conclusion and Outlook

This work studies how ties are built between a newcomer and an establishednetwork to reach certain structural properties. Despite achieving optimality isoften computationally hard, there are eﬃcient heuristics that reach the desiredgoals using few new edges. We also observe that the number of new links requiredto achieve the speciﬁed properties remain small even for large networks.This work amounts to an eﬀort towards an algorithmic study of networkbuilding. Along this eﬀort, natural questions have yet to be explored include:(1) Investigating the creation of ties between two arbitrary networks, namely,how ties are created between two established networks to maintain or reducediameter. (2) When building networks in an organizational context (such asmerging two departments in a company), one normally needs not only to takeinto account the informal social relations, but also formal ties such as the re-porting relations, which are typically directed edges [15]. We plan to investigatenetwork building in an organizational management perspective by incorporatingboth types of ties.

References

1. Andrew, S.A.: Adaptive versus restrictive contracts: Can they resulve diﬀerent riskproblems? In: Feiock, R., Scholz, J. (eds.) Self-Organizing Federalism: Collabora-tive Mechanisms to Mitigate Institutional Collective Action Dilemmas. CambridgeUniversity Press (2010)2. Barb´asi, A.L., Albert, R.: Emergence of scaling in random networks. Science286(5439), 509–512 (Oct 1999)3. Barth`elemy, M.: Betweenness centrality in large complex networks. Eur. Phys. J.B 38, 163–168 (2004)4. Cross, R., Thomas, R.: Managing yourself: a smarter way to network. HarvardBusiness Review 89(7–8), 149–153 (Jul–Aug 2011)5. Donetti, L., Hurtado, P.I., Munoz, M.A.: Entangled networks, synchronization andoptimal network topology. Phys. Rev. Lett. 95(188701) (2005)6. Duckworth, W., Mans, B.: Randomized greedy algorithms for ﬁnding small k -dominating sets of regular graphs. Random Structures and Algorithms 27(3), 401–412 (2005)7. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theoryof NP-Completeness. W.H.Freeman (1979)8. Granovetter, M.S.: The strength of weak ties. The American Journal of Sociology78(6), 1360–1380 (1973)9. Jablin, F.M., Krone, K.J.: Organizational assimilation. In: Berger, C., Chaﬀee, S.(eds.) Handbook of communication science, pp. 711–746. Sage (1987)10. Jackson, M.O.: A survey of models of network formation: Stability and eﬃciency.In: Demange, G., Wooders, M. (eds.) Group Formation in Economics; Networks,Clubs and Coalitions. Cambridge University Press (2004)11. Jackson, M.O.: The economics of social networks. In: Blundell, R., Newey, W.,Persson, T. (eds.) Proceedings of the 9th World Congress of the Econometric So-ciety. Cambridge University Press (2006)2. Kleinberg, J., Suri, S., Tardos, E., Wexler, T.: Strategic network formation withstructural holes. ACM SIGecom Exchanges 7(3) (November 2008)13. Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: Densiﬁcation andshrinking diameters. ACM Transactions on Knowledge Discovery from Data (ACMTKDD) 1(1) (2007)14. Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.: Community structure inlarge networks: Natural cluster sizes and the absence of large well-deﬁned clusters.Internet Mathematics 6(1), 29–123 (2009)15. Liu, J., Moskvina, A.: Hierarchies, ties and power in organizational networks:Model and analysis. In: ASONAM ’15 Proceedings of the 2015 IEEE/ACM In-ternational Conference on Advances in Social Networks Analysis and Mining. pp.202–209 (2015)16. Lokshtanov, D., Misra, N., Philip, G., Ramanujan, M.S., Saurabh, S.: Hardness ofr-dominating set on graphs of diameter ( r −−