Weight Prediction for Variants of Weighted Directed Networks
WWEIGHT PREDICTION FOR VARIANTS OF WEIGHTED DIRECTEDNETWORKS
DONG QUAN NGOC NGUYEN, LIN XING, AND LIZHEN LIN
Abstract.
A weighted directed network (WDN) is a directed graph in which each edge is associated toa unique value called weight. These networks are very suitable for modeling real-world social networksin which there is an assessment of one vertex toward other vertices. One of the main problems studiedin this paper is prediction of edge weights in such networks. We introduce, for the first time, ametric geometry approach to studying edge weight prediction in WDNs. We modify a usual notion ofWDNs, and introduce a new type of WDNs which we coin the term almost-weighted directed networks (AWDNs). AWDNs can capture the weight information of a network from a given training set. Wethen construct a class of metrics (or distances) for AWDNs which equips such networks with a metricspace structure. Using the metric geometry structure of AWDNs, we propose modified k nearestneighbors (kNN) methods and modified support-vector machine (SVM) methods which will then beused to predict edge weights in AWDNs. In many real-world datasets, in addition to edge weights, onecan also associate weights to vertices which capture information of vertices; association of weights tovertices especially plays an important role in graph embedding problems. Adopting a similar approach,we introduce two new types of directed networks in which weights are associated to either a subsetof origin vertices or a subset of terminal vertices . We, for the first time, construct novel classes ofmetrics on such networks, and based on these new metrics propose modified k NN and SVM methodsfor predicting weights of origins and terminals in these networks. We provide experimental results onseveral real-world datasets, using our geometric methodologies. Introduction
Many real world datasets can be modeled as weighted directed networks (WDNs) which are the mainobjects studied throughout this paper. In a WDN, each edge is associated to a weight (which is a realnumber between a given closed interval [ a, b ] in R .). For example, in a digraph G = ( O, T, E ), where O is the set of buyers, and T is the set of products, an edge e = ( o, t ) is formed in G if the buyer o buys theproduct t . A natural question in this social network is how much a buyer o likes or dislikes a product t . The degree of liking varies, and so it is natural to specify a value to each liking. Hence one obtainsa map W : E → [ a, b ], where [ a, b ] is a closed interval in R which represents the level of intensity of theevaluation of users towards products.Many real-world datasets, for example, Bitcoin exchanges or Wikipedia networks are explicit WDNs.A natural question in social network analysis is how to predict weights of edges in WDNs. Edgeweight prediction plays an important role in other tasks in networks such as community detection [1, 2],anomaly detection [3, 4], information diffusion [5, 6], among others. Thus a natural question as tohow to predict weights of edges in WDNs is important in network analysis. In this paper, we not onlydeal with the problem of predicting the weight of edges in WDN datasets, but also, for the first time,propose methodology for predicting weights associated with the vertices, i.e., origins and terminals ofthe graphs in digraph datasets. Note that prediction of weights of vertices plays an important role inother problems in network analysis, for example, graph embedding problems.In order to achieve these aims, we introduce several classes of metrics (or distances) on digraphs whichequip such digraphs with structures of metric spaces. Using these geometric structures, we introducemodified k nearest neighbors methods and modified support-vector machine (SVM) methods in suchdigraphs. Up to the best of our knowledges, this is the the first work that has equipped digraphs with Date : September 29, 2020. a r X i v : . [ c s . S I] S e p DONG QUAN NGOC NGUYEN, LIN XING, AND LIZHEN LIN such geometric structures, for the set of edges, the set of origins, or the set of terminals. The geometricapproach introduced in this paper may also see applications elsewhere in network analysis [7, 8, 9].The work of [10] focuses on edge weight prediction problem in real-world directed weighted signednetworks (DWSNs) which are a special type of WDNs. They introduce two measurements of verticesin DWSNs, the first of which is fairness , used to measure how fair a vertex in evaluating other vertices,and the second one is goodness , used to measure how good a vertex is from the viewpoints of othervertices. In the real-world DWSN datasets studied in [10], it is reasonable to view that the weight ofan edge as the product of fairness and goodness of two vertices forming the edge. Based on this view,[10] provides an algorithm to predict edge weights in DWSNs. One potential drawback of this approachis that the edge weight prediction algorithm strongly depends on specific WSN datasets in which theweights of edges are computed using both fairness and goodness values.Our geometric approach to edge weight prediction is more direct, and focuses entirely on the weightinformation of edges instead of depending on nodes features or measurements associated to vertices. Ina usual set-up for edge weight prediction, one is given a set of edges, called a training set , in a WDNwhose weights are explicitly given. A novel feature in our geometric approach is that we incorporateweight information from the training set to introduce a notion of topological neighborhoods of edges ,and construct a class of metrics in the WDN, using such topological neighborhoods. Many real-worldWDNs also carry weights of vertices. For example, in the work of [10], they introduce two measures ofvertices: the fairness of an origin vertex captures how fair the origin vertex in assessing other terminalvertices, and the goodness of a terminal vertex to signify how good this vertex is assessed by other originvertices. If we attach fairness and goodness values to the set of origin vertices and terminal vertices,respectively, one obtains two types of directed networks with weighted origins or terminals. Theseweight information of origins and terminals are important in these real-world social networks; thus anatural question is whether one can also predict weights of origins or terminals in such networks. Byadopting a similarly geometric approach, we propose two new types of directed networks which we cointhe terms almost-weighted origin directed networks (AWODNs) and almost-weighted terminal directednetworks (AWWDNs). In AWODNs, a subset of origins with known weights is given. In AWTDNs, asubset of terminals with known weights is given. It bears a resemblance with the notion of AWDNs thatwe describe above.We then propose, for the first time, methodologies using modified kNN and SVM methods for pre-dicting weights of vertices in such networks.2.
Variants of Weighted Directed Networks
In this section, we introduce some basic notions and notation that will be used throughout the paper.
Definition 2.1.
A directed graph (or digraph) G is a pair G = ( V, E ), where V is a set of elements(called vertices of G ), and E is a collection of ordered pairs of vertices of the form ( u, p ) (which are calleddirected edges), where u, p belong in V . We indicate the direction of an edge e = ( u, p ) by specifyingthat e starts from the first vertex u , and heads to the second vertex p . The vertex u is called the origin of e , and the vertex p is called the terminal of e . Example 2.2.
Fig. 1 is an example of a digraph G = ( O, T, E ), where O = { a, b, c, d } and T = { , , , } . The directed arrows represent edges in G . So the set of edges E consists of exactly sevenedges, say ( a, a, b, b, c, c, d, Remark 2.3.
For a directed graph G = ( V, E ), throughout this paper, we denote by O the set of allorigins of directed edges in G , and by T the set of all terminals of directed edges in G . It is clear that V = O ∪ T . Note that it may occur that there exists a vertex v belonging in O ∩ T , i.e., v is an originof a directed edge, and also a terminal of another directed edge. In many places in this paper, we alsowrite G = ( O, T, E ) to specify the sets of origins and of terminals in G .We begin by introducing several variants of WDNs. The first variant is a digraph equipped with theset of weights for a subset of origins in the digraph. Definition 2.4. (Almost origin-weighted directed networks)
EIGHT PREDICTION FOR VARIANTS OF WEIGHTED DIRECTED NETWORKS 3
Figure 1.
An example of a weighted directed networkLet G = ( O, T, E ) be a digraph. G is called an almost origin-weighted directed network (written asAOWDN for brevity) if there is a subset O A of O such that there exists a mapping F O A : O A → [ a, b ]where [ a, b ] is a finite interval in the reals R .We use the notation G = ( O, T, E, O A , F O A ) to denote the above AOWDN. Example 2.5.
Fig. 1 represents an example of an AOWDN G = ( O, T, E, O A , F O A ) in which O, T, E are the same as in Example 2.2, O A = { b, c } , and F O A : O A → [0 ,
1] is a mapping of weights of originsin O A given by F O A ( b ) = 0 . F O A ( c ) = 0 . Remark 2.6.
In real data examples which can be viewed as a digraph G = ( O, T, E ), most casesassume that there is some weight F O A ( o ) associated to each origin o in a subset O A of O , where onecan view O A as a training set. The main aim is to extend the map F O A to the whole set of origins O ,i.e., construct a predictive model for F O A on O which allows to predict the weight of every vertex in O . As an explicit example of an AOWDN, one can take G as a digraph of users and products, where O denotes the set of users, and T denotes the set of products. An edge, say e = ( u, p ) in G is formed if theuser u buys the product p . One is interested in knowing how fair a user u in G evaluates products in G ,which we call the weight of u . In practice, if the set of users in G evolves over time, and thus at a giventime, one only knows a subset of users in G , say O A whose weights are known. Our goal is to predict,in a future time, how fair a user u in the complement of O A in O is, and thus provides an insight intothe dynamic network G of users and products. In this particular example, prediction of weights of usersprovides an overall evaluation of users towards a fixed set of products in the network G .Another variant is a digraph equipped with the set of weights for a subset of terminals in the digraph. Definition 2.7. (Almost terminal-weighted directed networks)Let G = ( O, T, E ) be a digraph. G is called an almost terminal-weighted directed network (writtenas ATWDN for short) if there is a subset T B of T such that there exists a mapping G T B : T B → [ a, b ]where [ a, b ] is a finite interval in the reals R .We use the notation G = ( O, T, E, T B , G T B ) to denote the above ATWDN. Example 2.8.
Fig. 1 represents an example of an ATWDN G = ( O, T, E, T B , G T B ) in which O, T, E are the same as in Example 2.2, T B = { , } , and G T B : T B → [0 ,
1] is a mapping of weights of terminalsin T B given by G T B (2) = − . G T B (4) = 0 . Remark 2.9.
In real data examples which can be viewed as a digraph G = ( O, T, E ), most casesassume that there is some weight G T B ( t ) associated to each terminal t in a subset T B of T , where onecan view T B as a training set. The main aim is to extend the map G T B to the whole set of terminals B , i.e., construct a predictive model for G T B on T . DONG QUAN NGOC NGUYEN, LIN XING, AND LIZHEN LIN
The last variant of directed networks considered in this paper is a digraph in which only a subset ofedges is equipped with weights.
Definition 2.10. (Almost weighted directed networks)Let G = ( O, T, E ) be a digraph. G is called an almost weighted directed network (written as AWDNfor short) if there is a subset E L of E such that there exists a mapping W E L : E L → [ a, b ] where [ a, b ]is a finite interval in the reals R .We use the notation G = ( O, T, E, E L , W E L ) to denote the above AWDN. Example 2.11.
Fig. 1 represents an example of an AOWDN G = ( O, T, E, E L , W E L ) in which O, T, E are the same as in Example 2.2, E L = { ( b, , ( b, , ( c, , ( c, } (i.e., all red directed edges in Fig.1), and W E L : E L → [ − ,
1] is a mapping of weights of edges in E L given by W E L (( b, . W E L (( b, . W E L (( c, − .
15, and W E L (( c, . Remark 2.12.
In real data examples which can be viewed as a digraph G = ( O, T, E ), most casesassume that there is some invariant W E L ( e ) associated to each edge e in a subset E L of E , where onecan view E L as a training set in edge weight prediction for G . The main aim is to extend the map W E L to the whole set of edges E , i.e., construct a predictive model for W E L on E which allows to predict theweight of every edge in E . 3. Metrics modulo equivalent relations
We present in this section a notion of metric spaces modulo equivalence relations. We first recall anotion of equivalence relations on sets.
Definition 3.1. (Equivalence relation)Let X be a set. An equivalence relation , denoted by ∼ =, on X is a subset of X × X such that thefollowing are true:(i) ( Reflexivity ) ( a, a ) ∈ ∼ = for every a ∈ X .(ii) ( Symmetry ) ( a, b ) ∈ ∼ = if and only if ( b, a ) ∈ ∼ =.(iii) ( Transitivity ) if ( a, b ) ∈ ∼ = and ( b, c ) ∈ ∼ = then ( a, c ) ∈ ∼ =.When ( a, b ) ∈ ∼ =, we say that a is ∼ =–equivalent to b . Throughout this paper, in order to signify thisrelation, we write a ∼ = b whenever ( a, b ) ∈ ∼ =.An equivalence relation ∼ = on a set provides a way to identify similar elements in the set. Equivalentlyif one can find a measurement to measure how similar elements in a set are, then one can modify thismeasurement to introduce an equivalence relation on the set.We recall the notion of metrics modulo equivalence relation on a set. Definition 3.2. (Metric modulo an equivalence relation)Let X be a set, and ∼ = an equivalence relation on X . A mapping d : X × X → R is said to be a metric on X modulo the equivalence relation ∼ = if the following condition are satisfied:(i) d ( a, b ) ≥ a, b ∈ X .(ii) d ( a, b ) = 0 if and only if a ∼ = b .(iii) ( Symmetry ) d ( a, b ) = d ( b, a ) for all a, b ∈ X .(iv) ( Triangle inequality ) for any a, b, c ∈ X , d ( a, b ) ≤ d ( a, c ) + d ( c, b ) . A metric modulo an equivalence relation ∼ = on a set X acts almost like a metric. The only differencebetween a metric modulo an equivalence relation and a metric on a set is that condition (ii) in Definition3.2 is replaced by a stronger condition that d ( a, b ) = 0 if and only if a = b . But in studying weightprediction for a given dataset, it is often the case that distinct elements in the dataset share similarweights . In this case, it is natural to view that the distance between these distinct elements as zero sincethey are considered to be equivalent with respect to the property that the weights associated to themare approximately close. Note that if one uses a usual metric on this dataset, then one cannot identifysimilarities among distinct elements sharing almost the same weights. Thus it is more natural to use ametric modulo an equivalence relation on the dataset to study weight predictions for such dataset. EIGHT PREDICTION FOR VARIANTS OF WEIGHTED DIRECTED NETWORKS 5 Classes of metrics on variants of WDNs
A class of metrics on the set of origins in AOWDNs.
In this subsection, we introduce aclass of metrics modulo equivalent relations on the set of origins in an AOWDN. This is the first timethat such a class of metrics is defined in an AOWDN G = ( O, T, E, O A , F O A ). A novel feature of suchmetrics encodes weight information from the training set O A to transfer to weight prediction for originsnot contained in O A . We begin by introducing a notion of neighbors of origins in an AOWDN. Definition 4.1. (neighbors of origins)Let G = ( O, T, E, O A , F O A ) be an AOWDN. Let o be an element in O . A neighbor of o is an element α in O A such that there is a terminal t ∈ T for which ( o, t ) and ( α, t ) both belong to the set of directededges E .In notation, let N O ( o ) denote the set of all neighbors of o , and set n O ( o ) = N O ( o )–the number ofneighbors of o .Let h > h to introduce a metric on O . Let o ∈ O , and write N O ( o ) = { α , . . . , α n O ( o ) } . (1)We begin by defining, for each o ∈ O , AvgF ( o ) = F O A ( α ) + · · · + F O A ( α n O ( o ) ) n O ( o ) . (2)For each o ∈ O , let C O,h ( o ) be the number of neighbors α in O A of o such that | F O A ( α ) − AvgF ( o ) | ≤ h. (3)We introduce an equivalence relation on O as follows. We say that u ∼ = O v for origins u, v ∈ O if andonly if C O,h ( u ) = C O,h ( v ). It is clear that ∼ = O is an equivalence relation.We define a mapping D O,h : O × O → R ≥ as follows. For each pair of origins ( u, v ) ∈ O × O , define D O,h ( u, v ) = | C O,h ( u ) − C O,h ( v ) | . (4)One obtains the following theorem whose proof will be given in the appendix. Theorem 4.2. D O,h is a metric on O modulo the equivalence relation ∼ = O . A class of metrics on the set of terminals in ATWDNs.
In this subsection, we introduce aclass of metrics modulo equivalent relations on the set of terminals in an ATWDN.
Definition 4.3. (neighbors of terminals)Let G = ( O, T, E, T B , G T B ) be an ATWDN. Let t be an element in T . A neighbor of t is an element β in T B such that there is an origin o ∈ O for which ( o, t ) and ( o, β ) both belong to the set of directededges E .In notation, let N T ( t ) denote the set of all neighbors of t , and set n T ( t ) = N T ( t )–the number ofneighbors of t .Let h > h to introduce a metric on T . Let t ∈ T , and write N T ( t ) = { β , . . . , β n T ( t ) } . (5)We begin by defining, for each t ∈ T , AvgG ( t ) = G T B ( β ) + · · · + G T B ( β n T ( t ) ) n T ( t ) . (6)For each t ∈ T , let C T,h ( t ) be the number of neighbors β in T B of t such that | G T B ( β ) − AvgG ( t ) | ≤ h. (7)We introduce an equivalence relation on T as follows. We say that p ∼ = T q for terminals p, q ∈ T ifand only if C T,h ( p ) = C T,h ( q ). It is clear that ∼ = T is an equivalence relation. DONG QUAN NGOC NGUYEN, LIN XING, AND LIZHEN LIN
We define a mapping D T,h : T × T → R ≥ as follows. For each pair of terminals ( p, q ) ∈ T × T , define D T,h ( p, q ) = | C T,h ( p ) − C T,h ( q ) | . (8)One obtains the following result. Theorem 4.4. D T,h is a metric on T modulo an equivalence relation ∼ = T . The proof of Theorem 4.4 will be given in the appendix.4.3.
A class of metrics on the set of edges in AWDNs.
In this subsection, we introduce a class ofmetrics modulo equivalent relations on the set of edges in an AWDN. We begin by introducing a notionof neighbors of edges in an AWDN.
Definition 4.5. (neighbors of edges)Let G = ( O, T, E, E L , W E L ) be an AWDN. Let e be an element in E . A neighbor of e is an element a in E L such that either o ( e ) = o ( a ) or t ( e ) = t ( a ).In notation, let N E ( e ) denote the set of all neighbors of e , and set n E ( e ) = N E ( e )–the number ofneighbors of e .Let h > h to introduce a metric on E . Let e ∈ E , and write N E ( e ) = { a , . . . , a n E ( e ) } . (9)We begin by defining, for each e ∈ E , AvgW ( e ) = W E L ( a ) + · · · + W E L ( a n E ( e ) ) n E ( e ) . (10)For each e ∈ E , let C E,h ( e ) be the number of neighbors a in E L of e such that | W E L ( a ) − AvgW ( e ) | ≤ h. (11)We introduce an equivalence relation on E as follows. We say that e ∼ = E a if and only if C E,h ( e ) = C E,h ( a ). It is clear that ∼ = E is an equivalence relation.We define a mapping D E,h : E × E → R ≥ as follows. For each pair ( e , a ) ∈ E × E , define D E,h ( e , a ) = | C E,h ( e ) − C E,h ( a ) | . (12)We obtain the following theorem whose proof will be given in the appendix. Theorem 4.6. D E,h is a metric on E modulo an equivalence relation ∼ = E . kNN for variants of WDNs In this section, using metrics D O,h , D
T,h , and D E,h , we introduce our modified k nearest neighbors(kNN) method. The method bears a resemblance of the classical kNN for sign prediction. Instead ofbasing on Euclidean distances as in the classical kNN, we employ our own metrics constructed in Section4.5.1. kNN for AOWDNs. In this subsection, we introduce a method for predicting a model of F O A forthe whole set O of origin vertices. We use our own metrics constructed in Subsection 4.1 for introducinga predictive model for F O A .We first take an arbitrary positive integer k of our choice which can be viewed as a tuning parameterfor the kNN method proposed here. Let G = ( O, T, E, O A , F O A ) be an AOWDN. Our aim is to extendthe map F O A to the whole O , i.e., construct the map F on the set of origins such that the restriction of F to O A is F O A .Let x be an arbitrary origin in O . Let d , . . . , d k be the k smallest distance values from x to O A suchthat the d i are nonzero, i.e., the d i are the k smallest nonzero values among all the values D O,h ( x, α )for all α ∈ O A , where h is a given tuning parameter and D O,h is the metric introduced in Subsection4.1.
EIGHT PREDICTION FOR VARIANTS OF WEIGHTED DIRECTED NETWORKS 7
Let kN N O ( x ) be the set of all elements α in O A such that there exists an integer 1 ≤ i ≤ k for which d i = D O,h ( x, α ). We propose to define a predictive model of F O A as follows. For each x ∈ O , define (cid:91) F ( x ) = (cid:80) α ∈ kNN O ( x ) F O A ( α ) k . (13)5.2. kNN for ATWDNs. In this subsection, we introduce a method for predicting a model of G T B for the set T of terminal vertices.Let k be an arbitrary positive integer of our choice which can be viewed as a tuning parameter forthe kNN method. Let G = ( O, T, E, T B , G T B ) be an ATWDN. Our aim is to extend the map G T B tothe whole set T of terminals, i.e., construct the map G on the set of terminals such that the restrictionof G to T B is G T B .Let t be an arbitrary terminal in T . Let d , . . . , d k be the k smallest distance values from x to T B with respect to the metric D T,h such that the d i are nonzero, where h is a given tuning parameter.Let kN N T ( t ) be the set of elements β in T B such that there exists an integer 1 ≤ i ≤ k for which d i = D T,h ( t, β ). We propose to define an extension of G T B as follows. For each t ∈ T , define (cid:100) G ( t ) = (cid:80) β ∈ kNN T ( t ) G T B ( β ) k . (14)5.3. kNN for AWDNs. In this subsection, we introduce a modified kNN method for predicting amodel of W E L for the set E of directed edges.Again we take an arbitrary positive integer k as a tuning parameter for the kNN method. Let G = ( O, T, E, E L , W E L ) be an AWDN. Our aim is to extend the map W E L to the whole set E of edges,i.e., construct the map W on the set of terminals such that the restriction of W to E L is W E L .Let e be an arbitrary edge in E . Let d , . . . , d k be the k smallest distance values from e to E L withrespect to the metric D E,h such that the d i are nonzero, where h is a tuning parameter. Let kN N E ( e )be the set of elements a in E L such that there exists an integer 1 ≤ i ≤ k for which d i = D E,h ( e , a ). Wepropose to define an extension of W E L as follows. For each e ∈ E , define (cid:91) W ( e ) = (cid:80) e ∈ kNN E ( e ) W E L ( a ) k . (15) 6. SVM for variants of WDNs
In this section, we introduce a method for predicting weights in different types of WDNs. The methodresembles the classical support-vector machine (SVM) method in regression analysis (see [11]) . In orderto compute the kernel of SVM model, we introduce a transfer map that embeds the objects we want tostudy into R .Throughout this section, we fix a kernel function κ : R × R → R . There are many choices for such akernel function such as linear kernel, polynomial kernel, or Gaussian radial basis kernel (see, for example,[11]).6.1. SVM for AOWDNs.
In this subsection, let G = ( O, T, E, O A , F O A ) be an arbitrary AOWDN.Fix a tuning parameter h >
0. In order to construct the kernel function for SVM model on G , we firstdefine a transfer mapping T O : O → R (which allows to view each origin as a real number) of the form T O ( o ) = C O,h ( o )(16)for each o ∈ O , where C O,h ( o ) is given in Definition 4.1.The SVM model for predicting the weights of origins in G is given by (cid:91) F ( o ) = y ( T O ( o )) = ω + m (cid:88) i =1 ω i F O A ( α i ) κ ( T O ( o ) , T O ( α i )) , (17)where m is the size of the subset O A , α , . . . , α m are all the origins in O A , and the w i are coefficients ofthe SVM model which need to be estimated by using the values F O A ( α ) , . . . , F O A ( α m ). DONG QUAN NGOC NGUYEN, LIN XING, AND LIZHEN LIN
SVM for ATWDNs.
In this subsection, let G = ( O, T, E, T B , G T B ) be an arbitrary ATWDN.Fix a tuning parameter h >
0. The transfer map defined on the set of terminals, denoted as T T : T → R (which allows to view each terminal as a real number), is of the form T T ( t ) = C T,h ( t )(18)for each t ∈ T , where C T,h ( t ) is given in definition 4.3.The SVM model for predicting the weights of terminals in G is given by (cid:100) G ( t ) = y ( T T ( x )) = ω + n (cid:88) i =1 ω i G T B ( β i ) κ ( T T ( t ) , T T ( β i )) , (19)where n the size of the subset T B , β , . . . , β n are all the terminals in T B , and the w i are coefficients ofthe SVM model which need to be estimated by using the values G T B ( β ) , . . . , G T B ( β n ).6.3. SVM for AWDNs.
In this subsection, let G = ( O, T, E, E L , W E L ) be an arbitrary AWDN. Fix atuning parameter h >
0. The transfer map defined on the set of edges, denoted as T E : E → R (whichallows to view each edge as a real number), is of the form T E ( e ) = C E,h ( e )(20)for each e ∈ E , where C E,h ( e ) is given in definition 4.5.The SVM model for predicting the weights of edges in G is given by (cid:91) W ( e )= y ( T E ( e ))= ω + (cid:80) Jj =1 ω j W EL ( a j ) κ ( T E ( e ) , T E ( a j )) , (21)where J is the size of the subset E L , a , ..., a J are all the edges in E L , and the w i are coefficients of theSVM model which need to be estimated by using the values W E L ( a ) , . . . , W E L ( a J ).7. Experimental analysis on real datasets
In this section, we apply our modified kNN methods and modified SVM methods to predicting weightsof origins, terminals, and edges in three real weighted directed networks–Bitcoin network, Epinionsdataset, and WikiSigned dataset. Below we first give a description of each network. • Bitcoin OTC . This is a weighted signed directed network of people who trade using Bitcoin ona platform called Bitcoin OTC. The dataset is available at http://snap.stanford.edu/data/soc-sign-bitcoin-otc.html ) (see [10] and [4]).Since users are anonymous, it is necessary to maintain a record of users’ reputation to preventtransactions with fraudulent and risky users. Members of Bitcoin OTC rate other members’level of trustfulness on a scale of −
10 (total distrust) and +10 (total trust). • Epinions . This dataset was collected by Paolo Massa in a 5-week crawl (November/December2003) from the Epinions.com Website (see the dataset at ) (see [12]). In Epinions, each user rates the helpfulness of a review on a 1–5scale, where 1 means totally not helpful and 5 mean totally helpful. • WikiSigned . This is a WDN between Wikipedia editors. An edge from an editor i to anothereditor j represents the degree of trustfulness of i to the edits made by j . More details of thedataset could be found in [13].Note that all datasets above are weighted directed networks in which there is a map of weights W from the set of edges to a closed interval [ a, b ] in R . In Bitcoin OTC, the interval is [ − , , − , − , E L consisting of 3500 edges equipped with weights. Wethen use our modified kNN and SVM methods to predict weights of the remaining 1500 edges.We are not aware of any digraph datasets containing weights of origins and terminals. So based onthe notion of fairness and goodness introduced in [10], we associate to each network above weights oforigins and terminals. In each network from the above three networks, an edge is represented by ( o, t ), EIGHT PREDICTION FOR VARIANTS OF WEIGHTED DIRECTED NETWORKS 9 where o is the origin representing the rater, and t is the terminal of the edge representing the ratee.The weight of origin o we associate here is computed by the fairness metric which indicates how fairis the rater o in assessing other terminals. The weight of terminal t we associate here is computed bythe goodness metric which indicates how good is the ratee t from the viewpoints of other raters. Thefairness and goodness metrics are described in more detail in [10]. From the three datasets BitcoinOTC, Epinions, WikiSigned, we create AOWDNs in which the fairness of origin o represents the weightof o , and ATWDNs in which the goodness of terminal t represents the weight of such terminal. In each AOW DN , we randomly choose 70% of the set of origins O as the training set O A , and similarly in eachATWDN, we randomly choose 70% of the set of terminals T as the training set T B . For each AOW DN ,since the map of weights of origins F O A is constructed using fairness scores of origins (see Section III(B)in [10] for algorithm and formula to compute the fairness metric), the range of F O A is [0 , AT W DN , since the map of weights of origins G T B is constructed using goodness scores of origins (seeSection III(B) in [10] for algorithm and formula to compute the goodness metric), the range of G T B is[ − , D O,h , D T,h , D E,h , we choose the tuning parameter h tobe the standard deviation of all weights in the training sets O A , T B , and E L , respectively. To accessaccuracy in our prediction methods, we use the mean absolute error (MAE), and also the root meansquare error (RMSE). In Tables 1, 2, 3, and 4, each cell reports a pair of numbers (MAE, RMSE). Ourmodified kNN and SVM methods perform very well for all predictions. For origin weight prediction,the MAE and RMSE range over the interval [0 , , . − . . − . . − . . − . . − . . − . Table 1.
Descriptions of Datasets
Network Origins Terminals Edges % Positive E Bitcoin OTC
Epinions
WikiSigned
Table 2.
Results of Predicting Weights of Origins
Network kNN SVMBitcoin OTC (0.075, 0.139) (0.073, 0.138)
Epinions (0.125, 0.163) (0.116, 0.186)
WikiSigned (0.095, 0.155) (0.081, 0.158)
Table 3.
Results of Predicting Weights of Terminals
Network kNN SVMBitcoin OTC (0.099, 0.163) (0.087, 0.163)
Epinions (0.193, 0.221) (0.183 0.247)
WikiSigned (0.134, 0.209) (0.096, 0.218)
Table 4.
Results of Predicting Edge Weights.
Network kNN SVMBitcoin OTC (0.193, 0.312) (0.158, 0.315)
Epinions (0.278, 0.353) (0.245, 0.408)
WikiSigned (0.189, 0.312) (0.158, 0.315)8.
Conclusions
Our paper proposes novel geometric approaches to predict edge weights in weighted directed networks.Our paper also studies weight prediction for vertices in networks, which has not been investigated before,to the best of our knowledge.Our main contributions are as follows: • Variants of weighted directed networks
Our work is the first work that has introduced several variants of directed networks equippedwith weights of origins, terminals, or edges. We call these networks almost-weighted origindirected networks , almost-weighted origin directed networks , almost-weighted edge directed net-works , respectively. These types of networks are very suitable for modeling real-world datasetssince in most cases, one only knows weights of certain subsets of the set of origins, terminals, oredges, especially for temporal or dynamic networks in which the graph structures change overtime. For the purpose of predicting weights, it is very useful to have weight information of thenetwork at a given time, which can be viewed as a training set of weights. • Novel geometric approaches
We introduce a metric geometry approach to studying weight prediction problems in digraphs.We introduce several classes of metrics modulo equivalent relations on different types of weighteddigraphs. • Modified kNNs and SVMs
We introduce modified k nearest neighbors method and support-vector machine methodsfor predicting weights in digraphs. These methods base on the metric geometric structures ofdigraphs that we introduce in this work.9. Appendix
In this appendix, we prove Theorems 4.2, 4.4, and 4.6.
A. Proof of Theorem 4.2.
It is clear that D O,h ( u, v ) ≥ u, v ∈ O . Thus (i) in Definition3.2 follows. Condition (iii) in Definition 3.2 is straightforward. By definition of ∼ = O , C O,h ( u ) = C O,h ( v )if and only if u ∼ = O v . Thus D O,h ( u, v ) = | C O,h ( u ) − C O,h ( v ) | = 0. Thus if and only if u ∼ = O v , whichproves (ii) in Definition 3.2.For any origins u, v, w ∈ O , we see that D O,h ( u, w ) = | C O,h ( u ) − C O,h ( w ) | = | ( C O,h ( u ) − C O,h ( v )) + ( C O,h ( v ) − C O,h ( w )) |≤ | ( C O,h ( u ) − C O,h ( v )) | + | C O,h ( v ) − C O,h ( w ) | = D O,h ( u, v ) + D O,h ( v, w ) , which verifies (iv) in Definition 3.2. Thus D O,h is a metric on O modulo the equivalence relation ∼ = O . (cid:3) B. Proof of Theorem 4.4.
It is clear that D T,h ( p, q ) ≥ p, q ∈ T . Thus (i) in Definition3.2 follows. Condition (iii) in Definition 3.2 is straightforward. By definition of ∼ = T , C T,h ( p ) = C T,h ( q )if and only if p ∼ = T q . Thus D T,h ( p, q ) = | C T,h ( p ) − C T,h ( q ) | = 0. Thus if and only if p ∼ = T q , whichproves (ii) in Definition 3.2. EIGHT PREDICTION FOR VARIANTS OF WEIGHTED DIRECTED NETWORKS 11
For any terminals p, q, r ∈ O , we see that D T,h ( p, r ) = | C T,h ( p ) − C T,h ( r ) | = | ( C T,h ( p ) − C T,h ( q )) + ( C T,h ( q ) − C T,h ( r )) |≤ | ( C T,h ( p ) − C T,h ( q )) | + | C T,h ( q ) − C T,h ( r ) | = D T,h ( p, q ) + D T,h ( q, r ) , which verifies (iv) in Definition 3.2. Thus D T,h is a metric on T modulo the equivalence relation ∼ = T . (cid:3) B. Proof of Theorem 4.6.
It is clear that D E,h ( a , e ) ≥ a , e ∈ T . Thus (i) in Definition3.2 follows. Condition (iii) in Definition 3.2 is straightforward. By definition of ∼ = E , C E,h ( a ) = C E,h ( e )if and only if a ∼ = E e . Thus D E,h ( a , e ) = | C E,h ( a ) − C E,h ( e ) | = 0. Thus if and only if a ∼ = E e , whichproves (ii) in Definition 3.2.For any terminals a , b , e ∈ O , we see that D E,h ( a , b ) = | C E,h ( a ) − C E,h ( b ) | = | ( C E,h ( a ) − C E,h ( e )) + ( C E,h ( e ) − C E,h ( b )) |≤ | ( C E,h ( a ) − C E,h ( e )) | + | C E,h ( e ) − C E,h ( b ) | = D E,h ( a , e ) + D E,h ( e , b ) , which verifies (iv) in Definition 3.2. Thus D E,h is a metric on E modulo the equivalence relation ∼ = E . (cid:3) References [1] V. A. Traag and J. Bruggeman, “Community detection in networks with positive and negative links,”
Physical ReviewE , vol. 80, no. 3, p. 036115, 2009.[2] B. Yan and S. Gregory, “Detecting community structure in networks using edge prediction methods,”
Journal ofStatistical Mechanics: Theory and Experiment , vol. 2012, no. 09, p. P09008, 2012.[3] S. Kumar, F. Spezzano, and V. Subrahmanian, “Accurately detecting trolls in slashdot zoo via decluttering,” in ,pp. 188–195, IEEE, 2014.[4] S. Kumar, B. Hooi, D. Makhija, M. Kumar, C. Faloutsos, and V. Subrahmanian, “Rev2: Fraudulent user predictionin rating platforms,” in
Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining ,pp. 333–341, 2018.[5] E. Bakshy, I. Rosenn, C. Marlow, and L. Adamic, “The role of social networks in information diffusion,” in
Proceedingsof the 21st international conference on World Wide Web , pp. 519–528, 2012.[6] M. Shafaei and M. Jalili, “Community structure and information cascade in signed networks,”
New GenerationComputing , vol. 32, no. 3-4, pp. 257–269, 2014.[7] A. Roy, C. Sarkar, J. Srivastava, and J. Huh, “Trustingness & trustworthiness: A pair of complementary trust measuresin a social network,” in , pp. 549–554, IEEE, 2016.[8] Z. Wu, C. C. Aggarwal, and J. Sun, “The troll-trust model for ranking in signed networks,” in
Proceedings of theNinth ACM international conference on Web Search and Data Mining , pp. 447–456, 2016.[9] J. Leskovec, D. Huttenlocher, and J. Kleinberg, “Predicting positive and negative links in online social networks,” in
Proceedings of the 19th international conference on World wide web , pp. 641–650, 2010.[10] S. Kumar, F. Spezzano, V. Subrahmanian, and C. Faloutsos, “Edge weight prediction in weighted signed networks,”in , pp. 221–230, IEEE, 2016.[11] T. Hastie, R. Tibshirani, and J. Friedman,
The elements of statistical learning: data mining, inference, and prediction .Springer Science & Business Media, 2009.[12] P. Massa and P. Avesani, “Trust-aware recommender systems,” in
Proceedings of the 2007 ACM conference onRecommender systems , pp. 17–24, 2007.[13] S. Maniu, B. Cautis, and T. Abdessalem, “Building a signed network from interactions in wikipedia,” in
Databasesand Social Networks , pp. 19–24, 2011.
Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, NotreDame, Indiana 46556, USA
E-mail address : [email protected] URL : http://nd.edu/~dnguye15 E-mail address : [email protected] E-mail address ::