[PDF] Online Reciprocal Recommendation with Theoretical Performance Guarantees

Abstract

A reciprocal recommendation problem is one where the goal of learning is not just to predict a user's preference towards a passive item (e.g., a book), but to recommend the targeted user on one side another user from the other side such that a mutual interest between the two exists. The problem thus is sharply different from the more traditional items-to-users recommendation, since a good match requires meeting the preferences of both users. We initiate a rigorous theoretical investigation of the reciprocal recommendation task in a specific framework of sequential learning. We point out general limitations, formulate reasonable assumptions enabling effective learning and, under these assumptions, we design and analyze a computationally efficient algorithm that uncovers mutual likes at a pace comparable to those achieved by a clearvoyant algorithm knowing all user preferences in advance. Finally, we validate our algorithm against synthetic and real-world datasets, showing improved empirical performance over simple baselines.

Full PDF

OOnline Reciprocal Recommendation with TheoreticalPerformance Guarantees

Fabio Vitale

Department of Computer ScienceSapienza University of Rome (Italy) & INRIA Lille (France)Rome, Italy & Lille, France [email protected]

Nikos Parotsidis

Department of Computer ScienceUniversity of Rome Tor VergataRome, Italy [email protected]

Claudio Gentile

INRIA Lille Nord Europe & Google (New York, USA)Lille, France & New York, USA [email protected]

Abstract

A reciprocal recommendation problem is one where the goal of learning is notjust to predict a user’s preference towards a passive item (e.g., a book), but torecommend the targeted user on one side another user from the other side such thata mutual interest between the two exists. The problem thus is sharply different fromthe more traditional items-to-users recommendation, since a good match requiresmeeting the preferences at both sides. We initiate a rigorous theoretical investiga-tion of the reciprocal recommendation task in a speciﬁc framework of sequentiallearning. We point out general limitations, formulate reasonable assumptions en-abling effective learning and, under these assumptions, we design and analyze acomputationally efﬁcient algorithm that uncovers mutual likes at a pace compa-rable to that achieved by a clearvoyant algorithm knowing all user preferencesin advance. Finally, we validate our algorithm against synthetic and real-worlddatasets, showing improved empirical performance over simple baselines.

Recommendation Systems are at the core of many successful online businesses, from e-commerce, toonline streaming, to computational advertising, and beyond. These systems have extensively beeninvestigated by both academic and industrial researchers by following the standard paradigm ofitems-to-users preference prediction/recommendation. In this standard paradigm, a targeted useris presented with a list of items that s/he may prefer according to a preference proﬁle that thesystem has learned based on both explicit user features (item data, demographic data, explicitlydeclared preferences, etc.) and past user activity. In more recent years, due to their hugely increasinginterest in the online dating and the job recommendation domains, a special kind of recommendationsystems called

Reciprocal Recommendation Systems (RRS) have gained big momentum. Thereciprocal recommendation problem is sharply different from the more traditional items-to-usersrecommendation, since recommendations must satisfy both parties , i.e., both parties can express theirlikes and dislikes and a good match requires meeting the preferences of both. Examples of RRSinclude, for instance: online recruitment systems (e.g.,

LinkedIn ), where a job seeker searches forjobs matching his/her preferences, say salary and expectations, and a recruiter who seeks suitable . a r X i v : . [ c s . L G ] J un andidates to fulﬁl the job requirements; heterosexual online dating systems (e.g., Tinder ), wherepeople have the common goal of ﬁnding a partner of the opposite gender; roommate matching systems(e.g., Badi ), used to connect people looking for a room to those looking for a roommate, onlinementoring systems, customer-to-customer marketplaces, etc.From a Machine Learning perspective, the main challenge in a RRS is thus to learn reciprocated preferences, since the goal of the system is not just to predict a user’s preference towards a passiveitem (a book, a movie, etc), but to recommend the targeted user on one side another user from theother side such that a mutual interest exists. Importantly enough, the interaction the two involvedusers have with the system is often staged and unsynced . Consider, for instance, a scenario where auser, Geena, is recommended to another user, Bob. The recommendation is successful only if bothGeena and Bob mutually agree that the recommendation is good. In the ﬁrst stage, Bob logs into thesystem and Geena gets recommended to him; this is like in a standard recommendation system: Bobwill give a feedback (say, positive) to the system regarding Geena. Geena may never know that shehas been recommended to Bob. In a subsequent stage, some time in the future, also Geena logs in. Inan attempt to ﬁnd a match, the system now recommends Bob to Geena. It is only when also Geenaresponds positively that the reciprocal recommendation becomes successful.The problem of reciprocal recommendation has so far being studied mainly in the Data Mining,Recommendation Systems, and Social Network Analysis literature (e.g., [7, 1, 14, 13, 11, 17, 21,3, 15]), with some interesting adaptations of standard collaborative ﬁltering approaches to userfeature similarity, but it has remained largely unexplored from a theoretical standpoint. Despite eachapplication domain has its own speciﬁcity, in this paper we abstract such details away, and focus onthe broad problem of building matches between the two parties in the reciprocal recommendationproblem based on behavioral information only. In particular, we do not consider explicit userpreferences (e.g., those evinced by user proﬁles), but only the implicit ones, i.e., those derived frompast user behavior. The explicit-vs-implicit user features is a standard dichotomy in RecommendationSystem practice, and it is by now common knowledge that collaborative effects (aka, implicit features)carry far more information about actual user preferences than explicit features, like, for instance,demographic metadata[16]. Similar experimental ﬁndings are also reported in the context of RRS inthe online dating domain [2].In this paper, we initiate a rigorous theoretical investigation of the reciprocal recommendationproblem, and we view it as a sequential learning problem where learning proceeds in a sequenceof rounds. At each round, a user from one of the two parties becomes active and, based on pastfeedback, the learning algorithm (called matchmaker ) is compelled to recommend one user from theother party. The broad goal of the algorithm is to uncover as many mutual interests (called matches )as possible, and to do so as quickly as possible . We formalize our learning model in Section 2.After observing that, in the absence of structural assumptions about matches, learning is virtuallyprecluded (Section 3), we come to consider a reasonable clusterability assumption on the preferenceof users at both sides. Under these assumptions, we design and analyze a computationally efﬁcientmatchmaking algorithm that leverages the correlation across matches. We show that the number ofuncovered matches within T rounds is comparable (up to constant factors) to those achieved by anoptimal algorithm that knows beforehand all user preferences, provided T and the total number ofmatches to be uncovered is not too small (Sections 3, and 4). Finally, in Section 5 we present a suiteof initial experiments, where we contrast (a version of) our algorithm to noncluster-based randombaselines on both synthetic and publicly available real-world benchmarks in the domain of onlinedating. Our experiments serve the twofold purpuse of validating our structural assumptions on userpreferences against real data, and showing the improved matchmaking performance of our algorithm,as compared to simple noncluster-based baselines. https://tinder.com . https://badiapp.com/en . For instance, users in an online dating system have relevant visual features, and the system needs speciﬁccare in removing popular user bias, i.e., ensuring that popular users are not recommended more often thanunpopular ones. G x B → {-1,+1} B

1 2 3 4 12341 2 3 4 1234 -1 -1 +1 +1+1 -1 -1 -1+1 +1 +1 +1 -1 -1 +1 +1+1 -1 +1 -1+1 -1 -1 -1+1 +1 +1 +1 +1 -1 -1 -1

G B x G → {-1,+1} σ σ B G ( c )( a ) ( b ) Figure 1: (a)

The (complete and directed) bipartite graph ( (cid:104) B, G (cid:105) , E, σ ) with n = | B | = | G | = 4 , edges areonly sketched. (b) Representation of the σ function through its two pieces σ : B × G → {− , +1 } ( B × G matrix on the left), and σ : G × B → {− , +1 } ( G × B matrix on the right). For instance, in this graph, Boy1 likes Girl 1 and Girl 3, and dislikes Girl 2 and Girl 4, while Girl 3 likes Boy 1, and dislikes Boys 2, 3, and 4.Out of the n = 16 pairs of reciprocal edges, this graph admits only M = 4 matches, which are denoted bygreen circles on both matrices. For instance, the pairing of edges (1 , and (3 , are a match since Boy 1 likesGirl 3 and, at the same time, Girl 3 likes Boy 1. (c) The associated (undirected and bipartite) matching graph M .We have, for instance, deg M ( Girl 1 ) = 3 , and deg M ( Boy 2 ) = 1 . We ﬁrst introduce our basic notation. We have a set of users V partitioned into two parties. Though anumber of alternative metaphores could be adopted here, for concreteness, we call the two parties B (for “boys") and G (for “girls"). Throughout this paper, g , g (cid:48) and g (cid:48)(cid:48) will be used to denote genericmembers of G , and b , b (cid:48) , and b (cid:48)(cid:48) to denote generic members of B . For simplicity, we assume the twoparties B and G have the same size n . A hidden ground truth about the mutual pre ferences of themembers of the two parties is encoded by a sign function σ : ( B × G ) ∪ ( G × B ) → {− , +1 } .Speciﬁcally, for a pairing ( b, g ) ∈ B × G , the assignment σ ( b, g ) = +1 means that boy b likes girl g ,and σ ( b, g ) = − means that boy b dislikes girl g . Likewise, given pairing ( g, b ) ∈ G × B , we have σ ( g, b ) = +1 when girl g likes boy b , and σ ( g, b ) = − when girl g dislikes boy b . The ground truth σ therefore deﬁnes a directed bipartite signed graph collectively denoted as ( (cid:104) B, G (cid:105) , E, σ ) , where E , the set of directed edges in this graph, is simply ( B × G ) ∪ ( G × B ) , i.e., the sef of all possible n directed egdes in this bipartite graph. A “+1" edge will sometimes be called a positive edge,while a “-1" edge will be called a negative edge. Any pair of directed edges ( g, b ) ∈ G × B and ( b, g ) ∈ B × G involving the same two subjects g and b is called a reciprocal pair of edges. We alsosay that ( g, b ) is reciprocal to ( b, g ) , and vice versa. The pairing of signed edges ( g, b ) and ( b, g ) iscalled a match if and only if σ ( b, g ) = σ ( g, b ) = +1 . The total number of matches will often bedenoted by M . See Figure 1 for a pictorial illustration.Coarsely speaking, the goal of a learning algorithm A is to uncover in a sequential fashion as manymatches as possible as quickly as possible. More precisely, we are given a time horizon T ≤ n , e.g., T = n √ n , and at each round t = 1 , . . . , T :(1 B ) A receives the id of a boy b chosen uniformly at random from B ( b is meant to be the “nextboy" that logs into the system);(2 B ) A selects a girl g (cid:48) ∈ G to recommend to b ;(3 B ) b provides feedback to the learner, in that the sign σ ( b, g (cid:48) ) of the selected boy-to-girl edge isrevealed to A .Within the same round t , the three steps described above are subsequently executed after switchingthe roles of G and B (and will therefore be called Steps (1 G ), (2 G ), and (3 G )). Hence, each round t is made up of two halves, the ﬁrst half where a boy at random is logged into the system and thelearner A is compelled to select a girl, and the second half where a girl at random is logged in and A has to select a boy. Thus at each round t , A observes the sign of the two directed edges ( b, g (cid:48) ) and ( g, b (cid:48) ) , where b ∈ B and g ∈ G are generated uniformly at random by the environment, and g (cid:48) and b (cid:48) are the outcome of A ’s recommendation effort. Notice that we assume the ground truth encodedby σ is persistent and noiseless, so that whereas the same user (boy or girl) may recur several timesthroughout the rounds due to their random generation, there is no point for the learner to request thesign of the same edge twice at two different rounds. The goal of algorithm A is to maximize thenumber of uncovered matches within the T rounds. The sign of the two reciprocal edges giving riseto a match need not be selected by A in the same round; the round where the match is uncovered Though different distributional assumptions could be made, for technical simplicity in this paper we decidedto focus on the uniform distribution only.

3s the time when the reciprocating edge is selected, e.g., if in round t we observe σ ( b, g (cid:48) ) = − , σ ( g, b (cid:48) ) = +1 , and in round t > t we observe σ ( b (cid:48) , g ) = +1 , σ ( g (cid:48)(cid:48) , b (cid:48)(cid:48) ) = +1 , we say that thematch involving b (cid:48) and g has been uncovered only in round t . In fact, if A has uncovered a positiveedge g → b (cid:48) in (the second half of) round t , the reciprocating positive edge ( b (cid:48) , g ) need not beuncovered any time soon, since A has at the very least to wait until b (cid:48) will log into the system, anevent which on average will occur only n rounds later.We call matching graph , and denote it by M , the bipartite and undirected graph having B ∪ G asnodes, where ( b, g ) ∈ B × G is an edge in M if and only if b and g determine a match in the originalgraph ( (cid:104) B, G (cid:105) , E, σ ) . Given b ∈ B , we let N M ( b ) ⊆ G be the set of matching girls for b accordingto σ , and deg M ( b ) be the number of such girls. N M ( g ) and deg M ( g ) are deﬁned symmetrically.See again Figure 1 for an example.The performance of algorithm A is measured by the number of matches found by A within the T rounds. Speciﬁcally, if M t ( A ) is the number of matches uncovered by A after t rounds of a given run,we would like to obtain lower bounds on M T ( A ) that hold with high probability over the randomgeneration of boys and girls that log into the system as well as the internal randomization of A . Tothis effect, we shall repeatedly use in our statements the acronym w.h.p to signify with probabilityat least − O ( n ) , as n → ∞ . It will also be convenient to denote by E t ( A ) the set of directededges selected by A during the ﬁrst t rounds, with E ( A ) = ∅ . A given run of A may therefore besummarized by the sequence { E t ( A ) } Tt =1 . Likewise, E rt ( A ) will denote the set of reciprocal (notnecessarily matching) directed edges selected by A up to time t . Finally, E r will denote the set of all | B | · | G | = n pairs of reciprocal (not necessarily matching) edges between B and G .We will ﬁrst show (Section 3) that in the absence of further assumptions on the way the matchesare located, there is not much one can do but try and simulate a random sampler. In order to furtherillustrate our model, the same section introduces a reference optimal behavior that assumes priorknowledge of the whole sign fuction σ . This will be taken as a yardstick to be contrasted to theperformance of our algorithm SMILE (Section 4) that works under more speciﬁc, yet reasonable,structural assumptions on σ . We now show that in the absence of speciﬁc assumptions on σ , the best thing to do in order touncover matches is to reciprocate at random, no matter how big the number M of matches actually is. Theorem 1

Given B and G such that | B | = | G | = n , and any integer m ≤ n , there exists arandomized strategy for generating σ such that M = m , and the expected number of matchesuncovered by any algorithm A operating on ( (cid:104) B, G (cid:105) , E, σ ) satisﬁes E M T ( A ) = O (cid:18) Tn M (cid:19) . An algorithm matching the above upper bound is described next. We call this algorithm

OOMM (Obliv-ious Online Match Maker), The main idea is to develop a strategy that is able to draw uniformlyat random as many pairs of reciprocal edges as possible from E r (recall that E r is the set of all reciprocal edges between B and G ). In particular, within the T rounds, OOMM will draw uniformlyat random Θ( T ) -many such pairs. The pseudocode of OOMM is given next. For brevity, throughoutthis paper an algorithm will be described only through Steps ( B ) and ( G ) – recall Section 2. OOMM simply operates as follows. In Step ( B ) of round t , the algorithm chooses a girl g (cid:48) uniformlyat random from the whole set G . OOMM maintains over time the set B g,t ⊆ B of all boys that sofar gave their feedback (either positive or negative) on g , but for whom the feedback from g is notavailable yet. In Step ( G ), if B g,t is not empty, OOMM chooses a boy uniformly at random from B g,t , otherwise it selects a boy uniformly at random from the whole set B . All proofs are provided in the appendix. Recall that an upper bound on M T ( A ) is a negative result here, since we are aimed at making M T ( A ) aslarge as possible. A boy could be selected more than once while serving a girl g during the T rounds. The optimality of OOMM (see Theorems 1 and 2) implies that this redundancy does not signiﬁcantly affect

OOMM ’s performance. lgorithm 1: OOMM (Oblivious Online Match Maker) (cid:46)

INPUT : B and G At each round t : (2 B ) Select g (cid:48) uniformly at random from G ;(2 G ) B g,t ← { b (cid:48)(cid:48) ∈ B : ( b (cid:48)(cid:48) , g ) ∈ E t ( OOMM ) , ( g, b (cid:48)(cid:48) ) (cid:54)∈ E t − ( OOMM ) } ; If B g,t (cid:54) = ∅ then select b (cid:48) uniformly at random from B g,t else select b (cid:48) uniformly at random from B .Note that, the way it is designed, the selection of g (cid:48) and b (cid:48) does not depend on the signs σ ( b, g ) or σ ( g, b ) collected so far. The following theorem guarantees that E M T ( OOMM ) = Θ (cid:0) Tn M (cid:1) , whichis as if we were able to directly sample in most of the T rounds pairs of reciprocal edges. Theorem 2

Given any input graph ( (cid:104) B, G (cid:105) , E, σ ) , with | B | = | G | = n , if T − n = Ω( n ) then E rT ( OOMM ) is selected uniformly at random (with replacement) from E r , its size | E rT ( OOMM ) | issuch that E | E rT ( OOMM ) | = Θ( T ) , and the expected number of matches disclosed by OOMM is suchthat E M T ( OOMM ) = Θ (cid:18) Tn M (cid:19) . We now describe an optimal behavior (called

Omniscient Matchmaker ) that assumes prior knowledgeof the whole edge sign assignment σ . This optimal behavior will be taken as a reference performancefor our algorithm of Section 4. This will also help to better clarify our learning model. Deﬁnition 1

The

Omniscient Matchmaker A ∗ is an optimal strategy based on the prior knowledgeof the signs σ ( b, g ) and σ ( g, b ) for all b ∈ B and g ∈ G . Speciﬁcally, based on this information, A ∗ maximizes the number of matches uncovered during T rounds over all n T possible selections thatcan be made in Steps (2 B ) and (2 G ). We denote this optimal number of matches by M ∗ T = M T ( A ∗ ) . Observe that when the matching graph M is such that deg M ( u ) > Tn for some user u ∈ B ∪ G , noalgorithm will be able to uncover all M matches in expectation, since Steps (1 B ) and (1 G ) of ourlearning protocol entail that the expected number of times each user u logs into the system is equal to Tn . In fact, this holds even for the Omniscient Matchmaker A ∗ , despite the prior knowledge of σ . Forinstance, when M turns out to be a random bipartite graph the expected number of matches thatany algorithm can achieve is always upper bounded by O (cid:0) Tn M (cid:1) (this is how Theorem 1 is proven –see Appendix B). On the other hand, in order to have M ∗ T = Θ( M ) as n grows large, it is sufﬁcientthat deg M ( u ) ≤ Tn holds for all users u ∈ B ∪ G , even with such a random M . In order to avoidthe pitfalls of M being a random bipartite graph (and hence the negative result of Theorem 1), weneed to slightly depart from our general model of Section 2, and make structural assumptions onthe way matches can be generated. The next section formulates such assumptions, and analyzes analgorithm that under these assumptions is essentially optimal i.t.o. number of uncovered matches.The assumptions and the algorithm itself are then validated against simple baselines on real-worlddata in the domain of online dating (Section 5). In a nutshell, our model is based on the extent to which it is possible to arrange the users in(possibly) overlapping clusters by means of the feedbacks they may potentially receive from theother party. In order to formally describe our model, it will be convenient to introduce the Booleanpreference matrices B , G ∈ { , } n × n . These two matrices collect in their rows the ground truthcontained in σ , separating the two parties B and G . Speciﬁcally, B i,j = (1 + σ ( b i , g j )) , and G i,j = (1 + σ ( g i , b j )) (these are essentially the matrices exempliﬁed in Figure 1(b) where the “ − ”signs therein are replaced by “ ”). Then, we consider the n column vectors of B (resp. G ) – i.e., thewhole set of feedbacks that each g ∈ G (resp. b ∈ B ) may receive from members of B (resp. G )and, for a given radius ρ ≥ , the associated covering number of this set of Boolean vectors w.r.t.Hamming distance. We recall that the covering number at radius ρ is the smallest number of balls of The matching graph M is a random bipartite graph if any edge ( b, g ) ∈ B × G is generated independentlywith the same probability p ∈ [0 , . ≤ ρ that are needed to cover the entire set of n vectors. The smaller ρ the higher the coveringnumber. If the covering number stays small despite a small ρ , then our n vectors can be clusteredinto a small number of clusters each one having a small (Hamming) radius.As we mentioned in Section 3, a reasonable model for this problem is one for which our learning taskcan be solved in a nontrival manner, thereby speciﬁcally avoiding the pitfalls of M being a randombipartite graph. It is therefore worth exploring what pairs of radii and covering numbers may beassociated with the two preference matrices G and B when M is indeed random bipartite. Assume M = o ( n ) , so as to avoid pathological cases. When M is random bipartite, one can show that wemay have ρ = Ω (cid:0) Mn (cid:1) even when the two covering numbers are both . Hence, the only interestingregime is when ρ = o (cid:0) Mn (cid:1) . Within this regime, our broad modeling assumption is that the resultingcovering numbers for G and B are o ( n ) , i.e., less that linear in n when n grows large. Related work.

The approach of clustering users according to their description/preference similar-ities while exploiting user feedback is similar in spirit to the two-sided clusterability assumptionsinvestigated, e.g., in [1], which is based on a mixture of explicit and implicit (collaborative ﬁltering-like) user features. Yet, as far as we are aware, ours is the ﬁrst model that lends itself to a rigoroustheoretical quantiﬁcation of matchmaking performance (see Section 4.1). Moreover, in general in ourcase the user set is not partitioned as in previous RRS models. Each user may in fact belong to morethan one cluster, which is apparently more natural for this problem.The reader might also wonder whether the reciprocal recommendation task and associated modelingassumptions share any similarity to the problem of (online) matrix completion/prediction. Recoveringa matrix from a sample of its entries has been widely analyzed by a number of authors with differentapproaches, viewpoints, and assumptions, e.g., in Statistics and Optimization (e.g., [5, 12]), in OnlineLearning (e.g., [18, 19, 20, 9, 8, 6, 10]), and beyond. In fact, one may wonder if the problem ofpredicting the entries of matrices B and G may somehow be equivalent to the problem of disclosingmatches between B and G . A closer look reveals that the two tasks are somewhat related, butnot quite equivalent, since in reciprocal recommendation the task is to search for matching "ones"between the two binary matrices B and G by observing entries of the two matrices separately . Inaddition, because we get to see at each round the sign of two pairings ( b, g (cid:48) ) and ( g, b (cid:48) ) , where b and g are drawn at random and b (cid:48) and g (cid:48) are selected by the matchmaker, our learning protocol is ratherhalf-stochastic and half-active, which makes the way we gather information about matrix entriesquite different from what is usually assumed in the available literature on matrix completion. Under the above modeling assumptions, our goal is to design an efﬁcient matchmaker. We speciﬁcallyfocus on the ability of our algorithm to disclose Θ( M ) matches, in the regime where also the optimalnumber of matches M ∗ T is Θ( M ) . Recall from Section 3 that the latter assumption is needed soas to make the uncovering of Θ( M ) matches possible within the T rounds. Our algorithm, called SMILE (Sampling Matching Information Leaving out Exceptions) is described as Algorithm 2. Thealgorithm depends on input parameter S ∈ [log( n ) , n/ log( n )] and, after randomly shufﬂing both B and G , operates in three phases: Phase 0 (described at the end), Phase I, and Phase II. Algorithm 2:

SMILE ( S ampling M atching I nformation L eaving out E xceptions) (cid:46) INPUT : B and G ; parameter S > .Randomly shufﬂe sets B and G ;Phase 0: Run OOMM to provide an estimate ˆ M of M ;Phase I: ( C , F ) ← Cluster Estimation ( (cid:104) B, G (cid:105) , S );Phase II: User Matching ( (cid:104) B, G (cid:105) , ( C , F ) ); Phase I (Cluster Estimation).

SMILE approximates the clustering over users by: i. asking, for eachcluster representative b ∈ B , Θ( n ) feedbacks (i.e., edge signs) selected at random from G (andoperating symmetrically for each representative g ∈ G ), ii. asking Θ( S ) -many feedbacks for eachremaining user, where parameter S will be set later. In doing so, SMILE will be in a position toestimate the clusters each user belongs to, that is, to estimate the matching graph M , the mispredictionper user being w.h.p of the order of n log nS . The estimated M will then be used in Phase II.6 rocedure Cluster Estimation –

SMILE (Phase I) (cid:46)

INPUT : B and G , parameter S > . (cid:46) OUTPUT : Set of clusters C , set of feedbacks F . Init: • F u ← ∅ ∀ u ∈ B ∪ G ; /* One set of feedbacks per user u ∈ B ∪ G */ • B r ← ∅ ; G r ← ∅ ; /* One set of cluster representatives per side */ • r u ← ∀ u ∈ B ∪ G ; /* No user is candidate to belong to B r ∪ G r */ Let G = { g , . . . , g n } , B = { b , . . . , b n } , S (cid:48) = ∆ S + 4 √ S log n , i, j ← ;At each round t : if i ≤ n ∨ j ≤ n then(2 B ) Let b ∈ B be the boy selected in Step ( B ); if i ≤ n then Select g i ; F g i ← F g i ∪ { b } ; if | F g i | = S (cid:48) ∧ r g i = 0 then /* Try to assign g i to some cluster based on G r */ if ∃ g r ∈ G r : ∀ b (cid:48) ∈ F g i ∩ F g r s ( b (cid:48) , g i ) = s ( b (cid:48) , g r ) then Set cluster ( g i ) = g r ; i ← i + 1 ; else /* g i will be included into G r as soon as | F g i | = n */ r g i ← ; /* If g i is a cluster representative */ if | F g i | = n then G r ← G r ∪ { g i } ; i ← i + 1 ; else Select g ∈ G arbitrarily; (2 G ) Do the same as in Step (2 B ) after switching B with G , b with g , B r with G r , i with j , etc. else Set: • cluster ( g r ) = g r ∀ g r ∈ G r ; • C ← ∪ u ∈ B ∪ G { ( u, cluster ( u )) } ; • F ← ∪ u ∈ B ∪ G { ( u, F u ) } ; return ( C , F ) . Procedure

User Matching –

SMILE (Phase II) (cid:46)

INPUT : B and G , set of clusters C , set of feedbacks F .At each round t : (2 B ) Let b ∈ B the boy selected in Step ( B ); if ∃ g ∈ G : b ∈ F ( cluster ( g )) ∧ g ∈ F ( cluster ( b )) ∧ s ( b, cluster ( g )) = s ( g, cluster ( b )) =1 ∧ ( b, g ) (cid:54)∈ E t ( SMILE ) then select g ; else select g ∈ G arbitrarily; (2 G ) Do the same as in Step (2 B ) after switching B with G , and b with g .A more detailed description of the Cluster Estimation procedure follows (see also pseudocode). Forconvenience, we focus on clustering G (hence observing feedbacks from B to G ), the procedureoperates in a completely symmetric way on B . Let F g be the set of all b ∈ B who provided feedback7n g ∈ G so far. Assume for the moment we have at our disposal a subset G r ⊆ G containingone representative for each cluster over B , and that for each g ∈ G r we have already observed n feedbacks provided by n distinct members of B , selected uniformly at random from B . Also, let B ( g, S ) be a subset of B obtained by sampling at random S (cid:48) = 2 S + 4 √ S log n -many b from B .Then a Chernoff-Hoeffding bound argument shows that for any g ∈ G \ G r and any g r ∈ G r wehave w.h.p. | B ( g, S ) ∩ F g r | ≥ S . We use the above to estimate the cluster each g ∈ G \ G r belongsto. This task can be accomplished by ﬁnding g r ∈ G r who receives the same set of feedbacks as thatof g , i.e., who belongs to the same cluster as g r . Yet, in the absence of the feedback provided by all b ∈ B to both g and g r , it is not possible to obtain this information with certainty. The algorithmsimply estimates g ’s cluster by exploiting Step ( B ) of the protocol to ask for feedback on g from S (cid:48) = S (cid:48) ( S ) randomly selected b ∈ B , which will be seen as forming the subset B ( g, S ) . We shalltherefore assign g to the cluster represented by an arbitrary g r ∈ G r such that s ( b, g ) = s ( b, g r ) forall b ∈ B ( g, S ) ∩ F g r . We proceed this way for all g ∈ G \ G r .We now remove the assumption on G r . Although we initially do not have G r , we can build througha concentration argument an approximate version of G r while asking for the feedback B ( g, S ) oneach unclustered g . The Cluster Estimation procedure does so by processing girls g sequentially, asdescribed next. Recall that G was randomly shufﬂed into an ordered sequence G = { g , g , . . . , g n } .The algorithm maintains an index i over G that only moves forward, and collects feedback informationfor g i . At any given round, G r contains all cluster representatives found so far. Given b ∈ B thatneeds to be served during round t (Step ( B )), we include b in F g i . If | F g i | becomes as big as S (cid:48) ,then we look for g ∈ G r so as to estimate g i ’s cluster. If we succeed, index i is incremented andthe algorithm will collect feedback for g i during the next rounds. If we do not succeed, g i will beincluded in G r , and the algorithm will continue to collect feedback on g i until | F g i | < n . When | F g i | = n , index i is incremented, so as to consider the next member of G . Phase I terminates whenwe have estimated the cluster of each b and g that are themselves not representative of any cluster.Finally, when we have concluded with one of the two sides, but not with the other (e.g., we are donewith G but not with B ), we continue with the unterminated side, while for the terminated one we canselect members ( g ∈ G in this case) in Step 2 (Step ( B ) in this case) arbitrarily. Phase II (User Matching). In phase II (see pseudocode), we exploit the feedback collected in PhaseI so as to match as many pairs ( b, g ) as possible. For each user u ∈ B ∪ G selected in Step ( B )or Step ( G ), we pick in step ( G ) or ( B ) a user u (cid:48) from the other side such that u (cid:48) belongs to anestimated cluster which is among the set of clusters whose members are liked by u , and viceversa.When no such u (cid:48) exists, we select u (cid:48) from the other side arbitrarily. Phase 0: Estimating M . In the appendix we show that the optimal tuning of S is to set it as afunction of the number of hidden matches M . Since M is unknown, we run a preliminary phasewhere we run OOMM (from Section 3) for a few rounds. Using Theorem 2 it is not hard to show thatthe number T ˆ M of rounds taken by this preliminary phase to ﬁnd an estimate ˆ M of M which is w.h.p.accurate up to a constant factor satisﬁes T ˆ M = Θ (cid:16) n log nM (cid:17) .In order to quantify the performance of SMILE , it will be convenient to refer to the deﬁnition of theBoolean preference matrices B , G ∈ { , } n × n . For a given radius ρ ≥ , we denote by C Gρ thecovering number of the n column vectors of B w.r.t. Hamming distance. In a similar fashion wedeﬁne C Bρ . Moreover, let C G and C B be the total number of cluster representatives for girls andboys, respectively, found by SMILE , i.e., C G = | G r | and C B = | B r | at the end of the T rounds.The following theorem shows that when the optimal number of matches M ∗ T is M , then so is also M T ( SMILE ) up to a constant factor, provided M and T are not too small. Theorem 3

Given any input graph ( (cid:104) B, G (cid:105) , E, σ ) , with | B | = | G | = n , such that M ∗ T = M w.h.p.as n grows large, then we have C G ≤ min (cid:110) min ρ ≥ (cid:16) C Gρ/ + 3 ρS (cid:48) (cid:17) , n (cid:111) , C B ≤ min (cid:110) min ρ ≥ (cid:16) C Bρ/ + 3 ρS (cid:48) (cid:17) , n (cid:111) . Furthermore, when T and M are such that T = ω (cid:0) n ( C G + C B + S (cid:48) ) (cid:1) and M = ω (cid:0) n log( n ) S (cid:1) , hen we have w.h.p. M T ( SMILE ) = Θ( M ) . Notice in the above theorem the role played by the bounds on C G and C B . If the minimizing ρ therein gives C G = C B = n , we have enough degrees of freedom for M to be generated as arandom bipartite graph. On the other hand, when C G and C B are signiﬁcantly smaller than n at theminimizing ρ (which is what we expect to happen in practice) the resulting M will have a clusterstructure that cannot be compatible with a random bipartite graph. This entails that on both sidesof the bipartite graph, each subject receives from the other side a set of preferences that can becollectively clustered into a relatively small number of clusters with small intercluster distance. Thenthe number of rounds T that SMILE takes to achieve (up to a constant factor) the same number ofmatches M ∗ T as the Omniscient Matchmaker drops signiﬁcantly. In particular, when S in SMILE isset as function of (an estimate of) M , we have the following result. Corollary 1

Given any input graph ( (cid:104) B, G (cid:105) , E, σ ) , with | B | = | G | = n , such that M ∗ T = M w.h.p.as n grows large, with T and M satisfying T = ω (cid:18) n ( C G + C B ) + n log( n ) M (cid:19) , where C G and C B are upper bounded as in Theorem 3, then we have w.h.p. M T ( SMILE ) = Θ( M ) . In order to evaluate in detail the performance of

SMILE , it is very interesting to show to what extentthe conditions bounding from below T in Theorem 3 are necessary. We have the following generallimitation, holding for any matchmaker A . Theorem 4

Given B and G such that | B | = | G | = n , any integer m ∈ ( n log( n ) , n − n log( n )) ,and any algorithm A operating on ( (cid:104) B, G (cid:105) , E, σ ) , there exists a randomized strategy for generating σ such that m − nC G + C B − < M ≤ m , and the number of rounds T needed to achieve E M T ( A ) =Θ( M ) , satisﬁes T = Ω( n ( C G + C B ) + M ) , as n → ∞ . Remark 1

One can verify that the time bound for

SMILE established in Corollary 1 is nearly optimalwhenever M = ω (cid:16) n / (cid:112) log( n ) (cid:17) . To see this, observe that by deﬁnition we have C G ≤ C G and C B ≤ C B . Now, if M = ω (cid:16) n / (cid:112) log( n ) (cid:17) , then the additive term n log( n ) M becomes o ( M ) and thecondition on T in Corollary 1 simply becomes T = ω (cid:0) n ( C G + C B + M (cid:48) ) (cid:1) , where M (cid:48) = o ( M ) .This has to be contrasted to the lower bound on T contained in Theorem 4.We now explain why it is possible that, when M = ω (cid:0) n / √ log n (cid:1) , the additive term n log nM inthe bound T = ω (cid:16) n ( C G + C B ) + n log( n ) M (cid:17) of Corollary 1 becomes o ( M ) , while the ﬁrst term n ( C G + C B ) can be upper bounded by n ( C G + C B ) . Since the lower bound T = Ω( n ( C G + C B ) + M ) of Theorem 4 has a linear dependence on M , it might seem quite surprising that the larger M is the smaller becomes the second term in the bound of Corollary 1. However, it is important totake into account that in Corollary 1 T must be large enough to satisfy even the condition M ∗ T = M .Let T ∗ be the number of rounds T necessary to satisfy w.h.p. M ∗ T = M . In Corollary 1, both theconditions T ≥ T ∗ and T = ω (cid:16) n ( C G + C B ) + n log( n ) M (cid:17) must simultaneously hold. When M islarge, the number of rounds needed to satisfy the former condition becomes much larger than the oneneeded for the latter.As a further insight, consider the following. We either have M = O (cid:0) n ( C G + C B ) (cid:1) or M = ω (cid:0) n ( C G + C B ) (cid:1) . In the ﬁrst case, the lower bound in Theorem 4 clearly becomes T = Ω (cid:0) n ( C G + C B + C G + C B ) (cid:1) , hence not directly depending on M . In the second case,whenever M = ω (cid:16) n / (cid:112) log( n ) (cid:17) , T ∗ is larger than n ( C G + C B ) + n log( n ) M since, by deﬁnition, clusters within bounded radiusproperties · n/ log( n ) n/ log( n ) 0 . · n/ log( n ) Synthetic datasets (2000 boys and 2000 girls) |C ( B ) | |C ( G ) | |C ( B ) | |C ( G ) | |C ( B ) | |C ( G ) | |C ( B ) | |C ( G ) | S-20-23

20 22 2 . M K

20 23 20 23 445 429S-95-100

95 100 2 . M K

95 100 95 100 603 624S-500-480

500 480 2 . M K

500 480 500 480 983 950S-2000-2000 . M K | B | | G | |C ( B ) | |C ( G ) | |C ( B ) | |C ( G ) | |C ( B ) | |C ( G ) | RW-1007-1286 K . K

53 48 177 216 385 508RW-1526-2564 K . K

37 45 138 216 339 601RW-2265-3939 K . K

42 45 145 215 306 622

Table 1:

Relevant properties of our datasets. The last six columns present an approximation to the number ofclusters when we allow radius · n/ log( n ) , n/ log( n ) , and . · n/ log( n ) between users of the same cluster. we must have T ∗ = Ω( M ) , while in this case n ( C G + C B ) + n log( n ) M = o ( M ) . In conclusion, ifthe number of rounds SMILE takes to uncover Θ( M ) matches equals the number of rounds taken bythe Omniscent Matchmaker to uncover exactly M matches, then SMILE is optimal up to a constantfactor, because no algorithm can outperform the Omniscent Matchmaker. This provides a cruciallyimportant insight into the key factors allowing the additive term n log nM to be equal to o ( M ) inCorollary 1, and is indeed one of the keystones in the proof of Theorem 3 (see Appendix B). We conclude this section by emphasizing the fact that

SMILE is indeed quite scalable. As provenAppendix B, an implementation of

SMILE exists that leverages a combined use of suitable data-structures, leading to both time and space efﬁciency.

Theorem 5

The running time of

SMILE is O (cid:0) T + n S (cid:0) C G + C B (cid:1)(cid:1) , the memory requirement is O ( n ( C G + C B )) . Furthermore, when T = ω (cid:18) n ( C G + C B ) + n log( n ) M (cid:19) , as required by Corollary 1, the amortized time per round is Θ(1) + o ( C G + C B ) , which is always sublinear in n . In this section we evaluate the performance of (a variant of) our algorithm by empirically contrastingit to simple baselines against artiﬁcial and real-world datasets from the online dating domain. Thecomparison on real-world data also serve as a validation of our modeling assumptions.

Datasets.

The relevant properties of our datasets are given in Table 1. Each of our synthetic datasetshas | B | = | G | = 2000 . We randomly partitioned B and G into C B and C G clusters, respectively.Each boy likes all the girls of a cluster C with probability . , and with probability . dislikes them.We do the same for the preferences from girls to boy clusters. Finally, for each preference (eitherpositive or negative) we reverse its sign with probability / (2 · log n ) (in our case, n = 2000 ). Noticein Table 1 that, for all four datasets we generated, the number of likes is bigger than | B | · | G | / . As forreal-world datasets, we used the one from [4], which is also publicly available. This is a dataset froma Czech dating website, where 220,970 users rate each other in a scale from 1 (worst) to 10 (best).The gender of the users is not always available. To get two disjoint parties B and G , where each userrates only users from the other party, we disregarded all users whose gender is not speciﬁed. As thisdataset is very sparse, we extracted dense subsets as follows. We considered as ”like" any rating > ,while all ratings, including the missing ones, are ”dislikes". Next, we iteratively removed the users10 m a t c h e s f o und m a t c h e s f o und m a t c h e s f o und Figure 2:

Empirical comparison of the 3 algorithms on datasets S-95-100 (left), RW-1007-1286 (middle),RW-2265-3939 (right). Each plot gives number of disclosed matches vs. time. (no of recommendations). I - SMILE ’s yellow curve ﬂattens out when there are no more matches to uncover. with the smallest number of ratings until we met some desired density level. Speciﬁcally, we executedthe above process until we obtained two sets B and G such that the number of likes between the twoparties is at least {| B | , | G |} ) / (resulting in dataset RW-1007-1286), . {| B | , | G |} ) / (dataset RW-1526-2564), or . {| B | , | G |} ) / (dataset RW-2265-3939). Random baselines.

We included as baselines

OOMM , from Section 3, and a random method thatasks a user for his/her feedback on another user (of opposite gender) picked uniformly at random.We refer to this algorithm as

UROMM . Implementation of

SMILE . In the implementation of

SMILE , we slightly deviated from the de-scription in Section 4.1. One important modiﬁcation is that we interleaved Phase I and Phase II.The high-level idea is to start exploiting immediately the clusters once some clusters are identiﬁed,without waiting to learn all of them. Additionally, we gave higher priority to exploring the reciprocalfeedback of a discovered like, and we avoided doing so in the case of a dislike. Finally, whenever wetest whether two users belong in the same cluster, we allowed a radius of a (1 / log( n )) fraction ofthe tested entries. The parameter S (cid:48) in SMILE has been set to S + √ S log n . We call the resultingalgorithm I - SMILE (Improved

SMILE ). See Appendix C for more details.

Evaluation.

To get a complete picture on the behavior of the algorithms for different timehorizons, we present for each algorithm the number of discovered matches as a function of T ∈ { , . . . , | B || G |} . Figure 2 contains 3 representative cases, further plots are given in AppendixC. In all datasets we tested, I - SMILE clearly outperforms

UROMM and

OOMM . Our experimentsconﬁrm that

SMILE (and therefore I - SMILE ) quickly learns the underlying structure of the likesbetween users, and uses this structure to reveal the matches between them. Moreover, the variant I - SMILE that we implemented allows one not only to perform well on graphs with no underlyingstructure in the likes, but also to discover matches during the exploration phase while learning theclusters. A summary of the overall performance of the algorithms is reported in Table 2 in AppendixC, where we give the area under the curve metric, capturing how quickly, on average, the differentalgorithms learn over time. Again, I - SMILE is largely outperforming its competitors.

We have initiated a theoretical investigation of the problem of reciprocal recommendation in an adhoc model of sequential learning. Under suitable clusterability assumptions, we have introducedan efﬁcient matchmaker called

SMILE , and have proven its ability to uncover matches at a speedcomparable to the Omniscent Matchmaker, so long as M and T are not too small (Theorem 3 andCorollary 1). Our theoretical ﬁndings also include a computational complexity analysis (Theorem5), as well as limitations on the number of disclosable matches in both the general (Theorem 1) andthe cluster case (Theorem 4). We complemented our results with an initial set of experiments onsynthetic and real-world datasets in the online dating domain, showing encouraging evidence.Current ongoing research includes: 11. Introducing suitable noise models for the sign function σ .ii. Generalizing our learning model to nonbinary feedback preferences.iii. Investigating algorithms whose goal is to maximize the area under the curve “number ofmatches-vs-time", i.e., the criterion (cid:80) t ∈ [ T ] M t ( A ) , rather than the one we analyzed in thispaper; maximizing this criterion requires interleaving the phases where we collect matches(exploration) and the phases where we do actually disclose them (exploitation).iv. More experimental comparisons on different datasets against heuristic approaches availablein the literature. References [1] Joshua Akehurst, Irena Koprinska, Kalina Yacef, Luiz Augusto Pizzato, Judy Kay, and TomaszRej. CCR - A content-collaborative reciprocal recommender for online dating. In

IJCAI Int. Jt.Conf. Artif. Intell. , pages 2199–2204, 2011.[2] Joshua Akehurst, Irena Koprinska, Kalina Yacef, Luiz Augusto Pizzato, Judy Kay, and TomaszRej. Explicit and Implicit User Preferences in Online Dating.

New Front. Appl. Data Min. ,pages 15–27, 2012.[3] Ammar Alanazi and Michael Bain. A Scalable People-to-People Hybrid Reciprocal Recom-mender Using Hidden Markov Models. In ,2016.[4] Lukas Brozovsky and Vaclav Petricek. Recommender system for online dating service. In

Proceedings of Znalosti 2007 Conference , Ostrava, 2007. VSB.[5] J. Emmanuel Candes and Terence Tao. The power of convex relaxation: Near-optimal matrixcompletion.

IEEE Transactions on Information Theory , 56(5):2053–2080, 2010.[6] Paul Christiano. Online local learning via semideﬁnite programming. In

Proceedings of theForty-sixth Annual ACM Symposium on Theory of Computing , STOC ’14, pages 468–474, 2014.[7] F. Diaz, D. Metzler, and S. Amer-Yahia. Relevance and ranking in online dating systems. In , pages 66–73,2010.[8] C. Gentile, M. Herbster, and S. Pasteris. Online similarity prediction of networked data fromknown and unknown graphs. In

Proceedings of the 23rd Conference on Learning Theory (26thCOLT) , 2013.[9] E. Hazan, S. Kale, and S. Shalev-Shwartz. Near-optimal algorithms for online matrix prediction.In

Proceedings of the 25th Annual Conference on Learning Theory (COLT’12) , 2012.[10] M. Herbster, S. Pasteris, and M. Pontil. Mistake bounds for binary matrix completion. In

NIPS29 , pages 3954–3962, 2016.[11] Wenxing Hong, Siting Zheng, Huan Wang, and Jianchao Shi. A job recommender system basedon user clustering.

Journal of Computers , 8(8):1960–1967, 2013.[12] V. Koltchinskii, K. Lounici, and A. Tsybakov. Nuclear norm penalization and optimal rates fornoisy matrix completion. In arXiv:1011.6256v4 , 2016.[13] J. Kunegis, G. Gröner, and T. Gottron. Online dating recommender systems: The split-complexnumber approach. In ,2012.[14] Lei Li and Tao Li. MEET: A Generalized Framework for Reciprocal Recommender Systems.In

Proc. 21st ACM Int. Conf. Inf. Knowl. Manag. (CIKM ’12) , pages 35–44, 2012.[15] Saket Maheshwary and Hemant Misra. Matching resumes to jobs via deep siamese network. In

Companion Proceedings of the The Web Conference 2018 , WWW ’18, pages 87–88, 2018.[16] Istvan Pilaszy and Domonkos Tikk. Movies: Even a few ratings are more valuable than metadata.In

In Proceedings of the 3rd ACM Conference on Recommender Systems (RecSys) , 2009.[17] Luiz Augusto Pizzato, Tomasz Rej, Joshua Akehurst, Irena Koprinska, Kalina Yacef, and JudyKay. Recommending people to people: the nature of reciprocal recommenders with a case studyin online dating.

User Model. User-adapt. Interact. , 23(5):447–488, 2013.1218] S. Shalev-Shwartz, Y. Singer, and A. Ng. Online and batch learning of pseudo-metrics. In

Proceedings of the twenty-ﬁrst international conference on Machine learning , ICML 2004.ACM, 2004.[19] K. Tsuda, G. Rätsch, and M. K. Warmuth. Matrix exponentiated gradient updates for on-linelearning and bregman projections.

Journal of Machine Learning Research , 6:995–1018, 2005.[20] M. K. Warmuth. Winnowing subspaces. In

Proceedings of the 24th International Conferenceon Machine Learning , pages 999–1006, 2007.[21] Peng Xia, Benyuan Liu, Yizhou Sun, and Cindy Chen. Reciprocal Recommendation Systemfor Online Dating. In

Proc. 2015 IEEE/ACM Int. Conf. Adv. Soc. Networks Anal. Min. 2015 -ASONAM ’15 , pages 234–241. ACM Press, 2015.

A Ancillary Lemmas

A.1 Hamming distance-based clustering lemmas

Given an r × c matrix A , an r -dimensional vector c , and a subset of indices Z ⊆ [ r ] , let A ( Z, c ) bethe set containing all the column vectors v of A such that v i = c i for all indices i ∈ Z . Furthermore,given an integer k > , we denote by R k a set k distict integers drawn uniformly at random from [ r ] .We have the following lemma, whose proof is given in Appendix 1. Lemma 1

Given any matrix A ∈ { , } r × c where r ≥ c > , any column vector c of A , any positiveconstant β and any integer k ≥ (cid:100) β log r (cid:101) , the Hamming distance between c and any column vectorof A ( R k , c ) is upper bounded by βrk log r with probability at least − r − β . Proof

Let R k = { i , i , . . . , i k } . Let V ( A, c ) be the set of columns vectors v of A such that theHamming distance between c and v is larger than βrk log r . Clearly, we have |V ( A, c ) | ≤ c . Thus,given any vector v ∈ V ( A, c ) , the probability that it belongs to A ( R k , c ) can be upper bounded asfollows: P ( v ∈ A ( R k , c )) = P (cid:0) v i j = c i j ∀ j ∈ [ k ] (cid:1) ≤ (cid:32) − βrk log rr (cid:33) k = (cid:18) − β log rk (cid:19) k ≤ r − β . The probability that there exists at least one column vector belonging to both V ( A, c ) and A ( R k , c ) can therefore be bounded as follows : P ( |V ( A, c ) ∩ A ( R k , c ) | (cid:54) = ∅ ) ≤ (cid:88) v ∈V ( A, c ) P ( v ∈ A ( R k , c )) (1) ≤ |V ( A, c ) | r − β ≤ c r − β ≤ r − β , (2)where in Equation (1) we simply use the union bound, and in Equation (2) we took into account that |V ( A, c ) | ≤ c ≤ r . A.2 Setting

SMILE parameter S When putting together the information gathered during phase I, we may both miss to detect pairs ofmatching users, and consider some pairs of users as part of M while they are not. In fact, SMILE does not completely recover the structure of M , it only creates an approximate matching graph M (cid:48) .Let E M and E M (cid:48) be the set of edges of the two matching graphs. The error in reconstructing M M (cid:48) is represented by all edges in E M (cid:52) E M (cid:48) , the symmetric difference between E M and E M (cid:48) .During phase I, applying Lemma 1 with β = 3 , we have that for any user in B ∪ G , the numberof mispredicted feedbacks is w.h.p. bounded by n log nS . It is not difﬁcult to see that, requesting n feedbacks selected uniformly at random for each cluster representative, the number of edges of M recovered is w.h.p. equal to | E M | − o ( | E M | ) = M − o ( M ) , Hence, the total number of matchesthat we do not detect or that we mispredict is upper bounded w.h.p. by M + n log nS + o ( M ) .Since our goal is to ﬁnd w.h.p. Θ( M ) matches (under the assumption that M ∗ T = M holds w.h.p.), alower bound on M required to achieve this goal is M ≥ γn log nS for some constant γ . This impliesthat, by setting S = γn log nM , we are guaranteed to ﬁnd w.h.p. at least a constant fraction of the totalnumber of matches M . B Proofs

B.1 Proof of Theorem 1Proof

Consider the following adversarial random strategy. We select uniformly at random m elements from the set of pairs B × G . For each selected pair ( b, g ) , we set both σ ( b, g ) and σ ( g, b ) to +1 , and then assign the value − to all remaining directed edges of E . We have therefore M = m .Given any algorithm A , recall that E T ( A ) denotes the set of directed edges selected by A during T rounds. We now deﬁne E (cid:48) T ( A ) as the following superset of E T ( A ) : E (cid:48) T ( A ) = ∆ E T ( A ) ∪ { ( g (cid:48) , b (cid:48) ) : ( b (cid:48) , g (cid:48) ) ∈ E T ( A ) } ∪ { ( b (cid:48)(cid:48) , g (cid:48)(cid:48) ) : ( g (cid:48)(cid:48) , b (cid:48)(cid:48) ) ∈ E T ( A ) } .E (cid:48) T ( A ) contains all directed edges ( b (cid:48) , g (cid:48) ) and ( g (cid:48)(cid:48) , b (cid:48)(cid:48) ) already contained in E T ( A ) together withtheir respective reciprocal edges ( g (cid:48) , b (cid:48) ) and ( b (cid:48)(cid:48) , g (cid:48)(cid:48) ) .Let now M (cid:48) T ( A ) be the number of matches formed by the edges in E (cid:48) T ( A ) : M (cid:48) T ( A ) = ∆ |{ { b, g } : ( b, g ) , ( g, b ) ∈ E (cid:48) T ( A ) , σ ( b, g ) = σ ( g, b ) = +1 }| . By the deﬁnition of M (cid:48) T ( A ) , we know that M (cid:48) T ( A ) ≥ M T ( A ) and | E (cid:48) T ( A ) | ≤ | E T ( A ) | which inturn is equal to T , because during each round two distinct edges are selected. The number of pairsof reciprocal edges of | E (cid:48) T ( A ) | is | E (cid:48) T ( A ) | ≤ T , because for each edge ( u, u (cid:48) ) ∈ E (cid:48) T ( A ) , we alwayshave ( u (cid:48) , u ) ∈ E (cid:48) T ( A ) . Furthermore, because of the randomized sign assignment strategy describedabove, for any pair of reciprocal edges in E (cid:48) T ( A ) , the probability that this pair is a match is equal to Mn irrespective of the behavior of algorithm A . By the linearity of expectation, we can sum over allpairs of reciprocal edges of E (cid:48) T ( A ) to obtain E M (cid:48) T ( A ) ≤ Tn M .

Finally, recalling that M T ( A ) ≤ M (cid:48) T ( A ) , we can therefore conclude that the inequality E M T ( A ) ≤ Tn M holds for any algorithm A , where the expectation is taken over the generation of function σ for theinput graph ( (cid:104) B, G (cid:105) , E, σ ) . B.2 Proof of Theorem 2Proof [Sketch].

We ﬁrst prove that E rT ( OOMM ) is selected uniformly at random from E r . There-after, we will prove that E T ( OOMM ) contains in expectation Θ( T ) pairs of reciprocal edges, i.e. As we assume in our analysis that n goes to inﬁnity, we also assume that M , as a function of n , diverges as n → ∞ . Note that, even in the lower bound on M contained in the statement of Theorem 3, M is in fact alwayssuperlinear in n because of the deﬁnition of the range of values of S , i.e., S ∈ [log( n ) , n/ log( n )] . E rT ( OOMM ) = Θ( T ) . Since these pairs are selected uniformly at random from E r , this impliesthat E M T ( OOMM ) = Θ (cid:0) Tn M (cid:1) . In fact, each match must necessarily be a pair of reciprocal edges,and will be selected this way with probability equal, up to a constant factor, to T | E r | = Tn . In orderto prove that E rT ( OOMM ) = Θ( T ) , we will deﬁne an event related to each girl g ∈ G . We willshow that, throughout the algorithm execution, each new occurrence of this event is a sufﬁcientcondition to have a new pair of reciprocal edges in set E rT ( OOMM ) . We will ﬁnd a lower boundfor the expected number of times this event occurs, proving that it is equal to Θ( T ) , which im-plies that E E rT ( OOMM ) = Θ( T ) . Since the pairs of reciprocal edges in E rT ( OOMM ) are selecteduniformly at random from E r , this will allow us to conclude that the number of matches found is Θ (cid:16) T | E r | M (cid:17) = Θ (cid:0) Tn M (cid:1) . OOMM operates in steps B and G without making any distinction between any two boys or any twogirls. In addition, the algorithm does not depend on the observed values of σ . Hence, OOMM can beseen as a random process dealing with sets B and G solely, where each user is indistinguishable withinthe set s/he belongs to . During any round t , the edge ( b, g ) contained in E t ( OOMM ) \ E t − ( OOMM ) is selected uniformly at random from B × G at step B . At step G of each round t , the algorithmselects uniformly at random either a boy from B g,t or from the whole set B . At each round t , B g,t isthe result of the actions accomplished by OOMM during the previous rounds. As we pointed out, all these actions are carried out without making any distinction between any two users in B and in G .Hence, during any given round t , if B g,t (cid:54) = ∅ , no boy is more likely to be part of B g,t than any otherone. The probability that any pair { ( b, g ) , ( g, b ) } of reciprocal edge belongs to E rT ( OOMM ) , musttherefore be the same for each pair of user b ∈ B and g ∈ G during any given round t .Throughout this proof, for relevant event E , we denote by t ( E ) ∈ [ T ] any round where event E occurs.We also denote by S ( E ) ⊆ [ T ] the set of all rounds where E occurs.We now deﬁne relevant events associated with each girl g ∈ G . (cid:46) Deﬁnition of event E g (∆) . Given any girl g ∈ G , and any round t ≤ T − ∆ with ∆ > , let E g (∆) be the conjunction of thefollowing two events: Event E Gg (∆) : Girl g is selected in Step ( G ) during both round t and round t + ∆ , while she hasnever been selected in Step ( G ) during any round t (cid:48) such that t < t (cid:48) < t + ∆ ; Event E Bg (∆) : (i) There exists one and only one round t (cid:48) ∈ ( t, t + ∆] in which g receives a feedback(uncovered during Step ( B )), say feedback σ ( b (cid:48)(cid:48) , g ) , and (ii) we have ( b (cid:48)(cid:48) , g ) (cid:54)∈ E t ( OOMM ) ,i.e., this feedback was not uncovered until round t . We deﬁne the occurrence round t ( E Gg (∆)) and t ( E Bg (∆)) of event E Gg (∆) and E Bg (∆) , respectively,as well as the occurrence round t ( E g (∆)) of the joint event E g (∆) , as the round t in the abovedeﬁnition of E Gg (∆) and E Bg (∆) . To better clarify this deﬁnition, consider as an example thefollowing sequence of triples (cid:104) Round , Feedback uncovered in Step ( B ) , Feedback uncovered in Step ( G ) (cid:105) Recall that during the run of

OOMM over T rounds, for any given pair of users ( b, g ) ∈ B × G , the feedback σ ( b, g ) may be uncovered in Step ( B ) more than once. t to round t + ∆ , with ∆ = 9 : (cid:104) t, σ ( b , g )) , σ ( g, b ) (cid:105)(cid:104) t + 1 , σ ( b , g ) , σ ( g , b )) (cid:105)(cid:104) t + 2 , σ ( b , g ) , σ ( g , b )) (cid:105)(cid:104) t + 3 , σ ( b , g ) , σ ( g , b )) (cid:105)(cid:104) t + 4 , σ ( b , g ) , σ ( g , b )) (cid:105)(cid:104) t + 5 , σ ( b , g ) , σ ( g , b )) (cid:105)(cid:104) t + 6 , σ ( b , g ) , σ ( g , b )) (cid:105)(cid:104) t + 7 , σ ( b , g ) , σ ( g , b )) (cid:105)(cid:104) t + 8 , σ ( b , g ) , σ ( g , b )) (cid:105)(cid:104) t + ∆ , σ ( b , g ) , σ ( g, b )) (cid:105) . If σ ( b , g ) was never uncovered during any round t (cid:48)(cid:48) ≤ t , we say that events E Gg (9) , E Bg (9) and E g (9) have occurred at round t , i.e., that t ( E Gg (9)) = t ( E Bg (9)) = t ( E g (9)) = t . Observe that in thisexample girl g is selected twice (round t and round t + ∆ ) and, during rounds t + 1 , t + 2 , . . . , t + ∆ she receives one feedback (uncovered in Step ( B )), the one from boy b at round t + 3 .Finally, we deﬁne E (∆) as the union of E g (∆) over all g ∈ G . Fact 1

Events E Gg (∆) and E Bg (∆) are independent for all g ∈ G and all ∆ > , i.e. we always have P E g (∆) = P E Gg (∆) · P E Bg (∆) . Fact 2

For any girl g ∈ G and any pair positive integers ∆ and ∆ (cid:48) with ∆ (cid:54) = ∆ (cid:48) , we have that E g (∆) and E g (∆ (cid:48) ) are mutually exclusive. This mutual exclusion property also holds for events E Gg (∆) and E Gg (∆ (cid:48) ) . Fact 3

For any positive ∆ , given any pair of distinct girls g and g (cid:48) , we have that E g (∆) and E g (cid:48) (∆) are mutually exclusive. This mutual exclusion property also holds for events E Gg (∆) and E Gg (cid:48) (∆) . Given any girl g , when E g (∆) occurs, we must have one of the two following mutually exclusive consequences C and C , namely, any occurrence of E g (∆) implies either C or C but not both: Whenwe disclose the preference of boy b (cid:48)(cid:48) for girl g during round t (cid:48) ∈ ( t ( E g (∆)) , t ( E g (∆)) + ∆] we haveeither ( g, b (cid:48)(cid:48) ) (cid:54)∈ E t (cid:48) − ( OOMM ) or ( g, b (cid:48)(cid:48) ) ∈ E t (cid:48) − ( OOMM ) . This in turn implies: Consequence C : ( g, b (cid:48)(cid:48) ) (cid:54)∈ E t (cid:48) − ( OOMM ) .Boy b (cid:48)(cid:48) must belong to B g, (cid:101) t (Step ( G )) for all rounds (cid:101) t ∈ [ t (cid:48) , t + ∆] . Since B g,t +∆ is notempty, because it contains at least boy b (cid:48)(cid:48) , a new pair { ( (cid:101) b, g ) , ( g, (cid:101) b ) } of reciprocal edges isuncovered (note that we need not have (cid:101) b ≡ b (cid:48)(cid:48) , since B g,t +∆ may also include some otherboys besides b (cid:48)(cid:48) ). Hence, set E rt +∆ ( OOMM ) \ E rt +∆ − ( OOMM ) must contain { ( (cid:101) b, g ) , ( g, (cid:101) b ) } . Consequence C : ( g, b (cid:48)(cid:48) ) ∈ E t (cid:48) − ( OOMM ) .In this case OOMM ﬁnds the new pair { ( b (cid:48)(cid:48) , g ) , ( g, b (cid:48)(cid:48) ) } of reciprocal edges during round t (cid:48) (Step ( B )), i.e., the set E rt (cid:48) ( OOMM ) \ E rt (cid:48) − ( OOMM ) must contain { ( b (cid:48)(cid:48) , g ) , ( g, b (cid:48)(cid:48) ) } .Thus, taking into account that E g (∆) is a sufﬁcient condition for C ∨C , we can always associate a newoccurrence of E g (∆) with a distinct pair { ( b, g ) , ( g, b ) } of reciprocal edges in E rT ( OOMM ) . Hence, OOMM ﬁnds at least | S ( E (∆)) | distinct pairs of reciprocal edges, i.e., E rT ( OOMM ) ≥ | S ( E (∆)) | .Let now α ∈ (0 , be a constant parameter. We focus on computing E n (cid:88) ∆= αn | S ( E (∆)) | . We set for brevity E ( αn, n ) = ∪ ∆ ∈ [ αn,n ] E (∆) g , E g ( αn, n ) = ∪ ∆ ∈ [ αn,n ] E g (∆) , E Bg ( αn, n ) = ∪ ∆ ∈ [ αn,n ] E Bg (∆) , E Gg ( αn, n ) = ∪ ∆ ∈ [ αn,n ] E Gg (∆) . We recall we deﬁned the occurrence round t ( E g (∆)) as the ﬁrst of the (∆ + 1) -many rounds related todeﬁnition of event E g (∆) . We deﬁne the occurrence rounds t ( E g ( αn, n )) and t ( E ( αn, n )) in a similarmanner, as the earliest round t when, respectively, E g (∆) and E (∆) occurs over all ∆ ∈ [ αn, n ] . Fact 4

Given any α ∈ (0 , and any g ∈ G , Fact 1 and Fact 2 ensure that events E Gg ( αn, n ) and E Bg ( αn, n ) are independent, i.e., we always have P E g ( αn, n ) = P E Gg ( αn, n ) · P E Bg ( αn, n ) . Fact 5

Given any α ∈ (0 , , and any pair of distinct girls g (cid:48) and g (cid:48)(cid:48) , Fact 3 and Fact 2 ensure that E g (cid:48) ( αn, n ) and E g (cid:48)(cid:48) ( αn, n ) are mutually exclusive. Furthermore, Fact 4, together with the deﬁnitionof E Gg ( αn, n ) and E Bg ( αn, n ) for any girl g ∈ G , ensures P E g (cid:48) ( αn, n ) = P E g (cid:48)(cid:48) ( αn, n ) . We now prove that any constant α ∈ (0 , leads to E | S ( E ( αn, n )) | = Θ( T ) . This implies M T ( OOMM ) = Θ (cid:0) Tn M (cid:1) , since E rT ( OOMM ) is made up of pairs of reciprocal edges which areselected uniformly at random from E r .In order to estimate E | S ( E ( αn, n )) | , we will lower bound the probability P E ( αn, n ) , which in turnwill require us to lower bound P E g ( αn, n ) . Since in Step ( G ) a girl is selected uniformly at randomfrom G , for any g ∈ G we can write : ∀ g ∈ G P E Gg ( αn, n ) = n − (cid:88) ∆= αn +1 P E Gg (∆) (3) = n − (cid:88) ∆= αn +1 n (cid:18) − n (cid:19) ∆ − ≥ n n − (cid:88) ∆= αn +1 (cid:18) − n (cid:19) ∆ = 1 n (cid:18) − (1 − n − ) n − (1 − n − ) − − (1 − n − ) αn +1 − (1 − n − ) (cid:19) = 1 n (cid:18) (1 − n − ) αn +1 − (1 − n − ) n n − (cid:19) ∼ n →∞ e − α − e − n , (4)where in Equation (3) we used Fact 2.We now bound P E Bg ( αn, n ) for all g ∈ G . We deﬁne the event E Bg,b (cid:48)(cid:48) (∆) based on the deﬁnitionof E Bg (∆) provided at the beginning of the proof. Given any boy b (cid:48)(cid:48) ∈ B , event E Bg,b (cid:48)(cid:48) (∆) occurswhenever: (i) there exists one and only one round t (cid:48) ∈ ( t, t + ∆] in which g receives a feedback(uncovered in Step ( B )) from b (cid:48)(cid:48) , and (ii) g does not receive any feedback from any other boy duringany round in ( t, t + ∆] , and (iii) we have ( b (cid:48)(cid:48) , g ) (cid:54)∈ E t ( OOMM ) , i.e., this feedback was not uncovereduntil round t .Observe that, by this deﬁnition, we have E Bg (∆) ≡ ∪ b (cid:48)(cid:48) ∈ B E Bg,b (cid:48)(cid:48) (∆) — see the deﬁnition of E Bg (∆) provided above to compare events E Bg (∆) and E Bg,b (cid:48)(cid:48) (∆) . Now, given any girl g ∈ G , we deﬁne E Bg,b (cid:48)(cid:48) ( αn, n ) = ∆ ∪ ∆ ∈ [ αn,n ] E Bg,b (cid:48)(cid:48) (∆) . Fact 6

Given any girl g ∈ G , for each pair of distinct boys b (cid:48) , b (cid:48)(cid:48) ∈ B , events E Bg,b (cid:48) ( αn, n ) and E Bg,b (cid:48)(cid:48) ( αn, n ) are mutually exclusive by their deﬁnition. Furthermore, Step ( B ) and Step ( B ) ensurethat P E Bg,b (cid:48) ( αn, n ) = P E Bg,b (cid:48)(cid:48) ( αn, n ) . Mutual exclusion also holds for events E Bg (cid:48) ,b ( αn, n ) and E Bg (cid:48)(cid:48) ,b ( αn, n ) for any b ∈ B and pair of distinct g (cid:48) , g (cid:48)(cid:48) ∈ G .

17e can now conclude that, for any given occurrence round of E Bg ( αn, n ) and any integer T ∈{ n, n + 1 , . . . , n } , we have: ∀ g ∈ G P E Bg ( αn, n ) = P (cid:0) ∪ b (cid:48)(cid:48) ∈ B E Bg,b (cid:48)(cid:48) ( αn, n ) (cid:1) (5) = n P E Bg,b (cid:48)(cid:48) ( αn, n ) (6) ≥ n (cid:18) − n (cid:19) T min( αn, n ) n (cid:18) − n (cid:19) max( αn,n ) − ≥ n (cid:18) − n (cid:19) n αn (cid:18) − n (cid:19) n ∼ n →∞ αe − , (7)where in Equation (6) we used Fact 6.We can ﬁnally bound the probability of event E ( αn, n ) (as n grows large): P E ( αn, n ) = P E G ( αn, n ) · P E B ( αn, n ) (8) = (cid:0) P ∪ g ∈ G E Gg ( αn, n ) (cid:1) · αe − (9) ≥ (cid:0) e − α − e − (cid:1) · αe − , (10)where in Equation (8) we used Fact 4, in Equation (9) we used the chain of inequalities (5)—(7), andin Equation (10) we used Fact 5, together with the chain of inequalities (3)—(4).Let us denote for brevity αe − (cid:0) e − α − e − (cid:1) by c ( α ) . We clearly have c ( α ) > ∀ α ∈ (0 , . Event E ( αn, n ) can occur at any round t ≤ T − n . Recall that we denoted by S ( E ) the set of rounds whereevent E occurs.For all integers T such that T − n = Ω( n ) we now have: E M T ( OOMM ) = E | E rT ( OOMM ) || E r | M ≥ E | S ( E ( αn, n )) | n M (11) ≥ ( T − n ) P E ( αn, n ) n M (12) ≥ ( T − n ) c ( α ) n M = Θ (cid:18) Tn M (cid:19) , (13)where in Equation (12) we used the linearity of expectation of events E ( αn, n ) , by summing P E ( αn, n ) over the ﬁrst T − n rounds. B.3 Proof of Theorem 3Proof

Let T I and T II be the number of rounds used during Phase I and Phase II, respectively. Thuswe have T II = T − T I . The proof structure is as follows. After bounding C G and C B , we willshow that T I = O (cid:0) n ( C G + C B + S (cid:48) ) (cid:1) . Note that this implies T I = o ( T ) for any T satisfyingthe lower bound T = ω (cid:0) n ( C G + C B + S (cid:48) ) (cid:1) . Then we will prove that, during phase II, T II -manyrounds are sufﬁcient to serve in Step (1) each user a total number of times which is w.h.p. largerthan max u ∈ B ∪ G deg M (cid:48) ( u ) , where M (cid:48) is the matching graph estimated by SMILE . This fact can beproven by combining the two conditions M ∗ T = M (which is assumed to hold with high probability),and M = ω (cid:16) n log( n ) S (cid:17) . Hence, after o ( T ) -many rounds of Phase I, SMILE can start to greedily simulate the Omniscent Matchmaker on the estimated M (cid:48) during Phase I. Finally, we prove that the18umber of edges of M which are also contained in M (cid:48) is Θ( M ) , which implies that during phase II SMILE will uncover w.h.p. Θ( M ) matches. This will conclude the proof.Now, for the sake of this proof, we will focus on set B and Steps (1 B ) , (2 B ) and (3 B ) . Thecorresponding claims for G and Steps (1 G ) , (2 G ) and (3 G ) are completely symmetrical.We start by brieﬂy recalling the parts of the algorithm which are relevant for this proof. We deﬁne theboys and girls as arranged in sequences (cid:104) b , b , . . . b n (cid:105) and (cid:104) g , g , . . . g n (cid:105) . Let G rt be the set of thecluster representive girls found by SMILE during all rounds up to t . Let t ( g ) be the round in whichthe girl g is included in a subset of G rT during the execution of the algorithm, i.e., the round whenshe becomes a cluster representive girl. The construction of G rT is accomplished in a greedy fashion.Speciﬁcally, if in round t of Phase I all girls g , g , . . . , g i are either part of G rt or are included in acluster then, at the beginning of round t + 1 , SMILE picks the next girl g i +1 . Note that g i +1 can beany member of G who has not been processed yet. Thereafter, SMILE estimates whether the feedbackreceived by g i +1 is similar to the one of at least one cluster representative girl found so far. Moreprecisely, after having collected S (cid:48) feedbacks for her, SMILE uses a randomized strategy relying onLemma 1. Let then t (cid:48) be the round in which | F g i +1 | becomes equal to S (cid:48) . (Recall that, for each user u ∈ B ∪ G , F u is the set of all feedbacks received until the current round.) If at round t (cid:48) we have thatfor all b ∈ F g i +1 there exists a girl g r ∈ G rt (cid:48) − such that σ ( b, g i +1 ) = σ ( b, g r ) , then g i +1 is includedin the same cluster of g r . Otherwise, SMILE collects feedback for g i +1 until we have | F g i +1 | = n ,and then g i +1 becomes a new cluster representative girl.In order to prove that T I = O (cid:0) n ( C G + C B + S (cid:48) ) (cid:1) , we need to upper bound | C G | = | C GT | and | C B | = | C BT | . As in Section 4, we denote by B the matrix of all ground truth preferences of theboys. Namely, for each i, j ∈ [ n ] , B i,j is equal to (1 + σ ( b i , g j )) . Given girl g j , we denote by g j the vector of feedback received by g j , i.e. the j -th column vector of B . Let C Bρ be the coveringnumber of radius ρ of all the column vectors of B . Given two - vectors v and v (cid:48) , we denote by d ( v , v (cid:48) ) the Hamming distance between them. Given any non-negative integer ρ , let B ρ ( g ) be the setof v such that d ( g , v ) ≤ ρ , i.e. the ball centered at g . Finally, let G r,ρT ⊆ G rT be the set of all girls g included by SMILE in G rT while there exists at least one girl g r ∈ G rt ( g ) − such that g belongs to ball B ρ ( g r ) centered at g r .In this proof, we single out subset G r,ρT ⊂ G rT since, in order to upper bound | G rT | , it is convenientto bound | G r,ρT | and | G rT \ G r,ρT | separately, and then use the sum of these two bounds to limit | G rT | .Notice that by its very deﬁnition, G r,ρT can be seen as containing all girl representative members g of G rT satisfying the following property: Given any radius ρ , there exists at least one girl g r ∈ G rt ( g ) − such that g belongs to the ball B ρ ( g r ) centered at g r . This property states that, given any radius ρ , SMILE creates a new representative girl g instead of including g into the cluster of g r . In fact, afterround t ( g ) , both g ∈ B ρ ( g r ) and g r ∈ B ρ ( g ) will simultaneously hold because d ( g , g r ) ≤ ρ . Thisevent may happen because while SMILE is looking for a cluster including g , there exists at least oneboy b (cid:48)(cid:48) ∈ B ( g, S ) ∩ F g r (see Section 4.1 – Phase I) such that σ ( g, b (cid:48)(cid:48) ) (cid:54) = σ ( g r , b (cid:48)(cid:48) ) . Clearly, thelarger the considered ρ , the more frequent this event is. Since SMILE operates without consideringany speciﬁc radius ρ , this fact holds for all values of ρ .Taking into account the greedy way SMILE constructs G rT , we have | G rT \ G r,ρT | ≤ C Gρ/ . In fact,given any optimal ρ -covering C Bρ/ , by the deﬁnition of G r,ρT , we know that at most one girl of G rT \ G r,ρT can be included in any ball of C Bρ/ . Now, since we know that | G rT \ G r,ρT | ≤ C Gρ/ , inorder to upper bound | G rT | in terms of C Gρ/ , we can bound | G r,ρT | . A union bound shows that theprobability that any girl g belongs to G r,ρT is upper bounded by ρS (cid:48) n . In fact, from the deﬁnition of G r,ρT , we know that there is already at least one girl g r in G rt ( g ) − such that g ∈ B ρ ( g r ) .Let F S (cid:48) ,g be the set of feedbacks received by g when | F g | becomes equal to S (cid:48) and SMILE veriﬁeswhether g can be part of a previously discovered cluster. For each boy b ∈ F S (cid:48) ,g , the probabilitythat σ ( b, g ) (cid:54) = σ ( b, g r ) is at most ρn . The probability that σ ( b, g ) (cid:54) = σ ( b, g r ) holds for all b ∈ F S (cid:48) ,g can therefore be bounded from above by | F S (cid:48) ,g | ρn = ρS (cid:48) n . Since | G | = n , the cardinality of G r,ρT istherefore upper bounded by ρS (cid:48) in expectation. Applying now a Chernoff bound, and taking intoaccount that S (cid:48) > S ≥ log n and that the radius ρ is at least when it is not null, we obtain that the By “optimal” we mean here a covering having a number of balls exactly equal to the covering number. | G r,ρT | ≤ ρS (cid:48) + 2 (cid:112) S (cid:48) ρ log n holds w.h.p. Hence, we conclude that C G = | G rT | ≤ C Gρ/ + ρS (cid:48) + 2 (cid:112) ρS (cid:48) log n ≤ C Gρ/ + 3 ρS (cid:48) , holds w.h.p. for all non-negative values of the radius ρ . Since C G is clearly upper bounded by n , wecan ﬁnally write C G ≤ min (cid:26) min ρ ≥ (cid:16) C Gρ/ + 3 ρS (cid:48) (cid:17) , n (cid:27) . By symmetry, we can use the same arguments as above for bounding C B . This concludes the ﬁrstpart of the proof.We now prove that T I = O (cid:0) n ( C G + C B + S (cid:48) ) (cid:1) . Let T BI be the number of rounds during which SMILE asks for feedback to boys in Phase I. T BI is bounded by the sum of the number of rounds usedto obtain n feedbacks for each girl in G rT , and the number of rounds to obtain S (cid:48) feedbacks for eachgirl in G \ G rT . These two quantities are upper bounded w.h.p. by O ( n | G rT | ) and O ( S (cid:48) | G \ G rT | ) = O ( S (cid:48) n ) , respectively. Hence, the total number of rounds SMILE takes for asking all feedbacks fromboys during Phase I is upper bounded w.h.p. by O (cid:0) n ( C G + S (cid:48) ) (cid:1) . Since T I ≤ T GI + T BI and T BI = O (cid:0) nC B + Sn (cid:1) , we conclude that T I = O (cid:0) n ( C G + C B + S (cid:48) ) (cid:1) . (14)We now show that under the assumptions of the theorem, the strategy of Phase II yields w.h.p. to match Θ( M ) users. For each cluster representative member, the number of feedbacks obtained by selectinguniformly at random users from the other side during Step (1) is equal to n . Hence, if we disregardedthe number of mispredicted matches, SMILE would recover w.h.p. at least M − O (cid:0) √ M log n (cid:1) matches selected uniformly at random from E M . The number of mispredicted matches quantiﬁedby Lemma 1 is equal to O (cid:16) n log( n ) S (cid:17) per user, which are caused by the fact that M is not recoveredexactly by SMILE , but only in an approximate manner. Denote by M ’s the approximation to M scomputed by Phase II. Using a Chernoff bound and the conditions M = ω (cid:16) n log( n ) S (cid:17) and S < n (which together imply M = ω ( n log( n )) as n → ∞ ), we have that the total number | E M (cid:52) E M (cid:48) | ofmispredicted matches satisﬁes w.h.p. | E M (cid:52) E M (cid:48) | ≤ M + O (cid:16)(cid:112) M log n (cid:17) + O (cid:18) n log( n ) S (cid:19) = 34 M + o ( M ) , where E M (cid:52) E M (cid:48) is the symmetric difference between the edge sets of M and M (cid:48) .Set for brevity d max = max u ∈ V deg M ( u ) . We now claim that d max − o ( d max ) ≥ deg M (cid:48) ( u ) (15)holds w.h.p. for each user u ∈ B ∪ G . The operations performed by SMILE guarantee that w.h.p. deg M (cid:48) ( u ) − deg M ( u ) = O (cid:16) n log nS (cid:17) holds for all u ∈ B ∪ G . In fact, for each user u ∈ B ∪ G ,the total number of users u (cid:48) on the other side who dislike u and are adjacent to u in M (cid:48) , is upperbounded w.h.p. by O (cid:16) n log nS (cid:17) , as Lemma 1 guarantees. Now, we have w.h.p. deg M (cid:48) ( u ) ≤ d max + O (cid:16)(cid:112) d max log( n ) (cid:17) + O (cid:18) n log nS (cid:19) = 14 d max + o ( d max ) + o (cid:18) Mn (cid:19) = 14 d max + o ( d max ) ≤ d max − o ( d max ) , O (cid:16)(cid:112) d max log( n ) (cid:17) arises from the application of a Chernoff bound and we took intoaccount that M = ω (cid:16) n log( n ) S (cid:17) , combined with S < n , implies d max = ω (log( n )) . This concludesthe proof of (15).During phase II, SMILE matches pairs of users corresponding to E M (cid:48) in a greedy way. If we can showthat each user u is served w.h.p. at least deg M (cid:48) ( u ) times then we are done. Now, since M ∗ T = M w.h.p., the Omniscient Matchmaker must be able to match w.h.p. all the users corresponding to E M in T rounds. This implies that in Steps (2 B ) and (2 G ) each user u ∈ B ∪ G is served w.h.p. at least d max times during the T rounds. Hence, each user is served w.h.p. at least d max − o ( d max ) timesduring the last T − T I = (1 − o (1)) T rounds, where T I is the time used by phase I. Recalling now(15) and (14), we conclude that T = ω (cid:0) n ( C G + C B + S (cid:48) ) (cid:1) rounds are always sufﬁcient to serveeach user u at least deg M (cid:48) ( u ) times, thereby completing the proof. B.4 Proof of Theorem 4Proof [ Sketch. ] Term M in the lower bound clearly derives from the fact that we need to match Θ( M ) users. When M is the dominant term, the bound is therefore trivially true. In the sequel, wethus focus on the case M = o (cid:0) n ( C G + C B ) (cid:1) , i.e., when the dominant term is n ( C G + C B ) .We show how to build a sign function σ such that the number of rounds needed to uncover Θ( M ) matches is Ω (cid:0) n ( C G + C B ) (cid:1) . First of all, we set σ ( g, b ) = 1 for all g ∈ G and all b ∈ B . Thisimplies C B = 1 . The matches depend therefore solely on the boy preference matrix B . We createan instance of B where, for ρ = 0 , the number of girls belonging to each cluster of the columnsof B is equal to nC G , i.e., all these clusters of girls have the same size nC G . Let d be any divisorof n . Without loss of generality, consider B after having rearranged its columns in such a waythat all column indices are grouped according to the girl clustering. More precisely, given any i ∈ { , , . . . , d − } , the column indices of B in the range (cid:2) i nd , ( i + 1) nd (cid:3) belong to the samegirl cluster. We obtain this way a block matrix B made up of ( nd ) -many blocks, where each blockis a submatrix having row and nd columns. We then choose uniformly at random (cid:4) mdn (cid:5) blocks,and set equal to all entries in each selected block. Finally, we set all the remaining entries of B to . With this random assignment, we have that in expectation C G equals d . In fact, since m ∈ (cid:0) n log( n ) , n − n log( n ) (cid:1) , we can always select at least d log( d ) -many blocks. By using aclassical Coupon Collector argument, we see that in expectation we have at least one block of entriesequal to (and one block of entries equal to , both selected uniformly at random) per set of nd columns grouped together as explained above. Note also that this way we have m − nC G < M ≤ m ,which is equivalent to m − nC G + C B − < M ≤ m , since C B = 1 .Assume now T = o (cid:0) n ( C G + C B ) (cid:1) , which is equal to o (cid:0) n ( C G ) (cid:1) in our speciﬁc construction.In this case, for any matchmaking algorithm A , the number of feedbacks from boys revealed inSteps ( G ) and ( B ) must be o (cid:0) n ( C G + C B ) (cid:1) = o (cid:0) n ( C G ) (cid:1) . This implies that, in expectation, thefraction of matches that are not covered by A is asymptotically equal to as n → ∞ . Hence, ourconstruction of σ shows that in order to uncover Θ( M ) matches in expectation, it is necessary tohave T = Ω (cid:0) n ( C G + C B ) + M (cid:1) , as claimed. B.5 Proof of Theorem 5Proof [ Sketch. ] We describe an efﬁcient implementation of

SMILE analyzing step by step the timeand space complexity of the algorithm. Without loss of generality, we focus on B and the operationsperformed on matrix B . Similar operations can be performed on G and G , so that the total time andspace complexity of the algorithm will be obtained by simply doubling the complexities computedwithin this proof (this will not affect the ﬁnal results because of the big-Oh notation).We create a balanced tree T whose records contain the feedbacks collected for all cluster representativemembers of G r during Phase I. More precisely, T contains all ordered sets of indices of B ’s columnsaccording to their lexicographic order. We insert each column one by one reading all its binary21igits. This way, we can quickly insert new elements while maintaining them sorted even within eachnode of T . At the end of this process, we will have C G records. The resulting time complexity is O ( n C G log C G ) ; the space complexity is O ( n C G ) .Each time we collect S (cid:48) feedbacks for a girl g , we check whether we can put her in a cluster basedon the available information. We look for one girl g r ∈ G r such that we have σ ( b, g r ) = σ ( b, g ) for all b ∈ F g r ∩ F g . If do not ﬁnd any such girl, we continue to collect feedback for g until | F g | = n , and thereafter we insert g in G r . This operation is repeated for all girls except forthe ﬁrst one. This is the computational bottleneck of the whole implementation. Overall, it takes O ( n ) · O ( S (cid:48) ) · O ( C G ) = O ( n C G S ) time. The space complexity is still O ( n C G ) , because of theuse of tree T .At the end of this phase, we create a matrix (cid:101) B ∈ { , } n × C G containing all the columns in T in thesame order. We also create two other ancillary data-structures: (i) An n -dimensional array A B whereeach record contains an integer in { , . . . , C B } , representing the estimated cluster of each boy. Array A B allows us to get in constant time the estimated cluster of each boy. (ii) A C G -dimensional array A (cid:48) B , where each record represents a distinct cluster of girls. The j -th entry A (cid:48) B [ j ] of A (cid:48) B contains theordered list of the indices of all girls belonging to the j -the estimated cluster.Symmetrically, for the girl preference matrix G , we will have matrix (cid:101) G and arrays A G and A (cid:48) G .Finally, we create a C B -by- C G matrix M which can be exploited in Phase II to match usersaccording to the information collected during Phase I. Matrix M represents, in a very compact form,the approximation to the matching graph M computed by Phase I. Speciﬁcally, entry M i,j containstwo ordered lists of user indices, L B ( i, j ) and L G ( i, j ) . The integers in L B ( i, j ) correspond to all boy indices that belong to the i -th cluster of B and that, according to what the algorithm estimates,are matching girls in the j -th cluster of girls. Symmetrically, L G ( i, j ) contains all the indices of thegirls belonging to the j -th cluster of girls matching boys of the i -th cluster. It is not difﬁcult to see thatusing the data-structures described so far, this matching matrix M can be generated by reading allelements of (cid:101) B and (cid:101) G only once, and its construction thus requires only O (cid:0) n ( C G + C B ) (cid:1) time. Thespace complexity of the matching matrix M is again O (cid:0) n ( C G + C B ) (cid:1) . To see why, ﬁrst observethat the number of entries of M is C G · C B < n ( C G + C B ) . As for the space needed to store theboy and girl lists contained in the entries of M , consider the following. Let us focus on boys only, asimilar argument can be made for girls. List L B ( i, j ) , stored in B i,j , must be a subset of the i -thestimated cluster of B . Since B is partitioned by SMILE into C B -many estimated clusters, call theseclusters B , . . . , B C B , we have that the total number of items contained in all the lists of the i -th rowof M can be upper bounded by | B i | · C G . Thus, the total number of items contained in the lists ofboys in M can in turn be upper bounded by (cid:88) ≤ i ≤ C B | B i | · C G = | B | · C G = n · C G . Hence, the space needed to store M is bounded by O (cid:0) C G C B + n C G + n C B (cid:1) = O (cid:0) n ( C G + C B ) (cid:1) , as claimed.During Phase II, we match users according to the information obtained from Phase I. The procedureis greedy, and can be efﬁciently implemented by maintaining, for each b ∈ B , a pointer p b that canonly move forward to the corresponding row of M . More precisely, p b scans the estimated matchesfor b contained in the corresponding row of M . Without loss of generality, assume b is contained inthe i -th estimated cluster of boys, and that L B ( i, j ) contains b . During each round where boy b isselected (in some Step ( B )), pointer p b moves forward in the list L G ( i, j ) , where M i,j is the currententry processed by SMILE during Phase II for b . If during the last round where b was selected, p b was pointing to the last element of L G ( i, j ) , then we continue to increment j until we ﬁnd an entry M i,j (cid:48) such that the associated list of boys L B ( i, j (cid:48) ) contains b . In order to ﬁnd such entry j (cid:48) , weperform ( j (cid:48) − j ) -many binary searches over the j (cid:48) − j lists L B ( i, j + 1) , L B ( i, j + 2) , . . . , L B ( i, j (cid:48) ) .Thereafter, we make p b point to the ﬁrst girl in list L G ( i, j (cid:48) ) . When p b reaches the end of the list ofgirls L G ( i, C G ) of the last column of M , SMILE predicts arbitrarily in all subsequent rounds where b is selected.The total running time for Phase II is O (cid:0) T + n ( C G + C B ) log n (cid:1) , where term O (cid:0) ( C G + C B ) log n (cid:1) is due to the dichotomic searches performed in the lists of M for22ach user of B ∪ G . To see why, let us refer to a speciﬁc boy b : The number of operationsperformed during Phase II is either constant, when p b moves forward inside a list of girls of M , or O (( j (cid:48) − j ) log( n )) , when SMILE is looking for the next list of boys L B ( i, j (cid:48) ) containing b startingfrom the lists of entry M i,j . Hence the overall time complexity becomes O ( T + n ( C G + C B ) log n + n S ( C G + C B )) = O ( T + n ( C G + C B ) S ) where we used S ≥ log( n ) .As for the amortized time per round, when T = ω (cid:16) n ( C G + C B ) + n log( n ) M (cid:17) this canbe calculated as follows. Since S = Θ (cid:16) n log( n ) M (cid:17) , the overall time complexity becomes O (cid:16) T + ( C G + C B ) n log( n ) M (cid:17) . Dividing by T = ω (cid:16) n ( C G + C B ) + n log( n ) M (cid:17) , we immediatelyobtain O (cid:18) T + ( C G + C B ) n log( n ) /MT (cid:19) = O (cid:18) C G + C B ) n log( n ) /Mω ( n ( C G + C B ) + n log( n ) /M ) (cid:19) = O (cid:18) C G + C B ) n log( n ) /Mω ( n log( n ) /M ) (cid:19) = Θ(1) + o ( C G + C B ) , which is the claimed amortized time per round. This concludes the proof. C Supplementary material on the experiments

Implementation of

SMILE . As we mention in Section 5, our variant I - SMILE : (i) deals with thecases where the input datasets is uniformly random, (ii) avoids asking arbitrary queries if morevaluable queries are available, and (iii) discovers matches during the exploration phase of thealgorithm.To achieve all these goals, we adapted the implementation of SMILE along different axes.First, we combined Phase I and Phase II of

SMILE . The high-level idea of this modiﬁcation is to startexploiting immediately the clusters once some clusters are identiﬁed, without waiting to estimateall of them. We only describe the process of serving recommendations to boys, the process for girlsbeing symmetric. We maintain for each b ∈ B a set of girl clusters C to-ask ( b ) for which we do notyet know the preference of b , and a set of girl clusters C veriﬁed ( b ) which we already know b likes.Whenever b logs in, if C veriﬁed ( b ) (cid:54) = ∅ we pick a cluster C ∈ C veriﬁed ( b ) and a girl g ∈ C , and ask b about g . If C veriﬁed ( b ) = ∅ and C to-ask ( b ) (cid:54) = ∅ we pick a cluster C ∈ C to-ask ( b ) , ask b his preferencefor any girl in C , remove C from C to-ask ( b ) , update the preference of b for cluster C accordingly, andﬁnally add C into C veriﬁed ( b ) if b likes cluster C . If, on the other hand, C veriﬁed ( b ) = C to-ask ( b ) = ∅ ,and there are no prioritized queries for b (see second modiﬁcation), we proceed as we would in PhaseI of SMILE (asking b for feedback that helps estimating the clusters). Whenever the exploration phasediscovers a new girl cluster C represented by g , we add C into C veriﬁed ( b ) if σ ( b, g ) = +1 , and into C to-ask ( b ) if b was not asked about g . Whenever a girl g is classiﬁed into an existing girl cluster C ,for the boys b (cid:48) that provided feedback for g and C ∈ C veriﬁed ( b (cid:48) ) we remove C from C veriﬁed ( b (cid:48) ) aswe now know whether b (cid:48) likes cluster C or not.Second, whenever we discover a positive feedback from b to g , we prioritize for g the feedback to b .The feedback received by such queries is taken into account when classifying users into clusters.Third, instead of having Phase II choose girl g arbitrarily (“else" branch in the pseudocode), we let I - SMILE choose girl g (cid:48) who likes b , and if no such g (cid:48) exists, we select g (cid:48)(cid:48) for whom we have not yetdiscovered whether she likes or dislikes b . If no such girls exists for b , then we serve an arbitrary girlto b . 23 m a t c h e s f o und (a) Dataset S-20-23. UROMM

OOMMI-SMILE m a t c h e s f o und (b) Dataset S-500-480.

UROMMOOMMI-SMILE m a t c h e s f o und (c) Dataset S-2000-2000.

UROMMOOMMI-SMILE m a t c h e s f o und (d) Dataset RW-1007-1286.

Figure 3:

Empirical comparison of the three algorithms I - SMILE , OOMM , and

UROMM on the remaining datasetsconsidered in this paper. Each plot reports number of disclosed matches vs. time (no. of recommendations).

Finally, whenever we compare the feedbacks received by two users, say girl g and g r ∈ G r , in orderto determine whether g belongs to the cluster of g r , we amended as follows. We insert g into thecluster of g r by requiring that σ ( b, g ) = σ ( b, g r ) holds at least for (cid:16) | F g ∩ F g r | (1 − n ) ) (cid:17) -manyboys in F g ∩ F g r , in place of all boys belonging to F g ∩ F g r . This modiﬁcation aims to cope withthe problem of clustering similar users into different clusters due to a very small value in | F g (cid:52) F g (cid:48) | ,that is, the number of boys that like only one out of g and g (cid:48) . In the real-world dataset that we use,we noticed that if we allow no boys to disagree on their feedback to two girls, then the number of girlclusters is almost equal to | G | , while allowing a small number of disagreements (that is, a fraction n ) of the total number of boys) the number of girl clusters reduces drastically. Recall the last sixcolumns of Table 1. The same holds for clusters over boys when we consider feedback from girls. Further experimental results.

In Table 2 we give the area under the curve metric, which sumsover time the number of matches that are uncovered at each time-step t , divided by the total numberof time-steps. This metric captures how quickly, over average, the different algorithms disclosematches. Figure 3 contains the plots on the remaining datasets described in Section 5.24 lgorithm S-20-22 S-95-100 S-500-480 S-2000-2000 RW-1007- RW-1526- RW-2265- UROMM K K K K . K . K . K OOMM K K K K . K . K . K I- SMILE K K K K . K . K . K Table 2: