[PDF] Maximizing Agreements for Ranking, Clustering and Hierarchical Clustering via MAX-CUT

Abstract

In this paper, we study a number of well-known combinatorial optimization problems that fit in the following paradigm: the input is a collection of (potentially inconsistent) local relationships between the elements of a ground set (e.g., pairwise comparisons, similar/dissimilar pairs, or ancestry structure of triples of points), and the goal is to aggregate this information into a global structure (e.g., a ranking, a clustering, or a hierarchical clustering) in a way that maximizes agreement with the input. Well-studied problems such as rank aggregation, correlation clustering, and hierarchical clustering with triplet constraints fall in this class of problems. We study these problems on stochastic instances with a hidden embedded ground truth solution. Our main algorithmic contribution is a unified technique that uses the maximum cut problem in graphs to approximately solve these problems. Using this technique, we can often get approximation guarantees in the stochastic setting that are better than the known worst case inapproximability bounds for the corresponding problem. On the negative side, we improve the worst case inapproximability bound on several hierarchical clustering formulations through a reduction to related ranking problems.

Full PDF

MMaximizing Agreements for Ranking, Clusteringand Hierarchical Clustering via MAX-CUT

Vaggos Chatziafratis ∗ Google Research NY Mohammad Mahdian † Google Research NY Sara Ahmadian ‡ Google Research NYFebruary 24, 2021

Abstract

In this paper, we study a number of well-known combinatorial optimization problems that ﬁt in thefollowing paradigm: the input is a collection of (potentially inconsistent) local relationships between theelements of a ground set (e.g., pairwise comparisons, similar/dissimilar pairs, or ancestry structure oftriples of points), and the goal is to aggregate this information into a global structure (e.g., a ranking,a clustering, or a hierarchical clustering) in a way that maximizes agreement with the input. Well-studied problems such as rank aggregation, correlation clustering, and hierarchical clustering with tripletconstraints fall in this class of problems. We study these problems on stochastic instances with a hiddenembedded ground truth solution. Our main algorithmic contribution is a uniﬁed technique that usesthe maximum cut problem in graphs to approximately solve these problems. Using this technique, wecan often get approximation guarantees in the stochastic setting that are better than the known worstcase inapproximability bounds for the corresponding problem. On the negative side, we improve theworst case inapproximability bound on several hierarchical clustering formulations through a reductionto related ranking problems.

In many learning/optimization problems, the input data is in the form of a number of ordinal judgementsabout the local relationships among a set of n items. A prominent example is the problem of ranking n alternatives, where the input is often pairwise comparisons between these items. For example, sports teamsare often ranked by aggregating the results of matches played between pairs of teams, and election outcomesare decided by aggregating individual votes.Learning from comparisons has been prevalent across diﬀerent domains, as humans are typically good atquickly answering ordinal questions (“which movie/restaurant/candidate do you prefer”), but often respondslowly and inaccurately to cardinal questions (“how much do you like this option”). In the psychologyliterature, the method of paired comparisons that has been in use since the 1920’s is based on this principle(see [Thu59, Chapter 7]). Moreover, modern online platforms can organically extract such ordinal preferencesby observing the users (e.g., “which movie did they ﬁrst watch”, or “did they skip a search result andclick on the next one”) and later use them for improving search or recommendation rankings (see, forexample, [Joa02]). The same principle applies to settings other than ranking. For example, when trying tolearn a clustering of n items, it is easier for a human judge to answer questions of the form “should x and y be in the same cluster” than to measure the similarity of x and y . Or, to reconstruct the evolutionary tree(also known as the phylogenetic tree) between n species, biologists often start by answering questions of theform “between three species x, y , and z , which two are evolutionarily closer”.At the heart of each of these examples is the non-trivial algorithmic task of reconciling potentiallyinconsistent judgements into a global solution. This deﬁnes a number of algorithmic problems that we studyin this paper. Though seemingly unrelated, all of these problems seek to ﬁnd a global structure that has the ∗ [email protected] and [email protected] † [email protected] ‡ [email protected] a r X i v : . [ c s . D S ] F e b aximum number of agreements with the given collection of local ordinal relationships. As we shall see laterin the paper, the problems are also linked in that we can apply a common technique (based on graph maxcut) to them all. The problems, shown in Figure 1, fall under the three categories of ranking, clustering,and hierarchical clustering: • Ranking:

The goal is to ﬁnd an ordering of n items. In the Maximum Acyclic Subgraph (

Mas ) , theinput is a number of pairwise comparisons of the form a < b . In Betweenness , the input is a numberof triples a | b | c meaning that b is between a and c in the ordering. In Non-Betweenness , the input is anumber of triples b | ac meaning that b is not between a and c . • Clustering:

In the

Correlation Clustering problem, the goal is to ﬁnd a partitioning of n items, andthe input is a number of pairs of the form ab , meaning that a and b should be in the same cluster, anda number of pairs of the form a | b , meaning that a and b should be in diﬀerent clusters. • Hierarchical clustering:

The goal is to ﬁnd a (rooted or unrooted) tree with the set of n items as itsleaves. In the Desired Triplets problem, the input is a number of triplets ab | c , meaning that the leastcommon ancestor of a and b is a descendant of the least common ancestor of a, b , and c . In the DesiredQuartets problem, the input is a number of quartets ab | cd , meaning that the unique path connecting a and b in the tree does not intersect with the unique path connecting c and d . The Forbidden Triplets and

Forbidden Quartets problems are deﬁned similarly with the opposite requirements.These problems come from a variety of applications:

Mas is a formulation of the rank aggregationproblem and has many applications, e.g., in search ranking. Correlation Clustering is a central problem inunsupervised learning and data analysis [BBC04]. Hierarchical clustering problems are motivated by appli-cations in reconstructing phylogenetic trees [Fel04], and are also related to the objective-driven formulationsof [Das16], [MW17] and [CAKMTM19] for hierarchical clustering. In fact, the Desired Triplets formulationdescribed above is tightly connected with objective-based approaches for Hierarchical Clustering as can beseen in [CCN19, CCNY19]. Betweenness and Non-Betweenness are motivated by applications in genomesequencing in bioinformatics [SKSL97]. We are interested in algorithms that can provide an approximationguarantee, i.e., a provable bound on the multiplicative factor between the solution found by the algorithmand the optimal solution. We will consider this problem both in the worst case and under a stochastic modelwith an embedded ground-truth solution.

Main Results:

Our contribution is two-fold (see Table 1 for a summary): On the positive side, in Section 3,under a simple stochastic model akin to the well-known stochastic block model , we are able to improve uponworst-case approximations for all problems and in some cases (e.g., for problems on rankings and hierarchies)even overcome impossibility results. Interestingly, our algorithms are all based on variants of

MaxCut ongraphs that can have both positive and negative weights and may also be directed. Some approaches fortree reconstruction based on

MaxCut had been used in previous experimental works [SR06, SR08, SR12],and in this way our work provides concrete proof for why these heuristics are reported to perform wellon “real-world” instances. Our natural stochastic model captures “real-world” instances via an embeddedground-truth from which we generate “noisy” constraints, similar to the Stochastic Block Model [MNS12] incommunity detection.On the negative side, we obtain new hardness of approximation results for four problems on hierarchicalclustering: Forbidden Triplets, Desired Triplets, Forbidden Quartets, Desired Quartets. Brieﬂy, we may referto them as triplets/quartets consistency problems. These are instances of Constraint Satisfaction Problems(CSP) on trees [BM10, BJVP16], analogous to SAT formulas in complexity. Even though such problems onhierarchies have been studied for decades, the current best approximations are achieved by trivial baselinealgorithms. Our hardness results give some explanation why previous approaches were not able to obtainanything better. Our result on the Forbidden Triplets problem is tight and is the ﬁrst tight hardness forCSPs on trees, extending analogous hardness results by [GHM +

11] from linear orderings (i.e., rankings) totrees. This is carried out in Section 4.Our stochastic model for collecting information is the simplest form of embedded model on n items, and ismotivated by crowdsourcing and biological applications [Vau17, KvL17, GPvL19, SY12]. We simply chooseitems at random and include a pairwise/triplet/quartet constraint depending on the task. For example, to2enerate constraints for the Mas problem on rankings, let π ∗ denote a ground-truth ranking (e.g., of chessplayers or ads to show a user). We select uniformly at random m pairs of items a i , b i and then we generate m pairs a i < b i ; if a i precedes b i in π ∗ the constraint is included with probability (1 − ε ), otherwise the oppositeconstraint is generated. Thus, some fraction of the constraints can be erroneous. After generating m (noisy)constraints in this way, our goal is to ﬁnd a global solution (ranking, partition, or tree) that satisﬁes as manyas possible.Figure 1: A schematic representation of all problems considered in the paper. The left column has theproblem names, the middle the types of constraints and the right column has a candidate solution. Withgreen are constraints that are correctly resolved in the given candidate solution, whereas with red are thosethat are incorrect. For more examples, see Section 2. Techniques:

Our hardness reductions for Maximum Forbidden Triplets consistency are based on mappingtrees to permutations on their leaves and back, and showing that any constant factor improvement overtrivial baselines would refute the Unique Games Conjecture ( Ugc ) [Kho02]. Regarding our

MaxCut algorithm (see Algorithm 1), it is based on

MaxCut variations on directed and undirected graphs withnegative weights and is conceptually simple. Brieﬂy, given an instance for any of the problems we consider,we map it to a graph where edges encode the underlying constraints; perhaps the most intuitive suchconstruction is for Correlation Clustering where a “must-link” or “cannot-link” constraint between items i, j is captured by a negative or positive edge ( i, j ) respectively. Then, we show how large (positive) cuts in thisgraph yield partitions that satisfy many of the constraints. The existence of a large cut can be guaranteedby analyzing our stochastic model and so an approximate

MaxCut algorithm can yield improvements overprevious results. An interesting ingredient that we need for the case of

Mas , is how to approximate the

MaxCut problem on directed graphs with both positive and negative weights which, to the best of ourknowledge, hadn’t been analyzed before.More broadly, we justify theoretically why prior experimental heuristics work and we extend them towork for new problems with provable approximation guarantees. Our work also presents the ﬁrst case of aCSP on trees that is approximation resistant; recall that many important CSPs, including Max3SAT, are Khot’s

Ugc is a major open question in complexity. We will not deﬁne it here as we only use some of its consequences onordering problems [GHM + ε .Approx. Hardness Stochastic Mas

Btw non-Btw

Correl. Cl. 0.76 APX-hard (*)Forb. Triplet 2/3 (*)Des. Triplet 1/3 (*)Forb. Quartet 2/3

Des. Quartet 1/3

Remark 1.

We want to point out that all our approximation results here hold with high probability asa standard concentration argument about the stochastic process guarantees that the weight of the cuts iswell-concentrated around its mean (as long as the number of generated constraints m ≥ Ω(log n ) ). Remark 2.

Our results for ranking and quartets hold with no assumption on the optimal solution. Forthe positive results (denoted with (*) in Table 1) via

MaxCut for correlation clustering and triplets how-ever, we need a mild balancedness assumption, roughly stating that the optimal solution contains a relativelybalanced (cid:0) : (cid:1) partition, to ensure the existence of a good cut in the ground-truth (see Appendix, Assump-tion 1). Usually, such assumptions are common in generative graph models for clustering, e.g., the Stochas-tic Block Model [MNS12, ABH15] and for hierarchical clustering, e.g., the Hierarchical Stochastic BlockModel [LTA +

16, CAKMTM19, GPvL19], where we expect to see at least two large communities emerge.

As the paper discusses multiple problems on rankings, partitions and hierarchies, we devote this section indescribing the multitude of problems. A familiar reader can skip this section and proceed to Section 3.There are 3 categories of problems we study here, depending on the type of the output: ranking (alsocalled a permutation or a leaf ordering in biology [BJGJ01]), clustering (partitioning of the data points) andhierarchical clustering (also called phylogenetic tree ). There has been signiﬁcant amounts of work on each ofthese tasks, that we only partially cover here as we go over our problems and results.

In all problems, we are given m constraints and we want to maximize the number of constraints satisﬁedby our output, whether it be a ranking, a partition or a hierarchy. We describe below the types of diﬀerentconstraints (see also Figure 1): Ranking (i.e., a permutation or leaf ordering):

Given n labels { , , . . . , n } , we want to ﬁnd apermutation that maximizes the number of satisﬁed constraints of the following form: • Pairwise comparisons : A constraint here is of the form “ a < b ”, indicating that in the outputpermutation, item a should precede b . If this information is encoded as a directed graph G with arcs a → b , this gives rise to the Maximum Acyclic Subgraph ( Mas ) or Feedback Arc Set (

Fas ), twofundamental problems in computer science [Kar72].4

Betweenness (BTW) and Non-Betweenness (Non-BTW) constraints : In the

Btw prob-lem [Opa79, CS98, Mak12], we are given relative ordering constraints of the form a | b | c indicating “ b should be between a and c ”. This allows for abc or cba out of the 6 possible orderings for the 3 labels.As the name suggests, non-Btw is the complement of Btw , where a constraint bc | a (equivalently a | bc )indicates that in the output permutation “ a should not lie between b and c ”. This allows for 4 validrelative orderings abc, acb, bca, cba . Generally, these are the two most common examples of orderingConstraint Satisfaction Problems (ordering CSPs) of arity 3 and are mainly motivated by applicationsin bioinformatics [SKSL97]. They have also played a major role in complexity [GHM +

11, AMW13].Just to give a sense of the approximability of these problems in the worst-case, the current best constantfactor is a -approximation for Mas , a -approximation for Btw , and a -approximation for non-Btw ,all achieved by a random permutation. We also know that under the Unique Games Conjecture ( Ugc )of [Kho02], the ﬁrst two results are tight, whereas the third is tight under P (cid:54) = NP. Such problems, wherea random output is provably the best, are called approximation resistant and have been studied extensivelyby theoreticians [CGM09, GMR08, H˚as01, AM09]. Our work gives strong evidence pointing to the fact thatimportant CSPs on trees (triplets/quartets) may be approximation resistant.

Clustering:

Here we want to maximize agreements with

Must-Link/Cannot-Link constraints: Theinput is a graph with “+” or “ − ” edges indicating if the two endpoints should belong to the same clusteror not. Such constraints give rise to Correlation Clustering, an important paradigm for data analysis bothin practice [DB07, WC00, WCRS01] and theory [BBC04, ACN08, CGW05, Swa04]. The current best formaximizing agreements is a 0 . . Hierarchical Clustering (i.e., phylogenetic trees):

There are two common types of trees: rooted andunrooted. Given n data points, a rooted binary tree on n leaves, where each leaf corresponds to a datapoint, is usually called a hierarchical clustering and is a standard tool for data analysis across diﬀerentdisciplines [SKK00, LRU14, TLM10, SPT + phylogenetic trees and are prevalent in computational biologyas they describe speciation events throughout the evolution of species [Bry97, Fel04]. Here we will use thetwo terms interchangeably to describe hierarchies on n leaves. Since in a hierarchy all data are eventuallyseparated at the leaves, pairwise constraints no longer make sense and the analogue of “must-link/cannot-link” are so-called “must-link-before/cannot-link-before” constraints: • Desired/Forbidden Triplets:

The output here is a rooted binary tree T on n leaves. We say atriplet relation “ t = ab | c ” is obeyed by T (or T obeys t ), if the lowest common ancestor (LCA) of a, b is a descendant of the LCA of a, c in T . Otherwise T disobeys ab | c . A triplet can be desired (we write t ∈ T D ) and we want the output T to obey it or forbidden (we write t ∈ T F ) and we want T todisobey/avoid it, giving rise to important optimization problems studied in computational biology andgraph theory under the name of rooted triplets consistency [Ste92, Bry97, BGJ10, HHJS06]. Noticethat a forbidden triplet ab | c is less restrictive, since it only speciﬁes that T should either obey ac | b or bc | a , but not ab | c . This is reﬂected in the complexity of the problems: given a set of forbidden triplets,it is NP-complete to check consistency (i.e., if there is a tree avoiding all of them), whereas checkingconsistency of desired triplets in polynomial time was established long ago by [ASSU81]. • Desired/Forbidden Quartets:

The desired output here is a ternary unrooted tree T . We say aquartet q = ab | cd is obeyed by T (or T obeys q ) if the (unique) path from a to b in T does not shareany vertices with the (unique) path from c to d in T . Otherwise T disobeys q . Similarly to triplets, aquartet can be desired ( q ∈ Q D ) or forbidden ( q ∈ Q F ), giving rise to important quartets consistencyproblems in biology and graph theory [Fel04, Bry97, JKL01, SR06]. For both problems, even if theinput is consistent, checking consistency is NP-complete. For example, “penguin, dolphin | tiger” could be a desired triplet as the tiger is the least relevant item. -approximation and for forbidden triplets or quartets, the current best is a -approximation.Embarrassingly, in all four cases these are achieved by a random (rooted or unrooted) tree or a simplegreedy construction [HHJS06]. Here, we further make a comparison to other relevant works. For ranking, many diﬀerent types of probabilisticmodels have been considered [BM09, SBGW16, SW17, NOS12, FOPS17] giving statistical guarantees forreconstructing the desired permutation. Instead of pairwise comparisons, the problem has also been studiedin the case where partial rankings or complete information (“tournaments”) is provided [FKM +

06, Ail10,KMS07]. Clustering with constraints and qualitative information (both max and min versions) were studiedin [BBC04, CGW05] where approximations via linear programs were derived or practical improvements weremade possible [WCRS01, WC00]. In crowdsourcing and biological applications, both triplet and quartetsqueries have been deployed [VH16, Vau17, KvL17, GPvL19, SR06, Bry97] as they can be more intuitive fornon-expert users compared to pairwise comparisons. Semi-supervised models, where triplet queries dependon answers to previous queries have been studied in [EZK18, VD16].To further motivate our stochastic model and results, we include a slightly more detailed comparisonwith 3 important prior works [BM09, EZK18, SY12] that study “ground-truth” stochastic models similarto ours. The authors in [BM09] study the ranking problem and assume that there exists a ground-truthranking π ∗ , as we do. However, their stochastic model assumes either that we have access to all pairwisecomparisons, or that we have access to complete rankings σ on the n items, where each complete ranking σ isgenerated with probability inverse exponential in the Kemeny distance between π ∗ and σ (Kemeny distanceis the number of inversions, i.e., the number of pairs ordered in π ∗ diﬀerently from σ ).As it will become obvious, their assumptions are much stricter than our simple stochastic model thatgenerates m pairwise comparisons uniformly at random. Moreover, notice that our approximation guaranteeshold for any number m of given constraints without requiring it to be Ω( n ). Given their more reﬁned model,they are of course in a position to analyze the maximum likelihood estimator and prove approximate recoveryresults, e.g., that no element is misplaced by more than log n positions with high probability; however noguarantees are given for the number of violated pairwise constraints, which is the focus of our paper.For triplets hierarchical clustering, the authors in [EZK18] assume there exists a ground-truth binarytree T , as we do. However, they are allowed adaptive triplet queries and show that ≈ n log n such queriessuﬃce to recover T using a clever partition algorithm similar to Quickselect and Quicksort. Once again,our model is not adaptive, and we do not pose any constraints on the number m of given constraints. Forquartets hierarchical clustering, our model is similar to [SY12], but we generalize their results to hold bothfor forbidden and desired quartets.Finally, our constrained version of Hierarchical Clustering based on triplet constraints was studiedin [CNC18] under the assumption that the input contains pairwise similarities as well as triplet constraints. We present our main strategy

MaxCut behind our positive results. As we will see, by modifying the graphs,our method is ﬂexible to allow for combinations of constraints, e.g., both

Btw and non-Btw constraintsfor rankings, or both desired and forbidden triplets (or quartets) for trees.

Stochastic Model for Generating Constraints:

Since our goal is to beat the worst-case approximationand hardness results, we use a simple stochastic model with an embedded ground-truth solution on n items.The form of the ground-truth changes depending on which problem we consider; it can be a ranking (for Mas , Btw , non-Btw ), a partition (for Correlation Clustering) or a hierarchical tree (rooted for Tripletsand unrooted for Quartets). For generating the m input constraints, we simply choose items at random andwith probability (1 − ε ) we add a pairwise/triplet/quartet constraint that is consistent with the ground-truth, otherwise with probability ε we add an erroneous constraint on the selected items. For example, inthe introduction, we saw the Mas constraints. Similarly, for

Btw , we would uniformly at random pick m a, b, c and then add w.p. (1 − ε ) the constraint a | b | c if b appears in between a and c in theground-truth ordering. Also, for the Triplets Consistency problem, we would again uniformly at randompick m triples of items a, b, c and then add w.p. (1 − ε ) the constraint ab | c if c is separated ﬁrst from a, b inthe ground-truth (rooted binary) tree. For all problems, after getting m (noisy) constraints in the analogousmanner, our goal is to ﬁnd a global solution that satisﬁes as many constraints as possible. Positive Results:

Using our stochastic model we can escape worst-case impossibility results and for all 3categories of problems, we present improved approximation algorithms. At a high-level, we ﬁrst construct agraph by encoding each of the local constraints on the items as a set of positive or negative edges betweenthem. The graph captures the desired relationships and then, we ﬁnd a good ﬁrst split maximizing theratio of satisﬁed over violated constraints by the cut. Naturally, our algorithm

MaxCut (see Algorithm 1)is based on variants of

MaxCut on graphs with negative weights. An interesting building block in ouranalysis when solving for better Maximum Acyclic Subgraphs, is the directed

MaxCut problem on graphswith negative weights which, to the best of our knowledge, hadn’t been analyzed before. We note that for thetriplets problem on trees, analogous

MaxCut heuristics had been successfully used before in experimentalwork for computational biology, however with no theoretical guarantees [SR06, SR12, SR08]. An exception isthe work of [SY12], where they focus only on the desired quartets problem, however their analysis is a specialcase of ours for when Q F = ∅ (i.e., the input contains no forbidden quartets). Our ﬁnal approximationscircumvent known hardness results for the case of rankings [GHM +

11] and our new hardness results for treesdescribed in detail later in Section 4.

We start with

Mas as it is perhaps the easiest to describe (see also Algorithm 1):

Theorem 3.

Given m constraints generated according to our stochastic model on n items, MaxCut satisﬁesat least (0 . − . ε ) m on average, where ε is the fraction of erroneous comparisons. If moreover m ≥ Ω(log n ) , the result holds w.h.p. Remark 4.

For example, if the error parameter ε = 0 . , hence of the m generated constraints areerroneous, we still satisfy ≈ of them, and we still beat the previous best -approximation together withthe known hardness [GMR08]. Our general proof template has 5 steps: • Building a graph: For a sampled constraint a < b indicating that a should precede b in the ranking,we add two directed edges:+1 directed from a → b, − b → a Since the problem has orientation, we deﬁne the weight of a directed cut ( S, ¯ S ) as the sum of all(positively or negatively) weighted arcs going from S to ¯ S (and we ignore the arcs going from ¯ S to S ). • Cuts and constraints: The goal of constructing the graph is to use information about its cuts and relatethem to the pairwise constraints. Notice that a cut ( S, ¯ S ) can either obey, disobey or leave unaﬀectedthe status of a a < b constraint, depending on if a or b belongs to S or ¯ S . Let m s , m v denote thesatisﬁed, violated constraints by the cut, respectively. The weight of any directed ( S, ¯ S ) cut is thus: w ( S, ¯ S ) = m s ( S, ¯ S ) − m v ( S, ¯ S ) (1)as satisﬁed pairs m s (with a ∈ S, b ∈ ¯ S ) contribute +1 and violated pairs m v (with a ∈ ¯ S, b ∈ S )contribute − • Lower Bounding

MaxCut : The constructed graph from the ﬁrst step, is directed and has both positiveand negative weights. Based on eq. (1), we should ﬁnd a large cut in this graph as this translates to manysatisﬁed constraints. In order to ﬁnd the cut, we use a

MaxCut variant that ﬁnds a cut comparable7o the optimal max cut in graphs that are directed and contain both positive and negative weights.However, we cannot use the standard Goemans-Williamson algorithm and guarantees [GW95], as thegraph is directed with positive and negative weights. A new ingredient in our proof is a semideﬁniteprogramming relaxation and analysis for this variant that achieves: E ( w ( S, ¯ S )) ≥ . w ( OPT ) − . · W − (2)where w ( OPT ) is the weight of the optimum cut and W − is the total negative weight in the graph inabsolute value. Based on the graph construction in this case, W − = m as every constraint contributeda − .

143 and 0 .

857 sum to 1, and they just arise fromthe rounding scheme used to obtain an integral solution from the relaxation. • Now that we have a lower bound for w ( S, ¯ S ) based on the optimum cut, in order to conclude thealgorithm’s cut is large (and hence satisﬁes many constraints), we need to lower bound the optimum’scut weight w ( OPT ). To do this we consider the weight of a median directed cut: the median cut isdeﬁned to be the one that assigns the ﬁrst n/ Mas , on one sideof the cut, and the rest n/ ≈ m of the generated constraints are satisﬁed by the median cut and hence also by OPT .To see this, observe that for nearly half of the a < b constraints, a belongs to the ﬁrst n/ b belongs to the remaining n/ OPT is by deﬁnition even betterthan the median cut, we get that it has a large cut value. If we wanted to be slightly more precise, weshould say that due to errors in an ε fraction of the generated constraints, we actually lose a small ε fraction of the constraints (we defer details to Appendix A) but this discounts the optimum cut onlyby a small amount. • Output of

MaxCut : Finally, we need to ﬁnd a good permutation overall, not just a good top split.Our algorithm starts by ﬁnding an approximate

MaxCut ( S, ¯ S ) in G and then proceeds by outputtinga random permutation on the items in S and in ¯ S and concatenating them. Finally, we can computethe overall value of ALG (dropping the notation with ( S, ¯ S )): ALG = m s + m u == m s + ( m − m s − m v ) = m + ( w ( S, ¯ S )) (3)where m u are the constraints that were unaﬀected by the ( S, ¯ S ) cut. By eq. (3), we already see thatwe get some advantage over the m baseline which is optimal in the worst-case (and is achieved by arandom permutation on all n items). Remark 5.

A natural question is to attempt to use MaxCut repeatedly on each of the two generated partsof the ﬁrst split. However analyzing the repeated MaxCut approach is not that simple, as once the ﬁrstapproximate MaxCut is performed, there is no randomness in the two generated subgraphs that we canexploit. Analogous diﬃculties arise in dissimilarity-based and quartets-based hierarchical clustering [CCN19,SY12, ACE + + The same proof template as presented here can be modiﬁed to deal with the remaining problems:

Btw , non-Btw , forbidden and desired triplets, forbidden and desired quartets. As each of these constraints,involve 3 or 4 points, the construction and analyses become more involved. We present brieﬂy the mainmodiﬁcations for the graph construction (see Appendix A for details).For a Btw constraint { a | b | c } , we add undirected edges: +2 for ( a, c ) and − b, a ) , ( b, c ). The edgescapture that a cut violates the constraint if it separates b from a, c . For a non-Btw constraint { ab | c } indicating that c should not be between a, b in the ﬁnal ordering, we add the following 3 undirected edges:+18 lgorithm 1 Our

MaxCut template as instantiated for

Mas . Input: m pairwise constraints for Mas . For each a < b constraint, insert a +1 arc directed from a → b and another arc with negative weight − b → a . Call the resulting graph G . Run our approximate

MaxCut algorithm suitable for directed graphs with negative weights to get aﬁrst split ( S, ¯ S ), satisfying eq. (2). Construct a random permutation π on the nodes in S and a random permutation π on the nodes in¯ S . Let π be the ranking obtained by concatenating π and then π . Return π .for pairs (c,a),(c,b) and -2 for the pair (a,b). Recall, that for Btw and non-Btw , the ultimate goal is tobeat the factors and which are currently optimal in the worst-case: Theorem 6.

Given m = Ω(log n ) noisy constraints on n items, variations of MaxCut satisfy at least (0 . − . ε ) m and (0 . − . ε ) m constraints w.h.p. for Btw and non-Btw , respectively, where ε is the fraction of erroneous constraints. For Correlation Clustering, for each

Cannot-Link constraint ab , we add a +1 for ( a, b ), and for each Must-Link constraint ab , we add − . a, b ). The chosen numerical value − . Theorem 7.

Given m = Ω(log n ) noisy “must-link/cannot-link” constraints on n items, MaxCut (modiﬁedappropriately) satisﬁes at least (0 . − . ε ) m constraints w.h.p., where ε is the fraction of erroneousconstraints. Analogous theorems hold for the Triplets/Quartets consistency problems. Due to space constraints, weomit the statements but we refer the reader to Table 1 for the ﬁnal ratios and to Appendix A for the proofs.

Negative Results:

As mentioned, previous work [BGJ10, JKL01, Bry97, HHJS06, Ste92] tried to getbetter approximations for triplets/quartets consistency compared to trivial baselines. Recall, that the trivialbaseline is to simply output a random tree (either rooted or unrooted depending on the problem). In ourpaper, near optimal hardness of approximation results for the maximum desired/forbidden triplets/quartetsconsistency problems (4 problems in total) are presented shedding light to why, despite signiﬁcant eﬀortsfrom diﬀerent communities, no improvement had been made for nearly thirty years. As a consequence, weget the ﬁrst tight hardness for an ordering problem on trees, thus extending the work of [GHM +

11] fromorderings on the line to hierarchical clustering.Speciﬁcally, for maximizing forbidden triplets, we show that no polynomial time algorithm can achieve aconstant better than -approximation. Similar to [GMR08, GHM +

11] this is assuming the Unique GamesConjecture, however for maximizing desired triplets, we show a threshold of , assuming P (cid:54) = NP. The abovealso implies that forbidden triplets is approximation resistant as a random tree also achieves a factor. Infact our hardness results for all 4 problems are stronger, as we show it’s not possible to distinguish almostperfectly consistent inputs from inputs where the optimum solution achieves almost the same as a randomsolution.Technically, in order to get the hardness results, we give algorithms to obtain permutations on the leavesof a tree, such that if the tree obeyed many triplet/quartet constraints, then the permutation would alsoobey a large fraction of them when viewed as appropriate ordering constraints. Speciﬁcally, we prove thatunder the Ugc , it is hard to approximate the Forbidden Triplets Consistency problem better than a factorof , even in the unweighted case. Fact 1.

Let K be the total number of triplet constraints in an instance of Btw . For any (cid:15) > , it isUGC-hard to distinguish between Btw instances of the following two cases:

YES : val ( π ∗ ) ≥ (1 − (cid:15) ) K , i.e. the optimal permutation satisﬁes almost all constraints. NO : val ( π ∗ ) ≤ ( + (cid:15) ) K , i.e. the optimal permutation does not satisfy more than 1/3 fraction. + -inapproximability result for Forbidden Triplets: Theorem 8.

Let K be the total number of the triplet constraints in an instance of Forbidden TripletsConsistency. For any δ > , it is UGC-hard to distinguish between the following two cases: YES : val ( T ∗ ) ≥ (1 − δ ) K , i.e. the optimal tree satisﬁes almost all the triplet constraints. NO : val ( T ∗ ) ≤ ( + δ ) K , i.e. the optimal tree does not satisfy more than fraction of triplets.Proof. Start with a YES instance of the

Btw problem with optimal permutation π ∗ and val ( π ∗ ) ≥ (1 − (cid:15) ) K .Viewing each Btw constraint a | b | c as a forbidden triplet ac | b , we show how to construct a tree T such that val ( T ) ≥ (1 − δ ( ε )) K . In fact, the construction is straightforward: simply assign the n labels, in the orderthey appear in π ∗ , as the leaves of a caterpillar tree (every internal node has its left child being a leaf).Observe that this caterpillar tree satisﬁes: val ( T ) ≥ (1 − (cid:15) ) K . This is because if a Btw constraint a | b | c wasobeyed by π ∗ , it will also be avoided (viewed as a forbidden triplet ac | b ) by the caterpillar tree above: if a appears ﬁrst in the permutation then the caterpillar will avoid ac | b as a gets separated ﬁrst, otherwise if c appears ﬁrst, then again the caterpillar tree will avoid ac | b as c gets separated ﬁrst.The NO instance is more challenging. Start with a NO instance of the Btw problem with optimal π ∗ ofvalue val ( π ∗ ) ≤ ( + ε ) K . Viewing the Btw constraints as forbidden triplets, we show that the optimumtree T ∗ cannot achieve better than > (2 / (cid:15) ) K , because this would imply that val ( π ∗ ) > ( + ε ) K ,which is a contradiction. For this, assume that some tree T scored a value val ( T ) > (2 / (cid:15) ) K . We willconstruct a permutation π from the tree T with value val ( π ) > (1 / (cid:15) ) K , a contradiction. Notice thatthere are forbidden triplets that may be avoided by the tree, yet obeyed by the permutation: for examplefor a forbidden triplet t = ac | b , the tree R that ﬁrst removes a and then splits b, c will successfully avoid t ,however the permutation acb can come from R by projection, however acb does not obey the Btw constraint a | b | c . Hence directly projecting the leaves of T onto a line may not satisfy > (1 / (cid:15) ) K , since everyforbidden triplet ac | b avoided by T , can be ordered by this projected permutation in a way that wouldnot obey the corresponding Btw constraint a | b | c . However, just by randomly swapping each left and rightchild for every internal node in the tree before we do the projection to the permutation, would satisfy1 / · (2 / (cid:15) ) K = (1 / (cid:15) ) K number of constraints. To see this, note that with probability a forbidden ac | b avoided by T will be mapped to the desired abc (and not acb ) or cba (and not cab ) ordering.Finally, we get val ( π ∗ ) ≥ val ( π ) > (1 / (cid:15) ) K , a contradiction that we were given a NO instance. Toconclude, -inapproximability follows from the gap of these two instances.For the Desired Triplets problem, the proof proceeds in a similar fashion. One main diﬀerence is that weprove hardness of under P (cid:54) = NP, without assuming Ugc . The reason is that we reduce from the non-Btw problem that is known to be approximation resistant, subject only to P (cid:54) = NP. Of course, one open questionis to close the gap between this factor and the current best approximation of . Theorem 9.

Let K be the total number of the triplet constraints in an instance of Desired Triplets Consis-tency. For any δ > , it is NP-hard to distinguish: YES : val ( T ∗ ) ≥ ( − δ ) K NO : val ( T ∗ ) ≤ ( + δ ) K Switching to quartet problems, our reductions are more challenging. The ﬁrst challenge is that constraintsare on 4 items so we need to resort to an ordering CSP of arity 4, that we term . Next,trees are unrooted and we want to generate an ordering on their leaves. To do this we ﬁrst root the tree atsome internal node and then follow a similar strategy for randomly reordering their children. For desiredquartets we show hardness of and for forbidden quartets a hardness of (see App. A for statements).Recall that the best approximations are and respectively, achieved by a random (unrooted) tree. Remark 10.

Note that our hardness results give optimal results when restricted to (rooted or unrooted)caterpillar trees, an important tree family, where each internal node has at least one leaf as a child.

We studied ranking, correlation clustering and hierarchical clustering under qualitative constraints and wepresented a simple algorithm based on

MaxCut that is able to overcome known hardness results under10 random model. We also provided the ﬁrst tight hardness of approximation for CSPs on trees sheddinglight to basic problems in computational biology and extending previous results by [GHM +

11] from orderingCSPs to trees.In fact, we believe that a nice open question is to prove that the two most important families of CSPson trees (triplets and quartets consistency) are approximation resistant. Here we showed this for the caseof forbidden triplets. More generally, we conjecture that all non-trivial CSPs on trees are approximationresistant, implying that the inapproximability results of [GHM +

11] can be extended from linear orderingsto trees.

Acknowledgments

The authors would like to thank Alessandro Epasto for interesting discussions during early stages of thiswork.

References [ABH15] Emmanuel Abbe, Afonso S Bandeira, and Georgina Hall. Exact recovery in the stochasticblock model.

IEEE Transactions on Information Theory , 62(1):471–487, 2015.[ACE +

20] Sara Ahmadian, Vaggos Chatziafratis, Alessandro Epasto, Euiwoong Lee, Mohammad Mah-dian, Konstantin Makarychev, and Grigory Yaroslavtsev. Bisect and conquer: Hierarchicalclustering via max-uncut bisection.

The 23rd International Conference on Artiﬁcial Intelli-gence and Statistics , 2020.[ACN08] Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information:ranking and clustering.

Journal of the ACM (JACM) , 55(5):1–27, 2008.[Ail10] Nir Ailon. Aggregation of partial rankings, p-ratings and top-m lists.

Algorithmica ,57(2):284–300, 2010.[AM09] Per Austrin and Elchanan Mossel. Approximation resistant predicates from pairwise inde-pendence.

Computational Complexity , 18(2):249–271, 2009.[AMW13] Per Austrin, Rajsekar Manokaran, and Cenny Wenner. On the NP-hardness of approxi-mating ordering constraint satisfaction problems. In

Approximation, Randomization, andCombinatorial Optimization. Algorithms and Techniques , pages 26–41. Springer, 2013.[ASSU81] Alfred V. Aho, Yehoshua Sagiv, Thomas G. Szymanski, and Jeﬀrey D. Ullman. Inferringa tree from lowest common ancestors with an application to the optimization of relationalexpressions.

SIAM Journal on Computing , 10(3):405–421, 1981.[BBC04] Nikhil Bansal, Avrim Blum, and Shuchi Chawla. Correlation clustering.

Machine Learning ,56(1-3):89–113, 2004.[BGJ10] Jaroslaw Byrka, Sylvain Guillemot, and Jesper Jansson. New results on optimizing rootedtriplets consistency.

Discrete Applied Mathematics , 158(11):1136–1147, 2010.[BJGJ01] Ziv Bar-Joseph, David K Giﬀord, and Tommi S Jaakkola. Fast optimal leaf ordering forhierarchical clustering.

Bioinformatics , 17(suppl 1):S22–S29, 2001.[BJVP16] Manuel Bodirsky, Peter Jonsson, and Trung Van Pham. The complexity of phylogeny con-straint satisfaction. In , 2016.[BM09] Mark Braverman and Elchanan Mossel. Sorting from noisy information. arXiv preprintarXiv:0910.1191 , 2009.[BM10] Manuel Bodirsky and Jens K Mueller. The complexity of rooted phylogeny problems. In

Proceedings of the 13th International Conference on Database Theory , pages 165–173, 2010.11Bry97] David Bryant. Building trees, hunting for trees, and comparing trees: theory and methodsin phylogenetic analysis.

PhD Thesis , 1997.[CAKMTM19] Vincent Cohen-Addad, Varun Kanade, Frederik Mallmann-Trenn, and Claire Mathieu. Hi-erarchical clustering: Objective functions and algorithms.

Journal of the ACM (JACM) ,66(4):1–42, 2019.[CC17] Moses Charikar and Vaggos Chatziafratis. Approximate hierarchical clustering via spars-est cut and spreading metrics. In

Proceedings of the Twenty-Eighth Annual ACM-SIAMSymposium on Discrete Algorithms , pages 841–854. SIAM, 2017.[CCN19] Moses Charikar, Vaggos Chatziafratis, and Rad Niazadeh. Hierarchical clustering better thanaverage-linkage. In

Proceedings of the Thirtieth Annual ACM-SIAM Symposium on DiscreteAlgorithms , pages 2291–2304. SIAM, 2019.[CCNY19] Moses Charikar, Vaggos Chatziafratis, Rad Niazadeh, and Grigory Yaroslavtsev. Hierarchicalclustering for euclidean data. In

The 22nd International Conference on Artiﬁcial Intelligenceand Statistics , pages 2721–2730, 2019.[CGM09] Moses Charikar, Venkatesan Guruswami, and Rajsekar Manokaran. Every permutation csp ofarity 3 is approximation resistant. In , pages 62–73. IEEE, 2009.[CGW05] Moses Charikar, Venkatesan Guruswami, and Anthony Wirth. Clustering with qualitativeinformation.

Journal of Computer and System Sciences , 71(3):360–383, 2005.[CNC18] Vaggos Chatziafratis, Rad Niazadeh, and Moses Charikar. Hierarchical clustering with struc-tural constraints. In

International Conference on Machine Learning , pages 774–783, 2018.[CS98] Benny Chor and Madhu Sudan. A geometric approach to betweenness.

SIAM Journal onDiscrete Mathematics , 11(4):511–523, 1998.[Das16] Sanjoy Dasgupta.

A Cost Function for Similarity-Based Hierarchical Clustering , page118–127. Association for Computing Machinery, New York, NY, USA, 2016.[DB07] Ian Davidson and Sugato Basu. A survey of clustering with instance level constraints.

ACMTransactions on Knowledge Discovery from data , 1(1-41):2–42, 2007.[EZK18] Ehsan Emamjomeh-Zadeh and David Kempe. Adaptive hierarchical clustering using ordinalqueries. In

Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on DiscreteAlgorithms , pages 415–429. SIAM, 2018.[Fel04] Joseph Felsenstein.

Inferring phylogenies , volume 2. Sinauer associates Sunderland, MA,2004.[FG95] Uriel Feige and Michel Goemans. Approximating the value of two power proof systems,with applications to max 2sat and max dicut. In

Proceedings Third Israel Symposium on theTheory of Computing and Systems , pages 182–189. IEEE, 1995.[FKM +

06] Ronald Fagin, Ravi Kumar, Mohammad Mahdian, D Sivakumar, and Erik Vee. Comparingpartial rankings.

SIAM Journal on Discrete Mathematics , 20(3):628–648, 2006.[FOPS17] Moein Falahatgar, Alon Orlitsky, Venkatadheeraj Pichapati, and Ananda Theertha Suresh.Maximum selection and ranking under noisy comparisons. In

International Conference onMachine Learning , pages 1088–1096. PMLR, 2017.[GHM +

11] Venkatesan Guruswami, Johan H˚astad, Rajsekar Manokaran, Prasad Raghavendra, andMoses Charikar. Beating the random ordering is hard: Every ordering csp is approxima-tion resistant.

SIAM Journal on Computing , 40(3):878–914, 2011.12GMR08] Venkatesan Guruswami, Rajsekar Manokaran, and Prasad Raghavendra. Beating the randomordering is hard: Inapproximability of maximum acyclic subgraph. In , pages 573–582. IEEE, 2008.[GPvL19] Debarghya Ghoshdastidar, Micha¨el Perrot, and Ulrike von Luxburg. Foundations ofcomparison-based hierarchical clustering. In

Advances in Neural Information ProcessingSystems , pages 7454–7464, 2019.[GW95] Michel X Goemans and David P Williamson. Improved approximation algorithms for max-imum cut and satisﬁability problems using semideﬁnite programming.

Journal of the ACM(JACM) , 42(6):1115–1145, 1995.[H˚as01] Johan H˚astad. Some optimal inapproximability results.

Journal of the ACM (JACM) ,48(4):798–859, 2001.[HHJS06] Ying-Jun He, Trinh ND Huynh, Jesper Jansson, and Wing-Kin Sung. Inferring phylogeneticrelationships avoiding forbidden rooted triplets.

Journal of Bioinformatics and Computa-tional Biology , 4(01):59–74, 2006.[JKL01] Tao Jiang, Paul Kearney, and Ming Li. A polynomial time approximation scheme for inferringevolutionary trees from quartet topologies and its application.

SIAM Journal on Computing ,30(6):1942–1961, 2001.[Joa02] Thorsten Joachims. Optimizing search engines using clickthrough data. In

Proceedings of theEighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ,KDD ’02, page 133–142, New York, NY, USA, 2002. Association for Computing Machinery.[Kar72] Richard M Karp. Reducibility among combinatorial problems. In

Complexity of computercomputations , pages 85–103. Springer, 1972.[Kho02] Subhash Khot. On the power of unique 2-prover 1-round games. In

Proceedings of thethiry-fourth annual ACM symposium on Theory of computing , pages 767–775. ACM, 2002.[KMS07] Claire Kenyon-Mathieu and Warren Schudy. How to rank with few errors. In

Proceedings ofthe thirty-ninth annual ACM symposium on Theory of computing , pages 95–103, 2007.[KvL17] Matth¨aus Kleindessner and Ulrike von Luxburg. Kernel functions based on triplet compar-isons. In

Advances in Neural Information Processing Systems , pages 6807–6817, 2017.[LRU14] Jure Leskovec, Anand Rajaraman, and Jeﬀrey David Ullman.

Mining of massive datasets .Cambridge university press, 2014.[LTA +

16] Vince Lyzinski, Minh Tang, Avanti Athreya, Youngser Park, and Carey E Priebe. Commu-nity detection and classiﬁcation in hierarchical stochastic blockmodels.

IEEE Transactionson Network Science and Engineering , 4(1):13–26, 2016.[Mak12] Yury Makarychev. Simple linear time approximation algorithm for betweenness.

Operationsresearch letters , 40(6):450–452, 2012.[MNS12] Elchanan Mossel, Joe Neeman, and Allan Sly. Stochastic block models and reconstruction. arXiv preprint arXiv:1202.1499 , 2012.[MW17] Benjamin Moseley and Joshua Wang. Approximation bounds for hierarchical clustering:Average linkage, bisecting k-means, and local search. In

Advances in Neural InformationProcessing Systems , pages 3094–3103, 2017.[NOS12] Sahand Negahban, Sewoong Oh, and Devavrat Shah. Iterative ranking from pair-wise com-parisons. In

Advances in neural information processing systems , pages 2474–2482, 2012.[Opa79] Jaroslav Opatrny. Total ordering problem.

SIAM Journal on Computing , 8(1):111–114, 1979.13SBGW16] Nihar Shah, Sivaraman Balakrishnan, Aditya Guntuboyina, and Martin Wainwright.Stochastically transitive models for pairwise comparisons: Statistical and computationalissues. In

International Conference on Machine Learning , pages 11–20, 2016.[SBV +

15] Erwan Scornet, G´erard Biau, Jean-Philippe Vert, et al. Consistency of random forests.

TheAnnals of Statistics , 43(4):1716–1741, 2015.[SKK00] Michael Steinbach, George Karypis, and Vipin Kumar. A comparison of document clusteringtechniques. In

KDD workshop on text mining , volume 400, pages 525–526. Boston, 2000.[SKSL97] Donna Slonim, Leonid Kruglyak, Lincoln Stein, and Eric Lander. Building human genomemaps with radiation hybrids.

Journal of Computational Biology , 4(4):487–504, 1997.[SPT +

01] Therese Sørlie, Charles M Perou, Robert Tibshirani, Turid Aas, Stephanie Geisler, HildeJohnsen, Trevor Hastie, Michael B Eisen, Matt Van De Rijn, Stefanie S Jeﬀrey, et al. Geneexpression patterns of breast carcinomas distinguish tumor subclasses with clinical implica-tions.

Proceedings of the National Academy of Sciences , 98(19):10869–10874, 2001.[SR06] Sagi Snir and Satish Rao. Using max cut to enhance rooted trees consistency.

IEEE/ACMtransactions on computational biology and bioinformatics , 3(4):323–333, 2006.[SR08] Sagi Snir and Satish Rao. Quartets maxcut: a divide and conquer quartets algorithm.

IEEE/ACM Transactions on Computational Biology and Bioinformatics , 7(4):704–718, 2008.[SR12] Sagi Snir and Satish Rao. Quartet maxcut: a fast algorithm for amalgamating quartet trees.

Molecular phylogenetics and evolution , 62(1):1–8, 2012.[Ste92] Michael Steel. The complexity of reconstructing trees from qualitative characters and sub-trees.

Journal of classiﬁcation , 9(1):91–116, 1992.[SW17] Nihar B Shah and Martin J Wainwright. Simple, robust and optimal ranking from pairwisecomparisons.

The Journal of Machine Learning Research , 18(1):7246–7283, 2017.[Swa04] Chaitanya Swamy. Correlation clustering: maximizing agreements via semideﬁnite program-ming. In

Proceedings of the ﬁfteenth annual ACM-SIAM symposium on Discrete algorithms ,pages 526–527. Society for Industrial and Applied Mathematics, 2004.[SY12] Sagi Snir and Raphael Yuster. Reconstructing approximate phylogenetic trees from quartetsamples.

SIAM Journal on Computing , 41(6):1466–1480, 2012.[Thu59] L. L. Thurstone.

The Measurement of Values . The University of Chicago Press, 1959.[TLM10] Michele Tumminello, Fabrizio Lillo, and Rosario N Mantegna. Correlation, hierarchies, andnetworks in ﬁnancial markets.

Journal of economic behavior & organization , 75(1):40–58,2010.[Vau17] Jennifer Wortman Vaughan. Making better use of the crowd: How crowdsourcing can advancemachine learning research.

The Journal of Machine Learning Research , 18(1):7026–7071,2017.[VD16] Sharad Vikram and Sanjoy Dasgupta. Interactive bayesian hierarchical clustering. In

Inter-national Conference on Machine Learning , pages 2081–2090, 2016.[VH16] Ramya Korlakai Vinayak and Babak Hassibi. Crowdsourced clustering: Querying edges vstriangles. In

Advances in Neural Information Processing Systems , pages 1316–1324, 2016.[WC00] Kiri Wagstaﬀ and Claire Cardie. Clustering with instance-level constraints.

AAAI/IAAI ,1097:577–584, 2000.[WCRS01] Kiri Wagstaﬀ, Claire Cardie, Seth Rogers, and Stefan Schr¨odl. Constrained k-means clus-tering with background knowledge. In

ICML , volume 1, pages 577–584, 2001.14

Omitted Proofs - Improved Approximations via MaxCut

In this ﬁrst section of the Appendix, we present the omitted details for our positive results. Speciﬁcally,we show how to overcome impossibility results (see also Appendix B) by going beyond the hardness ofapproximation thresholds ρ for each of the problems considered in the paper. As noted, to escape the worst-case analysis, we will assume the input is given as a set of m noisy constraints generated according to ourstochastic model and the goal is to obtain a solution with strictly more than ρm satisﬁed constraints.Recall that in Table 1, only for the results on Correlation Clustering and on Triplets Consistency markedwith an asterisk (*), we required a mild balancedness assumption. The assumption here on the balancednessof the ground truth partition or ground truth hierarchical clustering is used in our reduction, and speciﬁcallywhen analyzing our MaxCut approach. It is needed in order to ensure that based on our stochastic model,our

MaxCut approach can ﬁnd a large cut in the constructed graph which later translates into a largeportion of satisﬁed constraints.

Assumption 1.

For a tree with n leaves, a split ( L, R ) at an internal node is called balanced if | L | = cn, | R | = (1 − c ) n with ≤ c ≤ . We assume that in the optimum tree there exists one split that isbalanced. Similarly, for a clustering on n nodes, if there exists a partition of the clusters into two sides ( L, R ) such that | L | = cn, | R | = (1 − c ) n with ≤ c ≤ , we say the clustering is balanced. This is a reasonable assumption since hierarchical clusterings tend to be balanced and indeed recursivebalanced cuts tend to recover good hierarchies [CC17]. In essence, we exclude caterpillar trees or moregenerally highly skewed trees that are generated by always removing tiny pieces out of a giant compo-nent. Moreover, such assumptions are common in generative graph models for clustering, e.g., the Stochas-tic Block Model [MNS12, ABH15] and for hierarchical clustering, e.g., the Hierarchical Stochastic BlockModel [LTA +

16, CAKMTM19, GPvL19], where we expect to see at least two large communities emerge.For example, recent generative models like the Hierarchical Stochastic Block Model in [GPvL19] satisfy thebalancedness assumption with c = . A.1 Quartets Consistency from Noisy constraints

Let Q F , Q D be the set of forbidden and desired quartet constraints with sizes |Q F | = m , |Q D | = m respectively. The total number of generated constraints according to our stochastic model is denoted by m = m + m . Out of those constraints, let ε , ε denote the fraction of the erroneous forbidden anderroneous desired quartet constraints respectively. Our main theorem here is: Theorem 11.

Given m = m + m constraints as above on n items, our algorithm MaxCut satisﬁesat least (0 . − . ε ) m + (0 . − . ε ) m on average, where ε , ε are as above. If moreover m , m ≥ Ω(log n ) , the result holds w.h.p. For example, if the constraints are not erroneous (i.e., ε = ε = 0), we satisfy 42.5% of the desiredquartets, while avoiding 67.2% of the forbidden quartets, improving upon prior best approximations.In order to prove Theorem 11, we will require several intermediate lemmas and constructions.Recall that forbidden quartets should be avoided, whereas desired quartets should be satisﬁed by the treeour algorithm ﬁnds. We use the following notation: Let Q A ( ALG ) denote the number of quartets q ∈ Q F avoided, Q F ( ALG ) the number of quartets q ∈ Q F not avoided (of course, Q F = m = Q A ( ALG ) + Q F ( ALG ))and Q D ( ALG ) the number of quartets q ∈ Q D satisﬁed by the output phylogenetic tree. For the case of noerrors ε = 0 , ε = 0, the best approximation under worst-case analysis is: Q D ( ALG ) − Q F ( ALG ) ≥ ( |Q D | − |Q F | ) ⇐⇒Q D ( ALG ) + Q A ( ALG ) ≥ m + m In fact, the guarantees hold separately Q D ( ALG ) ≥ |Q D | and Q A ( ALG ) ≥ |Q F | and are achieved eitherby a simple greedy algorithm or by a random tree [HHJS06]. Our goal is to ﬁnd a tree beating the aboveguarantees, i.e., satisfying strictly more than fraction of desired quartets and strictly more than fractionof forbidden quartets. Our approach is based on extending a previous analysis from [SR06] that only handledthe case with Q F = ∅ . 15he end result of our algorithm ALG , which is based on

MaxCut , is a tree with the following guarantees: Q D ( ALG ) + Q A ( ALG ) ≥≥ (0 . − . ε ) m + (0 . − . ε ) m (4)We start by instantiating our general algorithmic template in Algorithm 1 to the case of the QuartetsConsistency problem, and we describe the necessary changes for the appropriate graph construction below: Graph Construction from constraints:

The goal here is to construct a graph encoding the qualitativeinformation from the generated quartets so that a

MaxCut subroutine can yield a reasonable ﬁrst split ofthe output phylogenetic tree. Quartets q ∈ Q F needs to be handled diﬀerently from quartets q ∈ Q D . Foreach forbidden q = { ab | cd } ∈ Q F we add the following six + or − weighted edges:+2 for pairs ( a, b ) , ( c, d ) and − a, c ) , ( a, d ) , ( b, c ) , ( b, d )and for a q = { ab | cd } ∈ Q D we add the following + or − edges: − a, b ) , ( c, d ) and+1 for pairs ( a, c ) , ( a, d ) , ( b, c ) , ( b, d )Let G be the undirected weighed multigraph constructed from the constraints as above and let ( S, ¯ S ) denoteany graph cut into two parts. We say that a quartet q = { ab | cd } ∈ Q F ∪ Q D is unaﬀected by the cut ( S, ¯ S if all four labels a, b, c, d end up in one of the two parts. For quartets whose endpoints are separated by thecut, we distinguish 3 cases: if one of the labels goes to one of the two parts while the remaining 3 labels goto the other part, we say that q is postponed . If precisely a, b are contained in some part, while the otherpart contains precisely c, d , we say q is obeyed . In any other case, q is disobeyed (e.g., a, c ∈ S and b, d ∈ ¯ S or the symmetric split a, d ∈ S and b, c ∈ ¯ S ). The perhaps more natural terms satisf ied and violated werenot used as we deal both with desired and forbidden quartets and would be misleading when accounting forthe maximization objective: Lemma 12.

The weight of any cut ( S, ¯ S ) can be computed based on the status of the quartets as: w ( S, ¯ S ) = 2 m Q F d ( S, ¯ S ) − m Q F o ( S, ¯ S )++4 m Q D o ( S, ¯ S ) − m Q D d ( S, ¯ S ) (5) where m Q F d , m Q D d is the number of disobeyed quartets by the cut that belong to Q F , Q D respectively andsimilarly m Q F o , m Q D o is the number of obeyed quartets from Q F , Q D respectively.Proof. Note that by our choice for the edge weights, if q = { ab | cd } is postponed or unaﬀected by the cut( S, ¯ S ), its contribution to w ( S, ¯ S ) is 0 regardless of q ∈ Q F or q ∈ Q D . Now, if a forbidden q ∈ Q F is obeyed,that counts as a mistake and it decreases the weight of the cut by -4, whereas if it is disobeyed, that countsas a correct choice and it increases the weight of the cut by +2. Accordingly we compute the contributionfor the desired quartets q ∈ Q D as +4 if obeyed and -2 if disobeyed. Summing over all constraints gives usthe lemma.The ﬁnal step is to compute the overall quartets our algorithm had success on, relative to the samplesizes m , m : Lemma 13. If ( S, ¯ S ) is the ﬁrst split of ALG , the total number of quartets decomposed correctly is:

ALG = Q A ( ALG ) + Q D ( ALG ) ≥ m + m + w ( S, ¯ S )16 roof. Let m Q F p , m Q F u denote the number of postponed or unaﬀected by the cut forbidden quartets, and m Q D p , m Q D u denote the number of postponed or unaﬀected by the cut desired quartets. Our algorithmﬁrst uses an approximation to MaxCut and then proceeds greedily (or randomly) to achieve the baselineguarantees by building a tree on S and on ¯ S : ALG ≥ m Q F d ( S, ¯ S ) + ( m Q F u ( S, ¯ S ) + m Q F p ( S, ¯ S ))++ m Q D o ( S, ¯ S ) + ( m Q D u ( S, ¯ S ) + m Q D p ( S, ¯ S ))For notation purposes, from now on we drop the parentheses ( S, ¯ S ) from the terms since we always refer tothe ( S, ¯ S ) cut. Observe that m = m Q F d + m Q F o + m Q F u + m Q F p and similarly m = m Q D d + m Q D o + m Q D u + m Q D p .By substituting the terms for unaﬀected and postponed quartets we get: ALG ≥ m Q F d + ( m − m Q F d − m Q F o ) + m Q D o + ( m − m Q D d − m Q D o )= m + m Q F d − m Q F o + m + m Q D o − m Q D d = m + m + (2 m Q F d − m Q F o + 4 m Q D o − m Q D d )From equation (5), the last term is equal to the weight of the ( S, ¯ S ) cut and this ﬁnishes the proof.Now we need to show that there is a good cut with high weight in the graph. Recall that the graphhas positive and negative edges. For such graphs, the guarantee of the rounding algorithm of [GW95] is asfollows: Fact 2.

For graphs with both positive and negative weights, one can eﬃciently ﬁnd a cut ( S, ¯ S ) with weight: w ( S, ¯ S ) ≥ . w ( S ∗ , ¯ S ∗ ) − . W − where ( S ∗ , ¯ S ∗ ) is the optimum solution for MaxCut and W − is the absolute sum of all negative edge weights. The cut ( S, ¯ S ) is produced in the same manner as in the standard Goemans-Williamson algorithm viarandom hyperplane rounding on their semideﬁnite relaxation for MaxCut . We will use this fact to provethe following:

Lemma 14.

The weight of the top split relative to the sizes of the quartet constraints is: w ( S, ¯ S ) ≥ (0 . − . ε ) m + (0 . − . ε ) m Proof.

Observe that in the constructed graph, the total negative weight is W − = 4 m + 4 m as each quartetadds a total negative weight of -4. In order to use Fact 2, we require a lower bound on the optimum value w ( S ∗ , ¯ S ∗ ).Notice that for any phylogenetic tree, since all internal vertices have three neighbors each (a trivalenttree), we can always ﬁnd an edge that induces a balanced cut. For n leaves, a cut ( L, R ) is called balancedif | L | = cn, | R | = (1 − c ) n with ≤ c ≤ . From our uniform generating model, recall that the number ofquartet constraints the cut ( L, R ) succeeds at is: E ( m Q F d ) = 6 c (1 − c ) (1 − ε ) m E ( m Q D o ) = 6 c (1 − c ) (1 − ε ) m and the number of constraints the cut ( L, R ) fails at, due to the erroneous constraints is: E ( m Q F o ) = 6 c (1 − c ) ε m , E ( m Q D d ) = 6 c (1 − c ) ε m The quantity c (1 − c ) with ≤ c ≤ attains a minimum value of when c = ; hence, from Lemma 12,the weight of the cut ( L, R ) on the constructed graph is: w ( L, R ) ≥ m Q F d ( L, R ) − m Q F o ( L, R )+174 m Q D o ( L, R ) − m Q D d ( L, R ) ≥ (1 − ε ) m − ε m + (1 − ε ) m − ε m (6)Of course the optimum cut has even larger weight than the speciﬁc balanced ( L, R ) cut so: w ( S ∗ , ¯ S ∗ ) ≥ w ( L, R ). Substituting equation (6) in Fact 2 yields the lemma.

Proof of Theorem 11.

From Lemma 13, we have a lower bound on our algorithm’s performance via theapproximate max cut. Substituting the quantity w ( S, ¯ S ) based on Lemma 14, yields the theorem.From the above, notice that we can still beat the prior best baselines as long as the error rates are nottoo big ( ε ≤ .

4% and ε ≤ . A.2 Triplets Consistency from Noisy constraints

Here we show a similar approximation result but for Triplets. Let T F , T D be the set of forbidden and desiredtriplet constraints with sizes |T F | = m , |T D | = m respectively. The total number of generated constraints isdenoted by m = m + m . Out of those constraints, let ε , ε denote the fraction of the erroneous forbiddenand erroneous desired triplet constraints respectively. Theorem 15.

Given m = m + m constraints as above on n items, our algorithm MaxCut satisﬁes atleast ( + 0 . − . ε ) m + ( + 0 . − . ε ) m on average, where ε , ε are as above. Ifmoreover m , m ≥ Ω(log n ) , the result holds w.h.p. For example, if the constraints are not erroneous (i.e., ε = ε = 0), we satisfy 64% of the desired triplets,while avoiding 78% of the forbidden triplets. This latter ratio beats our worst-case inapproximability resultsfor triplets (see also Appendix B).The reason we stated the numerical values in this form is that the trivial baselines achieve ratios of and for m and m respectively.Recall that forbidden triplets should be avoided, whereas desired triplets should be satisﬁed by the treeour algorithm ﬁnds. We use the following notation: Let T A ( ALG ) denote the number of triplets t ∈ T F avoided, T F ( ALG ) the number of triplets t ∈ Q F not avoided (of course, T F = m = T A ( ALG ) + T F ( ALG )) and T D ( ALG ) the number of triplets t ∈ T D satisﬁed by the output rooted binary hierarchical tree. For the caseof no errors ε = 0 , ε = 0, the best approximation under worst-case analysis is: T D ( ALG ) − T F ( ALG ) ≥ ( |T D | − |T F | ) ⇐⇒T D ( ALG ) + T A ( ALG ) ≥ m + m In fact, the guarantees hold separately T D ( ALG ) ≥ |T D | and T A ( ALG ) ≥ |T F | and are achieved either bya simple greedy algorithm or by a random tree [HHJS06]. Our goal is to ﬁnd a tree beating the aboveguarantees, i.e., satisfying strictly more than fraction of desired triplets and strictly more than fractionof forbidden triplets.The end result of our algorithm ALG , which is based on

MaxCut , is a tree with the following guarantees: T D ( ALG ) + T A ( ALG ) ≥ ( + 0 . − . ε ) m + ( + 0 . − . ε ) m (7)We proceed by describing the necessary changes to be made in our algorithmic template in Algorithm 1,in order to handle the triplet constraints. 18 raph Construction from constraints: The goal here is to construct a graph encoding the qualitativeinformation from the generated triplets so that a

MaxCut subroutine can yield a reasonable ﬁrst split ofthe output binary hierarchical tree. Triplets t ∈ T F need to be handled diﬀerently from triplets t ∈ T D . Foreach forbidden t = { ab | c } ∈ T F we add the following 3 + or − undirected weighted edges:+2 for the pair ( a, b ) and − c, a ) , ( c, b )and for a t = { ab | c } ∈ T D we add the following + or − edges: − a, b ) and + 1 for pairs ( c, a ) , ( c, b )Let G be the undirected weighed multigraph constructed from the constraints as above and let ( S, ¯ S ) denoteany graph cut into two parts. We say that a triplet t = { ab | c } ∈ T F ∪ T D is unaﬀected by the cut ( S, ¯ S if allthree labels a, b, c end up in one of the two parts. For triplets whose endpoints are separated by the cut, wedistinguish 2 cases: if precisely a, b are contained in some part, while the other part contains precisely c , wesay t is obeyed . In any other case, t is disobeyed (e.g., a, c ∈ S and b ∈ ¯ S or the symmetric split b, c ∈ S and a ∈ ¯ S ). The perhaps more natural terms satisf ied and violated were not used as we deal both with desiredand forbidden quartets and would be misleading when accounting for the maximization objective: Lemma 16.

The weight of any cut ( S, ¯ S ) can be computed based on the status of the triplets as: w ( S, ¯ S ) = m T F d ( S, ¯ S ) − m T F o ( S, ¯ S ) + 2 m T D o ( S, ¯ S ) − m T D d ( S, ¯ S ) (8) where m T F d , m T D d is the number of disobeyed triplets by the cut that belong to T F , T D respectively andsimilarly m T F o , m T D o is the number of obeyed triplets from T F , T D respectively.Proof. Note that by our choice for the edge weights, if t = { ab | c } is unaﬀected by the cut ( S, ¯ S ]), itscontribution to w ( S, ¯ S ) is 0 regardless of t ∈ T F or t ∈ T D . Now, if a forbidden t ∈ T F is obeyed, that countsas a mistake and it decreases the weight of the cut by −

2, whereas if it is disobeyed, that counts as a correctchoice and it increases the weight of the cut by +1. Accordingly we compute the contribution for the desiredtriplets t ∈ T D as +2 if obeyed and − m , m : Lemma 17. If ( S, ¯ S ) is the ﬁrst split of ALG , the total number of triplets decomposed correctly is:

ALG = T A ( ALG ) + T D ( ALG ) ≥ m + m + w ( S, ¯ S ) Proof.

Let m T F u denote the number of unaﬀected by the cut forbidden triplets, and m T D u denote the numberof unaﬀected by the cut desired triplets. Our algorithm ﬁrst uses an approximation to MaxCut and thenproceeds greedily (or randomly) to achieve the baseline guarantees by building a tree on S and on ¯ S : ALG ≥ m T F d ( S, ¯ S ) + m T F u ( S, ¯ S ) + m T D o ( S, ¯ S ) + m T D u ( S, ¯ S )For notation purposes, from now on we drop the parentheses ( S, ¯ S ) from the terms since we always refer tothe ( S, ¯ S ) cut. Observe that m = m T F d + m T F o + m T F u and similarly m = m T D d + m T D o + m T D u . By substitutingthe terms for the unaﬀected triplets we get: ALG ≥ m T F d + ( m − m T F d − m T F o ) + m T D o + ( m − m T D d − m T D o )= m + m T F d − m T F o + m + m T D o − m T D d = m + m + ( m T F d − m T F o + 2 m T D o − m T D d )From equation (8), the last term is equal to the weight of the ( S, ¯ S ) cut and this ﬁnishes the proof.19ow we can use again Fact 2 to give a lower bound on the optimal cut. The cut ( S, ¯ S ) is produced in thesame manner as in the standard Goemans-Williamson algorithm via random hyperplane rounding on theirsemideﬁnite relaxation for MaxCut . We will use the fact to prove the following:

Lemma 18.

The weight of the top split relative to the sizes of the triplet constraints is: w ( S, ¯ S ) ≥ (0 . − . ε ) m + (0 . − . ε ) m Proof.

Observe that in the constructed graph, the total negative weight is W − = 2 m + 2 m as each tripletadds a total negative weight of −

2. In order to use Fact 2, we require a lower bound on the optimum value w ( S ∗ , ¯ S ∗ ).Here is the ﬁrst time where we require Assumption 1 about the balancedness of the ground truth tree.From our stochastic model, recall that the number of triplet constraints the cut ( L, R ) succeeds at is: E ( m T F d ) = (3 c (1 − c ) + 3 c (1 − c ) )(1 − ε ) m E ( m T D o ) = (3 c (1 − c ) + 3 c (1 − c ) )(1 − ε ) m and the number of constraints the cut ( L, R ) fails at, due to the erroneous constraints is: E ( m T F o ) = (3 c (1 − c ) + 3 c (1 − c ) ) ε m E ( m T D d ) = (3 c (1 − c ) + 3 c (1 − c ) ) ε m The quantity c (1 − c ) + c (1 − c ) with ≤ c ≤ attains a minimum value of when c = ; hence, fromLemma 16, the expected weight of the cut ( L, R ) on the constructed graph is: w ( L, R ) ≥ m T F d ( L, R ) − m T F o ( L, R ) + 2 m T D o ( L, R ) − m T D d ( L, R ) ≥ (1 − ε ) m − ε m + (1 − ε ) m − ε m (9)Of course the optimum cut has even larger weight than the speciﬁc balanced ( L, R ) cut so: w ( S ∗ , ¯ S ∗ ) ≥ w ( L, R ). Substituting equation (9) in Fact 2 yields the lemma.

Proof of Theorem 15.

From Lemma 17, we have a lower bound on our algorithm’s performance via theapproximate max cut. Substituting the quantity w ( S, ¯ S ) based on Lemma 18, yields the theorem.From the above, notice that we beat the trivial baselines as we avoid ≈ > of the forbidden tripletsand we satisfy ≈ > of the desired triplets. A.3 Rankings from Noisy constraints

Here we will show how to beat the approximability thresholds for 3 problems:

Mas , Btw and non-Btw ,even though our techniques can be extended to handle many other ordering problems and combinations ofdesired or forbidden ordering constraints.

Non-BTW:

The goal here is to beat the threshold of -approximation and as we will see a 0 . { ab | c } indicating that c should not be between a, b in the ﬁnal ordering we add thefollowing 3 undirected edges:+1 for pairs ( c, a ) , ( c, b ) and − a, b )The graph is as always constructed by inserting all these edges for each of the triplet constraints. We describebelow the necessary changes for each of the steps of the template.20 Contrary to previous ordering problems, here a cut into two pieces can either satisfy, postpone or leaveunaﬀected the status of a triplet { ab | c } . The weight of the cut is: w ( S, ¯ S ) = 2 m s ( S, ¯ S ) − m p ( S, ¯ S )as a satisﬁed triplet contributes +2 in the objective (( c, a ) , ( c, b ) are cut) while a postponed tripletcontributes a total of − a and b are separated). • Our algorithm

ALG , starting with the ( S, ¯ S ) cut and continuing randomly after that, scores a totalobjective (we drop the ( S, ¯ S ) notation): ALG = m s + m u + m p since even for postponed constraints there is still a probability of correctly placing c either ﬁrst orlast among the three labels. Substituting m = m s + m p + m u which is true for any cut: ALG == m s + ( m − m s − m p ) + m p == m + m s − m p == m + (2 m s − m d ) = m + w ( S, ¯ S ) (10) • The graph’s total negative weight is W − = 2 m so the Goemans-Williamson guarantee is: E ( w ( S, ¯ S )) = 0 . w ( OPT ) − . · m (11)We lower bound the weight w ( OPT ) by the weight of the median cut: consider the median element q in the unknown optimum permutation and then let one part of the split be the elements that precede q . Generally, in permutation problems, ensuring that a balanced cut with large cut value exists,is easier than problems on trees, as the median cut guarantees a 50-50 split. Since the labels forthe constraints were chosen at random, a simple counting argument implies that in expectation m (i.e., 3 c (1 − c ) m + 3(1 − c ) cm with c = ) constraints are satisﬁed by the OPT cut, so w ( OPT ) ≥ · (1 − ε ) m − εm and we get (0 . − . ε )-approximation by substituting in equation (11) andthen to (10). For example, even when ≈

10% are erroneous, we still get a 0 . BTW:

The goal here is to beat the -approximation which is the current best for inconsistent instances of Btw . We will get a 0 . -approximation. It is a divide and conquer algorithm that is simple and runsin linear time. A signiﬁcantly slower algorithm based on semideﬁnite program with the same approximationguarantee was previously proposed by Chor and Sudan [CS98].For a triplet { a | b | c } indicating that b should be between a and c in the ordering we construct a graphwith undirected edges: +2 for the pair ( a, c ) and − b, a ) , ( b, c )The edges try to capture that a cut violates the constraint if it separates b from a, c . We give our main steps: • Contrary to non-Btw , a cut into two pieces here can either violate, postpone or leave unaﬀected thestatus of the triplet { a | b | c } . The weight of a cut is: w ( S, ¯ S ) = m p ( S, ¯ S ) − m v ( S, ¯ S )as violated triplets contribute − Crucially, a postponed by the cut triplet, can still be satisﬁed with probability and this gives us theadvantage: ALG = m u + m p == ( m − m p − m v ) + m p == m + ( m p − m v ) = m + w ( S, ¯ S ) (12) • Again the graph’s total negative weight is W − = 2 m so the Goemans-Williamson guarantee is: E ( w ( S, ¯ S )) = 0 . w ( OPT ) − . · m (13)As before, we lower bound the weight w ( OPT ) by the weight of the median cut. Since the labels forthe constraints were chosen at random, a simple counting argument implies that in expectation m (i.e., 3 c (1 − c ) m + 3(1 − c ) cm with c = ) constraints are postponed by the OPT cut, so w ( OPT ) ≥ (1 − ε ) m − · εm and we get a (0 . − . ε )-approximation by substituting in equation (13) andthen to (12). For an error rate of ≈

10% we still get ≥ . . MAS:

The goal here is to beat the trivial -approximation achieved by an arbitrary or its reversed (or arandom) ordering. We will indeed be able to achieve a 0.642-approximation: Theorem 19.

Given m constraints generated according to our stochastic model on n items, MaxCut satisﬁes at least (0 . − . ε ) m on average, where ε is the fraction of erroneous comparisons. If moreover m ≥ Ω(log n ) , the result holds w.h.p. The constraints here are on pairs of labels, e.g., a < b . Contrary to

Btw and non-Btw where theconstructed graph and cuts were undirected,

Mas is orientated in the sense that it matters which side of thecut the labels end up at. This introduces the ﬁrst challenge since we have to solve approximate MaxCut indirected graphs with negative weights. For a query a < b indicating that a should precede b in the ranking,we add two directed edges: +1 directed from a → b and another arc withnegative weight − b → a Here the weight of a directed cut ( S, ¯ S ) is the sum of all (positively or negatively) weighted arcs goingfrom S to ¯ S (and we ignore the arcs going from ¯ S to S ). Here a cut can either satisfy, violate or leaveunaﬀected the status of a query and there are no postponed constraints as they only involve two labels. Wedescribe our steps: • It is easy to see that the weight of any directed ( S, ¯ S ) cut is: w ( S, ¯ S ) = m s ( S, ¯ S ) − m v ( S, ¯ S )as satisﬁed pairs contribute +1 and violated pairs contribute − • Again we can compute the value of

ALG (dropping the notation with ( S, ¯ S )): ALG = m s + m u == m s + ( m − m s − m v ) = m + ( w ( S, ¯ S )) (14) • Again the graph’s total negative weight is W − = m . However now that the graph is directed and withnegative weights, we cannot use the Goemans-Williamson guarantee. A new ingredient in our proof isan SDP relaxation and rounding scheme that achieves: E ( w ( S, ¯ S )) = 0 . w ( OPT ) − . · W − (15)22 Continuing as before, we will lower bound the weight w ( OPT ) by the weight of the median directed cut(as noted in the main body, this cut simply separates the ﬁrst half of the items in the optimal orderingfrom the last half). Since the labels for the constraints were chosen at random, a simple countingargument implies that in expectation m (i.e., 2 c (1 − c ) m with c = ) constraints are satisﬁed by the OPT cut, so w ( OPT ) ≥ m (1 − ε ) − εm due to errors in ε fraction of the constraints. Proof of Theorem 19.

Given the above observations, in order to get a (0 . − . ε )-approximation, weﬁrst substitute in equation (15) the lower bound we got for w( OPT ), and then we substitute w ( S, ¯ S ) to(14).For example, if 10% of the constraints are erroneous we still satisfy ≈

60% of all constraints, beating theworst-case inapproximability results of [GHM + A.3.1 Directed MaxCut with negative weights

Here we proceed by proving an important ingredient in our proof relating to ﬁnding directed cuts in graphswith negative weights.In the seminal paper by [GW95], they show how directed

MaxCut can be solved approximately ondirected graphs with non-negative weights. They used the following semideﬁnite programming relaxationwhere A denotes the arcs of the graph and V the vertices ( | V | = n ):maximize (cid:88) ( i,j ) ∈ A w ij (1 + v v i − v v j − v i v j )subject to: || v i || = 1 , v i ∈ R n +1 , ∀ i ∈ V ∪ v , which is used to break the symmetry indicating that we want tomaximize edges going from left to right where left is the side in which v belongs to. Observe that in anintegral {± } solution if vertex i is on the same side with v and j is on the other side then (1 + v v i − v v j − v i v j ) = 4 that’s why we chose the coeﬃcient in front of the summation. Also note that due to thesymmetry if instead of v j we set − v j the relaxation won’t change so we can instead think of:maximize (cid:88) ( i,j ) ∈ A w ij (1 + v v i + v v j + v i v j )This will just simplify some trigonometric expressions later.In this subsection we will prove a bound on the weight of the cut for directed graphs with positive andnegative edge weights. The bound we will be able to show is: E ( w ( S, ¯ S )) = 0 . w ( OPT ) − . · W − (16)where W − denotes the total weight in absolute value of all negative edges. Notice that if no negative weightsare present ( W − = 0 then we almost recover the Goemans-Williamson 0 .

878 coeﬃcient. The above boundfollows from the following theorem by rearranging terms:

Theorem 20.

Let W − = (cid:80) ( i,j ) ∈ A | w − ij | where x − = min(0 , x ) . Then we can eﬃciently ﬁnd a cut ( L, R ) such that: E ( w ( L, R )) + W − ≥ . (cid:0) w ( OPT ) + W − (cid:1) where OPT denotes the optimum directed cut in the graph.Proof.

Let

SDP denote the optimal

SDP value which is larger than w ( OPT ) since we relaxed the problem.We will show the above bound where w ( OPT ) is replaced by

SDP . We need to rewrite the

SDP relaxation toincorporate the W − term and then we need to compute the probabilities an edge ( i, j ) ∈ A participatesor does not participate in the cut and how it compares to the contribution in the SDP relaxation. Theprobability an edge does not participate in the cut is needed here because negatively weighted edges exist,23hich could potentially decrease the value of the cut. Separating the positive and negative weights ( A = A + ∪ A − ) and rewriting the SDP ( θ ij denotes the angle between v i , v j ): (cid:88) ( i,j ) ∈ A w ij (1 + v v i + v v j + v i v j ) + W − == (cid:88) ( i,j ) ∈ A + w ij (1 + v v i + v v j + v i v j )++ (cid:88) ( i,j ) ∈ A − | w ij | (4 − (1 + v v i + v v j + v i v j )) == (cid:88) ( i,j ) ∈ A + w ij (1 + cos θ i + cos θ j + cos θ ij )++ (cid:88) ( i,j ) ∈ A − | w ij | (3 − cos θ i − cos θ j − cos θ ij )For the rounding algorithm we can use the standard Goemans Williamson rounding although this willonly guarantee a sub-optimal coeﬃcient of 0 .

796 instead of 0 .

857 in Equation (15). We will show later howa non-standard but better rounding scheme by [FG95] gives us the desired 0 .

857 factor.Let r be a vector drawn uniformly from the unit sphere. Let’s evaluate the contribution of a positive arc( i, j ) ∈ A + to the quantity E ( w ( L, R )) + W − : w ij (4 · Pr [ sgn ( v i r ) = sgn ( v j r ) = sgn ( v r )]) == w ij Pr [ sgn ( v i r ) = sgn ( v j r ) = sgn ( v r )]For a negative arc ( i, j ) ∈ A − , the contribution to the quantity E ( w ( L, R )) + W − is: − | w ij | (4 Pr [ sgn ( v i r ) = sgn ( v j r ) = sgn ( v r )]) + | w ij | == | w ij | (1 − Pr [ sgn ( v i r ) = sgn ( v j r ) = sgn ( v r )])Finally, if we can manage to lower bound Pr [ sgn ( v i r ) = sgn ( v j r ) = sgn ( v r )] by (1 + cos θ i + cos θ j +cos θ ij ) and simultaneously lower bound (1 − Pr [ sgn ( v i r ) = sgn ( v j r ) = sgn ( v r )]) by (3 − cos θ i − cos θ j − cos θ ij ) we will have ﬁnished as the ﬁnal result will follow by linearity of expectations. This can indeed bedone using some trigonometric facts and the symmetry of spherical geometry: Fact 3.

Let r be chosen uniformly at random from the unit sphere. Then for any three vectors v i , v j , v inthe unit sphere: Pr [ sgn ( v i r ) = sgn ( v j r ) = sgn ( v r )] == 1 − π ( θ ij + θ j + θ i ) ≥≥ . · (1 + cos θ i + cos θ j + cos θ ij ) and also: − Pr [ sgn ( v i r ) = sgn ( v j r ) = sgn ( v r )] == π ( θ ij + θ j + θ i ) ≥≥ . · (3 − cos θ i − cos θ j − cos θ ij )Putting it all together and using linearity of expectations we have shown: E ( w ( L, R )) + W − ≥ . (cid:0) SDP + W − (cid:1) ≥≥ . (cid:0) w ( OPT ) + W − (cid:1) As we shall see next the ﬁrst inequality is the one that determines the approximation coeﬃcient. Theabove proves so far that 0.796 is possible. However there exists a more complicated rounding scheme which24oes not choose r uniformly at random. It was developed in the context of Max-2-SAT problem by Feigeand Goemans and their main idea behind their improvement is to take advantage of the special role of v .They crucially use v : they map each v i to another vector w i that depends both on v i and on v , and onlythen they proceed with the Goemans-Williamson rounding algorithm. Speciﬁcally, w i is coplanar with v , onthe same side of v as v i is, and forms an angle with v equal to f ( θ i ). By choosing the function f to be: f / ( θ ) = θ + ( π (1 − cos θ ))they report that they get a coeﬃcient 0 .

857 for the ﬁrst inequality above (instead of 0 . . . E ( w ( L, R )) + W − ≥ . (cid:0) w ( OPT ) + W − (cid:1) A.4 Correlation Clustering from Noisy Constraints

The last of the proofs for the positive results will be for the Correlation Clustering problem, following thesame ideas as in the proofs above. In correlation clustering, the information comes as

Must-Link ( ab ) or Cannot-Link ( a | b ) constraints indicating if two labels should be in the same or in diﬀerent parts of anoptimal partition. The current best algorithm is a 0 . . − . ε )-approximation. ALG = m s + 0 . m u = m s + 0 . m − m s − m v ) == 0 . m + 0 . m s − . m v )We construct an undirected graph where for every Cannot-Link constraint ab we add a +1 edge between a, b and for every Must-Link constraint ab we add an edge a, b now with negative weight − . w ( S, ¯ S ) = m s ( S, ¯ S ) − . m v ( S, ¯ S )Hence: ALG = 0 . m + 0 . w ( S, ¯ S )Assuming that the largest cluster in the optimum partition has size at most n , our stochastic model willgenerate at least m Cannot-Link constraints by a simple counting argument. This is in expectation, butof course using a standard large deviation Chernoﬀ bound, all our claims in this paper can be made to holdwith high probability. This also implies that the total number of

Must-Link constraints is at most m .Thus, once again using MaxCut for the ﬁrst split: E ( w ( S, ¯ S )) = 0 . w ( OPT ) − . · . · m An easy lower bound for the value of the

OPT cut is: w ( OPT ) ≥ m hence we obtain a 0.8226-approximation. B Hardness via Ordering CSPs

In this part of the Appendix, we present our hardness of approximation results for the constraint satisfactionproblems on trees, extending in some cases the inapproximability results of [GHM +

11, AMW13] from linearorderings to trees.

B.1 Hardness for Rooted Triplets Consistency

We prove that under the

Ugc , it is hard to approximate the Desired Triplets Consistency problem betterthan a factor of , even in the unweighted case. Notice that the current best approximation is achieved bya random tree (or a simple greedy algorithm). In fact our result is slightly stronger: it is hard to distinguishbetween two instances one of which is almost perfect (e.g., 99% of constraints are consistent) and the otheris far from perfect (e.g., 67% of constraints are consistent). We base our hardness result on the followingtheorem by [AMW13] about the Non-Betweeness problem and its -inapproximability:25 act 4. Let K be the total number of triplet constraints in an instance of non-Btw . For any (cid:15) > , it isNP-hard to distinguish between non-Btw instances of the following two cases: YES : val ( π ∗ ) ≥ (1 − (cid:15) ) K , i.e. the optimal permutation satisﬁes almost all constraints. NO : val ( π ∗ ) ≤ ( + (cid:15) ) K , i.e. the optimal permutation does not satisfy more than 2/3 fraction of theconstraints. Given the above fact, we prove our -inapproximability result for Triplets Consistency: Theorem 21.

Let K be the total number of the triplet constraints in an instance of Desired Triplets Con-sistency. For any δ > , it is NP-hard to distinguish between instances of the following two cases: YES : val ( T ∗ ) ≥ ( − δ ) K , i.e. the optimal tree satisﬁes almost half of all the triplet constraints. NO : val ( T ∗ ) ≤ ( + δ ) K , i.e. the optimal tree does not satisfy more than fraction of the triplet constraints. Then, our -inapproximability result follows directly from the gap of these instances: / = . Proof.

Start with a YES instance of the non-Btw problem with optimal permutation π ∗ and val ( π ∗ ) ≥ (1 − (cid:15) ) K . Viewing each non-Btw constraint as a desired triplet, we show how to construct a tree T suchthat val ( T ) ≥ ( − δ ( ε )) K . In fact, the construction is straightforward: simply assign the n labels, either inthe order they appear in π ∗ or reversed, as the leaves of a caterpillar tree (every internal node has at leastone child that is a leaf). Observe that this tree satisﬁes: val ( T ) ≥ (1 − (cid:15) ) K/ non-Btw constraint ab | c was obeyed by π ∗ , it will also be obeyed by one of the twocaterpillar trees above: if c appears ﬁrst in the permutation then the former caterpillar will obey ab | c as c gets separated ﬁrst, otherwise if c appears last, then the reversed caterpillar tree will obey ab | c . Here the factor is tight, since for example, the two non-Btw constraints ab | c and bc | a are both satisﬁed by theordering abc , but when viewed as desired triplets, they cannot both be satisﬁed by a tree.The NO instance is slightly more challenging. Start with a NO instance of the non-Btw problemwith optimal π ∗ of value val ( π ∗ ) ≤ ( + ε ) K . Viewing the non-Btw constraints as desired triplets, weshow that the optimum tree T ∗ cannot achieve better than > (1 / (cid:15) ) K , because this would imply that val ( π ∗ ) > ( + ε ) K , which is a contradiction.For this, assume that some tree T scored a value val ( T ) > (1 / (cid:15) ) K . We will construct a permutation π from the tree T with value val ( π ) > (2 / (cid:15) ) K . Observe that directly projecting the leaves of T ontoa line (just outputting the n leaves from left to right as they appear in the tree) would already satisfy > (1 / (cid:15) ) K , since every desired triplet ab | c obeyed by the tree, will also be obeyed (as a non-Btw constraint) by π as c will either be ﬁrst or last among the three labels a, b, c .Moreover, there are potentially desired triplet constraints that are disobeyed by the tree T , yet obeyedby the permutation. We know that the number of remaining constraints is K − (1 / (cid:15) ) K = (2 / − (cid:15) ) K .By randomly swapping each left and right child in the tree T before we do the projection to the permutation π , will actually lead to an excess of 1 / · (2 / − (cid:15) ) K = (1 / − (cid:15) ) K number of non-Btw constraints. Tosee this notice that for every triplet that is disobeyed in the tree, there is a probability that it becomesobeyed in the permutation. Summing up, we get val ( π ) > (1 / (cid:15) ) K + (1 / − (cid:15) ) K > (2 / (cid:15) ) K = ⇒ val ( π ∗ ) ≥ val ( π ) > (2 / (cid:15) ) K , a contradiction. B.2 Hardness for Forbidden Triplets: Random is Optimal

We prove that under the

Ugc , it is hard to approximate the Forbidden Triplets Consistency problem betterthan a factor of , even in the unweighted case. Notice that the current best approximation is in fact achieved by a random tree (or a simple greedy algorithm), hence we settle the computational complexity ofthe problem. Our result is slightly stronger: it is hard to distinguish between two instances one of whichis almost perfect (e.g., 99% of constraints are consistent) and the other is far from perfect (e.g., 67% ofconstraints are consistent). We base our hardness result on the following theorem by [GHM +

11] about the

Btw problem and its -inapproximability: Fact 5.

Let K be the total number of triplet constraints in an instance of Btw . For any (cid:15) > , it isUGC-hard to distinguish between Btw instances of the following two cases: ES : val ( π ∗ ) ≥ (1 − (cid:15) ) K , i.e. the optimal permutation satisﬁes almost all constraints. NO : val ( π ∗ ) ≤ ( + (cid:15) ) K , i.e. the optimal permutation does not satisfy more than 1/3 fraction of theconstraints. Given the above fact, we prove our -inapproximability result for Forbidden Triplets Consistency: Theorem 22.

Let K be the total number of the triplet constraints in an instance of Forbidden TripletsConsistency. For any δ > , it is UGC-hard to distinguish between instances of the following two cases: YES : val ( T ∗ ) ≥ (1 − δ ) K , i.e. the optimal tree satisﬁes almost half of all the triplet constraints. NO : val ( T ∗ ) ≤ ( + δ ) K , i.e. the optimal tree does not satisfy more than fraction of the triplet constraints. Then, our -inapproximability result follows directly from the gap of these instances: / . Proof.

Start with a YES instance of the

Btw constraint a | b | c was obeyed by π ∗ , it will also be avoided (viewed as a forbiddentriplet ac | b ) by the caterpillar tree above: if a appears ﬁrst in the permutation then the caterpillar will avoid ac | b as a gets separated ﬁrst, otherwise if c appears ﬁrst, then again the caterpillar tree will avoid ac | b as c gets separated ﬁrst.The NO instance is slightly more challenging. Start with a NO instance of the Btw problem with optimal π ∗ of value val ( π ∗ ) ≤ ( + ε ) K . Viewing the Btw constraints as forbidden triplets, we show that the optimumtree T ∗ cannot achieve better than > (2 / (cid:15) ) K , because this would imply that val ( π ∗ ) > ( + ε ) K , whichis a contradiction.For this, assume that some tree T scored a value val ( T ) > (2 / (cid:15) ) K . We will construct a permutation π from the tree T with value val ( π ) > (1 / (cid:15) ) K , a contradiction. Notice that there are forbidden tripletsthat may be avoided by the tree, yet obeyed by the permutation: for example for a forbidden triplet t = ac | b ,the tree R that ﬁrst removes a and then splits b, c will successfully avoid t , however the permutation acb cancome from R by projection, however acb do not obey the Btw constraint a | b | c .Hence directly projecting the leaves of T onto a line may not satisfy > (1 / (cid:15) ) K , since every forbiddentriplet ac | b avoided by T , can be ordered by this projected permutation in a way that would not obey thecorresponding Btw constraint a | b | c .However, just by randomly swapping each left and right child for every internal node in the tree beforewe do the projection to the permutation, would satisfy 1 / · (2 / (cid:15) ) K = (1 / (cid:15) ) K number of constraints.To see this, note that with probability a forbidden ac | b avoided by T will be mapped to the desired abc (and not acb ) or cba (and not cab ) ordering.Finally, we get val ( π ) > (1 / (cid:15) ) K = ⇒ val ( π ∗ ) ≥ val ( π ) > (1 / (cid:15) ) K , a contradiction that we weregiven a NO instance. B.3 Hardness for Desired Quartets Consistency

The main result in this section is that for the desired quartets problem, one cannot do better than -approximation. Notice that a random unrooted tree achieve -approximation which is currently the bestknown algorithm.To prove our results, we make use of a consequence from the results in [GHM +

11] for orderings CSPs ofarity 4. Speciﬁcally, we deﬁne the following problem, which we call . Deﬁnition 1.

For an ordering problem, a constraint { ab | cd } speciﬁes that both elements a, b should precede c, d or that both c, d should precede a, b in the output ordering (e.g., badc , but not acbd ).No constraints are placed on the relative ordering between a, b or on the ordering between c, d . act 6. Given constraints, no polynomial time algorithm can beat the performance of arandom permutation, which achieves a -approximation, assuming Ugc . In fact, if K is the total numberof constraints, for any (cid:15) > , it is UGC-hard to distinguish between the two cases: YES : val ( π ∗ ) ≥ (1 − (cid:15) ) K , i.e. the optimal permutation satisﬁes almost all constraints. NO : val ( π ∗ ) ≤ ( + (cid:15) ) K , i.e. the optimal permutation does not satisfy more than 1/3 fraction of theconstraints. Observe that from the 4! = 24 permutations on a, b, c, d only 8 of them obey the constraint, that’s why random achieves . Theorem 23.

Let K be the total number of the quartet constraints in an instance of Desired QuartetsConsistency. For any δ > , it is UGC-hard to distinguish between instances of the following two cases: YES : val ( T ∗ ) ≥ (1 − δ ) K , i.e. the optimal tree satisﬁes almost all the quartet constraints. NO : val ( T ∗ ) ≤ ( + δ ) K , i.e. the optimal tree does not satisfy more than a fraction of the quartetconstraints.Proof. We will make a reduction from the problem. Start from a

YES instance andconsider the optimum permutation π ∗ . Construct an unrooted caterpillar tree T with leaves the labels of π ∗ as they appear in the permutation. It is easy to see that if a constraint ab | cd was obeyedby the permutation, then the corresponding quartet constraint ab | cd was also obeyed in the caterpillar tree T . For that, we can assume w.l.o.g. that the elements appear with relative order abcd in π ∗ and observethat the paths a → b and c → d in T are disjoint, so the quartet is obeyed.The harder case is the NO instance. For that we will show how from a tree T with high value, we canconstruct a permutation π with high value. Speciﬁcally, we will show that if val ( T ) > ( + 2 (cid:15) ) K then wecan ﬁnd π with val ( π ) > ( + 2 (cid:15) ) K = ( + (cid:15) ) K , a contradiction since we started from a NO instance.The tree T is an unrooted tree on n ≥ T rooted by selecting an arbitrary internal node r and making it the root of a binary tree whose internalnodes have exactly 2 children and one parent. The only exception is the root r that has 3 children and noparent. Call this tree T r . Let A, B, C denote the leftmost, middle and rightmost child of r respectively,which are themselves rooted binary trees. Assume w.l.o.g. that A contains the largest number of leavesamong A, B, C , so | A | ≥

2, where | A | denotes the number of leaves contained in the subtree rooted at A .From this rooted tree T r , we generate a permutation π by randomly swapping every left and right childon each internal node of T r and also randomly swapping A, B, C at the root r ; then we simply project theleaves onto a line to get π . We show that each quartet q q | q q obeyed by T will be obeyed in π withprobability p ≥ . We have several cases depending on the labels q , q , q , q : • If q , q ∈ A and q ∈ B and q ∈ C : Notice that the status of the quartet is decided by the randomchoices at the root r since after the ﬁnal projection, labels from A will be consecutive in π and similarlyfor B and C . Here, π will actually obey the quartet with probability , as there are 3 equally likelyoutcomes ABC , BCA and

CAB and the ﬁrst two

ABC and

BCA obey the quartet, irrespectively ofhow labels from A , B , C are ordered. • If q , q ∈ A and q , q ∈ B : This is the easiest case as every quartet of this form will be obeyed in π with probability 1. This follows as labels from A will be consecutive in π and similarly for B . • If q , q , q ∈ A and q ∈ B : The status of this quartet only depends on how the elements q , q , q are placed. Speciﬁcally, depending on the random choices at the root r , q can appear either ﬁrst (if BA was chosen) or last (if AB was chosen) among the 4 elements in π . If the former is true, then q should appear second and we get q q | · · otherwise q should appear third and we get · · | q q . We needto compute the probability for each of these events. Notice that the lowest common ancestor both for q , q and for q , q is A . Hence, the status of the quartet is determined at A and with probability , q is correctly placed on the same side as B (and q ). • If q , q , q , q ∈ A : This case essentially reduces to the analyses of the previous two cases. Just ﬁndthe lowest common ancestor A of all 4 labels q , q , q , q in T r . If two of the labels belong to one childand the remaining to the other child, then the quartet will be obeyed with probability 1, irrespectivelyof the random choices at A (similar to the second case above). Moreover, if one child contains threeof the 4 elements, then the analysis is the same as the previous case yielding a probability of .28he other cases are symmetric for B, C . This proves that if a quartet is obeyed by the tree then withprobability will be obeyed in π which means that val ( π ) > ( + 2 (cid:15) ) K = ( + (cid:15) ) K by linearity ofexpectation. This contradicts the fact that we were given a NO instance. B.4 Hardness for Forbidden Quartets Consistency

The proof proceeds in the same way as the previous paragraph, where we now account for the forbid-den quartets and we make use of the complement problem to , which we call : Deﬁnition 2.

For an ordering problem, a constraint { ab | cd } speciﬁes that either a or b should be between c, d or that either c or d should be between a, b in the output ordering (e.g., adcb ,but not abcd ). No constraints are placed on the relative ordering between a, b or on the ordering between c, d . Fact 7.

Given constraints, no polynomial time algorithm can beat the performanceof a random permutation, which achieves a -approximation, assuming Ugc . In fact, if K is the totalnumber of constraints, for any (cid:15) > , it is UGC-hard to distinguish between the two cases: YES : val ( π ∗ ) ≥ (1 − (cid:15) ) K , i.e. the optimal permutation satisﬁes almost all constraints. NO : val ( π ∗ ) ≤ ( + (cid:15) ) K , i.e. the optimal permutation does not satisfy more than 2/3 fraction of theconstraints. Observe that from the 4! = 24 permutations on a, b, c, d , 16 of them obey the constraint, that’s why random achieves . Theorem 24.

Let K be the total number of the quartet constraints in an instance of Forbidden QuartetsConsistency. For any δ > , it is UGC-hard to distinguish between instances of the following two cases: YES : val ( T ∗ ) ≥ (1 − δ ) K , i.e. the optimal tree satisﬁes almost all the quartet constraints. NO : val ( T ∗ ) ≤ ( + δ ) K , i.e. the optimal tree does not satisfy more than a fraction of the quartetconstraints.Proof. We will make a reduction from the problem. Start from a