An Efficient Adversarial Attack for Tree Ensembles
AAn Efficient Adversarial Attack for Tree Ensembles
Chong Zhang Huan Zhang Cho-Jui Hsieh
Department of Computer Science, UCLA [email protected] , [email protected] , [email protected] Abstract
We study the problem of efficient adversarial attacks on tree based ensembles suchas gradient boosting decision trees (GBDTs) and random forests (RFs). Since thesemodels are non-continuous step functions and gradient does not exist, most existingefficient adversarial attacks are not applicable. Although decision-based black-boxattacks can be applied, they cannot utilize the special structure of trees. In ourwork, we transform the attack problem into a discrete search problem speciallydesigned for tree ensembles, where the goal is to find a valid “leaf tuple” thatleads to mis-classification while having the shortest distance to the original input.With this formulation, we show that a simple yet effective greedy algorithm canbe applied to iteratively optimize the adversarial example by moving the leaftuple to its neighborhood within hamming distance 1. Experimental results onseveral large GBDT and RF models with up to hundreds of trees demonstratethat our method can be thousands of times faster than the previous mixed-integerlinear programming (MILP) based approach, while also providing smaller (better)adversarial examples than decision-based black-box attacks on general (cid:96) p ( p =1 , , ∞ ) norm perturbations. Our code is available at https://github.com/chong-z/tree-ensemble-attack . It has been widely studied that machine learning models are vulnerable to adversarial examples(Szegedy et al., 2013; Goodfellow et al., 2015; Athalye et al., 2018), where a small imperceptibleperturbation on the input can easily alter the prediction of a model. A series of adversarial attackmethods have been proposed on continuous models such as neural networks, which can be generallysplit into two types. The gradient based methods formulate the attack into an optimization problemon a specially designed loss function for attacks, where the gradient can be acquired through eitherback-propagation in the white-box setting (Carlini, Wagner, 2017; Madry et al., 2018), or numericalestimation in the soft-label black-box setting (Chen et al., 2017; Tu et al., 2018; Ilyas et al., 2018).The decision based (or hard-label black-box) methods only have access to the output label, whichusually starts with an initial adversarial example and minimizes the perturbation along the decisionboundary (Brendel et al., 2018; Brunner et al., 2018; Cheng et al., 2019, 2020; Chen et al., 2019c).In this paper we study the problem of efficient adversarial attack on tree based ensembles such asgradient boosting decision trees (GBDT) and random forests (RFs), which have been widely usedin practice (Chen, Guestrin, 2016; Ke et al., 2017; Zhang et al., 2017; Prokhorenkova et al., 2018).We minimize the perturbation to find the smallest possible attack, to uncover the true weakness ofa model. Different from neural networks, tree based ensembles are non-continuous step functionsand existing gradient based methods are not applicable. Decision based methods can be applied butthey usually require a large number of queries and may easily fall into local optimum due to ruggeddecision boundary. In general, finding the exact minimal adversarial perturbation for tree ensemblesis NP-complete (Kantchelian et al., 2015), and a feasible approximation solution is necessary toevaluate the robustness of large ensembles. a r X i v : . [ c s . L G ] O c t he major difficulty of attacking tree ensembles is that the prediction remains unchanged withinregions on the input space, where the region could be large and makes continuous updates inef-ficient. To overcome this difficulty, we transform the continuous R d input space into a discrete { , , . . . , N } K “leaf tuple” space, where N is the number of leaves per tree and K is the number oftrees. On the leaf tuple space we define the distance between two input examples to be the number oftrees that have different prediction leaves (i.e., hamming distance), and define the neighborhood of atuple to be all valid tuples within a small hamming distance. In practice, we propose the attack thatiteratively optimizes the adversarial leaf tuple by moving it to the best adversarial tuple within theneighborhood of distance 1. Intuitively we could reach a far away adversarial tuple through a seriesof smaller updates, based on the fact that each tree makes prediction independently.In experiments, we compare (cid:96) , , ∞ norm perturbation metrics across 10 datasets, and show thatour method is thousands of times faster than MILP (Kantchelian et al., 2015) on most of the largeensembles, and 3 ∼
72x faster than decision based and empirical attacks on all datasets while achievinga smaller distortion. For instance, with the standard (natural) GBDT on the MNIST dataset with10 classes and 200 trees per class, our method finds the adversarial example with only 2.07 timeslarger (cid:96) ∞ perturbation than the optimal solution produced by MILP and only uses 0.237 secondsper test example, whereas MILP requires 375 seconds. As for other approximate attacks, SignOPT(Cheng et al., 2020) finds a 13.93 times larger (cid:96) ∞ perturbation (compared to MILP) using 3.7 seconds,HSJA (Chen et al., 2019c) achieves a 8.36 times larger (cid:96) ∞ perturbation using 1.8 seconds, andCube (Andriushchenko, Hein, 2019) achieves a 4 times larger (cid:96) ∞ perturbation using 4.42 seconds.Additionally, although (cid:96) p distance is widely used in previous attacks and a small (cid:96) p perturbation isusually invisible, our method is general and can also be adapted to other distance metrics. Problem Setting
While the main idea can be applied to multi-class classification models andtargeted attacks, for simplicity we consider a binary classification model f : R d → {− , } consistingof K decision trees. Each tree t is a weak learner f t : R d → R of N leaves, and the ensemble returnsthe sign f ( x ) = sign ( (cid:80) Kt =1 f t ( x )) . Given a victim input example x with y = f ( x ) , we want tofind the minimal adversarial perturbation r ∗ p , determining the adversarial robustness under (cid:96) p norm: r ∗ p = min δ (cid:107) δ (cid:107) p s.t. f ( x + δ ) (cid:54) = y . (1) Exact Solutions
In general computing the exact (optimal) solution for Eq. (1) requires exponentialtime: Kantchelian et al. (2015) showed that the problem is NP-complete for general ensembles andproposed a MILP based method; On the other hand, faster algorithms exist for models of specialform: Zhang et al. (2020) restricted both the input and prediction of every tree t to binary values f t : {− , } d → {− , } and provided an integer linear program (ILP) based formulation about4 times faster than Kantchelian et al. (2015); Andriushchenko, Hein (2019) showed that the exactrobustness of boosted decision stumps (i.e., depth = 1 ) can be solved in polynomial time; Chen et al.(2019b) proposed a polynomial time algorithm to solve a single decision tree. Approximate Solutions
To get a feasible solution for general models a series of methods have beenproposed to compute the lower bound ( robustness verification ) and the upper bound ( adversarialattacks ) of r ∗ p . Chen et al. (2019b) formulated the robustness verification problem into a max-cliqueproblem on a multi-partite graph and produced the lower bound on (cid:96) ∞ ; Wang et al. (2020) extendedthe verification method and produced a lower bound on general (cid:96) p norms; Lee et al. (2019) verifiedthe (cid:96) robustness on the randomly smoothed ensembles which is not directly related to our work.On the other hand, decision based attacks (Brendel et al., 2018; Brunner et al., 2018; Cheng et al.,2019, 2020; Chen et al., 2019c) can be applied here to produce an upper bound since they don’thave architecture or smoothness assumptions on f ( · ) , however they are usually ineffective due tothe discrete nature of tree models; Andriushchenko, Hein (2019) proposed the Cube attack for treeensembles that does stochastic updates along the (cid:96) ∞ boundary, which typically achieves better resultsthan decision based attacks; Yang et al. (2019) focused on the theoretical analysis of search spacedecomposition, and proposed RBA-Appr to search over a subset of the N K convex polyhedronscontaining training examples; Zhang et al. (2020) restricted both the input and prediction of all trees tobinary values and provided a heuristic attack on (cid:96) by assigning empirical weights to each feature, inboth the white-box setting and a special “black-box” setting via training substitute models. In contrast,2able 1: Key differences to prior adversarial attacks that are applicable to general tree ensembles. SignOPT HSJA Cube RBA-Appr OursAccess Level black-box black-box white-box + data white-boxSearch Space input space input space training data leaf tuple spaceStep Size small steps in continuous space (cid:96) boundary N/A one leaf nodeModel Queries / iteration 200 100 ∼
632 100 N/A ∼ our method works on ensemble of general trees f t : R d → R , and utilizes the special properties oftree models to produce a tighter upper bound on general (cid:96) p ( p = 1 , , ∞ ) norms efficiently. Table 1highlights our key differences to prior adversarial attacks that are applicable to general tree ensembles. Robust Training
To overcome the vulnerability, Kantchelian et al. (2015) proposed adversarialboosting by appending adversarial examples to the training dataset; Chen et al. (2019a) optimizedthe worst case perturbation through a max-min saddle point problem and effectively increased theminimal adversarial perturbation; Andriushchenko, Hein (2019) upper bounded the robust test errorby the sum of the max loss of each tree, and proposed a training scheme by minimizing this upperbound; Recently Chen et al. (2019d) approximated the saddle point objective with a greedy heuristicalgorithm and further increased the robustness. We consider both the standard (natural) models andthe robustly trained models (Chen et al., 2019a) to demonstrate our performance on different settings.
We propose an iterative approach where the algorithm starts with an initial adversarial example x (cid:48) s.t. f ( x (cid:48) ) (cid:54) = y and greedily moves closer to x . At each iteration we choose a new adversarial example x (cid:48) new within a small neighborhood around x (cid:48) that has the minimum (cid:96) p distance to x . Formally wedefine the update rule below, and the algorithm stops if x (cid:48) new does not give smaller perturbation than x (cid:48) . The key problem is to define the neighborhood so that Eq. (2) can be efficiently solved, and weprovide an efficient formulation in following sections. We defer all the proofs to Appendix C. x (cid:48) new = argmin x (cid:107) x − x (cid:107) p s.t. x ∈ Neighbor( x (cid:48) ) , f ( x ) (cid:54) = y . (2) In most of the existing attacks, Eq. (2) is solved by a continuous optimization algorithm where the
Neighbor( x (cid:48) ) (the region where we find an improved solution) is a small (cid:96) p ball around the currentsolution x (cid:48) . The major difficulty of attacking tree ensemble is that the model prediction will remainunchanged within a region (which may not be small) containing x (cid:48) , so traditional continuous distancemeasurements are not suitable here. And if we define the neighborhood as a large (cid:96) p ball, due tothe non-continuity of trees, f ( x ) becomes intractable to enumerate. To handle these difficulties weintroduce the concept of leaf tuple and rewrite Eq. (2) into a discrete form. Given an input example x = [ x , . . . , x d ] the traverse starts from the root node of each tree. Each internal node of index i hastwo children and a feature split threshold ( j i , η i ) , and x will be passed to the left child if x j i ≤ η i and to the right child otherwise. The leaf node has a prediction label v i , and will output v i when x reaches here. We use C ( x ) = ( i (1) ( x ) , . . . , i ( K ) ( x )) to denote the index tuple of K prediction leavesfrom input x . In general we use the subscript · j to denote the j th dimension, and the superscripts · i or · ( t ) to denote the i th node and t th tree respectively. Definition 1. ( Bounding Box ) B i = ( l i , r i ] × · · · × ( l id , r id ] denotes the bounding box of node i ,which is the region that x will fall into this node following the feature split thresholds along thetraverse path, and each l and r is either ±∞ or equals to one η i k along the traverse path. We use B ( C ( x )) = (cid:84) i ∈C ( x ) B i = (cid:84) i ∈C ( x ) ( l i , r i ] × · · · × (cid:84) i ∈C ( x ) ( l id , r id ] to denote the Cartesian productof the intersection of K bounding boxes on the ensemble. Definition 2. ( Valid Tuple ) C = {C ( x ) | ∀ x ∈ R d } denotes the set of all possible tuples thatcorrespond to at least one point in the input space, and C = ( i (1) , . . . , i ( K ) ) is a valid tuple iff. C ∈ C .3 ≤ x ≤ -5 x ≤ -20 x ≤ x ≤ x ≤ -1 x ≤ x ≤ x ≤ x (cid:48) x abc dy (cid:48) = − y = +1 our pathdecision based path x x Figure 1: An ensemble defined on [0 , and its corresponding decision boundaries on the inputspace. Numbers inside circles are indexes of leaves, and the number below each leaf is the correspond-ing prediction label v i . For clarity we mark boundaries belong to tree 1, 2, 3 with red, green, andblue respectively, and fill +1 area with gray. x is the victim example and x (cid:48) is an initial adversarialexample. Assume we are optimizing (cid:96) perturbation, our method can reach d by changing one leaf ofthe tuple at a time (black arrows). On the other hand, decision based attacks update the solution alongthe decision boundary, and easily fall into local minimums such as x (cid:48) and a (brown arrows) sincethey look only at the continuous neighborhood. To move from a to b , the path on decision boundaryis a → (5 , → b , but since a → (5 , increases the distortion they won’t find this path. Theorem 1. (Chen et al., 2019b)
The intersection B ( C ) can also be written as the Cartesian productof d intervals (similar to B i ), and C ∈ C ⇐⇒ B ( C ) (cid:54) = ∅ ⇐⇒ ∀ i, j ∈ C , B i ∩ B j (cid:54) = ∅ . (Theseconcepts were used in Chen et al. 2019b for verification instead of attack.) Corollary 1. ( Tuple to Example Distance ) The shortest distance between a valid leaf tuple and anexample, defined as dist p ( C , x ) = min x ∈ B ( C ) (cid:107) x − x (cid:107) p , can be solved in O ( d ) time.Observe C ( x ) = C ( x (cid:48) ) , ∀ x ∈ B ( C ( x (cid:48) )) and we can transform the intractable number of x intotractable number of leaf tuples C . We abuse the notation f ( C ) to denote the model predictionsign ( (cid:80) i ∈C v i ) , which is a constant within B ( C ) . Combined with Corollary 1 we can rewrite Eq. (2)into the discrete form below. Neighbor( C (cid:48) ) denotes the neighborhood space around C (cid:48) , which is aset of leaf tuples that close to C (cid:48) in certain distance measurements. Fig. 1 presents an example todemonstrate that it’s less likely to fall into local optimum on our newly defined neighborhood space. C (cid:48) new = argmin C dist p ( C , x ) s.t. C ∈
Neighbor( C (cid:48) ) ∩ C , f ( C ) (cid:54) = y . (3) Now we discuss how to define
Neighbor( C (cid:48) ) to facilitate our attack. Intuitively the space should beefficient to compute, and has a reasonable coverage to avoid falling into local minimums too easily. Inthis section we discuss two naive approaches that fail on these two properties, and provide empiricalresults in Table 2. Enumerating all leaves is not efficient ( NaiveLeaf ): Given current adversarial example x (cid:48) andits corresponding leaf tuple C (cid:48) = ( i (1) , . . . , i ( K ) ) , an intuitive approach is to change a single i ( t ) toa different leaf i ( t ) new . However the resulting tuple may not be valid, and we will have to query theensemble to get a valid tuple in C . We provide a possible implementation in Appendix D.2 where wemove x (cid:48) to the closest point within B i ( t ) new . NaiveLeaf requires multiple full model queries and takes O ( K · l · Kl ) time per iteration for K trees of depth l , which is too time consuming (see Table 2). Mutating one feature at a time has poor coverage ( NaiveFeature ): Given current adversarialexample x (cid:48) and its corresponding leaf tuple C (cid:48) , another intuitive approach is to move x (cid:48) outside of B ( C (cid:48) ) on each feature dimension. This approach is efficient since there are at most d neighborhood,and each neighborhood is only different by one tree (assuming unique split thresholds). However the4able 2: Average (cid:96) perturbation over 500 test examples on the standard (natural) GBDT models.("*"): For a fair comparison we disabled the random noise optimization discussed in §3.5. OurLT-Attack searches in a subspace of NaiveLeaf so ¯ r our is slightly larger, but it is significantly faster. Standard GBDT NaiveLeaf NaiveFeature LT-Attack (Ours)* Ours vs. NaiveLeaf (cid:96) Perturbation ¯ r leaf time ¯ r time ¯ r our time ¯ r our / ¯ r leaf SpeedupMNIST .081 2.37s .229 .069s .108 .105s 1.33 22.6XF-MNIST .080 3.93s .181 .061s .096 .224s 1.20 17.5XHIGGS .008 3.17s .011 .023s .009 .031s 1.13 102.3X method easily falls into local minimums due to the fact that each leaf is bounded by up to l featuresjointly, thus it’s unlikely to reach certain leaves by only changing one feature at a time. Taking Fig. 1as an example and assume we are at x (cid:48) , notice that B ( C ( x (cid:48) )) = [0 , × [0 , and the algorithm stopshere since both neighborhood { (3 + (cid:15), , (3 , (cid:15) ) } are not adversarial examples. In this section we introduce a neighborhood space through discrete hamming distance, and show thatit’s fast to compute and has good coverage. We define the distance D ( C , C (cid:48) ) between two tuples asthe number of different leaves, and the neighborhood of C (cid:48) with distance h by Neighbor h ( C (cid:48) ) = {C | ∀C ∈ C , D ( C , C (cid:48) ) = h } . (4)The intuition is that each tree can be queried independently, and we want to utilize such property bylimiting the number of affected trees at each iteration. Neighbor h ( · ) has a nice property where wecan increase h for larger search scope, or decrease h to improve speed. Observe that Neighbor ( C (cid:48) ) is a subset of NaiveLeaf (minus invalid leaf tuples that requires an expensive model query), and asuperset of NaiveFeature (plus leaf tuples that may affect multiple features). In experiments we areable to achieve good results with Neighbor ( · ) , and we provide an empirical greedy algorithm inAppendix D.3 to estimate the minimal h required to reach the exact solution. ( · ) We propose Leaf Tuple attack (LT-Attack) in Algorithm 1 that efficiently solves Eq. (3) through twoadditional concepts T Bound ( · ) and Neighbor ( t )1 ( · ) as defined below. Let C (cid:48) be any valid adversarialtuple, and assume unique feature split thresholds. By definition tuples in Neighbor ( C (cid:48) ) are onlydifferent from C (cid:48) by one leaf, and we use Neighbor ( t )1 ( C (cid:48) ) to denote the neighborhood that hasdifferent prediction leaf on tree t . Formally Neighbor ( t )1 ( C (cid:48) ) = {C | C ( t ) (cid:54) = C (cid:48) ( t ) , C ∈
Neighbor ( C (cid:48) ) } . (5) Definition 3. ( Bound Trees ) Let x (cid:48) ∈ B ( C (cid:48) ) be the example that minimizes dist p ( C (cid:48) , x ) , we denotethe indexes of trees that bounds x (cid:48) by T Bound ( C (cid:48) ) = { t | OnEdge ( x (cid:48) , B C (cid:48) ( t ) ) , ∀ t ∈ { , . . . , K }} .Here OnEdge ( x, B ) is true iff. x equals to the left or the right bound of B on at least one dimension. Definition 4. ( Advanced Neighborhood ) We denote the set of neighborhood with smaller (advanced)perturbation than C (cid:48) by Neighbor +1 ( C (cid:48) ) = {C | C ∈ Neighbor ( C (cid:48) ) , dist p ( C , x ) < dist p ( C (cid:48) , x ) } . Theorem 2. ( Bound Neighborhood ) Let
Neighbor
Bound ( C (cid:48) ) = (cid:83) t ∈ T Bound ( C (cid:48) ) Neighbor ( t )1 ( C (cid:48) ) , then Neighbor +1 ( C (cid:48) ) ⊆ Neighbor
Bound ( C (cid:48) ) .Theorem 2 suggests that we can solve Eq. (3) by searching over Neighbor
Bound ( C (cid:48) ) since it is asuperset of the advanced neighborhood which leads to smaller perturbation. In general the algorithmconsists of an outer loop and an inner Neighbor
Bound ( · ) function. The outer loop iterates until nobetter adversarial example can be found, while the inner function generates bound neighborhood withdistance 1. The inner function computes T Bound and runs the top-down traverse for each t ∈ T Bound with the intersection of other K − bounding boxes, denoted by B ( − t ) . According to Theorem 1, aleaf node of t is guaranteed to form a valid tuple if it has non-empty intersection with B ( − t ) .To efficiently obtain B ( − t ) we cache K bounding boxes in B (cid:48) , and for each feature dimension wemaintain the sorted list of left and right bounds from K boxes respectively. Note that B i of leaf node i lgorithm 1: Our proposed LT-Attack for constructing adversarial examples.
Data:
White-box model f , victim example x , y , initial adversarial example x (cid:48) . begin r (cid:48) , C (cid:48) ← dist p ( C ( x (cid:48) ) , x ) , C ( x (cid:48) ) ; B (cid:48) ← BuildSortedBoxes ( C (cid:48) , f ) ; (cid:46) O ( Kl log K ) - B (cid:48) maintains K sorted boundingboxes on d feature dimensions. has_better_neighbor ← True ; while has_better_neighbor do N ← Neighbor
Bound ( C (cid:48) , B (cid:48) , f ) ; (cid:46) See complexity in §3.4.1 N (cid:48) ← {C | C ∈ N , f ( C ) (cid:54) = y } ; (cid:46) O ( | N | ) - f ( C ) can be calculated from f ( C (cid:48) ) in O (1) using the diff leaf. r ∗ , C ∗ ← argmin r, C { dist p ( C , x ) | C ∈ N (cid:48) } ; (cid:46) O ( l | N (cid:48) | ) - dist p ( C , x ) can be calculatedfrom r (cid:48) in O ( l ) using the diff leaf. has_better_neighbor ← r ∗ < r (cid:48) ; if has_better_neighbor then r (cid:48) , C (cid:48) ← r ∗ , C ∗ ; B (cid:48) ← B (cid:48) . ReplaceBox ( C ∗ ) ; (cid:46) O ( l log K ) - Remove and add one box. end end return r (cid:48) , C (cid:48) end Function
Neighbor
Bound ( C (cid:48) , B (cid:48) , f ) : N ← ∅ ; T ← T Bound ( C (cid:48) , B (cid:48) ) ; (cid:46) O ( d log K ) - Need O (log K ) to get the first(tightest) tree on each dimension, and assume C (cid:48) caches the closet x (cid:48) to x . We give acloser complexity analysis for | T Bound ( · ) | inthe following section. for t ∈ T do B ( − t ) ← B (cid:48) . RemoveBox ( t ) ; (cid:46) O ( l log K ) - Remove the bounding box oftree t from the sorted list (lazily), thebox has at most l non-infinite dimensions. I ← { i | B i ∩ B ( − t ) (cid:54) = ∅ , i ∈ S ( t ) \ C (cid:48) ( t ) } ; (cid:46) O (2 l ) - Traverse tree t top-down andreturn leaves for Neighbor ( t )1 ( C (cid:48) ) . S ( t ) denotes the set of leaves of tree t . N ← N ∪ {C (cid:48) | C (cid:48) ( t ) ← i, i ∈ I } ; (cid:46) O ( | I | ) - Construct the neighborhood tupleby replacing the t th leaf. In practice,we only need to return the diff ( t, i ) . end return N end comes from feature split thresholds along the top-down traverse path, thus it has at most l non-infinitedimensions, where l is the depth of the tree. In conclusion we can add/remove a bounding box B i to/from B (cid:48) in O ( l log K ) time. We provide time complexity for most operations in Algorithm 1 inline,and give a detailed analysis for the size of Neighbor
Bound ( C (cid:48) ) in the next section. See Appendix D.1for the algorithm generating random initial adversarial examples. Our LT-Attack enumerates all leaf tuples in the bound neighborhood at each iteration, thus thecomplexity of each iteration largely depends on the size of
Neighbor
Bound ( C (cid:48) ) . In this section weanalyze the size | Neighbor
Bound ( C (cid:48) ) | and show it will not be too large on real datasets. Corollary 2. ( Size of Neighbor ( t )1 ( C (cid:48) ) ) Let k ( t ) = |{ η ∈ B ( − t ) j , ( j, η ) ∈ H ( t ) }| be the number offeature split thresholds inside B ( − t ) , we have | Neighbor ( t )1 ( C (cid:48) ) | ≤ min( k ( t ) ,l ) − . Here B ( − t ) = (cid:84) i ∈C (cid:48) ,i (cid:54) = C (cid:48) ( t ) B i and H ( t ) denotes the set of feature split thresholds on all internal nodes of tree t .In practice, k ( t ) (cid:28) l since B ( − t ) is the intersection of K − bounding boxes and only covers a smallregion of the input space R d . | T Bound ( C (cid:48) ) | ≤ d and is also usually small in real datasets, which can beexplained by the intuition that some features are less important and could reach the same value as x easily. Both | T Bound ( C (cid:48) ) | and | Neighbor ( t )1 ( C (cid:48) ) | characterize the complexity of | Neighbor
Bound ( C (cid:48) ) | .We provide empirical statistics in Appendix A, which suggests that | Neighbor
Bound ( C (cid:48) ) | has thesimilar complexity as a single full model query. For instance, on the MNIST dataset with 784 featuresand 400 trees we have mean | Neighbor
Bound ( C (cid:48) ) | ≈ . , and the algorithm stops in ∼ ( · ) In this section ¯ C denotes the converged tuple when the outer loop stops, and we discuss the propertyof the converged solution. Trivially ¯ C has the minimal adversarial perturbation within Neighbor ( ¯ C ) ,and we can show that the guarantee is actually stronger.6able 3: Average (cid:96) ∞ and (cid:96) perturbation of 500 test examples (or the entire test set when its sizeis less than 500) on standard (natural) GBDT models . Datasets are ordered by training data size. Bold and blue highlight the best and the second best entries respectively (not including MILP).("*"): Average of 50 examples due to long running time. (" (cid:63) "): HSJA has fluctuating running time.
Standard GBDT SignOPT HSJA RBA-Appr Cube LT-Attack (Ours) MILP Ours vs. MILP (cid:96) ∞ Perturbation ¯ r time ¯ r time ¯ r time ¯ r time ¯ r our time r ∗ time ¯ r our /r ∗ Speedupbreast-cancer .258 .308s .256 .070s .247 .0008s .530 .230s .235 .001s .222 .013s 1.06 13Xdiabetes .083 .343s .078 .066s .113 .0009s .080 .240s .059 .002s .056 .084s 1.05 42XMNIST2-6 .480 2.73s .277 1.23s .963 .155s .143 2.43s .097 .222s .065 28.7s 1.49 129.3Xijcnn .043 .313s .043 .096s .074 .020s .035 .334s .033 .007s .031 6.60s 1.06 942.9XMNIST .195 3.70s .117 28.7s (cid:63) .983 4.11s .056 4.42s .029 .237s .014 375s* 2.07 1582.3XF-MNIST .155 4.38s .065 1.81s .607 5.55s .038 5.45s .028 .370s .013 15min* 2.15 2473Xwebspam .013 1.01s .023 .445s .051 .720s .003 .866s .001 .051s .0008 27.5s 1.25 539.2Xcovtype .047 .508s .074 .209s .086 3.05s .036 .958s .032 .038s .028 10min* 1.14 15736.8XHIGGS .009 .465s .012 .157s .099 55.3s* .005 .862s .004 .036s .004 52min* 1.00 87166.7X
Standard GBDT SignOPT HSJA RBA-Appr Cube LT-Attack (Ours) MILP Ours vs. MILP (cid:96) Perturbation ¯ r time ¯ r time ¯ r time ¯ r time ¯ r our time r ∗ time ¯ r our /r ∗ Speedupbreast-cancer .310 .811s .370 .072s .352 .0008s .678 .248s .283 .001s .280 .011s 1.01 11Xdiabetes .106 .650s .123 .064s .158 .001s .136 .269s .077 .003s .073 .055s 1.05 18.3XMNIST2-6 2.18 7.29s 2.45 1.54s 3.98 .155s .801 3.73s .245 .235s .183 2.52s 1.34 10.7Xijcnn .051 .544s .052 .094s .112 .020s .067 .355s .044 .010s .043 2.18s 1.02 218XMNIST 1.20 8.71s 1.45 23.7s (cid:63) .072 .243s .043 32.5s 1.67 133.7XF-MNIST .870 9.57s .581 2.03s 3.85 5.48s .225 9.20s .073 .400s .049 49.3s 1.49 123.3Xwebspam .023 3.26s .112 .529s .128 .721s .009 1.04s .002 .053s .002 3.17s 1.00 59.8Xcovtype .061 .976s .123 .217s .129 3.03s .070 1.06s .045 .039s .042 237s 1.07 6076.9XHIGGS .015 1.02s .015 .154s .196 55.5s* .013 .905s .008 .037s .007 13min* 1.14 20621.6X
Theorem 3. ( Convergence Guarantee ) Let V + = { i | i ∈ C , C ∈
Neighbor +1 ( ¯ C ) } be the union ofleaves appeared in the advanced neighborhood Neighbor +1 ( ¯ C ) (Definition 4), then ¯ C is the optimum adversarial tuple within valid combinations of V + . Note that V + is the union of leaves, and leaves from multiple tuples of distance 1 could form anew valid tuple of larger distance to ¯ C (by combining the different leaves together). In other wordsTheorem 3 suggests that our solution is not only a local optimal in Neighbor ( ¯ C ) , but also better thancertain tuples in Neighbor h ( ¯ C ) with h > . In our illustrated example Fig. 1, assume the algorithmconverged at ¯ C = C ( a ) = (4 , , on (cid:96) ∞ norm, here Neighbor +1 ( C ( a )) = { (4 , , , (4 , , } .Theorem 3 claims that there is no better adversarial tuple from any valid combinations within V + = { , , , , } such as (4 , , , even though it is from Neighbor ( ¯ C ) . For the initial point, we draw 20 random initial adversarial examples from a Gaussian distribution,and optimize with a fine-grained binary search before feeding to the proposed algorithm. We returnthe best adversarial example found among them (see Appendix D.1 for details). The ensemble islikely to contain duplicate feature split thresholds even though it’s defined on R d , for example it maycome from the image space [255] d and scaled to R d . Duplicate split thresholds are problematic sincewe cannot move across the threshold without affecting multiple trees, and to overcome the issue weuse a relaxed version of Neighbor ( · ) to allow changing multiple trees at one iteration, as long asit’s caused by the same split threshold. When searching for the best neighborhood it’s likely to haveperturbation ties in (cid:96) ∞ and (cid:96) norm, in this case we use a secondary (cid:96) norm to break the tie. Eq. (3)looks for the best tuple across all neighborhood which may be unnecessary at early stage of iterations.To improve the efficiency we sort feature dimensions by abs( x (cid:48) − x ) (large first), and terminate thesearch earlier if a better tuple was found in the top 1 feature. To escape converged local minimumswe change each coordinate to a nearby value from Gaussian distribution with 0.1 probability, andcontinue the iteration if a better adversarial example was found within 300 trials. We evaluate the proposed algorithm on 9 public datasets (Smith et al., 1988; Lecun et al., 1998;Chang, Lin, 2011; Wang et al., 2012; Baldi et al., 2014; Xiao et al., 2017; Dua, Graff, 2017) with boththe standard (natural) GBDT and RF models, and on an additional 10th dataset (Bosch, 2016) with7able 4: Average (cid:96) ∞ and (cid:96) perturbation of 5000 test examples (or the entire test set when its size isless than 5000) on robustly trained GBDT models . Datasets are ordered by training data size. Bold and blue highlight the best and the second best entries respectively (not including MILP).("*" / " (cid:63) "): Average of 1000 / 500 examples due to long running time.
Robust GBDT SignOPT HSJA RBA-Appr Cube LT-Attack (Ours) MILP Ours vs. MILP (cid:96) ∞ Perturbation ¯ r time ¯ r time ¯ r time ¯ r time ¯ r our time r ∗ time ¯ r our /r ∗ Speedupbreast-cancer .403 .371s .405 .073s .405 .002s .888 .238s .404 .002s .401 .010s* 1.01 5.6Xdiabetes .119 .364s .123 .068s .138 .001s .230 .239s .113 .003s .112 .039s* 1.01 14.4XMNIST2-6 .588 3.06s .470 1.30s .671 .137s .337 2.15s .333 .275s .313 177s* 1.06 641.6Xijcnn .032 .353s .030 .105s .032 .018s .027 .313s .025 .006s .022 4.24s* 1.14 759.6XMNIST .513 3.93s .389 1.68s .690 6.42s .296 3.95s .290 .234s .270 20min* 1.07 5067.5XF-MNIST .254 4.31s .154 1.79s .596 7.83s .101 4.45s .095 .412s .076 74min* 1.25 10778.5Xwebspam .047 1.00s .043 .414s .061 .641s .020 .756s .017 .031s .015 129s* 1.13 4129.4Xcovtype .064 .540s .080 .186s .093 3.61s .055 .720s .047 .047s .045 14min* 1.04 17164.9Xbosch .343 3.28s .337 1.42s .533 1.22s .158 2.49s .143 .213s .100 237s* 1.43 1112XHIGGS .015 .466s .016 .134s .048 72.4s* .012 .644s .01 .050s .009 73min (cid:63)
Robust GBDT SignOPT HSJA RBA-Appr Cube LT-Attack (Ours) MILP Ours vs. MILP (cid:96) Perturbation ¯ r time ¯ r time ¯ r time ¯ r time ¯ r our time r ∗ time ¯ r our /r ∗ Speedupbreast-cancer .437 .711s .449 .069s .436 .002s .940 .239s .434 .002s .431 .011s* 1.01 5.2Xdiabetes .142 .591s .150 .061s .161 .003s .274 .240s .133 .005s .132 .025s* 1.01 4.8XMNIST2-6 2.97 7.37s 3.32 1.28s 2.95 .156s .971 .438s .762 25.0s* 1.27 57.1Xijcnn .033 .572s .035 .096s .040 .014s .042 .307s .030 .006s .025 .853s* 1.20 140.3XMNIST 3.08 9.14s 3.04 1.61s 4.07 5.11s 1.33 6.26s .932 .291s .670 7min* 1.39 1523.6XF-MNIST 1.67 9.27s 1.34 1.64s 3.72 7.01s .500 7.01s .310 .385s .233 231s* 1.33 600.8Xwebspam .097 3.24s .100 .431s .148 .589s .068 .869s .041 .034s .035 28.3s* 1.17 840.6Xcovtype .076 1.11s .104 .196s .137 3.26s .096 .726s .062 .047s .058 9min* 1.07 11183.1Xbosch .750 9.62s 2.33 1.54s 1.45 1.21s .480 3.84s .258 .232s .214 28.0s* 1.21 120.7XHIGGS .020 .879s .020 .128s .085 66.5s* .023 .580s .016 .045s .014 24min (cid:63) the robustly trained GBDT. Datasets have a mix of small/large scale and binary/multi classification(statistics in Appendix A), and are normalized to [0 , to make results comparable across datasets. Weorder datasets by training data size in all of our tables, where HIGGS is the largest dataset with 10.5million training examples. All GBDTs were trained using the XGBoost framework (Chen, Guestrin,2016) and we use the models provided by Chen et al. (2019b) as target models (except bosch). Wecompare with the following existing adversarial attacks that are applicable to tree ensembles:• SignOPT (Cheng et al., 2020): The decision based attack that constructs adversarial examplesbased on hard-label black-box queries. We report the average distortion, denoted as ¯ r in the results(since the norm of adversarial example is an upper bound of minimal adversarial perturbation r ∗ ).• HSJA (Chen et al., 2019c): Another decision based attack for constructing adversarial examples.•
RBA-Appr (Yang et al., 2019): An approximate attack for tree ensembles that constructs adversarialexamples by searching over training examples of the opposite class.•
Cube (Andriushchenko, Hein, 2019): An empirical attack for tree ensembles that constructsadversarial examples by stochastically changing a few coordinates to the (cid:96) ∞ boundary, and acceptsthe change if it decreases the functional margin. The method provides good experimental results ingeneral, but lacks theoretical guarantee and could be unreliable on certain datasets such as breast-cancer. Cube doesn’t support (cid:96) objective by default and we report the (cid:96) perturbation of theconstructed adversarial examples from (cid:96) ∞ objective attacks.• LT-Attack (Ours) : Our proposed attack that constructs adversarial examples for tree ensembles. Wereport the average distortion of the adversarial examples, denoted as ¯ r our in the results.• MILP (Kantchelian et al., 2015): The mixed-integer linear programming based method providesthe exact minimal adversarial perturbation r ∗ but could be very slow on large models.We run our experiments with 20 threads per task. Conventionally black-box attacks measure efficiencyby the number of queries, here we compare running time since it’s difficult to quantify queries forwhite-box attacks. To minimize the efficiency variance between programming languages we feedan XGBoost model (Chen, Guestrin, 2016) to SignOPT, HSJA, and Cube, which has an efficientC++ implementation and supports multi-threading batch query. MILP uses a thin wrapper around theGurobi Solver (Gurobi Optimization, 2020). Baseline methods spend majority of time on XGBoostmodel inference rather than Python code. For instance, on Fashion-MNIST, SignOPT, HSJA, Cubespent 72.8%, 57.3%, 73.4% of runtime in XGBoost library (C++) calls, respectively. HSJA and8able 5: Average (cid:96) perturbation over 100 test examples on the standard (natural) random forests(RF) models . Datasets are ordered by training data size. Bold and blue highlight the best and thesecond best entries respectively (not including MILP).
Standard RF Cube LT-Attack (Ours) MILP Ours vs. MILP (cid:96) Perturbation ¯ r time ¯ r our time r ∗ time ¯ r our /r ∗ Speedupbreast-cancer 1.03 .224s .413 .001s .402 .008s 1.03 8Xdiabetes .260 .285s .151 .003s .146 .042s 1.03 14XMNIST2-6 .439 2.13s .207 .045s .194 .071s 1.07 1.6Xijcnn .046 .336s .028 .003s .028 .185s 1.00 61.7XMNIST .057 2.88s .018 .057s .018 4.56s 1.00 80XF-MNIST .141 3.51s .066 .080s .066 7.44s 1.00 93Xwebspam .005 .704s .003 .033s .003 .664s 1.00 20.1Xcovtype .087 .700s .055 .040s .055 30.1s 1.00 752.5XHIGGS .015 .423s .009 .013s .009 6.66s 1.00 512.3X
SignOPT start with 1 initial adversarial example and run 100 ∼
632 and 200 queries per iterationrespectively to approximate the gradient, and Cube uses 20 initial examples which utilizes batchquery. In Table 3 and Table 4 we show the empirical comparisons on (cid:96) ∞ and (cid:96) , and provide (cid:96) results as well as the attack success rate in Appendix B due to the space limit. We can see that ourmethod provides a tight upper bound ¯ r our compared to the exact r ∗ from MILP, which means thatthe adversarial examples found are very close to the one with minimal adversarial perturbation, andour method achieved 1,000 ∼ (cid:96) case. For instance, on Fashion-MNIST our ¯ r/r ∗ ratio is 1.49x and1.33x for standard and robustly trained models respectively, while the respective Cube ratio is 4.59xand 2.15x using ∼
21x time, and the respective HSJA ratio is 11.86x and 5.75x using ∼ (cid:96) perturbation in Table 5, where the ¯ r our /r ∗ ratio isclose to 1 across all datasets. Additional experimental results on (cid:96) ∞ perturbation can be found inAppendix B.1, and we include model parameters as well as statistics in Appendix A.To study the impact of using different number of initial examples, we conduct the experiments with { , , , , , , } initial examples on SignOPT, HSJA, Cube, and LT-Attack, allocating 2 threadsper task. We report the smallest (best) adversarial perturbation among those initial examples. Usingmore initial examples could lead to smaller (better) adversarial perturbation, but requires linearlyincreasing computational cost. Fig. 2 presents the (cid:96) perturbation vs. runtime per test example in logscale, where our method is able to construct small (good) adversarial examples (y-axis) with a fewinitial examples, and can be orders of magnitude faster than other methods in the meantime (x-axis). − . . . . . time/s (cid:96) p e r t u r b a ti on Standard GBDT (covtype) − . . . . time/sStandard GBDT (F-MNIST) − − . . . time/sStandard GBDT (HIGGS) − . . time/sStandard GBDT (webspam) − . . . . time/s (cid:96) p e r t u r b a ti on Robust GBDT (covtype) . . time/sRobust GBDT (F-MNIST) − − . . . . time/sRobust GBDT (HIGGS) − . . . . time/sRobust GBDT (webspam)MILP Ours RBA-Appr Cube HSJA SignOPT Figure 2: Average (cid:96) perturbation of 50 test examples vs. runtime per test example in log scale.Methods on the bottom-left corner are better. 9 roader Impact To the best of our knowledge, this is the first practical attack algorithm (in terms of both computationaltime and solution quality) that can be used to evaluate the robustness of tree ensembles. The study ofrobustness training algorithms for tree ensemble models have been difficult due to the lack of attacktools to evaluate their robustness, and our method can serve as the benchmark tool for robustnessevaluation (similar to FGSM, PGD and C&W attacks for neural networks) (Goodfellow et al.,2015; Madry et al., 2018; Carlini, Wagner, 2017) to stimulate the research in the robustness of treeensembles.
Acknowledgments and Disclosure of Funding
We acknowledge the support by NSF IIS-1901527, IIS-2008173, ARL-0011469453, Google Cloudand Facebook.
References
Andriushchenko Maksym, Hein Matthias . Provably robust boosted decision stumps and trees againstadversarial attacks // Advances in Neural Information Processing Systems 32. 2019. 13017–13028.
Athalye Anish, Carlini Nicholas, Wagner David A.
Obfuscated Gradients Give a False Sense ofSecurity: Circumventing Defenses to Adversarial Examples // ICML. 2018.
Baldi Pierre, Sadowski Peter, Whiteson D. O.
Searching for exotic particles in high-energy physicswith deep learning. // Nature communications. 2014. 5. 4308.
Bosch . Bosch Production Line Performance. 2016. . Brendel Wieland, Rauber Jonas, Bethge Matthias . Decision-Based Adversarial Attacks: ReliableAttacks Against Black-Box Machine Learning Models // International Conference on LearningRepresentations. 2018.
Brunner Thomas, Diehl Frederik, Truong-Le Michael, Knoll Alois . Guessing Smart: Biased Samplingfor Efficient Black-Box Adversarial Attacks // 2019 IEEE/CVF International Conference onComputer Vision (ICCV). 2018. 4957–4965.
Carlini Nicholas, Wagner David A.
Towards Evaluating the Robustness of Neural Networks // 2017IEEE Symposium on Security and Privacy (SP). 2017. 39–57.
Chang Chih-Chung, Lin Chih-Jen . LIBSVM: A library for support vector machines // ACMTransactions on Intelligent Systems and Technology. 2011. 2. 27:1–27:27. Software available at . Chen Hongge, Zhang Huan, Boning Duane S., Hsieh Cho-Jui . Robust Decision Trees AgainstAdversarial Examples // ICML. 2019a. 1122–1131.
Chen Hongge, Zhang Huan, Si Si, Li Yang, Boning Duane, Hsieh Cho-Jui . Robustness Verificationof Tree-based Models // Advances in Neural Information Processing Systems 32. 2019b. 12317–12328.
Chen Jianbo, Jordan Michael I., Wainwright Martin J.
HopSkipJumpAttack: A Query-EfficientDecision-Based Adversarial Attack // arXiv preprint arXiv:1904.02144. 2019c.
Chen Pin-Yu, Zhang Huan, Sharma Yash, Yi Jinfeng, Hsieh Cho-Jui . ZOO: Zeroth Order Optimiza-tion Based Black-box Attacks to Deep Neural Networks without Training Substitute Models //Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. 2017.
Chen Tianqi, Guestrin Carlos . XGBoost: A Scalable Tree Boosting System // Proceedings of the22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. NewYork, NY, USA: Association for Computing Machinery, 2016. 785–794. (KDD ’16).10 hen Yizheng, Wang Shiqi, Jiang Weifan, Cidon Asaf, Jana Suman . Training Robust Tree Ensemblesfor Security. 2019d.
Cheng Minhao, Le Thong, Chen Pin-Yu, Zhang Huan, Yi JinFeng, Hsieh Cho-Jui . Query-EfficientHard-label Black-box Attack: An Optimization-based Approach // International Conference onLearning Representations. 2019.
Cheng Minhao, Singh Simranjit, Chen Patrick H., Chen Pin-Yu, Liu Sijia, Hsieh Cho-Jui . Sign-OPT: A Query-Efficient Hard-label Adversarial Attack // International Conference on LearningRepresentations. 2020.
Dua Dheeru, Graff Casey . UCI Machine Learning Repository. 2017.
Goodfellow Ian J., Shlens Jonathon, Szegedy Christian . Explaining and Harnessing AdversarialExamples // CoRR. 2015. abs/1412.6572.
Gurobi Optimization LLC . Gurobi Optimizer Reference Manual. 2020.
Ilyas Andrew, Engstrom Logan, Athalye Anish, Lin Jessy . Black-box Adversarial Attacks with LimitedQueries and Information // ICML. 2018.
Kantchelian Alex, Tygar J. Doug, Joseph Anthony D.
Evasion and Hardening of Tree EnsembleClassifiers // ICML. 2015.
Ke Guolin, Meng Qi, Finley Thomas, Wang Taifeng, Chen Wei, Ma Weidong, Ye Qiwei, Liu Tie-Yan .LightGBM: A Highly Efficient Gradient Boosting Decision Tree // Advances in Neural InformationProcessing Systems 30. 2017. 3146–3154.
Lecun Y., Bottou L., Bengio Y., Haffner P.
Gradient-based learning applied to document recognition //Proceedings of the IEEE. 1998. 86, 11. 2278–2324.
Lee Guang-He, Yuan Yang, Chang Shiyu, Jaakkola Tommi . Tight Certificates of Adversarial Robust-ness for Randomly Smoothed Classifiers // Advances in Neural Information Processing Systems32. 2019. 4910–4921.
Madry Aleksander, Makelov Aleksandar, Schmidt Ludwig, Tsipras Dimitris, Vladu Adrian . TowardsDeep Learning Models Resistant to Adversarial Attacks // International Conference on LearningRepresentations. 2018.
Prokhorenkova Liudmila, Gusev Gleb, Vorobev Aleksandr, Dorogush Anna Veronika, Gulin Andrey .CatBoost: unbiased boosting with categorical features // Advances in neural information processingsystems. 2018. 6638–6648.
Smith J. Walter, Everhart James E., Dickson William C., Knowler William C, Johannes Richard S.
Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus // Proceedings ofthe Annual Symposium on Computer Application in Medical Care. 1988.
Szegedy Christian, Zaremba Wojciech, Sutskever Ilya, Bruna Joan, Erhan Dumitru, Goodfellow Ian J.,Fergus Rob . Intriguing properties of neural networks // CoRR. 2013. abs/1312.6199.
Tu Chun-Chen, Ting Pai-Shun, Chen Pin-Yu, Liu Sijia, Zhang Huan, Yi Jinfeng, Hsieh Cho-Jui, ChengShin-Ming . AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for AttackingBlack-box Neural Networks // AAAI. 2018.
Wang De, Irani Danesh, Pu Calton . Evolutionary study of web spam: Webb Spam Corpus 2011versus Webb Spam Corpus 2006 // 8th International Conference on Collaborative Computing:Networking, Applications and Worksharing (CollaborateCom). 2012. 40–49.
Wang Yihan, Zhang Huan, Chen Hongge, Boning Duane, Hsieh Cho-Jui . On Lp-norm Robustness ofEnsemble Decision Stumps and Trees // ICML. 2020.
Xiao Han, Rasul Kashif, Vollgraf Roland . Fashion-MNIST: a Novel Image Dataset for BenchmarkingMachine Learning Algorithms. 2017. 11 ang Yao-Yuan, Rashtchian Cyrus, Wang Yizhen, Chaudhuri Kamalika . Robustness for Non-Parametric Classification: A Generic Attack and Defense. 2019.
Zhang Fuyong, Wang Yi, Liu Shigang, Wang Hua . Decision-based evasion attacks on tree ensembleclassifiers // World Wide Web. 04 2020.
Zhang Huan, Si Si, Hsieh Cho-Jui . GPU-acceleration for Large-scale Tree Boosting // arXiv preprintarXiv:1706.08359. 2017. 12
Dataset and Model Statistics
We use 9 datasets and pre-trained models provided in Chen et al. (2019b), which can be downloadedfrom https://github.com/chenhongge/RobustTrees . Table 6 summarized the statistics ofthe datasets as well as the standard (natural) GBDT models, and we report the average complexitystatistics for | Neighbor
Bound ( · ) | from 500 test examples. For multi-class datasets we count treeseither belong to the victim class or the class of the initial adversarial example. Datasets may containduplicate feature split thresholds and the extra complexity is covered in the statistics. We disabled therandom noise optimization discussed in §3.5 to provide a cleaner picture of Algorithm 1. We trainstandard (natural) RF models using XGBoost’s native RF APIs , and provide the statistics in Table 7.Table 6: The average complexity statistics for | Neighbor
Bound ( · ) | from 500 test examples. Dataset features classes trees depth l iterations | T Bound ( · ) | | Neighbor ( t )1 ( · ) | | Neighbor
Bound ( · ) | breast-cancer 10 2 4 6 2.1 3.2 5.2 9.2diabetes 8 2 20 5 6.3 6.1 3.4 10.6MNIST2-6 784 2 1,000 4 121.7 374.2 14.9 256.5ijcnn 22 2 60 8 26.5 7.4 3.3 17.8MNIST 784 10 400 8 159.4 124.7 5.0 367.9F-MNIST 784 10 400 8 236.8 149.1 6.5 717.4webspam 254 2 100 8 100.7 37.0 3.8 129.7covtype 54 7 160 8 36.7 30.8 10.6 39.2HIGGS 28 2 300 8 107.1 13.5 2.1 24.0 Table 7: Parameters and statistics for datasets and the standard (natural) RFs.
Dataset train size test size trees depth l subsampling test acc.breast-cancer 546 137 4 6 .8 .974diabetes 614 154 25 8 .8 .775MNIST2-6 11,876 1,990 1000 4 .8 .963ijcnn 49,990 91,701 100 8 .8 .919MNIST 60,000 10,000 400 8 .8 .907F-MNIST 60,000 10,000 400 8 .8 .823webspam 300,000 50,000 100 8 .8 .948covtype 400,000 180,000 160 8 .8 .745HIGGS 10,500,000 500,000 300 8 .8 .702 B Supplementary Experiments
B.1 Additional Experimental Results on Random Forests
Table 8: Average (cid:96) ∞ perturbation over 100 test examples on the standard (natural) random forests(RF) models . Datasets are ordered by training data size. Bold and blue highlight the best and thesecond best entries respectively (not including MILP).
Standard RF Cube LT-Attack (Ours) MILP Ours vs. MILP (cid:96) ∞ Perturbation ¯ r time ¯ r our time r ∗ time ¯ r our /r ∗ Speedupbreast-cancer .797 .208s .340 .001s .332 .008s 1.02 8Xdiabetes .159 .271s .111 .002s .103 .054s 1.08 27XMNIST2-6 .135 1.85s .130 .041s .121 .335s 1.07 8.2Xijcnn .032 .340s .026 .003s .026 .338s 1.00 112.7XMNIST .017 1.98s .010 .056s .009 21.4s 1.11 382.1XF-MNIST .036 .084s .032 34.1s 1.13 406Xwebspam .004 .652s .002 .023s .002 2.63s 1.00 114.3Xcovtype .050 .684s .048 .037s .048 72.2s 1.00 1951.4XHIGGS .008 .389s .007 .011s .006 20.9s 1.17 1900X
B.2 Attack Success Rate
We present attack success rate in Fig. 3, which is calculated as the ratio of constructed adversarialexamples that have smaller perturbation than the thresholds. We use 50 test examples for MILP dueto long running time, and 500 test examples for other methods. https://xgboost.readthedocs.io/en/release_1.0.0/tutorials/rf.html .
02 0 .
04 0 .
06 0 .
08 0 . . . . . (cid:96) ∞ threshold s u cce ss r a t e Standard GBDT (covtype) .
02 0 .
04 0 .
06 0 .
08 0 . . . . . (cid:96) ∞ thresholdRobust GBDT (covtype) .
05 0 . .
15 0 . . . . . (cid:96) thresholdStandard GBDT (covtype) .
05 0 . .
15 0 . . . . . (cid:96) thresholdRobust GBDT (covtype) .
05 0 . .
15 0 . . . . . (cid:96) ∞ threshold s u cce ss r a t e Standard GBDT (F-MNIST) .
05 0 . .
15 0 . . . . . (cid:96) ∞ thresholdRobust GBDT (F-MNIST) . . . . . . . (cid:96) thresholdStandard GBDT (F-MNIST) . . . . . . . (cid:96) thresholdRobust GBDT (F-MNIST)MILP Ours RBA-Appr Cube HSJA SignOPT Figure 3: Attack success rate vs. perturbation thresholds.
B.3 (cid:96) ∞ Perturbation Using Different Number of Initial Examples
Fig. 4 presents the average (cid:96) ∞ perturbation of 50 test examples vs. runtime per test example in logscale. We plot the results for SignOPT, HSJA, Cube, and LT-Attack on { , , , , , , } initialexamples, using 2 threads per task. Initial examples are not applicable to RBA-Appr and MILP thuswe only plot a single point for each method. − − . . . time/s (cid:96) ∞ p e r t u r b a ti on Standard GBDT (covtype) − . . . time/sStandard GBDT (F-MNIST) − . . . . . time/sStandard GBDT (HIGGS) − . . time/sStandard GBDT (webspam) − . . . time/s (cid:96) ∞ p e r t u r b a ti on Robust GBDT (covtype) − . . . . time/sRobust GBDT (F-MNIST) − . . time/sRobust GBDT (HIGGS) − . . . time/sRobust GBDT (webspam)MILP Ours RBA-Appr Cube HSJA SignOPT Figure 4: Average (cid:96) ∞ perturbation of 50 test examples vs. runtime per test example in log scale.Methods on the bottom-left corner are better. B.4 Experimental Results for (cid:96) Norm and Verification
Table 9 presents the experimental results on (cid:96) norm perturbation. Cube doesn’t support (cid:96) objectiveby default and we report the (cid:96) perturbation of the constructed adversarial examples from (cid:96) ∞ objectiveattacks. For completeness we include verification results (Chen et al., 2019b; Wang et al., 2020) inTable 9 and Table 10, which output lower bounds of the minimal adversarial perturbation denoted as r (in contrast to adversarial attacks that aim to output an upper bound ¯ r ).14able 9: Average (cid:96) perturbation over 50 test examples on the standard (natural) GBDT models and robustly trained GBDT models . Datasets are ordered by training data size. Bold and bluehighlight the best and the second best entries respectively (not including MILP and Verification).
Standard GBDT SignOPT RBA-Appr Cube LT-Attack (Ours) MILP Verification Ours vs. MILP (cid:96) Perturbation ¯ r time ¯ r time ¯ r time ¯ r our time r ∗ time r time ¯ r our /r ∗ Speedupbreast-cancer .413 .686s .535 .0002s .372 .002s .372 .012s .367 .003s 1.00 6.Xdiabetes .183 1.04s .294 .0005s .290 .238s .131 .003s .126 .080s .095 .133s 1.04 26.7XMNIST2-6 35.4 6.81s 25.9 .109s .783 .230s .568 2.51s .057 1.17s 1.38 10.9Xijcnn .075 .889s .286 .016s .247 .340s .064 .008s .060 3.28s .044 1.68s 1.07 410.XMNIST 22.8 8.77s 46.5 3.13s 2.16 7.11s .341 .267s .207 56.8s .013 5.40s 1.65 212.7XF-MNIST 13.8 9.55s 42.7 4.57s 1.20 9.04s .260 .479s .181 70.3s .013 7.19s 1.44 146.8Xwebspam .287 3.43s .614 .630s .036 1.01s .006 .062s .004 4.17s .0003 6.40s 1.50 67.3Xcovtype .074 1.74s .223 2.29s .132 1.01s .057 .039s .052 314s .024 2.26s 1.10 8051.3XHIGGS .050 1.20s .611 44.7s .039 .854s .016 .049s .013 25min .002 7.36s 1.23 30183.7X
Robust GBDT SignOPT RBA-Appr Cube LT-Attack (Ours) MILP Verification Ours vs. MILP (cid:96) Perturbation ¯ r time ¯ r time ¯ r time ¯ r our time r ∗ time r time ¯ r our /r ∗ Speedupbreast-cancer .654 .869s .598 .0008s .574 .002s .574 .008s .506 .001s 1.00 4.Xdiabetes .201 .667s .228 .001s .514 .235s .189 .002s .189 .028s .166 .007s 1.00 14.XMNIST2-6 23.8 6.51s 17.8 .106s .523s 1.78 23.2s .381 2.91s 1.55 44.4Xijcnn .076 .693s .205 .015s .233 .337s .067 .007s .065 1.02s .043 .279s 1.03 145.7XMNIST 57.4 7.93s 32.7 3.70s 9.76 8.35s .720 244s .077 13.6s 1.67 511.5Xwebspam .186 3.65s .540 .522s .309 1.00s .119 .037s .073 67.7s .014 2.17s 1.63 1829.7Xcovtype .097 1.65s .217 2.43s .241 1.02s .080 .075s .071 437s .033 3.12s 1.13 5826.7XHIGGS .033 1.12s .226 43.6s .071 .839s .028 .052s .021 40min .006 5.55s 1.33 46576.9X
Table 10: Average (cid:96) ∞ and (cid:96) perturbation over 500 test examples on the standard (natural) GBDTmodels and robustly trained GBDT models. ("*"): Average of 50 examples due to long running time. Standard GBDT LT-Attack (Ours) MILP Verification (cid:96) ∞ Perturbation ¯ r our time r ∗ time r timebreast-cancer .235 .001s .222 .013s .220 .002sdiabetes .059 .002s .056 .084s .047 .910sMNIST2-6 .097 .222s .065 28.7s .053 1.27sijcnn .033 .007s .031 6.60s .027 6.25sMNIST .029 .237s .014 375s* .011 9.38sF-MNIST .028 .370s .013 15min* .012 6.96swebspam .001 .051s .0008 27.5s .0002 9.79scovtype .032 .038s .028 10min* .021 4.21sHIGGS .004 .036s .004 52min* .002 13.2s Robust GBDT LT-Attack (Ours) MILP Verification (cid:96) ∞ Perturbation ¯ r our time r ∗ time r timebreast-cancer .415 .001s .415 .008s .414 .001sdiabetes .122 .002s .121 .036s .119 .011sMNIST2-6 .331 .302s .317 98.7s .311 29.4sijcnn .038 .006s .036 3.60s .032 .799sMNIST .298 .315s .278 13min* .255 7.47sF-MNIST .098 .403s .078 29min* .075 13.3swebspam .016 .038s .014 51.2s .011 5.85scovtype .047 .053s .044 518s* .031 3.24sHIGGS .01 .054s .009 45min* .005 8.42s Standard GBDT LT-Attack (Ours) MILP Verification (cid:96) Perturbation ¯ r our time r ∗ time r timebreast-cancer .283 .001s .280 .011s .277 .002sdiabetes .077 .003s .073 .055s .058 .458sMNIST2-6 .245 .235s .183 2.52s .058 1.06sijcnn .044 .010s .043 2.18s .030 4.62sMNIST .072 .243s .043 32.5s .013 6.81sF-MNIST .073 .400s .049 49.3s .013 6.72swebspam .002 .053s .002 3.17s .0002 8.95scovtype .045 .039s .042 237s .023 2.96sHIGGS .008 .037s .007 13min* .002 10.3s Robust GBDT LT-Attack (Ours) MILP Verification (cid:96) Perturbation ¯ r our time r ∗ time r timebreast-cancer .452 .001s .452 .007s .450 .001sdiabetes .144 .002s .143 .024s .130 .009sMNIST2-6 .968 .401s .803 17.3s .358 4.42sijcnn .048 .007s .046 .728s .035 .575sMNIST .996 .395s .701 200s .273 10.1sF-MNIST .326 .468s .251 99.3s .079 13.5swebspam .039 .036s .033 12.0s .012 4.72scovtype .063 .054s .059 280s .033 3.10sHIGGS .016 .054s .015 15min* .006 7.16s C Proofs
C.1 Proof of Theorem 2
Proof.
By contradiction. Given adversarial tuple C (cid:48) and victim example x , assume ∃C ∈ Neighbor +1 ( C (cid:48) ) s.t. C / ∈ Neighbor
Bound ( C (cid:48) ) . Assume p ∈ { , , ∞} . Let x = argmin x ∈ B ( C ) (cid:107) x − x (cid:107) p , x (cid:48) = argmin x ∈ B ( C (cid:48) ) (cid:107) x − x (cid:107) p , andlet J be the set of dimensions that x is closer to x : J = { j | | x ,j − x ,j | < | x (cid:48) j − x ,j |} . According to the definition of
Neighbor +1 ( C (cid:48) ) we have dist p ( C , x ) < dist p ( C (cid:48) , x ) , and conse-quently J (cid:54) = ∅ . We choose any j (cid:48) ∈ J and for cleanness we use ( l (cid:48) , r (cid:48) ] to denote the interval from B ( C (cid:48) ) on j (cid:48) th dimension, and let d = x ,j (cid:48) , d = x ,j (cid:48) , d (cid:48) = x (cid:48) j (cid:48) .15bserve d / ∈ ( l (cid:48) , r (cid:48) ] , otherwise we have | d (cid:48) − d | = 0 and | d − d | cannot be smaller. W.l.o.g.assume d is on the right side of the interval, i.e., r (cid:48) < d , then according to the argmin property of x (cid:48) we have d (cid:48) = r (cid:48) .Recall that B ( C (cid:48) ) is the intersection of K bounding boxes from the ensemble, then ∃ t (cid:48) ∈ [ K ] s.t. B C (cid:48) ( t ) j (cid:48) .r = r (cid:48) . Observe r (cid:48) = d (cid:48) < d since d is on the right side of d (cid:48) and d has smaller distance to d , which means C has different leaf than C (cid:48) on tree t . Also according to the above equation t (cid:48) ∈ T Bound ( C (cid:48) ) , and C ∈ Neighbor ( t (cid:48) )1 ( C (cid:48) ) , thus C ∈ (cid:83) t ∈ T Bound ( C (cid:48) ) Neighbor ( t )1 ( C (cid:48) ) = Neighbor Bound ( C (cid:48) ) , contradiction. C.2 Proof of Corollary 2
Proof.
According to Theorem 1 B ( − t ) is the intersection of K − bounding boxes and can be writtenas the Cartesian product of d intervals, we call it a box . Each tree t of depth l splits the R d space into upto l non-overlapping axis-aligned boxes, and | Neighbor ( t )1 ( C (cid:48) ) | + 1 (plus current box) correspondsto the number of boxes that has non-empty intersection with B ( − t ) , thus | Neighbor ( t )1 ( C (cid:48) ) | ≤ l − .Observe that k ( t ) axis-aligned feature split thresholds can split R d into at most k ( t ) non-overlappingboxes, assuming d ≥ k ( t ) , and the maximum can be reached by having at most 1 split threshold oneach dimension. In conclusion there are at most min( k ( t ) ,l ) boxes that has non-empty intersectionwith B ( − t ) , thus | Neighbor ( t )1 ( C (cid:48) ) | ≤ min( k ( t ) ,l ) − (minus the current box). C.3 Proof of Theorem 3
Proof.
By contradiction. Let x , y be the victim example and assume ∃C ∗ ∈ ( V + ) K ∩ C s.t. f ( C ∗ ) (cid:54) = y ∧ dist p ( C ∗ , x ) < dist p ( C (cid:48) , x ) . Assume p ∈ { , , ∞} . Recall f ( C ) = sign ( (cid:80) i ∈C v i ) , we compute the tree-wise prediction differ-ence v ( t ) diff = v C ∗ ( t ) − v C (cid:48) ( t ) , t ∈ [ K ] . Let t min be the tree with the smallest functional margin difference t min = argmin t ∈ [ K ] y · v ( t ) diff . We construct a tuple C which is the same as C (cid:48) except on t min , where C ( t min )1 = C ∗ ( t min ) , consequentlywe have C ∈ C ∧ dist p ( C , x ) < dist p ( C (cid:48) , x ) . Now we show f ( C ) (cid:54) = y , or y (cid:80) i ∈C v i < :i. Case y · v ( t min ) diff ≤ . Then y (cid:88) i ∈C v i = y (cid:88) i ∈C (cid:48) v i + y · v ( t min ) diff ≤ y (cid:88) i ∈C (cid:48) v i < . ii. Case y · v ( t min ) diff > . Then y (cid:88) i ∈C v i = y (cid:88) i ∈C (cid:48) v i + y · v ( t min ) diff < y (cid:88) i ∈C (cid:48) v i + Ky · v ( t min ) diff ≤ y (cid:88) i ∈C (cid:48) v i + y (cid:88) t ∈ [ K ] v ( t ) diff = y (cid:88) i ∈C ∗ v i < . In conclusion C is a valid adversarial tuple within Neighbor ( C (cid:48) ) and has smaller perturbation than C (cid:48) , thus the algorithm won’t stop. 16 Supplementary Algorithms
D.1 Generating Initial Adversarial Examples for LT-AttackAlgorithm 2:
Generating Initial Adversarial Examples for LT-Attack
Data:
Target white-box model f , victim example x . begin y ← f ( x ) ; r (cid:48) , C (cid:48) ← MAX , None ; num_attack ← ; for i ← , . . . , num_attack do do x (cid:48) ← x + Normal (0 , d ; while f ( x (cid:48) ) = y ; x (cid:48) ← BinarySearch ( x (cid:48) , x , f ) ; (cid:46) Do a fine-grained binary search between x and x (cid:48) to optimize the initial perturbation of x (cid:48) .Similar to g ( θ ) proposed by Cheng et al. (2019). r ∗ , C ∗ ← LT-Attack ( f, x , x (cid:48) ) ; if r ∗ < r (cid:48) then r (cid:48) , C (cid:48) ← r ∗ , C ∗ ; end end return r (cid:48) , C (cid:48) end D.2 Algorithm for NaiveLeafAlgorithm 3:
Compute NaiveLeaf
Data:
Target white-box model f , current adversarial example x (cid:48) , victim example x . Result:
The NaiveLeaf neighborhood of C ( x (cid:48) ) begin ( i (1) , . . . , i ( K ) ) ← C ( x (cid:48) ) ; N ← ∅ ; for t ← . . . K do for i ∈ S ( t ) , i (cid:54) = i ( t ) do (cid:46) S ( t ) denotes the leaves of tree t . x new ← x (cid:48) ; for j, ( l, r ] ∈ B i do (cid:46) The j th dimension of the bounding box B i , can be acquired from f . if x new ,j / ∈ ( l, r ] then x new ,j ← min( r, max( l + (cid:15), x ,j )) ; end end N ← N ∪ {C ( x new ) } ; end end return N end D.3 A Greedy Algorithm to Estimate the Minimum Neighborhood Distance
To understand the quality of constructed adversarial examples we use an empirical greedy algorithm toestimate the minimum neighborhood distance h such that Neighbor h ( · ) can reach the exact solution.Assume our method converged at C (cid:48) and the optimum solution is C ∗ , let T diff = { t | C (cid:48) ( t ) (cid:54) = C ∗ ( t ) } be the set of trees with different prediction, then the Hamming distance ¯ h = | T diff | is a trivial upperbound where Neighbor ¯ h ( C (cid:48) ) can reach C ∗ with a single addition iteration. To estimate a realistic h ∼ we want to find the disjoint split T diff = ∪ i ∈ [ k ] T i such that we can mutate C (cid:48) into C ∗ by changingtrees in T i to match C ∗ at i th iteration. We make sure all intermediate tuples are valid and has strictly17ecreasing perturbation as required by Eq. (3), and report h ∼ = max i ∈ [ k ] | T i | . h ∼ is an estimationof the minimum h because we cannot guarantee the argmin constrain due to the large complexity.As shown in Table 11 we have median (¯ h ) = 23 and median ( h ∼ ) = 8 on ensemble with 300 trees(HIGGS), which suggests that our method is likely to reach the exact optimum on half of the testexamples through ∼ Neighbor ( · ) . In this experiment we disabled therandom noise optimization discussed in §3.5 to provide a cleaner picture of Algorithm 1. Algorithm 4:
A greedy algorithm to estimate the minimum neighborhood distance h ∼ . Data:
The model f , our adversarial point x our , exact MILP solution x ∗ . Result:
An estimation of neighborhood distance h ∼ . begin C our , C ∗ ← C ( x our ) , C ( x ∗ ) ; r ∗ ← dist p ( C ∗ , x ) ; y ∗ ← f ( x ∗ ) ; h min ← D ( C our , C ∗ ) ; (cid:46) Hamming distance is the upper bound. I diff ← { ( v C ∗ ( t ) − v C ( t ) our , C ( t ) our , C ∗ ( t ) ) | C ( t ) our (cid:54) = C ∗ ( t ) , t ∈ [ K ] }; (cid:46) I diff is the list of tuples in the form of (label diff, our leaf, MILP leaf). num_trial ← ; for i ← , . . . , num_trial do h ← ; I ← shuffle ( I diff ) ; C tmp ← C our ; while I (cid:54) = ∅ do r last ← dist p ( C tmp , x ) ; C tmp ← pop the first tuple from I with positive label diff and replace with MILP leaf ; h ← h + 1 ; while C tmp / ∈ C or dist p ( C tmp , x ) p / ∈ [ r ∗ , r last ) or f ( C tmp ) (cid:54) = y ∗ do (cid:46) Making sure C tmp satisfies Equation 3 except the argmin . We cannot guarantee argmin due to the high complexity. The while loop is guaranteed to exit since wecan pop all tuples in I to become C ∗ . C tmp ← pop the first tuple from I and replace with MILP leaf ; h ← h + 1 ; end end h min ← min( h min , h ) ; end return h min end Table 11: Convergence statistics for the standard (natural) GBDT models between our solution andthe optimum MILP solution. We collect the data after the fine-grained binary search but beforeapplying LT-Attack (Initial), and the data after LT-Attack (Converged). We disabled the random noiseoptimization discussed in §3.5.