A Sketching Method for Finding the Closest Point on a Convex Hull
aa r X i v : . [ m a t h . DG ] F e b A Sketching Method for Finding the Closest Point on a Convex Hull
A Sketching Method for Finding the Closest Point on aConvex Hull
Roozbeh Yousefzadeh [email protected]
Yale University and VA Connecticut Healthcare System
Abstract
We develop a sketching algorithm to find the point on the convex hull of a dataset, closestto a query point outside it. Studying the convex hull of datasets can provide useful in-formation about their geometric structure and their distribution. Many machine learningdatasets have large number of samples with large number of features, but exact algorithmsin computational geometry are usually not designed for such setting. Alternatively, theproblem can be formulated as a linear least-squares problem with linear constraints. How-ever, solving the problem using standard optimization algorithms can be very expensivefor large datasets. Our algorithm uses a sketching procedure to exploit the structure of thedata and unburden the optimization process from irrelevant points. This involves breakingthe data into pieces and gradually putting the pieces back together, while improving theoptimal solution using a gradient project method that can rapidly change its active set ofconstraints. Our method eventually leads to the optimal solution of our convex problemfaster than off-the-shelf algorithms.
Keywords:
Convex hulls, Numerical optimization
1. Introduction
Studying the convex hull of datasets can provide useful information about their geometricstructure and their distribution. Such information may then help with downstream taskssuch as out-of-distribution detection. Studying the relationship of testing sets with respectto the convex hull of training sets may provide insights about the generalization of models,too (Yousefzadeh, 2020).Standard convex hull algorithms in computational geometry are highly tailored andefficient, but most of them are not practical in high-dimensional domains. There are ap-proximation algorithms as well with useful guarantees (Blum et al., 2019), but sometimeswe may need to seek exact solutions.Here, we provide a fast algorithm to understand the geometric relation of a query point, q , in high-dimensional space with respect to the convex hull of a dataset, H , possibly withlarge number of datapoints. Specifically, we find the point on the body of the convex hull,closest to the query point outside it. The vector connecting the query point to the convexhull reveals useful information about the query point and also about the contents of thedataset. The size of the vector in comparison with the size of H tells us how far q is fromthe distribution of H . Beyond the size, that vector reveals the direction that can bring the q to the convex hull. This also tells us something about the dataset. The closest point on theconvex hull is a convex combination of certain number of points in the dataset. Investigatingthose points and their geometric relationship w.r.t. q can reveal further information aboutthe data. ousefzadeh
2. Formulation
Let’s consider that dataset D is formed as a matrix, with n rows corresponding to thesamples, and d columns corresponding to the features, i.e., dimensions. H is the convexhull of all the samples in D . Our query point, q , sits outside the H , and we seek to find x H ,the point in H that is closest to q .To ensure that x belongs to H , we can define x H = α D , (1)where α is a row vector of size n . If all elements of α are bounded between 0 and 1, andtheir summation also equals 1, then by definition, x belongs to H . Given equation (1), wecan change our optimization variable to α .Our objective function is: min α f ( α ) = k q − α Dk , (2)while our constraints ensure that x belongs to H . α n, = 1 , (3)0 ≤ α ≤ . (4)This is a constrained least squares problem which can be solved using standard algo-rithms in numerical optimization literature (Nocedal and Wright, 2006). For any querypoint, we first compute the optimal α using equations (2)-(4). We then compute the corre-sponding x H using the optimal α and equation (1).However, we note that D can be quite large and most likely, a large portion of pointsin H would not have any effect on the optimal solution. This is easy to envision in 2D.Consider, for example, a large set of points forming a square convex hull, and a single point q , outside it. Now assume that one edge of the square is closer to q , compared to the otherthree edges. Then, for our optimization problem, we can only use the points on that closeredge. Whether we include the other points or not, the solution to our optimization problemwill remain the same. Hence, we can exploit the geometric structure of D with respect to q , and solve our optimization problem faster.
3. Our Algorithm
Here, we develop an algorithm that is suitable for datasets with large number of samples inhigh-dimensional space.
We expect the optimal solution to be highly sparse. For example for a dataset with n = 60 , Sketching Method for Finding the Closest Point on a Convex Hull our optimization problem for a subset of dataset denoted by D ′ , and then gradually enlargethat subset until it includes the entire D .Let’s denote the convex hull of D by H . Let’s also assume that the starting point forour optimization problem is some point x inside the H . This means that x satisfies theconstraints (3)-(4). As the optimization algorithm makes progress, the solution graduallymoves towards the q until it reaches a point where it cannot move any closer to q withoutexiting the H . That point, denoted by x H , is the optimal solution we seek to find.To develop our algorithm, let’s consider a subset of n ′ samples from D , and a specific q . The optimal solution of (2), subject to (3)-(4), for the subset, would be α ′ , leading toclosest point x ′ . If some element i of α ′ is zero, it means that the sample i has no effect onthe optimal solution α ′ , and by extension, it has no effect on the x ′ . Hence, if we discardsample i from the beginning, x ′ will not change. However, if we add an additional point tothe subset, so that the size of the set increases to n ′ + 1, sample i might become part of theoptimal solution. Hence, the subspace that contains x ′ may include dimensions that do notbelong to the subspace of x H . Moreover, there may be dimensions included in the subspaceof x H that are not included in x ′ .Therefore, as we increase the size of D ′ , we have to consider all the included samplesbecause any of them may become part of the optimal solution. At each iteration, thesolution is trying to get closer to q by moving inside a convex body. Constraint (3) isequality and will be satisfied as long as the solution stays within the H . We expect thelower bound of constraint (4) to be binding for many of the samples, but we do not expectit to be upper bounded unless the optimal solution coincides with a point in the dataset, inwhich case, constraint (4) will become binding for all the samples (upper bounded for thatpoint and lower bounded for all other points in D ). To move from x to the x H , we can use the Gradient Projection Method described byNocedal and Wright (2006, Section 16.7) which we briefly review in the following.At each iteration of the Gradient Projection Method, we first compute the Cauchy point(the feasible minimizer along the direction of derivatives), and then perform a subspaceminimization for the samples that are not binding in the lower bound of constraint (4).For computing the Cauchy point, we use the same procedure described by Nocedal and Wright(2006, Section 16.7). To make sure that constraint (3) is satisfied and the Cauchy pointdoes not exit the convex hull region, we normalize the gradient direction so that the sumof its values equals zero.The subspace minimization that follows does not need to lead to an exact solution of thesub-problem, as it could make the process unnecessarily expensive. As long as it improvesthe solution at hand, we can move to the next iteration and compute a new Cauchy point. We note that our constraint (3) is an equality constraint. The dual form of our problemonly has non-negativity constraint on the Lagrange multipliers. The gradient projectionmethod can be applied to the dual form of our problem, and in certain situations, solvingthe dual form may be easier. ousefzadeh In practice, for our datasets of interest, we observe that Lagrange multipliers are mostlynon-sparse and that makes the dual form more expensive to solve. Here, we present the dualform of the problem because it may be useful in certain settings when the primal solutionis non-sparse.The Lagrangian for our optimization problem is L ( α, λ ) = 12 k q − α Dk − ( α n, − λ − αλ − (1 − α ) λ , (5)where λ ’s are Lagrange multipliers: λ is a scalar while λ and λ are column vectors with n elements each.The Lagrange dual objective is g ( λ ) = inf α L ( α, λ ) . (6)Because L ( ., λ ) is a strictly convex quadratic function, the infimum is achieved when ∇ α L ( α, λ ) = 0, i.e., −D q T + DD T α T − λ n, − λ + λ = 0 . (7)If we compute the Singular Value Decomposition of D = U Σ V T and plug it into (7), weobtain − U Σ V T q T + U Σ U T α T − λ n, − λ + λ = 0 , which leads to α d = qV Σ − U T + ( λ ,n + λ T − λ T ) U Σ − U T . (8)Then, the dual form of our problem ismax λ ∈ R g ( λ ) = L ( α d , λ ) subject to: λ , λ , λ ≥ . (9)At each iteration of the Gradient Projection Method, we first compute ∇ λ g ( λ ), thenfind the Cauchy point in the direction of gradient, and perform an inexact minimization inthe subspace of Lagrange multipliers that are not in the active set. This process repeatsuntil the KKT conditions are satisfied, at which point, the corresponding α d will be ouroptimal solution.For subspace minimization, one can use the alternating direction method of multipliers(ADMM) on each of the λ , λ , and λ . Now, we have all the pieces to formalize our sketching method in Algorithm 1.Our sketching method divides the dataset, D , into η pieces: D , D , . . . , D η . It initiatesΦ by adding only the first piece of data, D . It then solves equations (2)-(4) for D usingthe gradient projection method described before. When it finds the optimal solution, α ∗ ,it proceeds with appending D to the Φ, and solves the problem again using the optimal Sketching Method for Finding the Closest Point on a Convex Hull solution from previous step. We expect the solution from previous step to be sparse implyingthat its active set is relatively large. We also expect the new Cauchy point to not be muchdifferent in the subspace of D and we expect many of the active sets to remain active inthat subspace.This is the key benefit of sketching, because computing the Cauchy point is not expensive,and we have already excluded a considerable portion of D from the expensive part ofcomputations. And this benefit repeats at the next sketching step because a considerableportion of D will be included in the active set and thereby excluded from the followingsubspace minimization.In other words, the subset of points that we pick at the beginning for solving the problemprovide a relatively good sketch of the entire convex hull w.r.t the q . And we graduallyimprove our sketch until we obtain the full picture of the H . The sketching algorithm endswhen we have included all pieces of D in Φ and we obtain the optimal solution for D .Algorithm 1 formalizes this entire process. Algorithm 1
Finding the point on the convex hull of a dataset, closest to a query pointoutside it
Inputs : Dataset D , query point q , number of partitions η Outputs : x ∗ : the point on the convex hull of D , closest to q Sort the rows in D based on their closeness to q Partition D into η subsets, call each subset D i ( D contains the points closest to q ) Initialize an empty matrix Φ Initialize the optimal solution α ∗ as an empty vector for i = 1 to η do Append D i to Φ Append zeros to vector α ∗ for each sample in D i while KKT conditions are not satisfied for objective function (2), subject to con-straints (3)-(4) on Φ do Compute ∇ α f ( α ) and normalize it Compute the Cauchy point in the direction of ∇ α f ( α ) Approximately solve equation (2), subject to (3)-(4) in the subspace of inactiveconstraints, such that constraints remain satisfied end while end for x ∗ = α ∗ Φ return x ∗ Acknowledgments
R.Y. thanks Daniel Robinson for his nonlinear optimization course. R.Y. was supportedby a fellowship from the Department of Veterans Affairs. The views expressed in thismanuscript are those of the author and do not necessarily reflect the position or policy ofthe Department of Veterans Affairs or the United States government. ousefzadeh References
Avrim Blum, Sariel Har-Peled, and Benjamin Raichel. Sparse approximation via generatingpoint sets.
ACM Transactions on Algorithms , 15(3):1–16, 2019.Jorge Nocedal and Stephen Wright.
Numerical Optimization . Springer, New York, 2ndedition, 2006.Roozbeh Yousefzadeh. Deep learning generalization and the convex hull of training sets. arXiv preprint arXiv:2101.09849 , 2020., 2020.