Abstract

More precise estimation of the greedy algorithm complexity for a special case of the set cover problem is given in this paper.

Full PDF

1 Greedy Set Cover Estimations

H. Aslanyan

Yerevan State University 1 A.Manoukyan, 375049, Yerevan, Armenia

Abstract . More precise estimation of the greedy algorithm complexity for a special case of the set cover problem is given in this paper.

Introduction

The greedy heuristic is the most used for optimization problems. The general approach is as follows: repeatedly, a procedure that minimizes (maximizes) the local increase of the objective function is applied. In some cases the greedy strategy guarantees the optimal solution (minimal spanning trees, the shortest path, etc.), in some others - it provides acceptable approximations (e.g. disjunctive forms and tests). Typically, the greedy algorithms use simple structures that require minimal computational resource – time and memory. Consider the set cover problem. Given a finite set },,{ n aaA L = and a family of subsets of A , },,{ m AAF L = , such that every element of A belongs to at least one subset from F , - F covers A . The problem is in finding a collection FC ⊆ of minimal size, that covers A . The set cover problem is one of the most typical NP-complete problems. It has proven that there is no constant factor approximation to this problem (unless P=NP) [2]. The problem can be represented in terms of (0,1)-matrices. The elements of A correspond to the columns, and each subset from F corresponds to a row of the matrix. The problem is in finding the minimal number of rows that cover at least one “1” in each column. There are known reasonable approximation greedy algorithms for this problem. Consider the following set cover greedy algorithm: – at the first step the algorithm selects the row that contains the maximal number of “1”’s (covers maximal number of elements of A ). At the current step the row, which covers the greatest number of uncovered yet elements, has been selected. It is clear that continuing this process (at most till the last row selection) all elements will be covered. Probably it may occur before. We consider the scheme [1] where a part of elements of A has been covered by the greedy steps, and then, each uncovered element is being covered by taking some new row having “1” in that column. The problem is in estimating the number of all selected rows (the size of cover). We consider the estimation given for a special case in [1], and give a more precise formula for this case. Formula Improvement

Given a (0,1)-matrix of size nm × ( m is the number of rows). Each column contains at least m γ “1”’s (special case). The problem is to find minimal number of rows that contain at least one “1” on each column. Then the number of “1”’s of the whole matrix is not less than mn γ , and there is a row with at least n γ “1”s. At the first step the greedy algorithm selects the row with maximal number of “1”’s, therefore the selected row will cover at least n γ elements. Let we have done k similar steps, and let the number of uncovered yet elements does not exceed n k δ . The estimate )1( γ−δ≤δ + kk nn is the main related result, obtained in [1], Part 3, Chapter 3.5, p. 136-137.The table below outlines the part of rows, selected during the first k greedy steps. The shaded columns do not contain “1”’s and hence the corresponding elements are not covered yet. a a … … )1( k n a δ− +δ− k n a … … … … n a A A … k A + k A m A The summary number of “1”’s in rows, from + k A to m A and part of uncovered elements (columns) is at least nm k δγ . The formula improvement takes into account, that this “1”s can’t be situated in the shaded part of the table. Hence, among the subsets + k A to m A there exists one with at least km nm k −δγ “1”s. Therefore, .1 )1(1)1()1)(()1()1( mkm knkm kmmnkm mnkmmnnn kkkkkk − γ−−γ−δ=γ−− −γ−γ−δ=−γ−δ=−δγ−δ≤δ + Using recurrently the above relation we get: ∏∏ ==+  −  γ−−γ−≤δ kikikk mim i , which is the proposed improvement of the set cover complexity estimate in the above given special case of the problem. The additional coefficient is easy to maintain using the following formulae: ∏ −= −−  −≤ −≤ −

11 121 xi xx yxyiyx for yx ≤ positive integers ( γ is supposed justified to obey this). As a result it follows that in asymptotes, when ∞→ n and ∞→ m , the additional term is close to 1 when →γ , but it is << otherwise, providing the sensitive improvement of the set cover estimate. Bibliography: Äèñêðåòíàÿ ìàòåìàòèêà è ìàòåìàòè÷åñêèå âîïðîñû êèáåðíåòèêè, ò.1, Ì., Íàóêà,2.