Set Covering Problems
I. History
Set covering is a widely studied optimization problem. It is one of the problems whose approximation algorithms were firstly analyzed. A common approach of handling NP-hard problems is to apply approximation algorithms that run in polynomial time and deliver solutions that are close to be optimal. In this case the quality of the approximation has to be shown. This happened with the problem of set cover, too.
Set covering problem was one of the first problems shown to be NP-complete-in Karp's seminal paper. Soon after this, Johnson (1974) and Lovász (1975) were the first to demonstrate that the greedy algorithm is an H(d)-approximation algorithm for the set cover problem, where H(d) = , and d is the size of the largest subset. It means that the solution of the greedy algorithm (the number of the selected subsets) is at most H(d) times larger than the optimum (the number of subsets selected by the optimal solution). Chvatal (1979) extended this result for the weighted set cover, where certain costs can be assigned to the selection of each subset. As H(d) ≤ 1 + ln d, so it is bounded by 1 + ln n, where n is the size of the set to cover.
Lund and Yannakakis (1994) showed the hardness of approximation within a ratio of ≈ 0.72 ln n, and Feige (1998) proved that (1−o(1)) ln n is the threshold below which set cover cannot be approximated efficiently. For the max k-cover problem the ratio of the greedy algorithm is 1 − (1 − )k ≈ 0.632, which means that the number of elements covered by the greedy algorithm divided by the number of elements covered of the optimal solution is at least 0.632 Hochbaum (1997). Feige (1998) showed that for any> 0, max k-cover cannot be approximated in polynomial time within a ratio of 1−+, unless P = NP. These results show the effectiveness of the greedy algorithm for set cover and max k-cover.
II. Problem Introduction
2.1 Problem Definition:
Consider an m-row, n-column, zero-one matrix (aij ), that is, all the elements of the matrix are either zero or one. If aij= 1 we say that column j covers row i, else (aij= 0) column j does not cover row i. Let every column j of this matrix have an associated cost cj (> 0; j=1,…,n). The set covering problem (SCP) is the problem of choosing a subset of the columns so as to cover all the rows of the matrix at minimum total cost. In terms of mathematics we have that:
Defining:
The SCP is:
Equation (2) ensures that each row is covered by at least one column chosen to be in the solution and (3) is the integrality constraint (a column is chosen or not).
2.2 Different formulations of the set covering problem:
2.2.1 Probabilistic search
A probabilistic analysis of the minimum cardinality set covering problem (SCP) is developed, considering a stochastic model of the (SCP), with n variables and m constraints, in which the entries of the corresponding (m,n) incidence matrix are independent Bernoulli distributed random variables, each with constant probability p of success.
The behaviors of the optimal solution of the (SCP) is then investigated as both m and n grow asymptotically large, assuming either an incremental model for the evolution of the matrix (for each size, the matrix A is obtained bordering a matrix of smaller size by new columns and rows) or an independent one (for each size, an entirely new set of entries for A are considered).
Identified:
m: represent a lower and an upper bound on n in order the (SCP) to be a.e. feasible and not trivial.
n: lying within these bounds, an asymptotic formula for the optimum value of the (SCP) is derived and shown to hold.
2.2.2 Greedy algorithm
The greedy heuristic starts with all variables given a value of zero. At each iteration, the (previously unselected) variable which appears with coefficient on in the largest number of unsatisfied constraints is selected, and given a value of one. This process terminates when all constraints are satisfied.
A formal statement of the heuristic now follows:
Step 0 (Input and initialization):
Input A = (aij)
Step 1 (Select a variable and update constraints):
Step 2 (Test for termination):
If, go to Step 1. Otherwise, output H, the index set of variables with value one, and Zg, the cost of the heuristic solution. Terminate.
Besides the theoretical performance guarantee already described, the greedy heuristic has the property of producing consistently good solutions to large randomly generated versions of the problem.
2.2.3 Lagrangian heuristic
A more effective family of heuristics is based on the computation of the Lagrangian relaxation of SC. The use of the Lagrangian relaxation for solving the set covering problem was originally proposed by Balas and Ho who devised an exact branch-and-bound algorithm. They also proposed to exploit the reduced cost information obtained by the subgradient algorithm to produce a feasible solution of SC.
Subsequently, Fisher and Kedia proposed an efficient continuous dual heuristic in the more general case of mixed set-covering/set-partitioning constraints. Finally, Beasley proposed to perform reduced cost fixing and computation of a feasible solution at each iteration of the subgradient algorithm.
2.2.4 Genetic Algorithm
Genetic Algorithms (GAS) are search and optimization algorithms based on the mechanics of natural selection and natural genetics. In order to apply GAS to a particular problem, first need to select an internal string representation for the solution space. Set covering problems seem to have a highly desirable string representation, namely, binary strings of length N in which the j-th bit represents whether the j-th set Pj is in the cover or not.
III. Applications
3.1 Problem Description:
The set covering problem has many diverse applications to problems arising in crew scheduling, facility location and other business areas.
Ø Crew scheduling
The transportation demand is represented by a timetable which fixes the trips that mass transit companies have to offer by using the available crews and vehicles. Therefore, mass transit companies must decide both the sequence of trips that each vehicle must carry out (Vehicle Scheduling Problem or VSP) and the crews work shifts (Crew Scheduling Problem or CSP).
The locations of the relief points (i.e. the places where a crew can relieve another crew) and the presence of a single central depot where crews and vehicles start and end their activities are assumed.
The optimal sub-set of feasible round trips that "covers" each trip, at least once, was selected by minimizing the total number of work periods, assuming that the cost of each round trip is due to the number of work periods required. This problem may be formulated as a set-covering problem.
Ø Location Set Covering Problems
The Location Set Covering Problem was originally stated by Toregas, et al.(1970), who utilized relaxed linear programming (LP) to solve the problem.
This problem seeks to locate the least number of facilities such that each demand nodes has at least one facility sited at anode within a specified maximum distance or time.
The Location Set Covering Problem (LSCP) is stated as a linear zero-one program as:
The objective (1) minimizes the number of facilities needed to cover each and every demand node by at least one facility. The constraints (2) insure that all demand nodes i are “covered,” i.e., each has at least one facility with S distance or time units of it. The variables are required to assume values of zero or one.
Ø Traffic checks
The link selection to traffic checks can be easily bisected into two separate parts. The task of the first part is to calculate the traffic volume of the links, in other words to do the traffic assignment, and the second part has to select the required links.
The result of the traffic assignment can be considered as a collection of weighted paths in a network. The weights are given with the elements of O–D matrices. In this case the link selection should be explicitly stated:
1. Input: A collection of paths in a graph with certain weights;
2. Output: A subset of links (L) with minimal size such that each path contains at least one edge from L, or the sum of weights of paths that contains at least one edge from L should be maximized.
3.2 Literature application:
Application 1
Source:
Ula, N., Nouh, A., 1980. A gradient search algorithm for set covering problems. Decision and control including the symposium on adaptive processes, 19, 47-48
Description:
A ten variable six constraint set covering problem is solved on a digital computer having the following parameters:
The values of x1,...,x10 at some iterative steps are listed below and Figure 2 shows the locus of x1, ..., x10 at various k.
Application 2
Source:
Hall, N.G., Vohra, R.V.,1989. Absolute bounds on optimal cost for a class of set covering problems. Mathematical methods of operations research, 33(3), 181-192.
Description:
The constraint matrix is shown in tableau form, and the parameters of the problem are m=6, n=6, k=3, d=4.
Application 3
Source:
Christofides, N., Paixao, J. 1993. Algorithms for large scale set covering problems. Annals of operations research, 45(3), 259-277.
Description:
A method for finding lower bounds to the SCP combining decomposition and state space relaxation is now described. Large size problems are decomposed into many small sub-problems to which state space relaxation is applied. Reduced costs are obtained from the combination of state space relaxation and decomposition, making possible further reductions in the number of variables. Subgradient optimization is used to update the costs in a Lagrangian fashion.
Application 4
Source:
Pezzella, F., Faggioli, E., 1997. Solving large set covering problems for crew scheduling. TOP, 5(1), 41-59.
Description:
Crew scheduling of Italian Railways Company
In these problems, each column stands for a train driver work-shift (feasible round trip) which is composed, according to trade union agreements, by at most 12 trips. The cost of each round trip is i if it requires one work-day to be executed; otherwise it is 2, because the corresponding train driver work shift consists of two days operated by a rest in a location different from driver's residence. Given the trips and all the feasible round trips of a railway district, the problem is to find the subset of feasible rounds that covers each trip in a timetable at least once, with minimum number of driver's work periods.
Application 5
Source:
Sanchez-Garcia, M., Sobron, M.I., Vitoriano, B., 1998. On the set covering polytope: Facets with coefficients in {0, 1, 2, 3}, Annals of operations research, 81, 343-356.
Description:
Balas and Ng characterized the class of valid inequalities for the set covering polytope with coefficients equal to 0, 1 or 2, and gave necessary and sufficient conditions for such an inequality to be facet defining. We extend this study, characterizing the class of valid inequalities with coefficients equal to 0, 1, 2 or 3, and giving necessary and sufficient conditions for such an inequality to be not dominated, and to be facet defining.
IV. Reference
[1] Beasley, J. E., 1990. OR-library: distributing test problems by electronic mail. The journal of the operational research society, 41(11), 1069-1072.
[2] Beasley, J.E., 1997. The set covering problem. Publishing ltd and oxford university press.
[3] Beasley, J.E., Chu, P.C., 1996. A genetic algorithm for the set covering problem. European journal of operational research, 94(2), 392-404.
[4] Beasley, J.E., Jornsten, K., 1992. Enhancing an algorithm for set covering problems. European journal of operational research, 58(2), 293-300.
[5] Caprara, A., Fischetti, M., Toth, P., 1999. A Heuristic Method for the Set Covering Problem. Operations research, 47(5), 730-743.
[6] Caprara, A., Fischetti, M., Toth, P., 2000. Algorithms for the set covering problem. Annals of operations research, 98, 353-371.
[7] Etcheberry, J., 1977. The set-covering problem: a new implicit enumeration algorithm. Operations research, 25(5), 760-772.
[8] Chang, E., 1993. Neural computing for minimum set covering and gate-packing problems.
[9] Feo, T.A., Resende, M.G.C., 1989. A probabilistic heuristic for a computationally difficult set covering problem. Operations research letters, 8(2), 67-71.
[10] Hall, N.G., Vohra, R.V.,1989. Absolute bounds on optimal cost for a class of set covering problems. Mathematical methods of operations research, 33(3), 181-192.
[11] Haddadi, S., 1997. Simple Lagrangian heuristic for the set covering problem. European journal of operational research, 97(1), 200-204.
[12] Huang, W.C., Kao, C.Y., Horng, J.T., 1994. A genetic algorithm approach for set covering problems. Evolutionary computation, 2, 569-574.
[13] Pezzella, F., Faggioli, E., 1997. Solving large set covering problems for crew scheduling. Top, 5, 41-59.