Implementing the Flow-Covering Location-Allocation Modelwith Geographic Information Systems

Daniel Turner

Master’s Project

Submitted in partial satisfaction of the requirements for the degree of

Master in Geographic Information Sciences

Introduction

Despite burgeoning commercial and cross-disciplinary academic interest, as well as the relatively long existence of a transportation-oriented sub-discipline, GIS-T, network analysis capabilities remain relatively under-developed in GIS(Miller 1999). Out-of-the-box network analysis extensions in ArcGIS 9.1, for example, are largely limited to elementary utility network modeling and shortest path calculations. The reasons for this are manifold, but certainly major obstacles include the difficulty in collecting data (such as precise traffic flow data)and the computational resources required to produce analytic results.[1] Network location problems, for example, face serious issues in how to aggregate demand, how to identify feasible sites, the viability of the different solution methodologies, and devising the data structures necessary to facilitate a range of different potential applications efficiently(Church and Sorenson 1994). This project addresses the latter two problems: solution methodologies and, to a lesser degree, data structures. It implements a flow-covering location-allocation model (FCLM)in ArcGIS 9.1 using variants of a greedy heuristic which are known to be efficient for FCLM problems.

The FCLM locates a set of facilities at nodes on a network such thatthe maximum amount of flow moving across that network is “covered” by a facility or, put another way, moves through a node with a facility. Practical applications of the FCLM include retail store and billboard placement, evaluation of communication network robustness, evaluation of hazardous waste transportation risk, location of speed traps, placement of medical facilities across evacuation routes, and location of pipeline monitoring stations(Goodchild and Noronha 1987; Hodgson 1990; Hodgson, Rosing et al. 1996). A related set of models, network flow interdiction models, has a long history of use in the military as a means of choosing targets for reducing the capacity of transportation and communication networks(Cormican, Morton et al. 1998). Another related set of models are flow interception models, which attempt to maximally cover flows as close to their inception point as possible (Berman, Hodgson et al. 1995).

The FCLM is considered a special case of the maximal covering location model (MCLM), to which it is structurally similar. The MCLM locates a set of facilities that can cover or satisfy demand on a network within a certain “service distance” such that the maximum amount of demand is satisfied. Bothmodels fall within a class of complexity problems known as NP (Non-deterministic Polynomial-time) which can be solved using linear programming techniques, but which require prohibitive amounts of processing time to solve optimally with all but the smallest data sets.The MCLM is known to be NP-Hard, so the same can be expected of the FCLM(Hodgson 1990). Selection of a heuristic procedure which can quickly estimate a good (though not necessarily optimal) solution is therefore of particular importance when considering the application of GIS technology to solving flow covering problems.

This paper will address these issues by discussing the FCLM in more detail, then providing a literature review of the formulations of various covering problems, the applications of those problems, and the solution methods and solution difficulty for real-world problem instances. This is followed by the presentation of a dataset to which both optimal and heuristic solution procedures are applied. A description of the ArcGIS 9.1 implementation is provided. Conclusions and a discussion of possible future research avenues follow.

Research Goals

This project implements the FCLM in an add-on application for ArcGIS 9.1. It can utilize one of two variants of a greedy algorithm to obtain solutions. While such heuristics can be expected to produce sub-optimal solutions, they can be calculated relatively quickly and can often produce solutions that are “good enough” for many applications. The solutions calculated using the tool developed here are compared with optimal results obtained from ILOG CPLEX, an industry-standard linear programming application, in order to evaluate the effectiveness of the heuristics and the implementation.

The FCLM presents some unique problems compared to the more well-known MCLM. In the MCLM, systems of facilities are located based on the demand capable of being allocated to the particular configuration of locations. In the MCLM and most models like it, demand is expressed as a weight at the nodes. However, this is not always the case. The FCLM addresses location-allocation problems where demand is expressed as flows. From an implementation standpoint, this requires some tailoring. Since flows can run through multiple nodes, the application must keep track of which flows run through which nodes.

Literature Review

The literature surrounding the FCLM that pertains to this research falls broadly into three categories: 1) the formulation of the FCLM itself and its predecessor, the MCLM, 2) solution procedures for the FCLM and MCLM, and 3) the implementation of combinatorially complex location models in GIS.

Formulations of the FCLM and MCLM

Church and ReVelle (Church and ReVelle 1974) present their seminal formulation of the MCLM, a model that locates a fixed number of facilities with a defined service distance for covering the maximum population (either with or without a minimum service distance constraint). A greedy adding and a greedy adding with substitution algorithm are considered and compared to an optimal linear programming solution.

Hodgson (Hodgson 1990) defined his formulation of the FCLM as a special case of the MCLM. A basic linear programming algorithm and cannibalizing and non-cannibalizing greedy heuristic algorithms are considered, the latter non-cannibalizing heuristic noted as being highly efficient. The article also points out how significant the issue of flow cannibalization is to the application of the model, as in some cases redundant flow capturing is beneficial.

Berman and Krass (Berman and Krass 2002) present a generalization of the maximal cover location problem, where degrees of coverage can be designated based on other values such as distance to the nearest facility. This paper marks an effort to formulate a more generalized version of the MCLM.

Church and ReVelle (Church and ReVelle 1976) show that the MCLM can be structured and solved as a p-median problem. They demonstrate how solution techniques for p-median problems can be applied to MCLM problems by essentially editing the distance matrix used in the formulation. They argue this correspondence between the two problems suggests that all location problems can, in fact, be structured as p-median problems.

Chung (Chung 1986) discusses the viability of some non-traditional applications of the MCLM outside of location problems. His proposals include data abstraction, cluster identification and analysis, and quantitative classification/categorization.

Solution Procedures

Adenso-Diaz and Rodriguez (Adenso-Diaz and Rodriguez 1997) describe the use of the TABU search heuristic in locating facilities under differing objective functions. They conclude the heuristic to be effective even with small TABU tenures (the range the stair-climbing method can search to avoid becoming captured in local optima). While the results are not optimal, they conclude the low computing time using TABU search validates use of the heuristic.

Berman and Krass (Berman and Krass 2002) evaluate a number of integer programming heuristics for location problems, demonstrating that heuristic methods usually produce optimal or near-optimal solutions, especially as the number of selected facilities increases.

Current and Schilling (Current and Schilling 1990) discuss the difficulties associated with aggregating demand data in p-median problems. Models like the MCLM are shown to be sensitive to small changes in source-demand distances, so aggregation processes can introduce significant error into the data and solution. They propose measures for mitigating these aggregation problems and, particularly, describe a procedure for aggregating demand data to eliminate the loss of locational information while minimizing solution time.

Implementing Location Problems in GIS

Church and Sorenson (Church and Sorenson 1994) discuss the problems of integrating location models into GIS. They point out that these problems fall into four major categories: how to aggregate demand, how to identify feasible sites, the viability of different solution methodologies, and the data structures necessary to facilitate a range of different potential applications. They focus, however, primarily on an evaluation of heuristic methods, concluding that Teitz and Bart (Teitz and Bart 1968) or the GRIA heuristic(Densham and Rushton 1992) are the best candidates.

Hodgson, Rosing, and Zhang (Hodgson, Rosing et al. 1996)consider a practical application of the FCLM in locating vehicle inspection stations on a transportation network. They differentiate a punitive inspection approach (which would use MCLM) versus a preventive inspection approach (FCLM). They implement the FCLM using a mixed integer program method and compare it to a greedy heuristic. The greedy heuristic is found to be insufficiently robust for the example, so other heuristics are suggested, including a vertex substitution heuristic (Teitz and Bart 1968).

Pirkul and Schilling (Pirkul and Schilling 1991)propose a capacitated maximal covering location problem in which workload limit constraints are placed on facilities in the model. They demonstrate that such a constraint can significantly affect location assignment, and develop a solution procedure for the variant problem based on Lagrangian relaxation.

Revelle, Schweitzer, and Snyder(ReVelle, Schweitzer et al. 1996) present three variations of covering location problems, two variations of a Maximal Conditional Covering Problem (MCCP I and II) and the Multi-objective Conditional Covering Problem (MOCCP). MCCP I and II are covering problems where a goal of the formulation is to maximize the number of facilities with supporting coverage subject to a constraint on the number of facilities. MCCP I does not allow more than one facility to be sited on a node, whereas MCCP II allows multiple facilities at the same node. MOCCP calculates an optimum compromise between primary and supporting coverage.

Maximal Covering Location Model

The MCLM from which the FCLM is derived is defined as follows (Church and ReVelle 1974):

Maximize z = (that is, satisfy as much demand as possible)

for all (that is, demand is satisfied at location i if there is at least one facility at location j in the set of locations Ni)

(that is, only P facilities are located)

xj = (0,1) for all

yi= (0,1) for all

Where:

I = the set of demand nodes;

J = the set of facility sites;

S = the distance beyond which a demand point is considered “uncovered” (the value of S can be chosen differently for each demand point if desired);

dij = the shortest distance from node I to node j;

xj = a binary variable,

= 1 if a facility is allocated to site j,

= 0 otherwise;

Ni = {| dij S} (the set of facility sites j capable of satisfying demand at location i);

ai = population to be servicedfrom demand node i;

yi = a binary variable,

= 1 if demand can be satisfied at site i,

= 0 if not;

P = the number of facilities to be located.

Flow Covering Location Model

The FCLM is very similar to the MCLM, the major difference being that set N is composed of the set of nodes capable of covering a given demand node in the MCLM, and in the FCLM set N is the set of nodes capable of capturing a given flow between two nodes. The flow capturing location-allocation model is defined as follows(Hodgson 1990):

Maximize z = (that is, capture as much flow as possible)

for all (that is, flow on path q is captured if there is at least one facility at k on path q)

(that is, only p facilities are located)

= (0, 1) for all

= (0, 1) for all

Where:

is a particular origin and destination (OD) pair;

is the set of all OD pairs;

is the flow between OD pair q;

is a binary variable;

= 1 if is captured;

= 0 if not.

k indicates a potential facility location.

K is the set of all potential facility locations.

is a binary variable,

= 1 if there is a facility at location k,

= 0 if not

= is the set of nodes capable of capturing (that is, the set of nodes on path qbetween and )

Applications

The FCLM can be applied to some problems which can be reduced to a network structure with flows moving across that network. Many scheduling and workflow problems, for example, might provide potential applications. In the domain of GIS, most FCLM applications have to do with retail store placement. Goodchild and Noronha, for example, discuss the utility of the FCLM in locating gas stations (Goodchild and Noronha 1987). Other examples include the location of convenience stores, automatic teller machines, and other businesses that rely on impulse or “convenience” sales. Berman, Hodgson, and Krass discuss in detail a scenario in which vehicle inspection stations are located on a network such that a maximal amount of the total network flow can be inspected (Berman, Hodgson et al. 1995). Speculatively, the FCLM could be applied to locating medical facilities along evacuation routes and monitoring stations on pipelines.

Solution Procedures

One major obstacle to the implementation of more extensive network analysis capabilities in GIS is that many such functions are prohibitively time-consuming to calculate optimally on networks of real-world size—even on present-day processors. However, heuristic procedures can be used that can produce relatively efficient solutions.

Optimal solution procedures

Enumeration

The location models discussed here, the FCLM and MCLM, are among a class of complexity problems in which the number of possible permutations will be:

where n = the number of potential facility locations and p = the number of facilities one wants to locate. For even modestly sized networks, the number of permutations becomes so large that a brute force enumeration method which simply checks every possible arrangement of facility sites isimpractical.

Linear Programming

The development of linear programming (LP) techniques since the 1950s has enabled more efficient computation of many combinatorially complex problems like the FCLM. LP requires that the problem be reduced to a set of linear functions. The standard form for LP problems requires an objective function, which is a function to be maximized or minimized, along with a set of linear constraint functions. For simple problems this can be represented graphically (Figure 1).

Figure 1: LP Feasible Region

In this example, four constraint functions define the convex hull of a feasible region of possible x1 and x2 combinations, the non-zero constraints and the sloped functions. Corners in the feasible region are called feasible solutions.

Perhaps the most common method of solving LP problems is the simplex algorithm. In effect, the simplex algorithm can be thought to move along the boundaries of the feasible region, evaluating each adjacent corner or feasible solution against the objective function, and returning the x1, x2 combination that results in the highest (or lowest) result in the objective function. In enumeration, each combination of x1 and x2 in the feasible region must be inspected, whereas, using simplex, only four (or less) combinations are evaluated in this example.

Clearly, LPprocedures like simplex are much more efficient than enumeration. However, location problems like the FCLM still have an n-dimensional feasible region, so while LP can reduce the number of permutations that are evaluated, it does not reduce the problems’ inherent combinatorial complexity. Similarly, further refinements of simplex such as a branch and bound procedure that divides the feasible region into smaller regions (branching) and determining the upper and lower bounds of the objective function for eachsub-region (bounding) can further reduce the solution space (and facilitate integer solutions), but again do not affect the problems’ complexity.

Heuristic solution procedures

Greedy Heuristics

Greedy heuristics can be understood to be simple solution algorithms that choose a local optimum at each step of the procedure in the hopes of achieving a global optimum. For example, a greedy heuristic algorithm for the traveling salesman problem (an NP-hard problem where a salesman must visit a set of cities while covering a minimal distance) would select the closest city which has not already been visited. The problem with greedy heuristics is that they do not operate on all the data and are liable to “commit” to a solution path that is optimallocally, but not globally.

There are many variants of greedy algorithms, including the GRASP algorithm which selects randomly from a set of the k best candidates. When k=1, GRASP is identical to a basic greedy heuristic. When k=2 or greater, GRASP functions as a randomized greedy heuristic. By collecting results from many searches with this algorithm, the best permutation can be expected to be a good result(Church and Sorenson 1994).

Interchange Heuristics

Interchange heuristics start with a random feasible solution (i.e. a set of potential facility locations or a permutation in the feasible region), evaluating the objective function for that solution, interchanging an element in the solution with one outside the solution, and evaluating the objective function again for the new permutation(Teitz and Bart 1968). Variants of interchange heuristics generally differ on the rules for whether an interchange is accepted or not, rules that aim to minimize the possibility of the algorithm becoming caught in local optima without achieving a global optimum. A simple greedy interchange heuristic, for example, would update the feasible solution whenever a new permutation with a better objective function is found. Results can vary depending on the initial random feasible solution, so use of interchange heuristics often include multiple runs seeded with differing initial solutions.

Tabu Heuristics

Tabu heuristics can be thought of as elaborated interchange heuristics in which solution paths (or permutation subsets) which have already been explored are not reevaluated—the paths are made tabu(Church and Sorenson 1994). In effect, this alters the local neighborhood structure of potential solution paths, reducing the possibility of the algorithm cycling through solutions it has already evaluated and enabling the algorithm to back out of local optima. Variants of this algorithm can be configured based on the size of permutation subsets or the number of steps a particular path is “remembered”.