Searching for Feasible Plans
The methods presented in this section are just graph search algorithms, but with the understanding that the state transition graph is revealed incrementally through the application of actions, instead of being fully specified in advance. The presenta- tion in this section can therefore be considered as visiting graph search algorithms from a planning perspective. An important requirement for these or any search algorithms is to be systematic. If the graph is finite, this means that the algorithm will visit every reachable state, which enables it to correctly declare in finite time whether or not a solution exists. To be systematic, the algorithm should keep track of states already visited; otherwise, the search may run forever by cycling through the same states. Ensuring that no redundant exploration occurs is sufficient to make the search systematic.
If the graph is infinite, then we are willing to tolerate a weaker definition for being systematic. If a solution exists, then the search algorithm still must report it in finite time; however, if a solution does not exist, it is acceptable for the algorithm to search forever. This systematic requirement is achieved by ensuring that, in the limit, as the number of search iterations tends to infinity, every reachable vertex in the graph is explored. Since the number of vertices is assumed to be countable, this must always be possible.
As an example of this requirement, consider Example 2.1 on an infinite tile floor with no obstacles. If the search algorithm explores in only one direction, as
FORWARD SEARCH
- Q.Insert(xI ) and mark xI as visited
- while Q not empty do
- x ← Q.GetFirst()
- if x XG
∈
- return SUCCESS
- forall u U(x)
∈
- x′ f(x, u)
←
- if x′ not visited
- Mark x′ as visited
- Q.Insert(x′)
- else
- Resolve duplicate x′
- return FAILURE
Figure 2.4: A general template for forward search.
depicted in Figure 2.3a, then in the limit most of the space will be left uncovered, even though no states are revisited. If instead the search proceeds outward from the origin in wavefronts, as depicted in Figure 2.3b, then it may be systematic. In practice, each search algorithm has to be carefully analyzed. A search algorithm could expand in multiple directions, or even in wavefronts, but still not be system- atic. If the graph is finite, then it is much simpler: Virtually any search algorithm is systematic, provided that it marks visited states to avoid revisiting the same states indefinitely.
- General Forward Search
Figure 2.4 gives a general template of search algorithms, expressed using the state- space representation. At any point during the search, there will be three kinds of states:
- Unvisited: States that have not been visited yet. Initially, this is every state except xI .
- Dead: States that have been visited, and for which every possible next state has also been visited. A next state of x is a state x′ for which there exists a u U(x) such that x′ = f(x, u). In a sense, these states are dead because there is nothing more that they can contribute to the search; there are no new leads that could help in finding a feasible plan. Section 2.3.3 discusses a variant in which dead states can become alive again in an effort to obtain optimal plans.
∈
- Alive: States that have been encountered, but possibly have unvisited next states. These are considered alive. Initially, the only alive state is xI .
The set of alive states is stored in a priority queue, Q, for which a priority function must be specified. The only significant difference between various search algorithms is the particular function used to sort Q. Many variations will be described later, but for the time being, it might be helpful to pick one. Therefore, assume for now that Q is a common FIFO (First-In First-Out) queue; whichever state has been waiting the longest will be chosen when Q.GetFirst() is called. The rest of the general search algorithm is quite simple. Initially, Q contains the initial state xI . A while loop is then executed, which terminates only when Q is empty. This will only occur when the entire graph has been explored without finding any goal states, which results in a FAILURE (unless the reachable portion of X is infinite, in which case the algorithm should never terminate). In each while iteration, the highest ranked element, x, of Q is removed. If x lies in XG, then it reports SUCCESS and terminates; otherwise, the algorithm tries applying every possible action, u U(x). For each next state, x′ = f(x, u), it must determine whether x′ is being encountered for the first time. If it is unvisited, then it is inserted into Q; otherwise, there is no need to consider it because it must be either dead or already in Q.
∈
The algorithm description in Figure 2.4 omits several details that often become important in practice. For example, how efficient is the test to determine whether
x ∈ XG in line 4? This depends, of course, on the size of the state space and
on the particular representations chosen for x and XG. At this level, we do not specify a particular method because the representations are not given.
One important detail is that the existing algorithm only indicates whether a solution exists, but does not seem to produce a plan, which is a sequence of actions that achieves the goal. This can be fixed by inserting a line after line 7 that associates with x′ its parent, x. If this is performed each time, one can simply trace the pointers from the final state to the initial state to recover the plan. For convenience, one might also store which action was taken, in addition to the pointer from x′ to x.
Lines 8 and 9 are conceptually simple, but how can one tell whether x′ has been visited? For some problems the state transition graph might actually be a tree, which means that there are no repeated states. Although this does not occur frequently, it is wonderful when it does because there is no need to check whether states have been visited. If the states in X all lie on a grid, one can simply make a lookup table that can be accessed in constant time to determine whether a state has been visited. In general, however, it might be quite difficult because the state x′ must be compared with every other state in Q and with all of the dead states. If the representation of each state is long, as is sometimes the case, this will be very costly. A good hashing scheme or another clever data structure can greatly alleviate this cost, but in many applications the computation time will remain high. One alternative is to simply allow repeated states, but this could lead to an increase in computational cost that far outweighs the benefits. Even if the graph is very small, search algorithms could run in time exponential in the size of the state transition graph, or the search may not terminate at all, even if the graph is
finite.
One final detail is that some search algorithms will require a cost to be com- puted and associated with every state. If the same state is reached multiple times, the cost may have to be updated, which is performed in line 12, if the particular search algorithm requires it. Such costs may be used in some way to sort the priority queue, or they may enable the recovery of the plan on completion of the algorithm. Instead of storing pointers, as mentioned previously, the optimal cost to return to the initial state could be stored with each state. This cost alone is suf- ficient to determine the action sequence that leads to any visited state. Starting at a visited state, the path back to xI can be obtained by traversing the state transi- tion graph backward in a way that decreases the cost as quickly as possible in each step. For this to succeed, the costs must have a certain monotonicity property, which is obtained by Dijkstra’s algorithm and A∗ search, and will be introduced in Section 2.2.2. More generally, the costs must form a navigation function, which is considered in Section 8.2.2 as feedback is incorporated into discrete planning.
- Particular Forward Search Methods
This section presents several search algorithms, each of which constructs a search tree. Each search algorithm is a special case of the algorithm in Figure 2.4, ob- tained by defining a different sorting function for Q. Most of these are just classical graph search algorithms [243].
Breadth first The method given in Section 2.2.1 specifies Q as a First-In First- Out (FIFO) queue, which selects states using the first-come, first-serve principle. This causes the search frontier to grow uniformly and is therefore referred to as breadth-first search. All plans that have k steps are exhausted before plans with k + 1 steps are investigated. Therefore, breadth first guarantees that the first solution found will use the smallest number of steps. On detection that a state has been revisited, there is no work to do in line 12. Since the search progresses in a series of wavefronts, breadth-first search is systematic. In fact, it even remains systematic if it does not keep track of repeated states (however, it will waste time considering irrelevant cycles).
The asymptotic running time of breadth-first search is O( V + E ), in which V and E are the numbers of vertices and edges, respectively, in the state tran- sition graph (recall, however, that the graph is usually not the input; for example, the input may be the rules of the Rubik’s cube). This assumes that all basic operations, such as determining whether a state has been visited, are performed in constant time. In practice, these operations will typically require more time and must be counted as part of the algorithm’s complexity. The running time can be expressed in terms of the other representations. Recall that V = X is the number of states. If the same actions U are available from every state, then
| | | |
| | | |
| | | |
|E| = |U||X|. If the action sets U(x1) and U(x2) are pairwise disjoint for any
x1, x2 ∈ X, then |E| = |U|.
Depth first By making Q a stack (Last-In, First-Out; or LIFO), aggressive ex- ploration of the state transition graph occurs, as opposed to the uniform expansion of breadth-first search. The resulting variant is called depth-first search because the search dives quickly into the graph. The preference is toward investigating longer plans very early. Although this aggressive behavior might seem desirable, note that the particular choice of longer plans is arbitrary. Actions are applied in the forall loop in whatever order they happen to be defined. Once again, if a state is revisited, there is no work to do in line 12. Depth-first search is systematic for any finite X but not for an infinite X because it could behave like Figure 2.3a. The search could easily focus on one “direction” and completely miss large portions of the search space as the number of iterations tends to infinity. The running time
of depth first search is also O(|V | + |E|).
Dijkstra’s algorithm Up to this point, there has been no reason to prefer one action over any other in the search. Section 2.3 will formalize optimal discrete planning and will present several algorithms that find optimal plans. Before go- ing into that, we present a systematic search algorithm that finds optimal plans because it is also useful for finding feasible plans. The result is the well-known Dijkstra’s algorithm for finding single-source shortest paths in a graph [273], which is a special form of dynamic programming. More general dynamic programming computations appear in Section 2.3 and throughout the book.
Suppose that every edge, e E, in the graph representation of a discrete plan- ning problem has an associated nonnegative cost l(e), which is the cost to apply the action. The cost l(e) could be written using the state-space representation as l(x, u), indicating that it costs l(x, u) to apply action u from state x. The total cost of a plan is just the sum of the edge costs over the path from the initial state to a goal state.
∈
The priority queue, Q, will be sorted according to a function C : X [0, ], called the cost-to-come. For each state x, the value C∗(x) is called the optimal1 cost-to-come from the initial state xI . This optimal cost is obtained by summing edge costs, l(e), over all possible paths from xI to x and using the path that produces the least cumulative cost. If the cost is not known to be optimal, then it is written as C(x).
→ ∞
The cost-to-come is computed incrementally during the execution of the search algorithm in Figure 2.4. Initially, C∗(xI ) = 0. Each time the state x′ is generated, a cost is computed as C(x′) = C∗(x) + l(e), in which e is the edge from x to x′ (equivalently, we may write C(x′) = C∗(x) + l(x, u)). Here, C(x′) represents the best cost-to-come that is known so far, but we do not write C∗ because it is not yet known whether x′ was reached optimally. Due to this, some work is required in line 12. If x′ already exists in Q, then it is possible that the newly discovered path to x′ is more efficient. If so, then the cost-to-come value C(x′) must be lowered for x′, and Q must be reordered accordingly.
1As in optimization literature, we will use ∗ to mean optimal.
When does C(x) finally become C∗(x) for some state x? Once x is removed from Q using Q.GetFirst(), the state becomes dead, and it is known that x cannot be reached with a lower cost. This can be argued by induction. For the initial state, C∗(xI ) is known, and this serves as the base case. Now assume that every dead state has its optimal cost-to-come correctly determined. This means that their cost-to-come values can no longer change. For the first element, x, of Q, the value must be optimal because any path that has a lower total cost would have to travel through another state in Q, but these states already have higher costs. All paths that pass only through dead states were already considered in producing C(x). Once all edges leaving x are explored, then x can be declared as dead, and the induction continues. This is not enough detail to constitute a proof of optimality; more arguments appear in Section 2.3.3 and in [243]. The running time is O( V lg V + E ), in which V and E are the numbers of edges and vertices, respectively, in the graph representation of the discrete planning problem. This assumes that the priority queue is implemented with a Fibonacci heap, and that all other operations, such as determining whether a state has been visited, are performed in constant time. If other data structures are used to implement the priority queue, then higher running times may be obtained.
| | | | | | | | | |
A-star The A∗ (pronounced “ay star”) search algorithm is an extension of Di- jkstra’s algorithm that tries to reduce the total number of states explored by incorporating a heuristic estimate of the cost to get to the goal from a given state. Let C(x) denote the cost-to-come from xI to x, and let G(x) denote the cost-to-go from x to some state in XG. It is convenient that C∗(x) can be computed in- crementally by dynamic programming; however, there is no way to know the true optimal cost-to-go, G∗, in advance. Fortunately, in many applications it is possible to construct a reasonable underestimate of this cost. As an example of a typical underestimate, consider planning in the labyrinth depicted in Figure 2.2. Suppose that the cost is the total number of steps in the plan. If one state has coordinates (i, j) and another has (i′, j′), then i′ i + j′ j is an underestimate because this is the length of a straightforward plan that ignores obstacles. Once obstacles are included, the cost can only increase as the robot tries to get around them (which may not even be possible). Of course, zero could also serve as an underestimate, but that would not provide any helpful information to the algorithm. The aim is to compute an estimate that is as close as possible to the optimal cost-to-go and is also guaranteed to be no greater. Let Gˆ∗(x) denote such an estimate.
| − | | − |
The A∗ search algorithm works in exactly the same way as Dijkstra’s algorithm. The only difference is the function used to sort Q. In the A∗ algorithm, the sum C∗(x′) + Gˆ∗(x′) is used, implying that the priority queue is sorted by estimates of
the optimal cost from xI to XG. If Gˆ∗(x) is an underestimate of the true optimal cost-to-go for all x ∈ X, the A∗ algorithm is guaranteed to find optimal plans
[337, 777]. As Gˆ∗ becomes closer to G∗, fewer vertices tend to be explored in
comparison with Dijkstra’s algorithm. This would always seem advantageous, but in some problems it is difficult or impossible to find a heuristic that is both efficient
Figure 2.5: Here is a troublesome example for best-first search. Imagine trying to reach a state that is directly below the spiral tube. If the initial state starts inside of the opening at the top of the tube, the search will progress around the spiral instead of leaving the tube and heading straight for the goal.
to evaluate and provides good search guidance. Note that when Gˆ∗(x) = 0 for all x X, then A∗ degenerates to Dijkstra’s algorithm. In any case, the search will always be systematic.
∈
Best first For best-first search, the priority queue is sorted according to an estimate of the optimal cost-to-go. The solutions obtained in this way are not necessarily optimal; therefore, it does not matter whether the estimate exceeds the true optimal cost-to-go, which was important to maintain optimality for A∗ search. Although optimal solutions are not found, in many cases, far fewer vertices are explored, which results in much faster running times. There is no guarantee, however, that this will happen. The worst-case performance of best-first search is worse than that of A∗ search and dynamic programming. The algorithm is often too greedy because it prefers states that “look good” very early in the search. Sometimes the price must be paid for being greedy! Figure 2.5 shows a contrived example in which the planning problem involves taking small steps in a 3D world. For any specified number, k, of steps, it is easy to construct a spiral example that wastes at least k steps in comparison to Dijkstra’s algorithm. Note that best-first
search is not systematic.
Iterative deepening The iterative deepening approach is usually preferable if the search tree has a large branching factor (i.e., there are many more vertices in the next level than in the current level). This could occur if there are many actions per state and only a few states are revisited. The idea is to use depth-first search and find all states that are distance i or less from xI . If the goal is not found, then the previous work is discarded, and depth first is applied to find all states of distance i + 1 or less from xI . This generally iterates from i = 1 and proceeds indefinitely until the goal is found. Iterative deepening can be viewed as a way of converting depth-first search into a systematic search method. The motivation for discarding the work of previous iterations is that the number of states reached for i + 1 is expected to far exceed (e.g., by a factor of 10) the number reached for i. Therefore, once the commitment has been made to reach level i + 1, the cost of all previous iterations is negligible.
The iterative deepening method has better worst-case performance than breadth- first search for many problems. Furthermore, the space requirements are reduced because the queue in breadth-first search is usually much larger than for depth- first search. If the nearest goal state is i steps from xI , breadth-first search in the worst case might reach nearly all states of distance i + 1 before terminating successfully. This occurs each time a state x XG of distance i from xI is reached because all new states that can be reached in one step are placed onto Q. The A∗ idea can be combined with iterative deepening to yield IDA∗, in which i is replaced by C∗(x′) + Gˆ∗(x′). In each iteration of IDA∗, the allowed total cost gradually increases [777].
/∈
- Other General Search Schemes
This section covers two other general templates for search algorithms. The first one is simply a “backward” version of the tree search algorithm in Figure 2.4. The second one is a bidirectional approach that grows two search trees, one from the initial state and one from a goal state.
Backward search Backward versions of any of the forward search algorithms of Section 2.2.2 can be made. For example, a backward version of Dijkstra’s algorithm can be made by starting from xG. To create backward search algorithms, suppose that there is a single goal state, xG. For many planning problems, it might be the case that the branching factor is large when starting from xI . In this case, it might be more efficient to start the search at a goal state and work backward until the initial state is encountered. A general template for this approach is given in Figure
- For forward search, recall that an action u U(x) is applied from x X to obtain a new state, x′ = f(x, u). For backward search, a frequent computation will be to determine for some x′, the preceding state x X, and action u U(x) such that x′ = f(x, u). The template in Figure 2.6 can be extended to handle a
∈ ∈
∈ ∈
BACKWARD SEARCH
- Q.Insert(xG) and mark xG as visited
- while Q not empty do
- x′ ← Q.GetFirst()
- if x = xI
- return SUCCESS
- forall u−1 U−1(x)
∈
7 x f−1(x′, u−1)
←
- if x not visited
- Mark x as visited
- Q.Insert(x)
- else
- Resolve duplicate x
- return FAILURE
Figure 2.6: A general template for backward search.
goal region, XG, by inserting all xG XG into Q in line 1 and marking them as visited.
∈
For most problems, it may be preferable to precompute a representation of the state transition function, f, that is “backward” to be consistent with the search algorithm. Some convenient notation will now be constructed for the backward version of f. Let U−1 = (x, u) X U x X, u U(x) , which represents the set of all state-action pairs and can also be considered as the domain of f. Imagine
{ ∈ × | ∈ ∈ }
from a given state x′ ∈ X, the set of all (x, u) ∈ U−1 that map to x′ using f. This can be considered as a backward action space, defined formally for any x′ ∈ X as
U−1(x′) = {(x, u) ∈ U−1 | x′ = f(x, u)}. (2.3) For convenience, let u−1 denote a state-action pair (x, u) that belongs to some
U−1(x′). From any u−1 U−1(x′), there is a unique x X. Thus, let f−1 denote a backward state transition function that yields x from x′ and u−1 U−1(x′). This defines a backward state transition equation, x = f−1(x′, u−1), which looks very similar to the forward version, x′ = f(x, u).
∈
∈ ∈
The interpretation of f−1 is easy to capture in terms of the state transition graph: reverse the direction of every edge. This makes finding a plan in the reversed graph using backward search equivalent to finding one in the original graph using forward search. The backward state transition function is the variant of f that is obtained after reversing all of the edges. Each u−1 is a reversed edge. Since there is a perfect symmetry with respect to the forward search of Section 2.2.1, any of the search algorithm variants from Section 2.2.2 can be adapted to the template in Figure 2.6, provided that f−1 has been defined.
Bidirectional search Now that forward and backward search have been cov- ered, the next reasonable idea is to conduct a bidirectional search. The general search template given in Figure 2.7 can be considered as a combination of the two in Figures 2.4 and 2.6. One tree is grown from the initial state, and the other is grown from the goal state (assume again that XG is a singleton, xG ). The search terminates with success when the two trees meet. Failure occurs if either priority queue has been exhausted. For many problems, bidirectional search can dramatically reduce the amount of required exploration. There are Dijkstra and A∗ variants of bidirectional search, which lead to optimal solutions. For best- first and other variants, it may be challenging to ensure that the two trees meet quickly. They might come very close to each other and then fail to connect. Addi- tional heuristics may help in some settings to guide the trees into each other. One can even extend this framework to allow any number of search trees. This may be desirable in some applications, but connecting the trees becomes even more complicated and expensive.
{ }
- A Unified View of the Search Methods
It is convenient to summarize the behavior of all search methods in terms of several basic steps. Variations of these steps will appear later for more complicated planning problems. For example, in Section 5.4, a large family of sampling-based motion planning algorithms can be viewed as an extension of the steps presented here. The extension in this case is made from a discrete state space to a continuous state space (called the configuration space). Each method incrementally constructs a search graph, (V, E), which is the subgraph of the state transition graph that has been explored so far.
G
All of the planning methods from this section followed the same basic template:
- Initialization: Let the search graph, G(V, E), be initialized with E empty and V containing some starting states. For forward search, V = {xI }; for backward search, V = {xG}. If bidirectional search is used, then V =
xI , xG . It is possible to grow more than two trees and merge them during the search process. In this case, more states can be initialized in V . The search graph will incrementally grow to reveal more and more of the state transition graph.
{ }
- Select Vertex: Choose a vertex ncur ∈ V for expansion; this is usually accomplished by maintaining a priority queue. Let xcur denote the state associated with ncur.
- Apply an Action: In either a forward or backward direction, a new state, xnew , is obtained. This may arise from xnew = f(x, u) for some u ∈ U(x) (forward) or x = f(xnew, u) for some u ∈ U(xnew) (backward).
- Insert a Directed Edge into the Graph: If certain algorithm-specific tests are passed, then generate an edge from x to xnew for the forward case,
BIDIRECTIONAL SEARCH
- QI .Insert(xI ) and mark xI as visited
- QG.Insert(xG) and mark xG as visited
- while QI not empty and QG not empty do
- if QI not empty
- x ← QI .GetFirst()
- if x already visited from xG
- return SUCCESS
- forall u U(x)
∈
- x′ f(x, u)
←
- if x′ not visited
- Mark x′ as visited
- QI .Insert(x′)
- else
- Resolve duplicate x′
- if QG not empty
- x′ QG.GetFirst()
←
- if x′ already visited from xI
- return SUCCESS
- forall u−1 U−1(x′)
∈
20 x f−1(x′, u−1)
←
- if x not visited
- Mark x as visited
- QG.Insert(x)
- else
- Resolve duplicate x
- return FAILURE
Figure 2.7: A general template for bidirectional search.
or an edge from xnew to x for the backward case. If xnew is not yet in V , it will be inserted into V .2
- Check for Solution: Determine whether encodes a path from xI to xG. If there is a single search tree, then this is trivial. If there are two or more search trees, then this step could be expensive.
G
- Return to Step 2: Iterate unless a solution has been found or an early termination condition is satisfied, in which case the algorithm reports failure.
Note that in this summary, several iterations may have to be made to generate one iteration in the previous formulations. The forward search algorithm in Figure
2.4 tries all actions for the first element of Q. If there are k actions, this corresponds to k iterations in the template above.