Discrete State Spaces

This section is provided mainly to help to explain similar concepts that are coming in later sections. The presentation is limited to discrete spaces, which are much simpler to formulate and understand. Following this, an extension to conﬁguration spaces and other continuous state spaces can be made. The discussion here is also relevant background for the feedback planning concepts that will be introduced in Section 8.4.1. In that case, uncertainty will be explicitly modeled. The resulting formulation and concepts can be considered as an extension of this section.

1Section 8.4.4 will actually consider some simple differential constraints, such as acceleration

bounds; the full treatment of differential constraints is deferred until Part IV.

8.2.1 Defining a Feedback Plan

Consider a discrete planning problem similar to the ones deﬁned in Formulations

1. and 2.3, except that the initial state is not given. Due to this, the cost func- tional cannot be expressed only as a function of a plan. It is instead deﬁned in terms of the state history and action history. At stage k, these are deﬁned as

x˜k = (x1, x2, . . . , xk) (8.1)

and

u˜k = (u1, u2, . . . , uk), (8.2)

respectively. Sometimes, it will be convenient to alternatively refer to x˜k as the

state trajectory.

The resulting formulation is

Formulation 8.1 (Discrete Optimal Feedback Planning)

- 1. A ﬁnite, nonempty state space X.
  2. For each state, x ∈ X, a ﬁnite action space U(x).
  3. A state transition function f that produces a state, f(x, u) ∈ X, for every

x ∈ X and u ∈ U(x). Let U denote the union of U(x) for all x ∈ X.

- 1. A set of stages, each denoted by k, that begins at k = 1 and continues indeﬁnitely.
  2. A goal set, XG ⊂ X.
  3. Let L denote a stage-additive cost functional,

L(x˜F , u˜K ) = l(xk, uk) + lF (xF ), (8.3)

k=1

in which F = K + 1.

There is one other diﬀerence in comparison to the formulations of Chapter 2. The state space is assumed here to be ﬁnite. This facilitates the construction of a feedback plan, but it is not necessary in general.

Consider deﬁning a plan that solves Formulation 8.1. If the initial condition is given, then a sequence of actions could be speciﬁed, as in Chapter 2. Without having the initial condition, one possible approach is to determine a sequence of actions for each possible initial state, x1 X. Once the initial state is given, the appropriate action sequence is known. This approach, however, wastes memory.

∈

Suppose some x is given as the initial state and the ﬁrst action is applied, leading to the next state x′. What action should be applied from x′? The second action in the sequence at x can be used; however, we can also imagine that x′ is now the

initial state and use its ﬁrst action. This implies that keeping an action sequence for every state is highly redundant. It is suﬃcient at each state to keep only the ﬁrst action in the sequence. The application of that action produces the next state, at which the next appropriate action is stored. An execution sequence can be imagined from an initial state as follows. Start at some state, apply the action stored there, arrive at another state, apply its action, arrive at the next state, and so on, until the goal is reached.

It therefore seems appropriate to represent a feedback plan as a function that maps every state to an action. Therefore, a feedback plan π is deﬁned as a function π : X U. From every state, x X, the plan indicates which action to apply. If the goal is reached, then the termination action should be applied. This is speciﬁed as part of the plan: π(x) = uT , if x XG. A feedback plan is called a solution to the problem if it causes the goal to be reached from every state that is reachable from the goal.

∈

→ ∈

If an initial state x1 and a feedback plan π are given, then the state and action histories can be determined. This implies that the execution cost, (8.3), also can be determined. It can therefore be alternatively expressed as L(π, x1), instead of L(x˜F , u˜K ). This relies on future states always being predictable. In Chapter 10, it will not be possible to make this direct correspondence due to uncertainties (see Section 10.1.3).

Feasibility and optimality The notions of feasible and optimal plans need to be reconsidered in the context of feedback planning because the initial condition is not given. A plan π is called a solution to the feasible planning problem if from every x X from which XG is reachable the goal set is indeed reached by executing π from x. This means that the cost functional is ignored (an alternative to Formulation 8.1 can be deﬁned in which the cost functional is removed). For convenience, π will be called a feasible feedback plan.

∈

Now consider optimality. From a given state x, it is clear that an optimal plan exists using the concepts of Section 2.3. Is it possible that a diﬀerent optimal plan needs to be associated with every x X that can reach XG? It turns out that only one plan is needed to encode optimal paths from every initial state to XG. Why is this true? Suppose that the optimal cost-to-go is computed over X using Dijkstra’s algorithm or value iteration, as covered in Section 2.3. Every cost- to-go value at some x X indicates the cost received under the implementation of the optimal open-loop plan from x. The ﬁrst step in this optimal plan can be determined by (2.19), which yields a new state x′ = f(x, u). From x′, (2.19) can be applied once again to determine the next optimal action. The cost at x′ represents both the optimal cost-to-go if x′ is the initial condition and also the optimal cost-to-go when continuing on the optimal path from x. The two must be equivalent because of the dynamic programming principle. Since all such costs must coincide, a single feedback plan can be used to obtain the optimal cost-to-go from every initial condition.

∈

A feedback plan π is therefore deﬁned as optimal if from every x ∈ X, the total









		xG









		uT

- - 1. (b)

Figure 8.2: a) A 2D grid-planning problem. b) A solution feedback plan.

cost, L(π, x), obtained by executing π is the lowest among all possible plans. The requirement that this holds for every initial condition is important for feedback planning.

Example 8.1 (Feedback Plan on a 2D Grid) This example uses the 2D grid model explained in Example 2.1. A robot moves on a grid, and the possible actions are up ( ), down ( ), left ( ), right ( ), and terminate (uT ); some directions are not available from some states. A solution feedback plan is depicted in Figure

↑ ↓ ← →

8.2. Many other possible solutions plans exist. The one shown here happens to be optimal in terms of the number of steps to the goal. Some alternative feedback plans are also optimal (ﬁgure out which arrows can be changed). To apply the plan from any initial state, simply follow the arrows to the goal. In each stage, the application of the action represented by the arrow leads to the next state. The

process terminates when uT is applied at the goal. ．

- 1. Feedback Plans as Navigation Functions

It conveniently turns out that tools for computing a feedback plan were already given in Chapter 2. Methods such as Dijkstra’s algorithm and value iteration produce information as a side eﬀect that can be used to represent a feedback plan. This section explains how this information is converted into a feedback plan. To achieve this, a feedback plan will be alternatively expressed as a potential function over the state space (recall potential functions from Section 5.4.3). The potential values are computed by planning algorithms and can be used to recover

the appropriate actions during execution. In some cases, an optimal feedback plan can even be represented using potential functions.

Navigation functions Consider a (discrete) potential function, deﬁned as φ : X [0, ]. The potential function can be used to deﬁne a feedback plan through the use of a local operator, which is a function that selects the action that reduces the potential as much as possible. First, consider the case of a feasible planning problem. The potential function, φ, deﬁnes a feedback plan by selecting u through the local operator,

→ ∞

J ＼

u∗ = argmin φ(f(x, u)) , (8.4)

u∈U (x)

which means that u∗ U(x) is chosen to reduce φ as much as possible. The local operator yields a kind of greedy descent of the potential. Note that the action u∗ may not be unique. In the continuous-space analog to this, the corresponding local operator performs a descent along the negative gradient (often referred to as gradient descent).

∈

In the case of optimal planning, the local operator is deﬁned as

u∗ = argmin l(x, u) + φ(f(x, u)) , (8.5)

J ＼

u∈U (x)

which looks similar to the dynamic programming condition, (2.19). It becomes identical to (2.19) if φ is interpreted as the optimal cost-to-go. A simpliﬁcation of (8.5) can be made if the planning problem is isotropic, which means that the cost is the same in every direction: l(x, u) = l(x, u′) for all u, u′ U(x) uT . In this case, the cost term l(x, u) does not aﬀect the minimization in (8.5). A common example in which this assumption applies is if the cost functional counts the number of stages required to reach the goal. The costs of particular actions chosen along the way are not important. Using the isotropic property, (8.5) simpliﬁes back to (8.4). When is a potential function useful? Many useless potential functions can be deﬁned that fail to reach the goal, or cause states to cycle indeﬁnitely, and so on. The most desirable potential function is one that for any initial state causes arrival in XG, if it is reachable. This requires only a few simple properties. A potential

∈ { }

function that satisﬁes these will be called a navigation function.2

Suppose that the cost functional is isotropic. Let x′ = f(x, u∗), which is the state reached after applying the action u∗ U(x) that was selected by (8.4). A potential function, φ, is called a (feasible) navigation function if

∈

- - 1. φ(x) = 0 for all x ∈ XG.
    2. φ(x) = ∞ if and only if no point in XG is reachable from x.
    3. For every reachable state, x X XG, the local operator produces a state

∈ \

x′ for which φ(x′) < φ(x).

2This term was developed for continuous configuration spaces in [541, 829]; it will be used more broadly in this book but still retains the basic idea.

22	21	22	21	20	19	18	17	16	17
21	20		15	16
20	19		14	15
19	18	17	16	15	14	13	12	13	14
18	17	16	15	14	13	12	11	12	13
	10	11	12
	9	10	11
3	2	1	2	3		8	9	10
2	1	0	1	2		7	8	9
3	2	1	2	3	4	5	6	7	8

Figure 8.3: The cost-to-go values serve as a navigation function.

The ﬁrst condition requires the goal to have zero potential (this condition is actu- ally not necessary but is included for convenience). The second condition requires that serves as a special indicator that the goal is not reachable from some state. The third condition means that the potential function has no local minima except at the goal. This means that the execution of the resulting feedback plan will progress without cycling and the goal region will eventually be reached.

∞

An optimal navigation function is deﬁned as the optimal cost-to-go, G∗. This

means that in addition to the three properties above, the navigation function must also satisfy the principle of optimality:

φ(x) = min l(x, u) + φ(f(x, u)) , (8.6)

J ＼

u∈U (x)

which is just (2.18) with G∗ replaced by φ. See Section 15.2.1 for more on this connection.

Example 8.2 (Navigation Function on a 2D Grid) Return to the planning problem in Example 8.1. Assume that an isotropic cost model is used: l(x, u) = 1 if u = uT . Figure 8.3 shows a navigation function. The numbers shown in the tiles represent φ. Verify that φ satisﬁes the three requirements for a navigation function.

At any state, an action is applied that reduces the potential value. This cor- responds to selecting the action using (8.4). The process may be repeated from any state until XG is reached. This example clearly illustrates how a navigation

function can be used as an alternative deﬁnition of a feedback plan. ．

Example 8.3 (Airport Terminal) You may have found yourself using a navi- gation function to ﬁnd the exit after arriving in an unfamiliar airport terminal. Many terminals are tree-structured, with increasing gate numbers as the distance

to the terminal exit increases. If you wish to leave the terminal, you should nor- mally walk toward the lower numbered gates. ．

Computing navigation functions There are many ways to compute naviga- tion functions. The cost-to-go function determined by Dijkstra’s algorithm work- ing backward from XG yields an optimal navigation function. The third condition of a navigation function under the anisotropic case is exactly the stationary dy- namic programming equation, (2.18), if the navigation function φ is deﬁned as the optimal cost-to-go G∗. It was mentioned previously that the optimal actions can be recovered using only the cost-to-go. This was actually an example of using a navigation function, and the resulting procedure could have been considered as a feedback plan.

If optimality is not important, then virtually any backward search algorithm from Section 2.2 can be used, provided that it records the distance to the goal from every reached state. The distance does not have to be optimal. It merely corresponds to the cost obtained if the current vertex in the search tree is traced back to the root vertex (or back to any vertex in XG, if there are multiple goal states).

If the planning problem does not even include a cost functional, as in Formu-

lation 2.1, then a cost functional can be invented for the purposes of constructing a navigation function. At each x ∈ X from which XG is reachable, the number

of edges in the search graph that would be traversed from x to XG can be stored as the cost. If Dijkstra’s algorithm is used to construct the navigation function, then the resulting feedback plan yields executions that are shortest in terms of the number of stages required to reach the goal.

The navigation function itself serves as the representation of the feedback plan, by recovering the actions from the local operator. Thus, a function, π : X

→

U, can be recovered from a navigation function, φ : X [0, ]. Likewise, a navigation function, φ, can be constructed from π. Therefore, the π and φ can be considered as interchangeable representations of feedback plans.

→ ∞

- 1. Grid-Based Navigation Functions for Motion Plan- ning

To consider feedback plans for continuous spaces, vector ﬁelds and other basic deﬁnitions from diﬀerential geometry will be needed. These will be covered in Section 8.3; however, before handling such complications, we ﬁrst will describe how to use the ideas presented so far in Section 8.2 as a discrete approximation to feedback motion planning.

Examples 8.1 and 8.2 have already deﬁned feedback plans and navigation func- tions for 2D grids that contain obstacles. Imagine that this model is used to ap- proximate a motion planning problem for which R2. Section 5.4.2 showed

C ⊂

how to make a topological graph that approximates the motion planning problem

WAVEFRONT PROPAGATION ALGORITHM

- - 1. Initialize W0 = XG; i = 0.
    2. Initialize Wi+1 = ∅.
    3. For every x ∈ Wi, assign φ(x) = i and insert all unexplored neighbors of x

into Wi+1.

- - 1. If Wi+1 = ∅, then terminate; otherwise, let i := i + 1 and go to Step 2.

Figure 8.4: The wavefront propagation algorithm is a specialized version of Dijk- stra’s algorithm that optimizes the number of stages to reach the goal.

with a grid of samples. The motions used in Example 8.1 correspond to the 1- neighborhood deﬁnition, (5.37). This idea was further reﬁned in Section 7.7.1 to model approximate optimal motion planning by moving on a grid; see Formulation

7.4. By choosing the Manhattan motion model, as deﬁned in Example 7.4, a grid with the same motions considered in Example 8.1 is produced.

To construct a navigation function that may be useful in mobile robotics, a high-resolution (e.g., 50 to 100 points per axis) grid is usually required. In Section 5.4.2, only a few points per axis were needed because feedback was not assumed. It was possible in some instances to ﬁnd a collision-free path by investigating only a few points per axis. During the execution of a feedback plan, it is assumed that the future states of the robot are not necessarily predictable. Wherever the robot may end up, the navigation function in combination with the local operator must produce the appropriate action. If the current state (or conﬁguration) is approximated by a grid, then it is important to reduce the approximation error as much as possible. This is accomplished by setting the grid resolution high. In the feedback case, the grid can be viewed as “covering” the whole conﬁguration space, whereas in Section 5.4.2 the grid only represented a topological graph of paths that cut across the space.3

Wavefront propagation algorithms Once the approximation has been made, any of the methods discussed in Section 8.2.2 can be used to compute a navigation function. An optimal navigation function can be easily computed using Dijkstra’s algorithm from the goal. If each motion has unit cost, then a useful simpliﬁca- tion can be made. Figure 8.4 describes a wavefront propagation algorithm that computes an optimal navigation function. It can be considered as a special case of Dijkstra’s algorithm that avoids explicit construction of the priority queue. In Dijkstra’s algorithm, the cost of the smallest element in the queue is monotonically

3Difficulty in distinguishing between these two caused researchers for many years to believe that grids yield terrible performance for the open-loop path planning problems of Chapter 5. This was mainly because it was assumed that a high-resolution grid was necessary. For many problems, however, they could terminate early after only considering a few points per axis.

nondecreasing during the execution of the algorithm. In the case of each motion having unit cost, there will be many states in the queue that have the same cost. Dijkstra’s algorithm could remove in parallel all elements that have the same, smallest cost. Suppose the common, smallest cost value is i. These states are organized into a wavefront, Wi. The initial wavefront is W0, which represents the states in XG. The algorithm can immediately assign an optimal cost-to-go value of 1 to every state that can be reached in one stage from any state in W0. These must be optimal because no other cost value is optimal. The states that receive cost 1 can be organized into the wavefront W1. The unexplored neighbors of W1 are assigned cost 2, which also must be optimal. This process repeats inductively from i to i+ 1 until all reachable states have been reached. In the end, the optimal cost-to-go is computed in O(n) time, in which n is the number of reachable grid states. For any states that were not reached, the value φ(x) = can be assigned. The navigation function shown in Figure 8.3 can actually be computed using the wavefront propagation algorithm.

∞

Maximum clearance One problem that typically arises in mobile robotics is that optimal motion plans bring robots too close to obstacles. Recall from Section

- 1. that the shortest Euclidean paths for motion planning in a polygonal envi- ronment must be allowed to touch obstacle vertices. This motivated the maximum clearance roadmap, which was covered in Section 6.2.3. A grid-based approximate version of the maximum clearance roadmap can be made. Furthermore, a naviga- tion function can be deﬁned that guides the robot onto the roadmap, then travels along the roadmap, and ﬁnally deposits the robot at a speciﬁed goal. In [588], the resulting navigation function is called NF2.

Assume that there is a single goal state, xG X. The computation of a maximum clearance navigation function proceeds as follows:

∈

- - 1. Instead of XG, assign W0 to be the set of all states from which motion in at least one direction is blocked. These are the states on the boundary of the discretized collision-free space.
    2. Perform wavefront iterations that propagate costs in waves outward from the obstacle boundaries.
    3. As the wavefronts propagate, they will meet approximately at the location of the maximum clearance roadmap for the original, continuous problem. Mark any state at which two wavefront points arrive from opposing directions as a skeleton state. It may be the case that the wavefronts simply touch each other, rather than arriving at a common state; in this case, one of the two touching states is chosen as the skeleton state. Let S denote the set of all skeleton states.
    4. After the wavefront propagation ends, connect xG to the skeleton by inserting xG and all states along the path to the skeleton into S. This path can be found using any search algorithm.
    5. Compute a navigation function φ1 over S by treating all other states as if they were obstacles and using the wavefront propagation algorithm. This navigation function guides any point in S to the goal.
    6. Treat S as a goal region and compute a navigation function φ2 using the wavefront propagation algorithm. This navigation function guides the state to the nearest point on the skeleton.
    7. Combine φ1 and φ2 as follows to obtain φ. For every x S, let φ(x) = φ1(x).

∈

For every remaining state, the value φ(x) = φ1(x′) + φ2(x) is assigned, in which x′ is the nearest state to x such that x′ ∈ S. The state x′ can easily

be recorded while φ2 is computed.

If free is multiply connected, then there may be multiple ways to each xG by traveling around diﬀerent obstacles (the paths are not homotopic). The method described above does not take into account the problem that one route may have a tighter clearance than another. The given approach only optimizes the distance traveled along the skeleton; it does not, however, maximize the nearest approach to an obstacle, if there are multiple routes.

Dial’s algorithm Now consider generalizing the wavefront propagation idea. Wavefront propagation can be applied to any discrete planning problem if l(x, u) = 1 for any x X and u U(x) (except u = uT ). It is most useful when the transition graph is sparse (imagine representing the transition graph using an adjacency matrix). The grid problem is a perfect example where this becomes important. More generally, if the cost terms assume integer values, then Dial’s algorithm [272] results, which is a generalization of wavefront propagation, and a specialization of Dijkstra’s algorithm. The idea is that the priority queue can be avoided by assigning the alive vertices to buckets that correspond to diﬀerent possible cost-to-go values. In the wavefront propagation case, there are never more than two buckets needed at a time. Dial’s algorithm allows all states in the smallest cost bucket to be processed in parallel. The scheme was enhanced in [939] to yield a linear-time algorithm.

∈ ∈

Other extensions Several ideas from this section can be generalized to produce other navigation functions. One disadvantage of the methods discussed so far is that undesirable staircase motions (as shown in Figure 7.40) are produced. If the 2-neighborhood, as deﬁned in (5.38), is used to deﬁne the action spaces, then the motions will generally be shorter. Dial’s algorithm can be applied to eﬃciently compute an optimal navigation function in this case.

A grid approximation can be made to higher dimensional conﬁguration spaces. Since a high resolution is needed, however, it is practical only for a few dimensions (e.g., 3 or 4). If the 1-neighborhood is used, then wavefront propagation can be easily applied to compute navigation functions. Dial’s algorithm can be adapted for general k-neighborhoods.

Constructing navigation functions over grids may provide a practical solution in many applications. In other cases it may be unacceptable that staircase motions occur. In many cases, it may not even be possible to compute the navigation function quickly enough. Factors that inﬂuence this problem are 1) very high accuracy, and a hence high-resolution grid may be necessary; 2) the dimension of the conﬁguration space may be high; and 3) the environment may be frequently changing, and a real-time response is required. To address these issues, it is appealing to abandon grid approximations. This will require deﬁning potential functions and velocities directly on the conﬁguration space. Section 8.3 presents the background mathematical concepts to make this transition.

Discrete State Spaces

Discrete State Spaces

results matching ""

No results matching ""

22	21	22	21	20	19	18	17	16	17
21	20		15	16
20	19		14	15
19	18	17	16	15	14	13	12	13	14
18	17	16	15	14	13	12	11	12	13
	10	11	12
	9	10	11
3	2	1	2	3		8	9	10
2	1	0	1	2		7	8	9
3	2	1	2	3	4	5	6	7	8

22	21	22	21	20	19	18	17	16	17
21	20		15	16
20	19		14	15
19	18	17	16	15	14	13	12	13	14
18	17	16	15	14	13	12	11	12	13
	10	11	12
	9	10	11
3	2	1	2	3		8	9	10
2	1	0	1	2		7	8	9
3	2	1	2	3	4	5	6	7	8

22	21	22	21	20	19	18	17	16	17
21	20		15	16
20	19		14	15
19	18	17	16	15	14	13	12	13	14
18	17	16	15	14	13	12	11	12	13
	10	11	12
	9	10	11
3	2	1	2	3		8	9	10
2	1	0	1	2		7	8	9
3	2	1	2	3	4	5	6	7	8