Sampling-Based Methods for Continuous Spaces
The methods in Section 8.4 can be considered as the feedback-case analogs to the combinatorial methods of Chapter 6. Although such methods provide elegant solutions to the problem, the issue arises once again that they are either limited to lower dimensional problems or problems that exhibit some special structure. This motivates the introduction of sampling-based methods. This section presents the feedback-case analog to Chapter 5.
Figure 8.14: A navigation function and corresponding vector field can be designed as a composition of funnels.
- Computing a Composition of Funnels
Mason introduced the concept of a funnel as a metaphor for motions that converge to the same small region of the state space, regardless of the initial position [679]. As grains of sand in a funnel, they follow the slope of the funnel until they reach the opening at the bottom. A navigation function can be imagined as a funnel that guides the state into the goal. For example, the cost-to-go function depicted in Figure 8.13d can be considered as a complicated funnel that sends each piece of sand along an optimal path to the goal.
Rather than designing a single funnel, consider decomposing the state space into a collection of simple, overlapping regions. Over each region, a funnel can be designed that leads the state into another funnel; see Figure 8.14. As an example, the approach in [162] places a Lyapunov function (such functions are covered in Section 15.1.2) over each funnel to ensure convergence to the next funnel. A feedback plan can be constructed by composing several funnels. Starting from some initial state in X, a sequence of funnels is visited until the goal is reached. Each funnel essentially solves the subgoal of reaching the next funnel. Eventually, a funnel is reached that contains the goal, and a navigation function on this funnel causes the goal to be reached. In the context of sensing uncertainty, for which the funnel metaphor was developed, the composition of funnels becomes the preimage planning framework [659], which is covered in Section 12.5.1. In this section, however, it is assumed that the current state is always known.
- An approximate cover
Figure 8.15 illustrates the notion of an approximate cover, which will be used to
represent the funnel domains. Let X˜
denote a subset of a state space X. A cover
of X˜
is a collection O of sets for which
Figure 8.15: An approximate cover is shown. Every point of X˜
is contained in at
least one neighborhood, and X˜
is a subset of X.
- O ⊆ X for each O ∈ O.
- X˜
is a subset of the union of all sets in the cover:
X˜ ⊆ O. (8.50)
I
O∈O
Let each O be called a neighborhood. The notion of a cover was actually used in Section 8.3.2 to define a smooth manifold using a cover of coordinate neighborhoods.
∈ O
In general, a cover allows the following:
- Any number of neighborhoods may overlap (have nonempty intersection).
- Any neighborhood may contain points that lie outside of X˜ .
A cell decomposition, which was introduced in Section 6.3.1, is a special kind
of cover for which the neighborhoods form a partition of X˜ , and they must fit together nicely (recall Figure 6.15).
So far, no constraints have been placed on the neighborhoods. They should be chosen in practice to greatly simplify the design of a navigation function over each one. For the original motion planning problem, cell decompositions were designed to make the determination of a collision-free path trivial in each cell. The same idea applies here, except that we now want to construct a feedback plan. Therefore, it is usually assumed that the cells have a simple shape.
A cover is called approximate if X˜
is a strict subset of X. Ideally, we would
like to develop an exact cover, which implies that X˜ = X and each neighborhood has some nice property, such as being convex. Developing such covers is possible in practice for state spaces that are either low-dimensional or exhibit some special structure. This was observed for the cell decomposition methods of Chapter 6.
Consider constructing an approximate cover for X. The goal should be to cover as much of X as possible. This means that µ(X \ X˜ ) should be made as small as
Figure 8.16: A transition from O to O′ is caused by a vector field on O for which all integral curves lead into O ∩ O′.
possible, in which µ denotes Lebesgue measure, as defined in Section 5.1.3. It is
also desirable to ensure that X˜
preserves the connectivity of X. In other words,
if a path between two points exists in X, then it should also exist in X˜ .
- Defining a feedback plan over a cover
The ideas from Section 8.4.2 can be adapted to define a feedback plan over X˜
using a cover. Let Xˇ
denote a discrete state space in which each superstate
is a neighborhood. Most of the components of the associated discrete planning problems are the same as in Section 8.4.2. The only difference is in the definition of superactions because neighborhoods can overlap in a cover. For each neighborhood
O ∈ O, a superaction exists for each other neighborhood, O′ ∈ O such that
O ∩ O′ /= ∅ (usually, their interiors overlap to yield int(O) ∩ int(O′) /= ∅).
Note that in the case of a cell decomposition, this produces no superactions because it is a partition. To follow the metaphor of composing funnels, the domains of some funnels should overlap, as shown in Figure 8.14. A transition from one neighborhood, O, to another, O′, is obtained by defining a vector field on O that sends all states from O O′ into O O′; see Figure 8.16. Once O′ is reached, the vector field of O is no longer followed; instead, the vector field of O′ is used. Using the vector field of O′, a transition may be applied to reach another neighborhood. Note that the jump from the vector field of O to that of O′ may cause the feedback plan to be a discontinuous vector field on X˜ . If the cover is designed so that O O′ is large (if they intersect), then gradual transitions may be possible by blending the vector fields from O and O′.
∩
\ ∩
Once the discrete problem has been defined, a discrete feedback plan can be
computed over
Xˇ , as defined in Section 8.2. This is converted into a feedback
plan over X by defining a vector field on each neighborhood that causes the ap-
propriate transitions. Each xˇ ∈ Xˇ can be interpreted both as a superstate and
a neighborhood. For each xˇ, the discrete feedback plan produces a superaction uˇ = π(xˇ), which yields a new neighborhood xˇ′. The vector field over xˇ = O is then designed to send all states into xˇ′ = O′.
If desired, a navigation function φ over X can even be derived from a navigation function, φˇ, over Xˇ . Suppose that φˇ is constructed so that every φˇ(xˇ) is distinct
for every xˇ ∈ Xˇ . Any navigation function can be easily transformed to satisfy
this constraint (because Xˇ
is finite). Let φO denote a navigation function over
some O ∈ O. Assume that XG is a point, xG (extensions can be made to more
general cases). For every neighborhood O such that xG O, φO is defined so that performing gradient descent leads into the overlapping neighborhood for which φˇ(xˇ) is smallest. If O contains xG, the navigation function φO simply guides the state to xG.
∈ O /∈
The navigation functions over each O can be easily pieced together to yield a navigation function over all of X. In places where multiple neighborhoods overlap, φ is defined to be the navigation function associated with the neighbor-
∈ O
hood for which φˇ(xˇ) is smallest. This can be achieved by adding a large constant to each φO . Let c denote a constant for which φO (x) < c over all O and
∈ O
x O (it is assumed that each φO is bounded). Suppose that φˇ assumes only
∈
integer values. Let (x) denote the set of all O such that x O. The navigation function over X is defined as
O ∈ O ∈
φ(x) = min φO (x) + c φˇ(O) . (8.51)
J \
O∈O(x)
- A sampling-based approach
There are numerous alternative ways to construct a cover. To illustrate the ideas, an approach called the sampling-based neighborhood graph is presented here [983]. Suppose that X = free, which is a subset of some configuration space. As intro- duced in Section 5.4, let α be a dense, infinite sequence of samples in X. Assume that a collision detection algorithm is available that returns the distance, (5.28), between the robot and obstacles in the world. Such algorithms were described in Section 5.3.
C
An incremental algorithm is given in Figure 8.17. Initially, O is empty. In each iteration, if α(i) ∈ Cfree and it is not already contained in some neighborhood, then a new neighborhood is added to O. The two main concerns are 1) how to define a new neighborhood, O, such that O ⊂ Cfree, and 2) when to terminate. At any
˜
given time, the cover is approximate. The union of all neighborhoods is X, which is a strict subset of X. In comparison to Figure 8.15, the cover is a special case in which the neighborhoods do not extend beyond X˜ .
Defining new neighborhoods For defining new neighborhoods, it is important to keep them simple because during execution, the neighborhoods that contain the state x must be determined quickly. Suppose that all neighborhoods are open balls: B(x, r) = {x′ ∈ X | ρ(x, x′) < r}, (8.52)
in which ρ is the metric on C. There are efficient algorithms for determining whether x ∈ O for some O ∈ O, assuming all of the neighborhoods are balls [700].
INCREMENTAL COVER CONSTRUCTION
- Initialize O = ∅ and i = 1.
- Let x = α(i), and let d be the distance returned by the collision detection algorithm applied at x.
- If d > 0 (which implies that x ∈ Cfree) and x /∈ O for all O ∈ O, then insert a new neighborhood, On, into . The neighborhood size and shape are determined from x and d.
O
- If the termination condition is not satisfied, then let i := i + 1, and go to Step 1.
- Remove any neighborhoods from that are contained entirely inside of another neighborhood.
O
Figure 8.17: The cover is incrementally extended by adding new neighborhoods that are guaranteed to be collision-free.
In practice, methods based on Kd-trees yield good performance [47, 52] (recall Section 5.5.2). A new ball, B(x, r), can be constructed in Step 3 for x = α(i),
but what radius can be assigned? For a point robot that translates in R2 or R3, the Hausdorff distance d between the robot and obstacles in W is precisely the
distance to obs from α(i). This implies that we can set r = d, and B(x, r) is guaranteed to be collision-free.
C
In a general configuration space, it is possible to find a value of r such that B(x, r) free, but in general r < d. This issue arose in Section 5.3.4 for checking path segments. The transformations of Sections 3.2 and 3.3 become important in the determination of r. For illustrative purposes, suppose that = R2 S1,
⊆ C
A W
C ×
which corresponds to a rigid robot, , that can translate and rotate in = R2.
Each point a is transformed using (3.35). Now imagine starting with some
∈ A
configuration q = (x, y, θ) and perturbing each coordinate by some ∆x, ∆y, and
∆θ. What is the maximum distance that a point on A could travel? Translation affects all points on A the same way, but rotation affects points differently. Recall Figure 5.12 from Section 5.3.4. Let ar ∈ A denote the point that is furthest from
the origin (0, 0). Let r denote the distance from ar to the origin. If the rotation is perturbed by some small amount, ∆θ, then the displacement of any a is no more than r∆θ. If all three configuration parameters are perturbed, then
∈ A
(∆x)2 + (∆y)2 + (r∆θ)2 < d2 (8.53)
is the constraint that must be satisfied to ensure that the resulting ball is contained in free. This is actually the equation of a solid ellipsoid, which becomes a ball if
C
r = 1. This can be made into a ball by reparameterizing SE(2) so that ∆θ has the same affect as ∆x and ∆y. A transformation h : θ 1→ rθ maps θ into a new
domain Z = [0, 2πr). In this new space, the equation of the ball is
(∆x)2 + (∆y)2 + (∆z)2 < d2, (8.54)
in which ∆z represents the change in z Z. The reparameterized version of (3.35) is
∈
−
cos(θ/r) sin(θ/r) xt
T = sin(θ/r) cos(θ/r) y . (8.55)
t
0 0 1
For a 3D rigid body, similar reparameterizations can be made to Euler angles or quaternions to generate six-dimensional balls. Extensions can be made to chains of bodies [983]. One of the main difficulties, however, is that the balls are not the largest possible. In higher dimensions the problem becomes worse because numerous balls are needed, and the radii constructed as described above tend to be much smaller than what is possible. The number of balls can be reduced by also allowing axis-aligned cylinders, but it still remains difficult to construct a cover
over a large fraction of Cfree in more than six dimensions.
Termination The sampling-based planning algorithms in Chapter 5 were de- signed to terminate upon finding a solution path. In the current setting, ter- mination is complicated by the fact that we are interested in solutions from all initial configurations. Since α is dense, the volume of uncovered points in free tends to zero. After some finite number of iterations, it would be nice to measure the quality of the approximation and then terminate when the desired quality is achieved. This was also possible with the visibility sampling-based roadmap in
C
Section 5.6.2. Using random samples, an estimate of the fraction of Cfree can be
obtained by recording the percentage of failures in obtaining a sample in free that is outside of the cover. For example, if a new neighborhood is created only once in 1000 iterations, then it can be estimated that 99.9 percent of free is cov- ered. High-probability bounds can also be determined. Termination conditions are given in [983] that ensure with probability greater than Pc that at least a fraction α (0, 1) of free has been covered. The constants Pc and α are given as parame- ters to the algorithm, and it will terminate when the condition has been satisfied using rigorous statistical tests. If deterministic sampling is used, then termination can be made to occur based on the dispersion, which indicates the largest ball in free that does not contain the center of another neighborhood. One problem with volume-based criteria, such as those suggested here, is that there is no way to ensure that the cover preserves the connectivity of free. If two portions of free are connected by a narrow passage, the cover may miss a neighborhood that has very small volume yet is needed to connect the two portions.
C
C
C
∈ C
C C
Example 8.18 (2D Example of Computed Funnels) Figure 8.18 shows a 2D example that was computed using random samples and the algorithm in Figure
- Note that once a cover is computed, it can be used to rapidly compute differ- ent navigation functions and vector fields for various goals. This example is mainly
- (b)
Figure 8.18: (a) A approximate cover for a 2D configuration space. (b) Level sets of a navigation function
.
for illustrative purposes. For the case of a polygonal environment, constructing covers based on convex polygons would be more efficient. .
- Dynamic Programming with Interpolation
This section concludes Part II by solving the motion planning problem with value iteration, which was introduced in Section 2.3. It has already been applied to obtain discrete feedback plans in Section 8.2. It will now be adapted to continuous spaces by allowing interpolation first over a continuous state space and then by additionally allowing interpolation over a continuous action space. This yields a numerical approach to computing optimal navigation functions and feedback plans for motion planning. The focus will remain on backward value iteration; however, the interpolation concepts may also be applied in the forward direction. The approach here views optimal feedback motion planning as a discrete-time optimal control problem [28, 84, 151, 583].
- Using interpolation for continuous state spaces
Consider a problem formulation that is identical to Formulation 8.1 except that X is allowed to be continuous. Assume that X is bounded, and assume for now that the action space, U(x), it finite for all x X. Backward value iteration can be applied. The dynamic programming arguments and derivation are identical to
∈
Stage k + 1
Stage k
Possible next states
xk
Figure 8.19: Even though xk is a sample point, the next state, xk+1, may land between sample points. For each uk ∈ U(xk), interpolation may be needed for the
resulting next state, xk+1 = f(xk, uk).
those in Section 2.3. The resulting recurrence is identical to (2.11) and is repeated here for convenience:
u ) J \
Gk∗ (xk) = min l(xk, uk) + Gk∗+1(xk+1) . (8.56)
k ∈U (xk
The only difficulty is that G∗k(xk) cannot be stored for every xk ∈ X because X is continuous. There are two general approaches. One is to approximate G∗k using a parametric family of surfaces, such as polynomials or nonlinear basis functions derived from neural networks [97]. The other is to store G∗k only over a finite set of sample points and use interpolation to obtain its value at all other points
[582, 583].
Suppose that a finite set S X of samples is used to represent cost-to-go functions over X. The evaluation of (8.56) using interpolation is depicted in Figure
⊂
8.19. In general, the samples should be chosen to reduce the dispersion (defined in Section 5.2.3) as much as possible. This prevents attempts to approximate the cost-to-go function on large areas that contain no sample points. The rate of convergence ultimately depends on the dispersion [92] (in combination with Lipschitz conditions on the state transition equation and the cost functional). To
simplify notation and some other issues, assume that S is a grid of regularly spaced points in Rn.
First, consider the case in which X = [0, 1] ⊂ R. Let S = {s0, s1, . . . , sr},
in which si = i/r. For example, if r = 3, then S = 0, 1/3, 2/3, 1 . Note that this always yields points on the boundary of X, which ensures that for any point
{ }
in (0, 1) there are samples both above and below it. Let i be the largest integer such that si < x. This implies that si+1 > x. The samples si and si+1 are called interpolation neighbors of x.
The value of G∗k+1 in (8.56) at any x [0, 1] can be obtained via linear inter- polation as
∈
G∗k+1(x) ≈ αG∗k+1(si) + (1 − α)G∗k+1(si+1), (8.57)
in which the coefficient α ∈ [0, 1] is computed as
α = 1 x − si . (8.58)
−
r
If x = si, then α = 1, and (8.57) reduces to Gk∗+1(si), as expected. If x = si+1, then α = 0, and (8.57) reduces to G∗k+1(si+1). At all points in between, (8.57) blends the cost-to-go values at si and si+1 using α to provide the appropriate weights.
n = 2 n = 3
Figure 8.20: Barycentric subdivision can be used to partition each cube into sim- plexes, which allows interpolation to be performed in O(n lg n) time, instead of O(2n).
The interpolation idea can be naturally extended to multiple dimensions. Let X be a bounded subset of Rn. Let S represent an n-dimensional grid of points in Rn. Each sample in S is denoted by s(i1, i2, . . . , in). For some x X, there are 2n interpolation neighbors that “surround” it. These are the corners of an
∈
- imensional cube that contains x. Let x = (x1, . . . , xn). Let ij denote the largest integer for which the jth coordinate of s(i1, i2, . . . , in) is less than xj . The 2n samples are all those for which either ij or ij + 1 appears in the expression s( , , . . . , ), for each j 1, . . . , n . This requires that samples exist in S for all of these cases. Note that X may be a complicated subset of Rn, provided that for
n n
· · · ∈ { }
any x ∈ X, all of the required 2 interpolation neighbors are in S. Using the 2
interpolation neighbors, the value of G∗k+1 in (8.56) on any x X can be obtained via multi-linear interpolation. In the case of n = 2, this is expressed as
∈
G∗k+1(x) ≈ α1α2 G∗k+1(s(i1, i2))+
α1(1 − α2) G∗k+1(s(i1, i2 + 1))+ (1 − α1)α2 G∗k+1(s(i1 + 1, i2))+
(1 − α1)(1 − α2) G∗k+1(s(i1 + 1, i2 + 1)),
(8.59)
in which α1 and α2 are defined similarly to α in (8.58) but are based on distances along the x1 and x2 directions, respectively. The expressions for multi-linear in- terpolation in higher dimensions are similar but are more cumbersome to express. Higher order interpolation, such a quadratic interpolation may alternatively be used [583].
Unfortunately, the number of interpolation neighbors grows exponentially with the dimension, n. Instead of using all 2n interpolation neighbors, one improvement is to decompose the cube defined by the 2n samples into simplexes. Each simplex
Cobs
∂_C_free
- (b) (c)
Figure 8.21: (a) An interpolation region, R(S), is shown for a set of sample points,
S. (b) The interpolation region that arises due to obstacles. (c) The interpolation region for goal points must not be empty.
has only n + 1 samples as its vertices. Only the vertices of the simplex that con- tains x are declared to be the interpolation neighbors of x; this reduces the cost of evaluating G∗k+1(x) to O(n) time. The problem, however, is that determining the simplex that contains x may be a challenging point-location problem (a com- mon problem in computational geometry [264]). If barycentric subdivision is used to decompose the cube using the midpoints of all faces, then the point-location problem can be solved in O(n lg n) time [263, 607, 721], which is an improve- ment over the O(2n) scheme described above. Examples of this decomposition are
shown for two and three dimensions in Figure 8.20. This is sometimes called the Coxeter-Freudenthal-Kuhn triangulation. Even though n is not too large due to practical performance considerations (typically, n 6), substantial savings occur in implementations, even for n = 3.
≤
It will be convenient to refer directly to the set of all points in X for which all required interpolation neighbors exist. For any finite set S X of sample points, let the interpolation region R(S) be the set of all x X S for which G∗(x) can be computed by interpolation. This means that x R(S) if and only if all interpolation neighbors of x lie in S. Figure 8.21a shows an example. Note that some sample points may not contribute any points to R. If a grid of samples is used to approximate G∗, then the volume of X R(S) approaches zero as the sampling resolution increases.
∈
\
∈ \
⊆
Continuous action spaces Now suppose that U(x) is continuous, in addition to X. Assume that U(x) is both a closed and bounded subset of Rn. Once again, the dynamic programming recurrence, (8.56), remains the same. The trouble now is that the min represents an optimization problem over an uncountably infinite number of choices. One possibility is to employ nonlinear optimization techniques to select the optimal u U(x). The effectiveness of this depends heavily on U(x), X, and the cost functional.
∈
Another approach is to evaluate (8.56) over a finite set of samples drawn from U(x). Again, it is best to choose samples that reduce the dispersion as much as possible. In some contexts, it may be possible to eliminate some actions from
consideration by carefully utilizing the properties of the cost-to-go function and its representation via interpolation.
- The connection to feedback motion planning
The tools have now been provided to solve motion planning problems using value iteration. The configuration space is a continuous state space; let X = Cfree. The
action space is also continuous, U(x) = Tx(X). For motion planning problems,
0 Tx(X) is only obtained only when uT is applied. Therefore, it does not need to be represented separately. To compute optimal cost-to-go functions for motion planning, the main concerns are as follows:
∈
- The action space must be bounded.
- A discrete-time approximation must be made to derive a state transition equation that works over stages.
- The cost functional must be discretized.
- The obstacle region, Cobs, must be taken into account.
- At least some interpolation region must yield G∗(x) = 0, which represents the goal region.
We now discuss each of these.
Bounding the action space Recall that using normalized vector fields does not alter the existence of solutions. This is convenient because U(x) needs to be bounded to approximate it with a finite set of samples. It is useful to restrict the action set to obtain
U(x) = {u ∈ R | IuI ≤ 1}. (8.60)
n
To improve performance, it is sometimes possible to use only those u for which u = 1 or u = 0; however, numerical instability problems may arise. A finite sample set for U(x) should have low dispersion and always include u = 0.
I I
Obtaining a state transition equation Value iterations occur over discrete stages; however, the integral curves of feedback plans occur over continuous time. Therefore, the time interval T needs to be sampled. Let ∆t denote a small positive constant that represents a fixed interval of time. Let the stage index k refer to time
(k − 1)∆t. Now consider representing a velocity field x˙
over Rn. By definition,
dx
= lim
x(t + ∆t) − x(t). (8.61)
dt ∆t→0 ∆t
In Section 8.3.1, a velocity field was defined by assigning some u U(x) to each x X. If the velocity vector u is integrated from x(t) over a small ∆t, then a new state, x(t + ∆t), results. If u remains constant, then
∈
∈
x(t + ∆t) = x(t) + ∆t u, (8.62)
which is called an Euler approximation. If a feedback plan is executed, then u is determined from x via u = π(x(t)). In general, this means that u could vary as the state is integrated forward. In this case, (8.62) is only approximate,
x(t + ∆t) ≈ x(t) + ∆t π(x(t)). (8.63)
The expression in (8.62) can be considered as a state transition equation that works over stages. Let xk+1 = x(t + ∆t) and xk = x(t). The transitions can now be expressed as
xk+1 = f(xk, u) = xk + ∆t u. (8.64)
The quality of the approximation improves as ∆t decreases. Better approxi- mations can be made by using more sample points along time. The most widely known approximations are the Runge-Kutta family. For optimal motion planning, it turns out that the direction vector almost always remains constant along the in- tegral curve. For example, in Figure 8.13d, observe that piecewise-linear paths are obtained by performing gradient descent of the optimal navigation function. The direction vector is constant over most of the resulting integral curve (it changes only as obstacles are contacted). Therefore, approximation problems tend not to arise in motion planning problems. When approximating dynamical systems, such as those presented in Chapter 13, then better approximations are needed; see Section 14.3.2. One important concern is that ∆t is chosen in a way that is com- patible with the grid resolution. If ∆t is so small that the actions do not change the state enough to yield new interpolation neighbors, then the interpolated cost- to-go values will remain constant. This implies that ∆t must be chosen to ensure that x(t + ∆t) has a different set of interpolation neighbors than x(t).
An interesting connection can be made to the approximate motion planning problem that was developed in Section 7.7. Formulation 7.4 corresponds precisely to the approximation defined here, except that ǫ was used instead of ∆t because velocities were not yet considered (also, the initial condition was specified because there was no feedback). Recall the different possible action spaces shown in Figure
7.41. As stated in Section 7.7, if the Manhattan or independent-joint models are used, then the configurations remain on a grid of points. This enables discrete value iterations to be performed. A discrete feedback plan and navigation function, as considered in Section 8.2.3, can even be computed. If the Euclidean motion model is used, which is more natural, then the transitions allow a continuum of possible configurations. This case can finally be handled by using interpolation over the configuration space, as described in this section.
Approximating the cost functional A discrete cost functional must be de- rived from the continuous cost functional, (8.39). The final term is just assigned as lF (xF ) = lF (x(tf )). The cost at each stage is
r ∆t
ld(xk, uk) =
l(x(t), u(t))dt, (8.65)
0
0
ld(xk, uk) =
l(x(t), u(t))dt, (8.65)
and ld(xk, uk) is used in the place of l(xk, uk) in (8.56). For many problems, the integral does not need to be computed repeatedly. To obtain Euclidean shortest paths, ld(xk, uk) = uk can be safely assigned for all xk X and uk U(xk). A reasonable approximation to (8.65) if ∆t is small is l(x(t), u(t))∆t.
I I ∈ ∈
Handling obstacles A simple way to handle obstacles is to determine for each x S whether x obs. This can be computed and stored in an array before the value iterations are performed. For rigid robots, this can be efficiently computed using fast Fourier transforms [513]. For each x obs, G∗(x) = . No value iterations are performed on these states; their values must remain at infinity. During the evaluation of (8.59) (or a higher dimensional version), different actions are attempted. For each action, it is required that all of the interpolation neighbors of xk+1 lie in free. If one of them lies in obs, then that action produces infinite cost. This has the effect of automatically reducing the interpolation region, R(S),
C C
∈ ∈ C
∈ C ∞
to all cubes whose vertices all lie in Cfree, as shown in Figure 8.21b. All samples in
obs are assumed to be deleted from S in the remainder of this section; however, the full grid is still used for interpolation so that infinite values represent the obstacle region.
C
Note that as expressed so far, it is possible that points in obs may lie in R(S) because collision detection is performed only on the samples. In practice, either the grid resolution must be made fine enough to minimize the chance of this error occurring or distance information from a collision detection algorithm must be used to infer that a sufficiently large ball around each sample is collision free. If an interpolation region cannot be assured to lie in free, then the resolution may have to be increased, at least locally.
C
C
Handling the goal region Recall that backward value iterations start with the final cost-to-go function and iterate backward. Initially, the final cost-to-go is assigned as infinity at all states except those in the goal. To properly initialize the final cost-to-go function, there must exist some subset of X over which the zero value can be obtained by interpolation. Let G = S XG. The requirement is that the interpolation region R(G) must be nonempty. If this is not satisfied, then the grid resolution needs to be increased or the goal set needs to be enlarged. If Xg is a single point, then it needs to be enlarged, regardless of the resolution (unless an alternative way to interpolate near a goal point is developed). In the interpolation region shown in Figure 8.21c, all states in the vicinity of xG yield an interpolated cost-to-go value of zero. If such a region did not exist, then all costs would remain at infinity during the evaluation of (8.59) from any state. Note that ∆t must be chosen large enough to ensure that new samples can reach G.
∩
Using G∗ as a navigation function After the cost-to-go values stabilize, the resulting cost-to-go function, G∗ can be used as a navigation function. Even though G∗ is defined only over S X, the value of the navigation function can be obtained using interpolation over any point in R(S). The optimal action is selected as the
⊂
one that satisfies the min in (8.6). This means that the state trajectory does not have to visit the grid points as in the Manhattan model. A trajectory can visit any point in R(S), which enables trajectories to converge to the true optimal solution as ∆t and the grid spacing tend to zero.
Topological considerations So far there has been no explicit consideration of the topology of . Assuming that is a manifold, the concepts discussed so far can be applied to any open set on which coordinates are defined. In practice, it is often convenient to use the manifold representations of Section 4.1.2. The manifold can be expressed as a cube, [0, 1]n, with some faces identified to obtain [0, 1]n/ . Over the interior of the cube, all of the concepts explained in this section work without modification. At the boundary, the samples used for interpolation must take the identification into account. Furthermore, actions, uk, and next states, xk+1, must function correctly on the boundary. One must be careful, however, in declaring that some solution is optimal, because Euclidean shortest paths depend on the manifold parameterization. This ambiguity is usually resolved by formulating the cost in terms of some physical quantity, such as time or energy. This often requires modeling dynamics, which will be covered in Part IV.
∼
C C
Value iteration with interpolation is extremely general. It is a generic algorithm for approximating the solution to optimal control problems. It can be applied to solve many of the problems in Part IV by restricting U(x) to take into account complicated differential constraints. The method can also be extended to problems that involve explicit uncertainty in predictability. This version of value iteration is covered in Section 10.6.
- Obtaining Dijkstra-like algorithms
For motion planning problems, it is expected that x(t + ∆t), as computed from (8.62), is always close to x(t) relative to the size of X. This suggests the use of a Dijkstra-like algorithm to compute optimal feedback plans more efficiently. As discussed for the finite case in Section 2.3.3, many values remain unchanged during the value iterations, as indicated in Example 2.5. Dijkstra’s algorithm maintains a data structure that focuses the computation on the part of the state space where values are changing. The same can be done for the continuous case by carefully considering the sample points [607].
During the value iterations, there are three kinds of sample points, just as in the discrete case (recall from Section 2.3.3):
- Dead: The cost-to-go has stabilized to its optimal value.
- Alive: The current cost-to-go is finite, but it is not yet known whether the value is optimal.
- Unvisited: The cost-to-go value remains at infinity because the sample has not been reached.
The sets are somewhat harder to maintain for the case of continuous state spaces because of the interaction between the sample set S and the interpolated region R(S).
Imagine the first value iteration. Initially, all points in G are set to zero values. Among the collection of samples S, how many can reach R(G) in a single stage? We expect that samples very far from G will not be able to reach R(G); this keeps their values are infinity. The samples that are close to G should reach it. It would be convenient to prune away from consideration all samples that are too far from G to lower their value. In every iteration, we eliminate iterating over samples that are too far away from those already reached. It is also unnecessary to iterate over the dead samples because their values no longer change.
To keep track of reachable samples, it will be convenient to introduce the
notion of a backprojection, which will be studied further in Section 10.1. For a single state, x ∈ X, its backprojection is defined as
B(x) = {x′ ∈ X | ∃u′ ∈ U(x′) such that x = f(x′, u′)}. (8.66)
The backprojection of a set, X′ X, of points is just the union of backprojections for each point:
⊆
I
B(X′) = B(x). (8.67)
x∈X
′
Now consider a version of value iteration that uses backprojections to eliminate some states from consideration because it is known that their values cannot change. Let i refer to the number of stages considered by the current value iteration. During the first iteration, i = 1, which means that all one-stage trajectories are considered.
Let S be the set of samples (assuming already that none lie in Cobs). Let Di and
Ai refer to the dead and alive samples, respectively. Initially, D1 = G, the set of
samples in the goal set. The first set, A1, of alive samples is assigned by using the concept of a frontier. The frontier of a set S′ ⊆ S of sample points is
Front(S′) = (B(R(S′)) \ S′) ∩ S. (8.68)
This is the set of sample points that can reach R(S′) in one stage, excluding those already in S′. Figure 8.22 illustrates the frontier. Using (8.68), A1 is defined as A1 = Front(D1).
Now the approach is described for iteration i. The cost-to-go update (8.56) is computed at all points in Ai. If G∗k+1(s) = G∗k(s) for some s ∈ Ai, then s is declared
dead and moved to Di+1. Samples are never removed from the dead set; therefore, all points in Di are also added to Di+1. The next active set, Ai+1, includes all samples in Ai, excluding those that were moved to the dead set. Furthermore, all samples in Front(Ai) are added to Ai+1 because these will produce a finite cost-to- go value in the next iteration. The iterations continue as usual until some stage, m, is reached for which Am is empty, and Dm includes all samples from which the goal can be reached (under the approximation assumptions made in this section). For efficiency purposes, an approximation to Front may be used, provided that the true frontier is a proper subset of the approximate frontier. For example, the
(b)
Figure 8.22: An illustration of the frontier concept: (a) the shaded disc indicates the set of all reachable points in one stage, from the sample on the left. The sample cannot reach in one stage the shaded region on the right, which represents R(S′). (b) The frontier is the set of samples that can reach R(S′). The inclusion of the frontier increases the interpolation region beyond R(S′).
frontier might add all new samples within a specified radius of points in S′. In this case, the updated cost-to-go value for some s ∈ Ai may remain infinite. If
this occurs, it is of course not added to Di+1. Furthermore, it is deleted from Ai in the computation of the next frontier (the frontier should only be computed for samples that have finite cost-to-go values).
The approach considered so far can be expected to reduce the amount of com- putations in each value iteration by eliminating the evaluation of (8.56) on unnec- essary samples. The same cost-to-go values are obtained in each iteration because only samples for which the value cannot change are eliminated in each iteration. The resulting algorithm still does not, however, resemble Dijkstra’s algorithm be- cause value iterations are performed over all of Ai in each stage.
To make a version that behaves like Dijkstra’s algorithm, a queue Q will be introduced. The algorithm removes the smallest element of Q in each iteration. The interpolation version first assigns G∗(s) = 0 for each s G. It also maintains a set D of dead samples, which is initialized to D = G. For each s Front(G), the cost-to-go update (8.56) is computed. The priority Q is initialized to Front(G), and elements are sorted by their current cost-to-go values (which may not be optimal). The algorithm iteratively removes the smallest element from Q (because its optimal cost-to-go is known to be the current value) and terminates when Q is empty. Each time the smallest element, ss Q, is removed, it is inserted into
∈
∈
∈
D. Two procedures are then performed: 1) Some elements in the queue need to have their cost-to-go values recomputed using (8.56) because the value G∗(ss) is known to be optimal, and their values may depend on it. These are the samples in Q that lie in Front(D) (in which D just got extended to include ss). 2) Any samples in B(R(D)) that are not in Q are inserted into Q after computing their cost-to-go values using (8.56). This enables the active set of samples to grow as the set of dead samples grows. Dijkstra’s algorithm with interpolation does not compute values that are identical to those produced by value iterations because
G∗k+1 is not explicitly stored when G∗k is computed. Each computed value is some
cost-to-go, but it is only known to be the optimal when the sample is placed into
- It can be shown, however, that the method converges because computed values are no higher than what would have been computed in a value iteration. This is also the basis of dynamic programming using Gauss-Seidel iterations [96].
A specialized, wavefront-propagation version of the algorithm can be made for the special case of finding solutions that reach the goal in the smallest number of stages. The algorithm is similar to the one shown in Figure 8.4. It starts with an
initial wavefront W0 = G in which G∗(s) = 0 for each s ∈ G. In each iteration,
the optimal cost-to-go value i is increased by one, and the wavefront, Wi+1, is computed from Wi as Wi+1 = Front(Wi). The algorithm terminates at the first iteration in which the wavefront is empty.
Further Reading
There is much less related literature for this chapter in comparison to previous chapters. As explained in Section 8.1, there are historical reasons why feedback is usually separated from motion planning. Navigation functions [541, 829] were one of the most influential ideas in bringing feedback into motion planning; therefore, navigation functions were a common theme throughout the chapter. For other works that use or develop navi-
gation functions, see [206, 274, 750]. The ideas of progress measures [317], Lyapunov
functions (covered in Section 15.1.1), and cost-to-go functions are all closely related.
For Lyapunov-based design of feedback control laws, see [278]. In the context of motion planning, the Error Detection and Recovery (EDR) framework also contains feedback ideas [284].
In [325], the topological complexity of C-spaces is studied by characterizing the min- imum number of regions needed to cover C × C by defining a continuous path function
over each region. This indicates limits on navigation functions that can be constructed, assuming that both qI and qG are variables (throughout this chapter, qG was instead fixed). Further work in this direction includes [326, 327].
To gain better intuitions about properties of vector fields, [44] is a helpful reference, filled with numerous insightful illustrations. A good introduction to smooth manifolds that is particularly suited for control-theory concepts is [133]. Basic intuitions for 2D and 3D curves and surfaces can be obtained from [753]. Other sources for smooth manifolds and differential geometry include [4, 107, 234, 279, 872, 906, 960]. For discussions of
piecewise-smooth vector fields, see [27, 634, 846, 998].
Sections 8.4.2 and 8.4.3 were inspired by [235, 643] and [708], respectively. Many difficulties were avoided because discontinuous vector fields were allowed in these ap- proaches. By requiring continuity or smoothness, the subject of Section 8.4.4 was ob- tained. The material is based mainly on [829, 830]. Other work on navigation functions includes [249, 651, 652].
Section 8.5.1 was inspired mainly by [162, 679], and the approach based on neigh- borhood graphs is drawn from [983].
Value iteration with interpolation, the subject of Section 8.5.2, is sometimes forgot- ten in motion planning because computers were not powerful enough at the time it was developed [84, 85, 582, 583]. Presently, however, solutions can be computed for chal-
lenging problems with several dimensions (e.g., 3 or 4). Convergence of discretized value iteration to solving the optimal continuous problem was first established in [92], based on Lipschitz conditions on the state transition equation and cost functional. Analy- ses that take interpolation into account, and general discretization issues, appear in [168, 292, 400, 565, 567]. A multi-resolution variant of value iteration was proposed in [722]. The discussion of Dijkstra-like versions of value iteration was based on [607, 946]. The level-set method is also closely related [532, 534, 533, 862].
Exercises
- Suppose that a very fast path planning algorithm runs on board of a mobile robot (for example, it may find an answer in a few milliseconds, which is reasonable using trapezoidal decomposition in R2). Explain how this method can be used to simulate having a feedback plan on the robot. Explain the issues and trade- offs between having a fast on-line algorithm that computes open-loop plans vs. a better off-line algorithm that computes a feedback plan.
- Use Dijkstra’s algorithm to construct navigation functions on a 2D grid with obstacles. Experiment with adding a penalty to the cost functional for getting too close to obstacles.
- If there are alternative routes, the NF2 algorithm does not necessarily send the state along the route that has the largest maximum clearance. Fix the NF2 algorithm so that it addresses this problem.
- Tangent space problems:
- For the manifold of unit quaternions, find basis vectors for the tangent space in R4 at any point.
- Find basis vectors for the tangent space in R9, assuming that matrices in
SO(3) are parameterized with quaternions, as shown in (4.20).
- Extend the algorithm described in Section 8.4.3 to make it work for polygons that have holes. See Example 8.16 for a similar problem.
- Give a complete algorithm that uses the vertical cell decomposition for a polygonal obstacle region in R2 to construct a vector field that serves as a feedback plan. The vector field may be discontinuous.
- Figure 8.23 depicts a 2D example for which Xfree is an open annulus. Consider designing a vector field for which all integral curves flow into XG and the vector field is continuous outside of XG. Either give a vector field that achieves this or explain why it is not possible.
- Use the maximum-clearance roadmap idea from Section 6.2.3 to define a cell de- composition and feedback motion plan (vector field) that maximizes clearance. The vector field may be discontinuous.
Figure 8.23: Consider designing a continuous vector field that flows into XG.
- Develop an algorithm that computes an exact cover for a polygonal configuration space and ensures that if two neighborhoods intersect, then their intersection always contains an open set (i.e., the overlap region is two-dimensional). The neighborhoods in the cover should be polygonal.
- Using a distance measurement and Euler angles, determine the expression for a collision-free ball that can be inferred (make the ball as large as possible). This should generalize (8.54).
- Using a distance measurement and quaternions, determine the expression for a collision-free ball (once again, make it as large as possible).
- Generalize the multi-linear interpolation scheme in (8.59) from 2 to n dimensions.
- Explain the convergence problems for value iteration that can result if IuI = 1 is used to constraint the set of allowable actions, instead of IuI ≤ 1.
Implementations
- Experiment with numerical methods for solving the function (8.49) in two dimen- sions under various boundary conditions. Report on the efficiency and accuracy of the methods. How well can they be applied in higher dimensions?
- Implement value iteration with interpolation (it is not necessary to use the method in Figure 8.20) for a polygonal robot that translates and rotates among polygonal
obstacles in W = R2. Define the cost functional so that the distance traveled is
obtained with respect to a weighted Euclidean metric (the weights that compare
rotation to translation can be set arbitrarily).
- Evaluate the efficiency of the interpolation method shown in Figure 8.20 applied to multi-linear interpolation given by generalizing (8.59) as in Exercise 12. You do not need to implement the full value iteration approach (alternatively, this could be done, which provides a better comparison of the overall performance).
- Implement the method of Section 8.4.2 of computing vector fields on a triangula- tion. For given input polygons, have your program draw a needle diagram of the
computed vector field. Determine how fast the vector field can be recomputed as the goal changes.
- Optimal navigation function problems:
- Implement the algorithm illustrated in Figure 8.13. Show the level sets of the optimal cost-to-go function.
- Extend the algorithm and implementation to the case in which there are polygonal holes in Xfree.
- Adapt value iteration with interpolation so that a point robot moving in the plane can keep track of a predictable moving point called a target. The cost functional should cause a small penalty to be added if the target is not visible. Optimizing
- Optimal navigation function problems:
this should minimize the amount of time that the target is not visible. Assume that the initial configuration of the robot is given. Compute optimal feedback plans for the robot.
- Try to experimentally construct navigation functions by adding potential functions that repel the state away from obstacles and attract the state toward xG. For simplicity, you may assume that X = R2 and the obstacles are discs. Start with a single disc and then gradually construct more complicated obstacle regions. How difficult is it to ensure that the resulting potential function has no local minima outside of xG?