Sampling-Based Methods for Continuous Spaces

The methods in Section 8.4 can be considered as the feedback-case analogs to the combinatorial methods of Chapter 6. Although such methods provide elegant solutions to the problem, the issue arises once again that they are either limited to lower dimensional problems or problems that exhibit some special structure. This motivates the introduction of sampling-based methods. This section presents the feedback-case analog to Chapter 5.

Figure 8.14: A navigation function and corresponding vector ﬁeld can be designed as a composition of funnels.

- 1. Computing a Composition of Funnels

Mason introduced the concept of a funnel as a metaphor for motions that converge to the same small region of the state space, regardless of the initial position [679]. As grains of sand in a funnel, they follow the slope of the funnel until they reach the opening at the bottom. A navigation function can be imagined as a funnel that guides the state into the goal. For example, the cost-to-go function depicted in Figure 8.13d can be considered as a complicated funnel that sends each piece of sand along an optimal path to the goal.

Rather than designing a single funnel, consider decomposing the state space into a collection of simple, overlapping regions. Over each region, a funnel can be designed that leads the state into another funnel; see Figure 8.14. As an example, the approach in [162] places a Lyapunov function (such functions are covered in Section 15.1.2) over each funnel to ensure convergence to the next funnel. A feedback plan can be constructed by composing several funnels. Starting from some initial state in X, a sequence of funnels is visited until the goal is reached. Each funnel essentially solves the subgoal of reaching the next funnel. Eventually, a funnel is reached that contains the goal, and a navigation function on this funnel causes the goal to be reached. In the context of sensing uncertainty, for which the funnel metaphor was developed, the composition of funnels becomes the preimage planning framework [659], which is covered in Section 12.5.1. In this section, however, it is assumed that the current state is always known.

- - 1. An approximate cover

Figure 8.15 illustrates the notion of an approximate cover, which will be used to

represent the funnel domains. Let X˜

denote a subset of a state space X. A cover

of X˜

is a collection O of sets for which

Figure 8.15: An approximate cover is shown. Every point of X˜

is contained in at

least one neighborhood, and X˜

is a subset of X.

- - - 1. O ⊆ X for each O ∈ O.
      2. X˜

is a subset of the union of all sets in the cover:

X˜ ⊆ O. (8.50)

O∈O

Let each O be called a neighborhood. The notion of a cover was actually used in Section 8.3.2 to deﬁne a smooth manifold using a cover of coordinate neighborhoods.

∈ O

In general, a cover allows the following:

Any number of neighborhoods may overlap (have nonempty intersection).
Any neighborhood may contain points that lie outside of X˜ .

A cell decomposition, which was introduced in Section 6.3.1, is a special kind

of cover for which the neighborhoods form a partition of X˜ , and they must ﬁt together nicely (recall Figure 6.15).

So far, no constraints have been placed on the neighborhoods. They should be chosen in practice to greatly simplify the design of a navigation function over each one. For the original motion planning problem, cell decompositions were designed to make the determination of a collision-free path trivial in each cell. The same idea applies here, except that we now want to construct a feedback plan. Therefore, it is usually assumed that the cells have a simple shape.

A cover is called approximate if X˜

is a strict subset of X. Ideally, we would

like to develop an exact cover, which implies that X˜ = X and each neighborhood has some nice property, such as being convex. Developing such covers is possible in practice for state spaces that are either low-dimensional or exhibit some special structure. This was observed for the cell decomposition methods of Chapter 6.

Consider constructing an approximate cover for X. The goal should be to cover as much of X as possible. This means that µ(X \ X˜ ) should be made as small as

Figure 8.16: A transition from O to O′ is caused by a vector ﬁeld on O for which all integral curves lead into O ∩ O′.

possible, in which µ denotes Lebesgue measure, as deﬁned in Section 5.1.3. It is

also desirable to ensure that X˜

preserves the connectivity of X. In other words,

if a path between two points exists in X, then it should also exist in X˜ .

- - 1. Defining a feedback plan over a cover

The ideas from Section 8.4.2 can be adapted to deﬁne a feedback plan over X˜

using a cover. Let Xˇ

denote a discrete state space in which each superstate

is a neighborhood. Most of the components of the associated discrete planning problems are the same as in Section 8.4.2. The only diﬀerence is in the deﬁnition of superactions because neighborhoods can overlap in a cover. For each neighborhood

O ∈ O, a superaction exists for each other neighborhood, O′ ∈ O such that

O ∩ O′ /= ∅ (usually, their interiors overlap to yield int(O) ∩ int(O′) /= ∅).

Note that in the case of a cell decomposition, this produces no superactions because it is a partition. To follow the metaphor of composing funnels, the domains of some funnels should overlap, as shown in Figure 8.14. A transition from one neighborhood, O, to another, O′, is obtained by deﬁning a vector ﬁeld on O that sends all states from O O′ into O O′; see Figure 8.16. Once O′ is reached, the vector ﬁeld of O is no longer followed; instead, the vector ﬁeld of O′ is used. Using the vector ﬁeld of O′, a transition may be applied to reach another neighborhood. Note that the jump from the vector ﬁeld of O to that of O′ may cause the feedback plan to be a discontinuous vector ﬁeld on X˜ . If the cover is designed so that O O′ is large (if they intersect), then gradual transitions may be possible by blending the vector ﬁelds from O and O′.

∩

\ ∩

Once the discrete problem has been deﬁned, a discrete feedback plan can be

computed over

Xˇ , as deﬁned in Section 8.2. This is converted into a feedback

plan over X by deﬁning a vector ﬁeld on each neighborhood that causes the ap-

propriate transitions. Each xˇ ∈ Xˇ can be interpreted both as a superstate and

a neighborhood. For each xˇ, the discrete feedback plan produces a superaction uˇ = π(xˇ), which yields a new neighborhood xˇ′. The vector ﬁeld over xˇ = O is then designed to send all states into xˇ′ = O′.

If desired, a navigation function φ over X can even be derived from a navigation function, φˇ, over Xˇ . Suppose that φˇ is constructed so that every φˇ(xˇ) is distinct

for every xˇ ∈ Xˇ . Any navigation function can be easily transformed to satisfy

this constraint (because Xˇ

is ﬁnite). Let φO denote a navigation function over

some O ∈ O. Assume that XG is a point, xG (extensions can be made to more

general cases). For every neighborhood O such that xG O, φO is deﬁned so that performing gradient descent leads into the overlapping neighborhood for which φˇ(xˇ) is smallest. If O contains xG, the navigation function φO simply guides the state to xG.

∈ O /∈

The navigation functions over each O can be easily pieced together to yield a navigation function over all of X. In places where multiple neighborhoods overlap, φ is deﬁned to be the navigation function associated with the neighbor-

∈ O

hood for which φˇ(xˇ) is smallest. This can be achieved by adding a large constant to each φO . Let c denote a constant for which φO (x) < c over all O and

∈ O

x O (it is assumed that each φO is bounded). Suppose that φˇ assumes only

∈

integer values. Let (x) denote the set of all O such that x O. The navigation function over X is deﬁned as

O ∈ O ∈

φ(x) = min φO (x) + c φˇ(O) . (8.51)

J ＼

O∈O(x)

- - 1. A sampling-based approach

There are numerous alternative ways to construct a cover. To illustrate the ideas, an approach called the sampling-based neighborhood graph is presented here [983]. Suppose that X = free, which is a subset of some conﬁguration space. As intro- duced in Section 5.4, let α be a dense, inﬁnite sequence of samples in X. Assume that a collision detection algorithm is available that returns the distance, (5.28), between the robot and obstacles in the world. Such algorithms were described in Section 5.3.

An incremental algorithm is given in Figure 8.17. Initially, O is empty. In each iteration, if α(i) ∈ Cfree and it is not already contained in some neighborhood, then a new neighborhood is added to O. The two main concerns are 1) how to deﬁne a new neighborhood, O, such that O ⊂ Cfree, and 2) when to terminate. At any

given time, the cover is approximate. The union of all neighborhoods is X, which is a strict subset of X. In comparison to Figure 8.15, the cover is a special case in which the neighborhoods do not extend beyond X˜ .

Defining new neighborhoods For deﬁning new neighborhoods, it is important to keep them simple because during execution, the neighborhoods that contain the state x must be determined quickly. Suppose that all neighborhoods are open balls: B(x, r) = {x′ ∈ X | ρ(x, x′) < r}, (8.52)

in which ρ is the metric on C. There are eﬃcient algorithms for determining whether x ∈ O for some O ∈ O, assuming all of the neighborhoods are balls [700].

INCREMENTAL COVER CONSTRUCTION

- - - 1. Initialize O = ∅ and i = 1.
      2. Let x = α(i), and let d be the distance returned by the collision detection algorithm applied at x.
      3. If d > 0 (which implies that x ∈ Cfree) and x /∈ O for all O ∈ O, then insert a new neighborhood, On, into . The neighborhood size and shape are determined from x and d.

- - - 1. If the termination condition is not satisﬁed, then let i := i + 1, and go to Step 1.
      2. Remove any neighborhoods from that are contained entirely inside of another neighborhood.

Figure 8.17: The cover is incrementally extended by adding new neighborhoods that are guaranteed to be collision-free.

In practice, methods based on Kd-trees yield good performance [47, 52] (recall Section 5.5.2). A new ball, B(x, r), can be constructed in Step 3 for x = α(i),

but what radius can be assigned? For a point robot that translates in R2 or R3, the Hausdorﬀ distance d between the robot and obstacles in W is precisely the

distance to obs from α(i). This implies that we can set r = d, and B(x, r) is guaranteed to be collision-free.

In a general conﬁguration space, it is possible to ﬁnd a value of r such that B(x, r) free, but in general r < d. This issue arose in Section 5.3.4 for checking path segments. The transformations of Sections 3.2 and 3.3 become important in the determination of r. For illustrative purposes, suppose that = R2 S1,

⊆ C

A W

C ×

which corresponds to a rigid robot, , that can translate and rotate in = R2.

Each point a is transformed using (3.35). Now imagine starting with some

∈ A

conﬁguration q = (x, y, θ) and perturbing each coordinate by some ∆x, ∆y, and

∆θ. What is the maximum distance that a point on A could travel? Translation aﬀects all points on A the same way, but rotation aﬀects points diﬀerently. Recall Figure 5.12 from Section 5.3.4. Let ar ∈ A denote the point that is furthest from

the origin (0, 0). Let r denote the distance from ar to the origin. If the rotation is perturbed by some small amount, ∆θ, then the displacement of any a is no more than r∆θ. If all three conﬁguration parameters are perturbed, then

∈ A

(∆x)2 + (∆y)2 + (r∆θ)2 < d2 (8.53)

is the constraint that must be satisﬁed to ensure that the resulting ball is contained in free. This is actually the equation of a solid ellipsoid, which becomes a ball if

r = 1. This can be made into a ball by reparameterizing SE(2) so that ∆θ has the same aﬀect as ∆x and ∆y. A transformation h : θ 1→ rθ maps θ into a new

domain Z = [0, 2πr). In this new space, the equation of the ball is

(∆x)2 + (∆y)2 + (∆z)2 < d2, (8.54)

in which ∆z represents the change in z Z. The reparameterized version of (3.35) is

∈

 − 

cos(θ/r) sin(θ/r) xt

T = sin(θ/r) cos(θ/r) y . (8.55)





0 0 1

For a 3D rigid body, similar reparameterizations can be made to Euler angles or quaternions to generate six-dimensional balls. Extensions can be made to chains of bodies [983]. One of the main diﬃculties, however, is that the balls are not the largest possible. In higher dimensions the problem becomes worse because numerous balls are needed, and the radii constructed as described above tend to be much smaller than what is possible. The number of balls can be reduced by also allowing axis-aligned cylinders, but it still remains diﬃcult to construct a cover

over a large fraction of Cfree in more than six dimensions.

Termination The sampling-based planning algorithms in Chapter 5 were de- signed to terminate upon ﬁnding a solution path. In the current setting, ter- mination is complicated by the fact that we are interested in solutions from all initial conﬁgurations. Since α is dense, the volume of uncovered points in free tends to zero. After some ﬁnite number of iterations, it would be nice to measure the quality of the approximation and then terminate when the desired quality is achieved. This was also possible with the visibility sampling-based roadmap in

Section 5.6.2. Using random samples, an estimate of the fraction of Cfree can be

obtained by recording the percentage of failures in obtaining a sample in free that is outside of the cover. For example, if a new neighborhood is created only once in 1000 iterations, then it can be estimated that 99.9 percent of free is cov- ered. High-probability bounds can also be determined. Termination conditions are given in [983] that ensure with probability greater than Pc that at least a fraction α (0, 1) of free has been covered. The constants Pc and α are given as parame- ters to the algorithm, and it will terminate when the condition has been satisﬁed using rigorous statistical tests. If deterministic sampling is used, then termination can be made to occur based on the dispersion, which indicates the largest ball in free that does not contain the center of another neighborhood. One problem with volume-based criteria, such as those suggested here, is that there is no way to ensure that the cover preserves the connectivity of free. If two portions of free are connected by a narrow passage, the cover may miss a neighborhood that has very small volume yet is needed to connect the two portions.

∈ C

C C

Example 8.18 (2D Example of Computed Funnels) Figure 8.18 shows a 2D example that was computed using random samples and the algorithm in Figure

1. Note that once a cover is computed, it can be used to rapidly compute diﬀer- ent navigation functions and vector ﬁelds for various goals. This example is mainly
- 1. (b)

Figure 8.18: (a) A approximate cover for a 2D conﬁguration space. (b) Level sets of a navigation function

for illustrative purposes. For the case of a polygonal environment, constructing covers based on convex polygons would be more eﬃcient. ．

- 1. Dynamic Programming with Interpolation

This section concludes Part II by solving the motion planning problem with value iteration, which was introduced in Section 2.3. It has already been applied to obtain discrete feedback plans in Section 8.2. It will now be adapted to continuous spaces by allowing interpolation ﬁrst over a continuous state space and then by additionally allowing interpolation over a continuous action space. This yields a numerical approach to computing optimal navigation functions and feedback plans for motion planning. The focus will remain on backward value iteration; however, the interpolation concepts may also be applied in the forward direction. The approach here views optimal feedback motion planning as a discrete-time optimal control problem [28, 84, 151, 583].

- - 1. Using interpolation for continuous state spaces

Consider a problem formulation that is identical to Formulation 8.1 except that X is allowed to be continuous. Assume that X is bounded, and assume for now that the action space, U(x), it ﬁnite for all x X. Backward value iteration can be applied. The dynamic programming arguments and derivation are identical to

∈

Stage k + 1

Stage k

Possible next states

Figure 8.19: Even though xk is a sample point, the next state, xk+1, may land between sample points. For each uk ∈ U(xk), interpolation may be needed for the

resulting next state, xk+1 = f(xk, uk).

those in Section 2.3. The resulting recurrence is identical to (2.11) and is repeated here for convenience:

u ) J ＼

Gk∗ (xk) = min l(xk, uk) + Gk∗+1(xk+1) . (8.56)

k ∈U (xk

The only diﬃculty is that G∗k(xk) cannot be stored for every xk ∈ X because X is continuous. There are two general approaches. One is to approximate G∗k using a parametric family of surfaces, such as polynomials or nonlinear basis functions derived from neural networks [97]. The other is to store G∗k only over a ﬁnite set of sample points and use interpolation to obtain its value at all other points

[582, 583].

Suppose that a ﬁnite set S X of samples is used to represent cost-to-go functions over X. The evaluation of (8.56) using interpolation is depicted in Figure

⊂

8.19. In general, the samples should be chosen to reduce the dispersion (deﬁned in Section 5.2.3) as much as possible. This prevents attempts to approximate the cost-to-go function on large areas that contain no sample points. The rate of convergence ultimately depends on the dispersion [92] (in combination with Lipschitz conditions on the state transition equation and the cost functional). To

simplify notation and some other issues, assume that S is a grid of regularly spaced points in Rn.

First, consider the case in which X = [0, 1] ⊂ R. Let S = {s0, s1, . . . , sr},

in which si = i/r. For example, if r = 3, then S = 0, 1/3, 2/3, 1 . Note that this always yields points on the boundary of X, which ensures that for any point

{ }

in (0, 1) there are samples both above and below it. Let i be the largest integer such that si < x. This implies that si+1 > x. The samples si and si+1 are called interpolation neighbors of x.

The value of G∗k+1 in (8.56) at any x [0, 1] can be obtained via linear inter- polation as

∈

G∗k+1(x) ≈ αG∗k+1(si) + (1 − α)G∗k+1(si+1), (8.57)

in which the coeﬃcient α ∈ [0, 1] is computed as

α = 1 x − si . (8.58)

−

If x = si, then α = 1, and (8.57) reduces to Gk∗+1(si), as expected. If x = si+1, then α = 0, and (8.57) reduces to G∗k+1(si+1). At all points in between, (8.57) blends the cost-to-go values at si and si+1 using α to provide the appropriate weights.

n = 2 n = 3

Figure 8.20: Barycentric subdivision can be used to partition each cube into sim- plexes, which allows interpolation to be performed in O(n lg n) time, instead of O(2n).

The interpolation idea can be naturally extended to multiple dimensions. Let X be a bounded subset of Rn. Let S represent an n-dimensional grid of points in Rn. Each sample in S is denoted by s(i1, i2, . . . , in). For some x X, there are 2n interpolation neighbors that “surround” it. These are the corners of an

∈

1. imensional cube that contains x. Let x = (x1, . . . , xn). Let ij denote the largest integer for which the jth coordinate of s(i1, i2, . . . , in) is less than xj . The 2n samples are all those for which either ij or ij + 1 appears in the expression s( , , . . . , ), for each j 1, . . . , n . This requires that samples exist in S for all of these cases. Note that X may be a complicated subset of Rn, provided that for

n n

· · · ∈ { }

any x ∈ X, all of the required 2 interpolation neighbors are in S. Using the 2

interpolation neighbors, the value of G∗k+1 in (8.56) on any x X can be obtained via multi-linear interpolation. In the case of n = 2, this is expressed as

∈

G∗k+1(x) ≈ α1α2 G∗k+1(s(i1, i2))+

α1(1 − α2) G∗k+1(s(i1, i2 + 1))+ (1 − α1)α2 G∗k+1(s(i1 + 1, i2))+

(1 − α1)(1 − α2) G∗k+1(s(i1 + 1, i2 + 1)),

(8.59)

in which α1 and α2 are deﬁned similarly to α in (8.58) but are based on distances along the x1 and x2 directions, respectively. The expressions for multi-linear in- terpolation in higher dimensions are similar but are more cumbersome to express. Higher order interpolation, such a quadratic interpolation may alternatively be used [583].

Unfortunately, the number of interpolation neighbors grows exponentially with the dimension, n. Instead of using all 2n interpolation neighbors, one improvement is to decompose the cube deﬁned by the 2n samples into simplexes. Each simplex

Cobs

∂_C_free

- 1. (b) (c)

Figure 8.21: (a) An interpolation region, R(S), is shown for a set of sample points,

S. (b) The interpolation region that arises due to obstacles. (c) The interpolation region for goal points must not be empty.

has only n + 1 samples as its vertices. Only the vertices of the simplex that con- tains x are declared to be the interpolation neighbors of x; this reduces the cost of evaluating G∗k+1(x) to O(n) time. The problem, however, is that determining the simplex that contains x may be a challenging point-location problem (a com- mon problem in computational geometry [264]). If barycentric subdivision is used to decompose the cube using the midpoints of all faces, then the point-location problem can be solved in O(n lg n) time [263, 607, 721], which is an improve- ment over the O(2n) scheme described above. Examples of this decomposition are

shown for two and three dimensions in Figure 8.20. This is sometimes called the Coxeter-Freudenthal-Kuhn triangulation. Even though n is not too large due to practical performance considerations (typically, n 6), substantial savings occur in implementations, even for n = 3.

≤

It will be convenient to refer directly to the set of all points in X for which all required interpolation neighbors exist. For any ﬁnite set S X of sample points, let the interpolation region R(S) be the set of all x X S for which G∗(x) can be computed by interpolation. This means that x R(S) if and only if all interpolation neighbors of x lie in S. Figure 8.21a shows an example. Note that some sample points may not contribute any points to R. If a grid of samples is used to approximate G∗, then the volume of X R(S) approaches zero as the sampling resolution increases.

∈

∈ \

⊆

Continuous action spaces Now suppose that U(x) is continuous, in addition to X. Assume that U(x) is both a closed and bounded subset of Rn. Once again, the dynamic programming recurrence, (8.56), remains the same. The trouble now is that the min represents an optimization problem over an uncountably inﬁnite number of choices. One possibility is to employ nonlinear optimization techniques to select the optimal u U(x). The eﬀectiveness of this depends heavily on U(x), X, and the cost functional.

∈

Another approach is to evaluate (8.56) over a ﬁnite set of samples drawn from U(x). Again, it is best to choose samples that reduce the dispersion as much as possible. In some contexts, it may be possible to eliminate some actions from

consideration by carefully utilizing the properties of the cost-to-go function and its representation via interpolation.

- - 1. The connection to feedback motion planning

The tools have now been provided to solve motion planning problems using value iteration. The conﬁguration space is a continuous state space; let X = Cfree. The

action space is also continuous, U(x) = Tx(X). For motion planning problems,

0 Tx(X) is only obtained only when uT is applied. Therefore, it does not need to be represented separately. To compute optimal cost-to-go functions for motion planning, the main concerns are as follows:

∈

The action space must be bounded.
A discrete-time approximation must be made to derive a state transition equation that works over stages.
The cost functional must be discretized.
The obstacle region, Cobs, must be taken into account.
At least some interpolation region must yield G∗(x) = 0, which represents the goal region.

We now discuss each of these.

Bounding the action space Recall that using normalized vector ﬁelds does not alter the existence of solutions. This is convenient because U(x) needs to be bounded to approximate it with a ﬁnite set of samples. It is useful to restrict the action set to obtain

U(x) = {u ∈ R | IuI ≤ 1}. (8.60)

To improve performance, it is sometimes possible to use only those u for which u = 1 or u = 0; however, numerical instability problems may arise. A ﬁnite sample set for U(x) should have low dispersion and always include u = 0.

I I

Obtaining a state transition equation Value iterations occur over discrete stages; however, the integral curves of feedback plans occur over continuous time. Therefore, the time interval T needs to be sampled. Let ∆t denote a small positive constant that represents a ﬁxed interval of time. Let the stage index k refer to time

(k − 1)∆t. Now consider representing a velocity ﬁeld x˙

over Rn. By deﬁnition,

= lim

x(t + ∆t) − x(t). (8.61)

dt ∆t→0 ∆t

In Section 8.3.1, a velocity ﬁeld was deﬁned by assigning some u U(x) to each x X. If the velocity vector u is integrated from x(t) over a small ∆t, then a new state, x(t + ∆t), results. If u remains constant, then

∈

x(t + ∆t) = x(t) + ∆t u, (8.62)

which is called an Euler approximation. If a feedback plan is executed, then u is determined from x via u = π(x(t)). In general, this means that u could vary as the state is integrated forward. In this case, (8.62) is only approximate,

x(t + ∆t) ≈ x(t) + ∆t π(x(t)). (8.63)

The expression in (8.62) can be considered as a state transition equation that works over stages. Let xk+1 = x(t + ∆t) and xk = x(t). The transitions can now be expressed as

xk+1 = f(xk, u) = xk + ∆t u. (8.64)

The quality of the approximation improves as ∆t decreases. Better approxi- mations can be made by using more sample points along time. The most widely known approximations are the Runge-Kutta family. For optimal motion planning, it turns out that the direction vector almost always remains constant along the in- tegral curve. For example, in Figure 8.13d, observe that piecewise-linear paths are obtained by performing gradient descent of the optimal navigation function. The direction vector is constant over most of the resulting integral curve (it changes only as obstacles are contacted). Therefore, approximation problems tend not to arise in motion planning problems. When approximating dynamical systems, such as those presented in Chapter 13, then better approximations are needed; see Section 14.3.2. One important concern is that ∆t is chosen in a way that is com- patible with the grid resolution. If ∆t is so small that the actions do not change the state enough to yield new interpolation neighbors, then the interpolated cost- to-go values will remain constant. This implies that ∆t must be chosen to ensure that x(t + ∆t) has a diﬀerent set of interpolation neighbors than x(t).

An interesting connection can be made to the approximate motion planning problem that was developed in Section 7.7. Formulation 7.4 corresponds precisely to the approximation deﬁned here, except that ǫ was used instead of ∆t because velocities were not yet considered (also, the initial condition was speciﬁed because there was no feedback). Recall the diﬀerent possible action spaces shown in Figure

7.41. As stated in Section 7.7, if the Manhattan or independent-joint models are used, then the conﬁgurations remain on a grid of points. This enables discrete value iterations to be performed. A discrete feedback plan and navigation function, as considered in Section 8.2.3, can even be computed. If the Euclidean motion model is used, which is more natural, then the transitions allow a continuum of possible conﬁgurations. This case can ﬁnally be handled by using interpolation over the conﬁguration space, as described in this section.

Approximating the cost functional A discrete cost functional must be de- rived from the continuous cost functional, (8.39). The ﬁnal term is just assigned as lF (xF ) = lF (x(tf )). The cost at each stage is

r ∆t

ld(xk, uk) =

l(x(t), u(t))dt, (8.65)

ld(xk, uk) =

l(x(t), u(t))dt, (8.65)

and ld(xk, uk) is used in the place of l(xk, uk) in (8.56). For many problems, the integral does not need to be computed repeatedly. To obtain Euclidean shortest paths, ld(xk, uk) = uk can be safely assigned for all xk X and uk U(xk). A reasonable approximation to (8.65) if ∆t is small is l(x(t), u(t))∆t.

I I ∈ ∈

Handling obstacles A simple way to handle obstacles is to determine for each x S whether x obs. This can be computed and stored in an array before the value iterations are performed. For rigid robots, this can be eﬃciently computed using fast Fourier transforms [513]. For each x obs, G∗(x) = . No value iterations are performed on these states; their values must remain at inﬁnity. During the evaluation of (8.59) (or a higher dimensional version), diﬀerent actions are attempted. For each action, it is required that all of the interpolation neighbors of xk+1 lie in free. If one of them lies in obs, then that action produces inﬁnite cost. This has the eﬀect of automatically reducing the interpolation region, R(S),

C C

∈ ∈ C

∈ C ∞

to all cubes whose vertices all lie in Cfree, as shown in Figure 8.21b. All samples in

obs are assumed to be deleted from S in the remainder of this section; however, the full grid is still used for interpolation so that inﬁnite values represent the obstacle region.

Note that as expressed so far, it is possible that points in obs may lie in R(S) because collision detection is performed only on the samples. In practice, either the grid resolution must be made ﬁne enough to minimize the chance of this error occurring or distance information from a collision detection algorithm must be used to infer that a suﬃciently large ball around each sample is collision free. If an interpolation region cannot be assured to lie in free, then the resolution may have to be increased, at least locally.

Handling the goal region Recall that backward value iterations start with the ﬁnal cost-to-go function and iterate backward. Initially, the ﬁnal cost-to-go is assigned as inﬁnity at all states except those in the goal. To properly initialize the ﬁnal cost-to-go function, there must exist some subset of X over which the zero value can be obtained by interpolation. Let G = S XG. The requirement is that the interpolation region R(G) must be nonempty. If this is not satisﬁed, then the grid resolution needs to be increased or the goal set needs to be enlarged. If Xg is a single point, then it needs to be enlarged, regardless of the resolution (unless an alternative way to interpolate near a goal point is developed). In the interpolation region shown in Figure 8.21c, all states in the vicinity of xG yield an interpolated cost-to-go value of zero. If such a region did not exist, then all costs would remain at inﬁnity during the evaluation of (8.59) from any state. Note that ∆t must be chosen large enough to ensure that new samples can reach G.

∩

Using G∗ as a navigation function After the cost-to-go values stabilize, the resulting cost-to-go function, G∗ can be used as a navigation function. Even though G∗ is deﬁned only over S X, the value of the navigation function can be obtained using interpolation over any point in R(S). The optimal action is selected as the

⊂

one that satisﬁes the min in (8.6). This means that the state trajectory does not have to visit the grid points as in the Manhattan model. A trajectory can visit any point in R(S), which enables trajectories to converge to the true optimal solution as ∆t and the grid spacing tend to zero.

Topological considerations So far there has been no explicit consideration of the topology of . Assuming that is a manifold, the concepts discussed so far can be applied to any open set on which coordinates are deﬁned. In practice, it is often convenient to use the manifold representations of Section 4.1.2. The manifold can be expressed as a cube, [0, 1]n, with some faces identiﬁed to obtain [0, 1]n/ . Over the interior of the cube, all of the concepts explained in this section work without modiﬁcation. At the boundary, the samples used for interpolation must take the identiﬁcation into account. Furthermore, actions, uk, and next states, xk+1, must function correctly on the boundary. One must be careful, however, in declaring that some solution is optimal, because Euclidean shortest paths depend on the manifold parameterization. This ambiguity is usually resolved by formulating the cost in terms of some physical quantity, such as time or energy. This often requires modeling dynamics, which will be covered in Part IV.

∼

C C

Value iteration with interpolation is extremely general. It is a generic algorithm for approximating the solution to optimal control problems. It can be applied to solve many of the problems in Part IV by restricting U(x) to take into account complicated diﬀerential constraints. The method can also be extended to problems that involve explicit uncertainty in predictability. This version of value iteration is covered in Section 10.6.

- - 1. Obtaining Dijkstra-like algorithms

For motion planning problems, it is expected that x(t + ∆t), as computed from (8.62), is always close to x(t) relative to the size of X. This suggests the use of a Dijkstra-like algorithm to compute optimal feedback plans more eﬃciently. As discussed for the ﬁnite case in Section 2.3.3, many values remain unchanged during the value iterations, as indicated in Example 2.5. Dijkstra’s algorithm maintains a data structure that focuses the computation on the part of the state space where values are changing. The same can be done for the continuous case by carefully considering the sample points [607].

During the value iterations, there are three kinds of sample points, just as in the discrete case (recall from Section 2.3.3):

- - - 1. Dead: The cost-to-go has stabilized to its optimal value.
      2. Alive: The current cost-to-go is ﬁnite, but it is not yet known whether the value is optimal.
      3. Unvisited: The cost-to-go value remains at inﬁnity because the sample has not been reached.

The sets are somewhat harder to maintain for the case of continuous state spaces because of the interaction between the sample set S and the interpolated region R(S).

Imagine the ﬁrst value iteration. Initially, all points in G are set to zero values. Among the collection of samples S, how many can reach R(G) in a single stage? We expect that samples very far from G will not be able to reach R(G); this keeps their values are inﬁnity. The samples that are close to G should reach it. It would be convenient to prune away from consideration all samples that are too far from G to lower their value. In every iteration, we eliminate iterating over samples that are too far away from those already reached. It is also unnecessary to iterate over the dead samples because their values no longer change.

To keep track of reachable samples, it will be convenient to introduce the

notion of a backprojection, which will be studied further in Section 10.1. For a single state, x ∈ X, its backprojection is deﬁned as

B(x) = {x′ ∈ X | ∃u′ ∈ U(x′) such that x = f(x′, u′)}. (8.66)

The backprojection of a set, X′ X, of points is just the union of backprojections for each point:

⊆

B(X′) = B(x). (8.67)

x∈X

′

Now consider a version of value iteration that uses backprojections to eliminate some states from consideration because it is known that their values cannot change. Let i refer to the number of stages considered by the current value iteration. During the ﬁrst iteration, i = 1, which means that all one-stage trajectories are considered.

Let S be the set of samples (assuming already that none lie in Cobs). Let Di and

Ai refer to the dead and alive samples, respectively. Initially, D1 = G, the set of

samples in the goal set. The ﬁrst set, A1, of alive samples is assigned by using the concept of a frontier. The frontier of a set S′ ⊆ S of sample points is

Front(S′) = (B(R(S′)) \ S′) ∩ S. (8.68)

This is the set of sample points that can reach R(S′) in one stage, excluding those already in S′. Figure 8.22 illustrates the frontier. Using (8.68), A1 is deﬁned as A1 = Front(D1).

Now the approach is described for iteration i. The cost-to-go update (8.56) is computed at all points in Ai. If G∗k+1(s) = G∗k(s) for some s ∈ Ai, then s is declared

dead and moved to Di+1. Samples are never removed from the dead set; therefore, all points in Di are also added to Di+1. The next active set, Ai+1, includes all samples in Ai, excluding those that were moved to the dead set. Furthermore, all samples in Front(Ai) are added to Ai+1 because these will produce a ﬁnite cost-to- go value in the next iteration. The iterations continue as usual until some stage, m, is reached for which Am is empty, and Dm includes all samples from which the goal can be reached (under the approximation assumptions made in this section). For eﬃciency purposes, an approximation to Front may be used, provided that the true frontier is a proper subset of the approximate frontier. For example, the

(b)

Figure 8.22: An illustration of the frontier concept: (a) the shaded disc indicates the set of all reachable points in one stage, from the sample on the left. The sample cannot reach in one stage the shaded region on the right, which represents R(S′). (b) The frontier is the set of samples that can reach R(S′). The inclusion of the frontier increases the interpolation region beyond R(S′).

frontier might add all new samples within a speciﬁed radius of points in S′. In this case, the updated cost-to-go value for some s ∈ Ai may remain inﬁnite. If

this occurs, it is of course not added to Di+1. Furthermore, it is deleted from Ai in the computation of the next frontier (the frontier should only be computed for samples that have ﬁnite cost-to-go values).

The approach considered so far can be expected to reduce the amount of com- putations in each value iteration by eliminating the evaluation of (8.56) on unnec- essary samples. The same cost-to-go values are obtained in each iteration because only samples for which the value cannot change are eliminated in each iteration. The resulting algorithm still does not, however, resemble Dijkstra’s algorithm be- cause value iterations are performed over all of Ai in each stage.

To make a version that behaves like Dijkstra’s algorithm, a queue Q will be introduced. The algorithm removes the smallest element of Q in each iteration. The interpolation version ﬁrst assigns G∗(s) = 0 for each s G. It also maintains a set D of dead samples, which is initialized to D = G. For each s Front(G), the cost-to-go update (8.56) is computed. The priority Q is initialized to Front(G), and elements are sorted by their current cost-to-go values (which may not be optimal). The algorithm iteratively removes the smallest element from Q (because its optimal cost-to-go is known to be the current value) and terminates when Q is empty. Each time the smallest element, ss Q, is removed, it is inserted into

∈

D. Two procedures are then performed: 1) Some elements in the queue need to have their cost-to-go values recomputed using (8.56) because the value G∗(ss) is known to be optimal, and their values may depend on it. These are the samples in Q that lie in Front(D) (in which D just got extended to include ss). 2) Any samples in B(R(D)) that are not in Q are inserted into Q after computing their cost-to-go values using (8.56). This enables the active set of samples to grow as the set of dead samples grows. Dijkstra’s algorithm with interpolation does not compute values that are identical to those produced by value iterations because

G∗k+1 is not explicitly stored when G∗k is computed. Each computed value is some

cost-to-go, but it is only known to be the optimal when the sample is placed into

It can be shown, however, that the method converges because computed values are no higher than what would have been computed in a value iteration. This is also the basis of dynamic programming using Gauss-Seidel iterations [96].

A specialized, wavefront-propagation version of the algorithm can be made for the special case of ﬁnding solutions that reach the goal in the smallest number of stages. The algorithm is similar to the one shown in Figure 8.4. It starts with an

initial wavefront W0 = G in which G∗(s) = 0 for each s ∈ G. In each iteration,

the optimal cost-to-go value i is increased by one, and the wavefront, Wi+1, is computed from Wi as Wi+1 = Front(Wi). The algorithm terminates at the ﬁrst iteration in which the wavefront is empty.

Sampling-Based Methods for Continuous Spaces

Sampling-Based Methods for Continuous Spaces

results matching ""

No results matching ""