Continuous State Spaces

Virtually all of the concepts covered in this chapter extend to continuous state spaces. This enables them to at least theoretically be applied to configuration spaces. Thus, a motion planning problem that involves uncertainty or noncoop- erating robots can be modeled using the concepts of this chapter. Such problems also inherit the feedback concepts from Chapter 8. This section covers feedback motion planning problems that incorporate uncertainty due to nature. In partic- ular contexts, it may be possible to extend some of the methods of Sections 8.4 and 8.5. Solution feedback plans must ensure that the goal is reached in spite of nature’s efforts. Among the methods in Chapter 8, the easiest to generalize is value iteration with interpolation, which was covered in Section 8.5.2. Therefore, it is the main focus of the current section. For games in continuous state spaces, see Section 13.5.

      1. Extending the value-iteration method

The presentation follows in the same way as in Section 8.5.2, by beginning with the discrete problem and making various components continuous. Begin with

Formulation 10.1 and let X be a bounded, open subset of Rn. Assume that U(x) and Θ(x, u) are finite. The value-iteration methods of Section 10.2.1 can be directly

applied by using the interpolation concepts from Section 8.5.2 to compute the cost- to-go values over X. In the nondeterministic case, the recurrence is (10.39), in which G∗k+1 is represented on a finite sample set S X and is evaluated on all other points in R(S) by interpolation (recall from Section 8.5.2 that R(S) is the interpolation region of S). In the probabilistic case, (10.45) or (10.46) may once again be used, but G∗k+1 is evaluated by interpolation.

If U(x) is continuous, then it can be sampled to evaluate the min in each recurrence, as suggested in Section 8.5.2. Now suppose Θ(x, u) is continuous. In the nondeterministic case, Θ(x, u) can be sampled to evaluate the max in (10.39) or it may be possible to employ a general optimization technique directly over Θ(x, u). In the probabilistic case, the expectation must be taken over a continuous probability space. A probability density function, p(θ x, u), characterizes nature’s action. A probabilistic state transition density function can be derived from this

|

as p(xk+1|xk, uk). Using these densities, the continuous versions of (10.45) and

(10.46) become

Gk∗ (xk) = min

ukU (xk

) (r

Θ(xk,uk )

l(xk, uk, θk) + G∗k+1(f (xk, uk, θk)) p(θk|xk, uk)dθ_k_1

(10.122)

( \

and

G∗k(xk) = min

ukU (xk

) (l(xk, uk) + r

G∗k+1(xk+1)p(xk+1|xk, uk)dxk+11 , (10.123)

respectively. Sampling can be used to evaluate the integrals. One straightforward method is to approximate p(θ x, u) by a discrete distribution. For example, in one dimension, this can be achieved by partitioning Θ(x, u) into intervals, in which each interval is declared to be a discrete nature action. The probability associated with the discrete nature action is just the integral of p(θ x, u) over the associated interval.

|

|

X

Section 8.5.2 concluded by describing Dijkstra-like algorithms for continuous spaces. These were derived mainly by using backprojections, (8.66), to conclude that some samples cannot change their values because they are too far from the active set. The same principle can be applied in the current setting; however, the weak backprojection, (10.20), must be used instead. Using the weak backprojec- tion, the usual value iterations can be applied while removing all samples that are not in the active set. For many problems, however, the size of the active set may quickly become unmanageable because the weak backprojection often causes much faster propagation than the original backprojection. Continuous-state generaliza- tions of the Dijkstra-like algorithms in Section 10.2.3 can be made; however, this requires the additional condition that in every iteration, it must be possible to extend D by forcing the next state to lie in R(D), in spite of nature.

      1. Motion planning with nature

Recall from Section 8.5.2 that value iteration with interpolation can be applied to motion planning problems that are approximated in discrete time. Nature can even be introduced into the discrete-time approximation. For example, (8.62) can be replaced by

x(t + ∆t) = x(t) + ∆t (u + θ), (10.124)

in which θ is chosen from a bounded set, Θ(x, u). Using (10.124), value iterations can be performed as described so far. An example of a 2D motion planning prob- lem under this model using probabilistic uncertainty is shown in Figure 10.20. It is interesting that when the plan is executed from a fixed initial state, a differ- ent trajectory is obtained each time. The average cost over multiple executions, however, is close to the expected optimum.

Interesting hybrid system examples can be made in which nature is only allowed to interfere with the mode. Recall Formulation 7.3 from Section 7.3. Nature can be added to yield the following formulation.

0 20 40 60 80 100

  1. Motion planning game against nature (a) Optimal navigation function

(c) Vector field (d) Simulated executions

Figure 10.20: (a) A 2D planning problem is shown in which nature is probabilistic (uniform density over an interval of angles) and can interfere with the direction of motion. Contact with obstacles is actually allowed in this problem. (b) Level sets of the computed, optimal cost-to-go (navigation) function. (c) The vector field derived from the navigation function. (d) Several dozen execution trials are superimposed [605].

0 20 40 60 80 100

0 20 40 60 80 100

XG

XG

Cost-to-go, open mode Cost-to-go, closed mode

XG
XG

Vector field, open mode Vector field, closed mode

Figure 10.21: Level sets of the optimal navigation function and resulting vector field are shown for a stochastic, hybrid motion planning problem. There are two modes, which correspond to whether a door is closed. The goal is to reach the rectangle at the bottom left [613]

XG

Figure 10.22: Several executions from the same initial state are shown. A different trajectory results each time because of the different times when the door is open or closed.

Formulation 10.5 (Hybrid System Motion Planning with Nature)

    1. Assume all of the definitions from Formulation 7.3, except for the transition functions, fm and f. The state is represented as x = (q, m).
    2. A finite nature action space Θ(x, u) for each x ∈ X and u ∈ U(x).
    3. A mode transition function fm that produces a mode fm(x, u, θ) for every

x ∈ X, u ∈ U(x), and θ ∈ Θ(x, u).

    1. A state transition function f that is derived from fm by changing the mode and holding the configuration fixed. Thus, f((q, m), u, θ) = (q, fm(q, m, θ)) (the only difference with respect to Formulation 7.3 is that θ has been in- cluded).
    2. An unbounded time interval T = [0, ∞).
    3. A continuous-time cost-functional,

r tF

L(x˜tF , u˜tF ) =

l(x(t), u(t))dt + lF (x(tF )). (10.125)

0

L(x˜tF , u˜tF ) =

l(x(t), u(t))dt + lF (x(tF )). (10.125)

0

Value iteration proceeds in the same way for such hybrid problems. Interpolation only needs to be performed over the configuration space. Along the mode “axis” no interpolation is needed because the mode set is already finite. The resulting computation time grows linearly in the number of modes. A 2D motion planning example for a point robot, taken from [613], is shown in Figures 10.21 and 10.22. In this case, the environment contains a door that is modeled as a stationary Markov

process. The configuration space is sampled using a 40 × 40 grid. There are two

modes: door open or door closed. Thus, the configuration space has two layers, one for each mode. The robot wishes to minimize the expected time to reach the goal. The navigation function for each layer cannot be computed independently because each takes into account the transition probabilities for the mode. For example, if the door is almost always open, then its plan would be different from one in which the door is almost always closed. If the door is almost always open, then the robot should go toward the door, even if it is currently closed, because it is highly likely that it will open soon. Numerous variations can be made on this example. More modes could be added, and other interpretations are possible, such as hazardous regions and shelters (the mode might be imagined as rain occurring and the robot must run for shelter) or requests to deliver objects [613, 870, 871].

Further Reading

Since this chapter considers sequential versions of single-stage decision problems, the suggested reading at the end of Chapter 9 is also relevant here. The probabilistic for- mulation in Section 10.1 is a basic problem of stochastic control theory [95, 564]. The framework is also popular in artificial intelligence [79, 267, 471, 839]. For an early, in- fluential work on stochastic control, see [109], in which the notion of sequential games against nature is developed. The forward projection and backprojection topics are not as common in control theory and are instead inspired from [281, 313, 659]. The non- deterministic formulation is obtained by eliminating probabilities from the formulation; worst-case analysis also appears extensively in control theory [57, 58, 301]. A case for using randomized strategies in robotics is made in [314].

Section 10.2 is based on classical dynamic programming work, but with emphasis on the stochastic shortest-path problem. For more reading on value and policy iteration in this context, see [95]. Section 10.2.3 is based on extending Dijkstra’s algorithm. For

convergence issues due to approximations of continuous problems, see [92, 567, 720]. For complexity results for games against nature, see [764, 767].

Section 10.3 was inspired by coverage in [95]. For further reading on reinforcement learning, the subject of Section 10.4, see [19, 74, 97, 895].

Section 10.5 was based on material in [59], but with an emphasis on unifying concepts from previous sections. Also contained in [59] are sequential game formulations on continuous spaces and even in continuous time. In continuous time, these are called

differential games, and they are introduced in Section 13.5. Dynamic programming

principles extend nicely into game theory. Furthermore, they extend to Pareto optimality

[242].

The main purpose of Section 10.6 is to return to motion planning by considering continuous state spaces. Few works exist on combining stochastic optimal control with motion planning. The presented material is based mainly on [599, 605, 613, 867, 868].

Exercises

  1. Show that SB(S, u) cannot be expressed as the union of all SB(x, u) for x S.
  2. Show that for any S X and any state transition equation, x′ = f (x, u, θ), it

follows that SB(S) ⊆ W B(S).

  1. Generalize the strong and weak backprojections of Section 10.1.2 to work for multiple stages.
  2. Assume that nondeterministic uncertainty is used, and there is no limit on the number of stages. Determine an expression for the forward projection at any stage k > 1, given that π is applied.
  3. Give an algorithm for computing nondeterministic forward projections that uses matrices with binary entries. What is the asymptotic running time and space for your algorithm?
  4. Develop a variant of the algorithm in Figure 10.6 that is based on possibly achieving the goal, as opposed to guaranteeing that it is achieved.
  5. Develop a forward version of value iteration for nondeterministic uncertainty, by paralleling the derivation in Section 10.2.1.
  6. Do the same as in Exercise 7, but for probabilistic uncertainty.
  7. Give an algorithm that computes probabilistic forward projections directly from the plan-based state transition graph, Gπ .
  8. Augment the nondeterministic value-iteration method of Section 10.2.1 to detect and handle states from which the goal is possibly reachable but not guaranteed reachable.
  9. Derive a generalization of (10.39) for the case of stage-dependent state-transition equations, xk+1 = f (xk, uk, θk, k), and cost terms, l(xk, uk, θk, k), under nondeter- ministic uncertainty.
  10. Do the same as in Exercise 11, but for probabilistic uncertainty.
  11. Extend the policy-iteration method of Figure 10.4 to work for the more general case of nature-dependent cost terms, l(xk, uk, θk).
  12. Derive a policy-iteration method that is the nondeterministic analog to the method in Figure 10.4. Assume that the cost terms do not depend on nature.
  13. Can policy iteration be applied to solve problems under Formulation 2.3, which involve no uncertainties? Explain what happens in this case.
  14. Show that the probabilistic infinite-horizon problem under the discounted-cost model is equivalent in terms of cost-to-go to a particular stochastic shortest-path problem (under Formulation 10.1). [Hint: See page 378 of [95].]
  15. Derive a value-iteration method for the infinite-horizon problem with the discounted- cost model and nondeterministic uncertainty. This method should compute the cost-to-go given in (10.71).

P1 acts

P2 acts

Cost −1 2 0

1 2 4 1

5 7 1 −1 3 1

0 2 4

Figure 10.23: A two-player, two-stage game expressed using a game tree.

  1. Figure 10.23 shows a two-stage, zero-sum game expressed as a game tree. Com- pute the randomized value of this sequential game and give the corresponding randomized security plans for each player.
  2. Generalize alpha-beta pruning beyond game trees so that it works for sequential games defined on a state space, starting from a fixed initial state.

20. Derive (10.110) and (10.111).

  1. Extend Formulation 2.4 to allow nondeterministic uncertainty. This can be ac- complished by specifying sets of possible effects of operators.
  2. Extend Formulation 2.4 to allow probabilistic uncertainty. For this case, assign probabilities to the possible operator effects.

Implementations

  1. Implement probabilistic backward value iteration and study the convergence issue depicted in Figure 10.3. How does this affect performance in problems for which there are many cycles in the state transition graph? How does performance depend on particular costs and transition probabilities?
  2. Implement the nondeterministic version of Dijkstra’s algorithm and test it on a few examples.
  3. Implement and test the probabilistic version of Dijkstra’s algorithm. Make sure that the condition Gπ (xk+1) < Gπ (xk) from 10.2.3 is satisfied. Study the perfor- mance of the algorithm on problems for which the condition is almost violated.
  4. Experiment with the simulation-based version of value iteration, which is given by (10.101). For some simple examples, characterize how the performance depends on the choice of ρ.
  5. Implement a recursive algorithm that uses dynamic programming to determine the upper and lower values for a sequential game expressed using a game tree under the stage-by-stage model.

results matching ""

    No results matching ""