12.1 General Methods

This section presents planning methods for the problems introduced in Section

11.1. They are based mainly on general-purpose dynamic programming, without exploiting any particular structure to the problem. Therefore, their application is limited to small state spaces; nevertheless, they are worth covering because of their extreme generality. The basic idea is to use either the nondeterministic or probabilistic I-map to express the problem entirely in terms of the derived I-space,

Indet or Iprob, respectively. Once the derived information transition equation (recall

Section 11.2.1) is deﬁned, it can be imagined that ndet or prob is a state space in which perfect state measurements are obtained during execution (because the I-state is always known).

I I

- 1. The Information Space as a Big State Space

Recall that any problem speciﬁed using Formulation 11.1 can be converted us- ing derived I-states into a problem under Formulation 10.1. By building on the discussion from the end of Section 11.1.3, this can be achieved by treating the I- space as a big state space in which each state is an I-state in the original problem

Item Notation Explanation

State -x = ηder Derived I-state

State space Action space

Nature action space

X- = der Derived I-space

U- = U Original action space

Θ- ⊆ Y Original observation space

State transition equation

f-(-x, -u, θ-) Nature action is just y

Initial state -xI = η0 Initial I-state, η0 ∈ Ider

Goal set

X- G Subsets of original XG

Cost functional

L- Derived from original L

Figure 12.1: The derived I-space can be treated as an ordinary state space on which planning with perfect state information can be performed.

formulation. Some of the components were given previously, but here a complete formulation is given.

Suppose that a problem has been speciﬁed using Formulation 11.1, resulting in the usual components: X, U, Θ, f, Y , h, xI , XG, and L. The following concepts will work for any suﬃcient I-map; however, the presentation will be limited to two important cases: κndet and κprob, which yield derived I-spaces ndet and prob, respectively (recall Sections 11.2.2 and 11.2.3).

I I

The components of Formulation 10.1 will now be speciﬁed using components of the original problem. To avoid confusion between the two formulations, an arrow will be placed above all components of the new formulation. Figure 12.1 summa-

rizes the coming deﬁnitions. The new state space, X- , is deﬁned as X- = Ider, and a

state, -x ∈ X- , is a derived I-state, -x = ηder. Under nondeterministic uncertainty, -xk

means Xk(ηk), in which ηk is the history I-state. Under probabilistic uncertainty,

-xk means P(xk|ηk). The action space remains the same: U- = U.

The strangest part of the formulation is the new nature action space, Θ- (-x, -u). The observations in Formulation 11.1 behave very much like nature actions because they are not selected by the robot, and, as will be seen shortly, they are the only

unpredictable part of the new state transition equation. Therefore, Θ- (-x, -u) ⊆ Y ,

the original observation space. A new nature action, θ- ∈ Θ- , is just an observa-

tion, θ-(-x, -u) = y. The set Θ- (-x, -u) generally depends on -x and -u because some

observations may be impossible to receive from some states. For example, if a sen- sor that measures a mobile robot position is never wrong by more than 1 meter, then observations that are further than 1 meter from the true robot position are impossible.

A derived state transition equation is deﬁned with f-(-xk, -uk, θ-k) and yields a

new state, -xk+1. Using the original notation, this is just a function that uses κ(ηk), uk, and yk to compute the next derived I-state, κ(ηk+1), which is allowed because we are working with suﬃcient I-maps, as described in Section 11.2.1.

Initial states and goal sets are optional and can be easily formulated in the new representation. The initial I-state, η0, becomes the new initial state, -xI = η0. It is

assumed that η0 is either a subset of X or a probability distribution, depending on whether planning occurs in Indet or Iprob. In the nondeterministic case, the new

goal set X- G can be derived as

X- G = {X(η) ∈ Indet | X(η) ⊆ XG}, (12.1)

which is the set of derived I-states for which it is guaranteed that the true state lies in XG. A probabilistic version can be made by requiring that all states assigned nonzero probability by P(x η) lie in XG. Instead of being nonzero, a threshold could be used. For example, the goal may require being only 98% certain that the goal is reached.

The only remaining portion of Formulation 10.1 is the cost functional. We will develop a cost model that uses only the state and action histories. A dependency on nature would imply that the costs depend directly on the observation, y = θ-, which was not assumed in Formulation 11.1. The general K-stage cost functional from Formulation 10.1 appears in this context as

L- (-xk, -uk) = -l(-xk, -uk) + -lF (-xF ), (12.2)

k=1

with the usual cost assumptions regarding the termination action.

The cost functional L- must be derived from the cost functional L of the original

problem. This is expressed in terms of states, which are unknown. First consider the case of Iprob. The state xk at stage k follows the probability distribution P(xk|ηk), as derived in Section 11.2.3. Using -xk, an expected cost is assigned as

-l(-xk, -uk) = -l(ηk, uk) = P(xk|ηk)l(xk, uk) (12.3)

xk ∈X

and

-lF (-xF ) = -lF (ηF ) =

xF ∈X

P(xF |ηK )lF (xF ). (12.4)

Ideally, we would like to make analogous expressions for the case of ndet; however, there is one problem. Formulating the worst-case cost for each stage is too pessimistic. For example, it may be possible to obtain high costs in two consecutive stages, but each of these may correspond to following diﬀerent paths in X. There is nothing to constrain the worst-case analysis to the same path. In the probabilistic case there is no problem because probabilities can be assigned to paths. For the nondeterministic case, a cost functional can be deﬁned, but the stage-additive property needed for dynamic programming is destroyed in general. Under some restrictions on allowable costs, the stage-additive property is preserved.

The state xk at stage k is known to lie in Xk(ηk), as derived in Section 11.2.2. For every history I-state, ηk = -xk, and uk ∈ U, assume that l(xk, uk) is invariant

over all xk ∈ Xk(ηk). In this case,

-l(-xk, -uk) = -l(ηk, uk) = l(xk, uk), (12.5)

in which xk ∈ Xk(ηk), and

-lF (-xF ) = -lF (ηF ) = lF (xF ), (12.6)

in which xF ∈ XF (ηF ).

A plan on the derived I-space, Indet or Iprob, can now also be considered as

a plan on the new state space X. Thus, state feedback is now possible, but in

a larger state space X-

instead of X. The outcomes of actions are still generally

unpredictable due to the observations. An interesting special case occurs when there are no observations. In this case, the I-state is predictable because it is derived only from actions that are chosen by the robot. In this case, the new formulation does not need nature actions, which reduces it down to Formulation

2.3. Due to this, feedback is no longer needed if the initial I-state is given. A plan can be expressed once again as a sequence of actions. Even though the original states are not predictable, the future information states are! This means that the state trajectory in the new state space is completely predictable as well.

- 1. Algorithms for Nondeterministic I-Spaces

Now that the problem of planning in ndet has been expressed using Formulation 10.1, the methods of Section 10.2 directly apply. The main limitation of their use is

that the new state space X-

is exponentially larger than X. If X contains n states,

then X- contains 2n −1 states. Thus, even though some methods in Section 10.2 can

solve problems in practice that involve a million states, this would only be about 20 states in the original state space. Handling substantially larger problems requires developing application-speciﬁc methods that exploit some special structure of the I-space, possibly by deﬁning an I-map that leads to a smaller derived I-space.

Value iteration The value-iteration method from Section 10.2.1 can be applied without modiﬁcation. In the ﬁrst step, initialize G∗F using (12.6). Using the nota- tion for the new problem, the dynamic programming recurrence, (10.39), becomes Gk∗ (-xk) = min J max J-l(-xk, -uk) + G∗k+1(-xk+1)＼＼, (12.7)

uk ∈U θ k

in which -xk+1 = f-(-xk, -uk, θ-k).

The main diﬃculty in evaluating (12.7) is to determine the set Θ- (-xk, -uk), over

which the maximization occurs. Suppose that a state-nature sensor mapping is used, as deﬁned in Section 11.1.1. From the I-state -xk = Xk(ηk), the action

-uk = uk is applied. This yields a forward projection Xk+1(ηk, uk). The set of all possible observations is

Θ- (-xk, -uk) = {yk+1 ∈ Y | ∃xk+1 ∈ Xk+1(ηk, uk) and ∃ψk+1 ∈ Ψ

such that yk+1 = h(xk+1, ψk+1)}.

(12.8)

Without using forward projections, a longer, equivalent expression is obtained:

Θ- (-xk, -uk) = {yk+1 ∈ Y | ∃xk ∈ Xk(ηk), ∃θk ∈ Θ, and ∃ψk+1 ∈ Ψ

such that yk+1 = h(f(xk, uk, θk), ψk+1)}.

Other variants can be formulated for diﬀerent sensing models.

(12.9)

Policy iteration The policy iteration method of Section 10.2.2 can be applied in principle, but it is unlikely to solve challenging problems. For example, if X = 10, then each iteration will require solving matrices that have 1 million entries! At least they are likely to be sparse in many applications.

| |

Graph-search methods The methods from Section 10.2.3, which are based on backprojections, can also be applied to this formulation. These methods must

initially set S =

X- G. If S is initially nonempty, then backprojections can be

attempted using the general algorithm in Figure 10.6. Dijkstra’s algorithm, as

given in Figure 10.8, can be applied to yield a plan that is worst-case optimal.

The sensorless case If there are no sensors, then better methods can be applied because the formulation reduces from Formulation 10.1 to Formulation 2.3. The simpler value iterations of Section 2.3 or Dijkstra’s algorithm can be applied to ﬁnd a solution. If optimality is not required, then any of the search methods of Section 2.2 can even be applied. For example, one can even imagine performing a

bidirectional search on X-

to attempt to connect -xI to some -xG.

- 1. Algorithms for Probabilistic I-Spaces (POMDPs)

For the probabilistic case, the methods of Section 10.2 cannot be applied because prob is a continuous space. Dynamic programming methods for continuous state spaces, as covered in Section 10.6, are needed. The main diﬃculty is that the

dimension of X-

grows linearly with the number of states in X. If there are n

states in X, the dimension of X- is n − 1. Since the methods of Section 10.6 suﬀer

from the curse of dimensionality, the general dynamic programming techniques are limited to problems in which X has only a few states.

Approximate value iteration The continuous-space methods from Section

10.6 can be directly applied to produce an approximate solution by interpolat-

ing over X- to determine cost-to-go values. The initial cost-to-go value G∗F over

the collection of samples is obtained by (12.6). Following (10.46), the dynamic

programming recurrence is

G∗k(-xk) = min -l(-xk, -uk) +

J -

uk ∈U

G∗k+1(-xk+1)P (-xk+1|-xk, -uk)＼. (12.10)

xk+1∈X

If Θ- (-x, -u) is ﬁnite, the probability mass is distributed over a ﬁnite set of points,

y = θ- ∈ Θ- (-x, -u). This in turn implies that P(-xk+1|-xk, -uk) is also distributed

over a ﬁnite subset of X- . This is somewhat unusual because X- is a continuous

space, which ordinarily requires the speciﬁcation of a probability density function. Since the set of future states is ﬁnite, this enables a sum to be used in (12.10) as opposed to an integral over a probability density function. This technically yields

a probability density over

X- , but this density must be expressed using Dirac

functions.1 An approximation is still needed, however, because the xk+1 points may not be exactly the sample points on which the cost-to-go function G∗k+1 is represented.

Exact methods If the total number of stages is small, it is possible in practice to compute exact representations. Some methods are based on an observation that the cost-to-come is piecewise linear and convex [494]. A linear-programming problem results, which can be solved using the techniques that were described for ﬁnding randomized saddle points of zero-sum games in Section 9.3. Due to the numerous constraints, methods have been proposed that dramatically reduce the number that need to be considered in some circumstances (see the suggested reading on POMDPs at the end of the chapter).

An exact, discrete representation can be computed as follows. Suppose that the initial condition space 0 consists of one initial condition, η0 (or a ﬁnite number of initial conditions), and that there are no more than K stages at which decisions are made. Since Θ(x, u) and Ψ(x) are assumed to be ﬁnite, there is a ﬁnite number of possible ﬁnal I-states, ηF = (η0, u˜K , y˜F ). For each of these, the distribution P(xF ηF ) can be computed, which is alternatively represented as -xF . Following this, (12.4) is used to compute G∗(-xF ) for each possible -xF . The number of these states is unfortunately exponential in the total number of stages, but at least there are ﬁnitely many. The dynamic programming recurrence (12.10) can be applied for k = K to roll back one stage. It is known that each possible -xk+1 will be a point

in X-

at which a value was computed because values were computed for possible all

I-states. Therefore, interpolation is not necessary. Equation 12.10 can be applied repeatedly until the ﬁrst stage is reached. In each iteration, no interpolation is needed because the cost-to-go G∗k+1 was computed for each possible next I-state.

Given the enormous size of I, this method is practical only for very small problems.

The sensorless case In the case of having no observations, the path through prob becomes predictable. Suppose that a feasible planning problem is formulated. For example, there are complicated constraints on the probability distributions

over X that are permitted during the execution of the plan. Since X- = Iprob is a

continuous space, it is tempting to apply motion planning techniques from Chapter

5 to ﬁnd a successful path. The adaptation of such techniques may be possible,

1These are single points that are assigned a nonzero probability mass, which is not allowed, for example, in the construction of a continuous probability density function.

but they must be formulated to use actions and state transition functions, which was not done in Chapter 5. Such adaptations of these methods, however, will be covered in Chapter 14. They could be applied to this problem to search the I-space and produce a sequence of actions that traverses it while satisfying hard constraints on the probabilities.

12.1 General Methods

12.1 General Methods

results matching ""

No results matching ""