Discrete State Spaces
- Sensors
As the name suggests, sensors are designed to sense the state. Throughout all of this section it is assumed that the state space, X, is finite or countably infinite, as in Formulations 2.1 and 2.3. A sensor is defined in terms of two components:
- an observation space, which is the set of possible readings for the sensor, and
- a sensor mapping, which characterizes the readings that can be expected if the current state or other information is given. Be aware that in the planning model, the state is not really given; it is only assumed to be given when modeling a sensor. The sensing model given here generalizes the one given in Section 9.2.3. In that case, the sensor provided information regarding θ instead of x because state spaces were not needed in Chapter 9.
Let Y denote an observation space, which is a finite or countably infinite set. Let h denote the sensor mapping. Three different kinds of sensor mappings will be considered, each of which is more complicated and general than the previous one:
- State sensor mapping: In this case, h : X Y , which means that given the state, the observation is completely determined.
→
- State-nature sensor mapping: In this case, a finite set, Ψ(x), of nature sensing actions is defined for each x X. Each nature sensing action, ψ Ψ(x), interferes with the sensor observation. Therefore, the state-nature mapping, h, produces an observation, y = h(x, ψ) Y , for every x X and ψ Ψ(x). The particular ψ chosen by nature is assumed to be unknown during planning and execution. However, it is specified as part of the sensing model.
∈
∈ ∈
∈
∈
- History-based sensor mapping: In this case, the observation could be based on the current state or any previous states. Furthermore, a nature sensing action could be applied. Suppose that the current stage is k. The set of nature sensing actions is denoted by Ψk(x), and the particular nature
sensing action is ψk ∈ Ψk(x). This yields a very general sensor mapping,
yk = hk(x1, . . . , xk, ψk), (11.1)
in which yk is the observation obtained in stage k. Note that the mapping is denoted as hk because the domain is different for each k. In general, any of the sensor mappings may be stage-dependent, if desired.
Many examples of sensors will now be given. These are provided to illustrate the definitions and to provide building blocks that will be used in later examples of I-spaces. Examples 11.1 to 11.6 all involve state sensor mappings.
Example 11.1 (Odd/Even Sensor) Let X = Z, the set of integers, and let
Y = {0, 1}. The sensor mapping is
y = h(x) = 0 if x is even
(
1 if x is odd.
(11.2)
The limitation of this sensor is that it only tells whether x X is odd or even. When combined with other information, this might be enough to infer the state,
∈
but in general it provides incomplete information. .
Example 11.2 (Mod Sensor) Example 11.1 can be easily generalized to yield the remainder when x is divided by k for some fixed integer k. Let X = Z, and
let Y = {0, 1, . . . , k − 1}. The sensor mapping is
y = h(x) = x mod k. (11.3)
.
Example 11.3 (Sign Sensor) Let X = Z, and let Y = 1, 0, 1 . The sensor mapping is
{− }
y = h(x) = sgn x. (11.4)
This sensor provides very limited information because it only indicates on which side of the boundary x = 0 the state may lie. It can, however, precisely determine
whether x = 0. .
Example 11.4 (Selective Sensor) Let X = Z Z, and let (i, j) X denote a state in which i, j Z. Suppose that only the first component of (i, j) can be observed. This yields the sensor mapping
∈
× ∈
y = h(i, j) = i. (11.5)
An obvious generalization can be made for any state space that is formed from Cartesian products. The sensor may reveal the values of one or more components,
and the rest remain hidden. .
Example 11.5 (Bijective Sensor) Let X be any state space, and let Y = X. Let the sensor mapping be any bijective function h : X Y . This sensor provides information that is equivalent to knowing the state. Since h is bijective, it can be inverted to obtain h−1 : Y X. For any y Y , the state can be determined as x = h−1(y).
→
→ ∈
A special case of the bijective sensor is the identity sensor, for which h is the identity function. This was essentially assumed to exist for all planning problems
covered before this chapter because it immediately yields the state. However, any bijective sensor could serve the same purpose. .
Example 11.6 (Null Sensor) Let X be any state space, and let Y = 0 . The
{ }
null sensor is obtained by defining the sensor mapping as h(x) = 0. The sensor reading remains fixed and hence provides no information regarding the state. .
From the examples so far, it is tempting to think about partitioning X based on sensor observations. Suppose that in general a state mapping, h, is not bijective, and let H(y) denote the following subset of X:
H(y) = {x ∈ X | y = h(x)}, (11.6)
which is the preimage of y. The set of preimages, one for each y Y , forms a parti- tion of X. In some sense, this indicates the “resolution” of the sensor. A bijective sensor partitions X into singleton sets because it contains perfect information. At the other extreme, the null sensor partitions X into a single set, X itself. The sign sensor appears slightly more useful because it partitions X into three sets: H(1) = 1, 2, . . . , H( 1) = . . . , 2, 1 , and H(0) = 0 . The preimages of
∈
{ } − { − − } { }
the selective sensor are particularly interesting. For each i Z, H(i) = Z. The partitions induced by the preimages may remind those with an algebra background
∈
of the construction of quotient groups via homomorphisms [769].
Next consider some examples that involve a state-action sensor mapping. There are two different possibilities regarding the model for the nature sensing action:
- Nondeterministic: In this case, there is no additional information regard- ing which ψ ∈ Ψ(x) will be chosen.
- Probabilistic: A probability distribution is known. In this case, the prob- ability, P(ψ|x), that ψ will be chosen is known for each ψ ∈ Ψ(x).
These two possibilities also appeared in Section 10.1.1, for nature actions that interfere with the state transition equation.
It is sometimes useful to consider the state-action sensor model as a probability distribution over Y for a given state. Recall the conversion from P(ψ θ) to P(y θ) in (9.28). By replacing Θ by X, the same idea can be applied here. Assume that if the domain of h is restricted to some x X, it forms an injective (one-to-one) mapping from Ψ to Y . In this case,
∈
| |
P(y x) = P(ψ x) for the unique ψ such that y = h(x, ψ).
| ( |
0 if no such ψ exists.
(11.7)
If the injective assumption is lifted, then P(ψ x) is replaced by a sum over all ψ
|
for which y = h(x, ψ).
Example 11.7 (Sensor Disturbance) Let X = Z, Y = Z, and Ψ = 1, 0, 1 . The idea is to construct a sensor that would be the identity sensor if it were not for the interference of nature. The sensor mapping is
{− }
y = h(x, ψ) = x + ψ. (11.8)
It is always known that |x − y| ≤ 1. Therefore, if y is received as a sensor reading, one of the following must be true: x = y − 1, x = y, or x = y + 1. .
Example 11.8 (Disturbed Sign Sensor) Let X = Z, Y = {−1, 0, 1}, and Ψ =
{−1, 0, 1}. Let the sensor mapping be
y = h(x, ψ) = sgn(x + ψ). (11.9)
In this case, if y = 0, it is no longer known for certain whether x = 0. It is possible that x = 1 or x = 1. If x = 0, then it is possible for the sensor to read 1, 0, or
− −
- .
Example 11.9 (Disturbed Odd/Even Sensor) It is not hard to construct ex-
amples for which some mild interference from nature destroys all of the informa- tion. Let X = Z, Y = {0, 1}, and Ψ = {0, 1}. Let the sensor mapping be
y = h(x, ψ) = 0 if x + ψ is even.
(
1 if x + ψ is odd.
(11.10)
Under the nondeterministic model for the nature sensing action, the sensor pro- vides no useful information regarding the state. Regardless of the observation, it is never known whether x is even or odd. Under a probabilistic model, however,
this sensor may provide some useful information. .
It is once again informative to consider preimages. For a state-action sensor mapping, the preimage is
H(y) = {x ∈ X | ∃ψ ∈ Ψ(x) for which y = h(x, ψ)}. (11.11)
In comparison to state sensor mappings, the preimage sets are larger for state- action sensor mappings. Also, they do not generally form a partition of X. For example, the preimages of Example 11.8 are H(1) = 0, 1, . . . , H(0) = 1, 0, 1 , and H( 1) = . . . , 2, 1, 0 . This is not a partition because every preimage contains 0. If desired, H(y) can be directly defined for each y Y , instead of explicitly defining nature sensing actions.
∈
− { − − }
{ } {− }
Finally, one example of a history-based sensor mapping is given.
Example 11.10 (Delayed-Observation Sensor) Let X = Y = Z. A delayed- observation sensor can be defined for some fixed positive integer i as yk = xk−i. It indicates what the state was i stages ago. In this case, it gives a perfect mea-
surement of the old state value. Many other variants are possible. For example, it might only give the sign of the state from i stages ago. .
y1 y2 y3
\I I\
x1 、,. u1 、,. x2
I\
、,. u2 、,. x3
,.、 . . .
Figure 11.2: In each stage, k, an observation, yk ∈ Y , is received and an action
uk ∈ U is applied. The state, xk, however, is hidden from the decision maker.
- Defining the History Information Space
This section defines the most basic and natural I-space. Many others will be derived from it, which is the topic of Section 11.2. Suppose that X, U, and f have been defined as in Formulation 10.1, and the notion of stages has been defined as in Formulation 2.2. This yields a state sequence x1, x2, . . ., and an action sequence u1, u2, . . ., during the execution of a plan. However, in the current setting, the state sequence is not known. Instead, at every stage, an observation, yk, is obtained. The process depicted in Figure 11.2.
In previous formulations, the action space, U(x), was generally allowed to depend on x. Since x is unknown in the current setting, it would seem strange to allow the actions to depend on x. This would mean that inferences could be made regarding the state by simply noticing which actions are available.1 Instead, it will be assumed by default that U is fixed for all x X. In some special contexts, however, U(x) may be allowed to vary.
∈
Initial conditions As stated at the beginning of the chapter, the initial condi- tions provide one of the three general sources of information regarding the state. Therefore, three alternative types of initial conditions will be allowed:
- Known State: The initial state, x1 X, is given. This initializes the problem with perfect state information. Assuming nature actions interfere with the state transition function, f, uncertainty in the current state will generally develop.
∈
- Nondeterministic: A set of states, X1 X, is given. In this case, the initial state is only known to lie within a particular subset of X. This can be considered as a generalization of the first type, which only allowed singleton subsets.
⊂
- Probabilistic: A probability distribution, P(x1), over X is given.
In general, let η0 denote the initial condition, which may be any one of the three alternative types.
1Such a problem could be quite interesting to study, but it will not be considered here.
History Suppose that the kth stage has passed. What information is available? It is assumed that at every stage, a sensor observation is made. This results in a sensing history,
y˜k = (y1, y2, . . . , yk). (11.12)
At every stage an action can also be applied, which yields an action history,
u˜k−1 = (u1, u2, . . . , uk−1). (11.13)
Note that the action history only runs to uk−1; if uk is applied, the state xk+1 and stage k + 1 are obtained, which lie beyond the current stage, k. By combining the sensing and action histories, the history at stage k is (u˜k−1, y˜k).
History information states The history, (u˜k−1, y˜k), in combination with the initial condition, η0, yields the history I-state, which is denoted by ηk. This cor- responds to all information that is known up to stage k. In spite of the fact that the states, x1, . . ., xk, might not be known, the history I-states are always known because they are defined directly in terms of available information. Thus, the history I-state is
ηk = (η0, u˜k−1, y˜k). (11.14)
When representing I-spaces, we will generally ignore the problem of nesting paren- theses. For example, (11.14) is treated a single sequence, instead of a sequence that contains two sequences. This distinction is insignificant for the purposes of decision making.
The history I-state, ηk, can also be expressed as
ηk = (ηk−1, uk−1, yk), (11.15)
by noticing that the history I-state at stage k contains all of the information from the history I-state at stage k − 1. The only new information is the most recently
applied action, uk−1, and the current sensor observation, yk.
The history information space The history I-space is simply the set of all possible history I-states. Although the history I-states appear to be quite compli- cated, it is helpful to think of them abstractly as points in a new space. To define the set of all possible history I-states, the sets of all initial conditions, actions, and observations must be precisely defined.
The set of all observation histories is denoted as Y˜k and is obtained by a
Cartesian product of k copies of the observation space:
Y˜k = Y × Y . . . × Y . (11.16)
、 、/ -,
k
Similarly, the set of all action histories is U˜k−1, the Cartesian product of k 1 copies of the action space U.
−
It is slightly more complicated to define the set of all possible initial conditions because three different types of initial conditions are possible. Let 0 denote the initial condition space. Depending on which of the three types of initial conditions
I
are used, one of the following three definitions of I0 is used:
- Known State: If the initial state, x1, is given, then I0 ⊆ X. Typically, I0 = X; however, it might be known in some instances that certain initial states are impossible. Therefore, it is generally written that I0 ⊆ X.
- Nondeterministic: If X1 is given, then I0 ⊆ pow(X) (the power set of X). Again, a typical situation is 0 = pow(x); however, it might be known that certain subsets of X are impossible as initial conditions.
I
- Probabilistic: Finally, if P(x) is given, then 0 (X), in which (x) is the set of all probability distributions over X.
I ⊆ P P
The history I-space at stage k is expressed as
Ik = I0 × U˜k−1 × Y˜k. (11.17)
Each ηk k yields an initial condition, an action history, and an observation history. It will be convenient to consider I-spaces that do not depend on k. This will be defined by taking a union (be careful not to mistakenly think of this con- struction as a Cartesian product). If there are K stages, then the history I-space is
∈ I
Ihist = I0 ∪ I1 ∪ I2 ∪ · · · ∪ IK . (11.18)
Most often, the number of stages is not fixed. In this case, Ihist is defined to be the union of Ik over all k ∈ {0} ∪ N:
Ihist = I0 ∪ I1 ∪ I2 ∪ · · · . (11.19)
This construction is related to the state space obtained for time-varying motion planning in Section 7.1. The history I-space is stage-dependent because informa- tion accumulates over time. In the discrete model, the reference to time is only implicit through the use of stages. Therefore, stage-dependent I-spaces are defined. Taking the union of all of these is similar to the state space that was formed in Section 7.1 by making time be one axis of the state space. For the history I-space,
Ihist, the stage index k can be imagined as an “axis.”
One immediate concern regarding the history I-space hist is that its I-states
I
may be arbitrarily long because the history grows linearly with the number of stages. For now, it is helpful to imagine Ihist abstractly as another kind of state
space, without paying close attention to how complicated each η hist may be to represent. In many contexts, there are ways to simplify the I-space. This is the topic of Section 11.2.
∈ I
- Defining a Planning Problem
Planning problems will be defined directly on the history I-space, which makes it appear as an ordinary state space in many ways. Keep in mind, however, that it was derived from another state space for which perfect state observations could not be obtained. In Section 10.1, a feedback plan was defined as a function of the state. Here, a feedback plan is instead a function of the I-state. Decisions cannot be based on the state because it will be generally unknown during the execution of the plan. However, the I-state is always known; thus, it is logical to base decisions on it.
Let πK denote a K-step information-feedback plan, which is a sequence (π1,
π2, . . ., πK ) of K functions, πk : Ik → U. Thus, at every stage k, the I-state
ηk k is used as a basis for choosing the action uk = πk(ηk). Due to interference of nature through both the state transition equation and the sensor mapping, the action sequence (u1, . . . , uK ) produced by a plan, πK , will not be known until the plan terminates.
∈ I
As in Formulation 2.3, it will be convenient to assume that U contains a termi- nation action, uT . If uT is applied at stage k, then it is repeatedly applied forever. It is assumed once again that the state xk remains fixed after the termination con- dition is applied. Remember, however, xk is still unknown in general; it becomes fixed but unknown. Technically, based on the definition of the history I-space, the I-state must change after uT is applied because the history grows. These changes can be ignored, however, because no new decisions are made after uT is applied. A plan that uses a termination condition can be specified as π = (π1, π2, . . .) because the number of stages may vary each time the plan is executed. Using the history I-space definition in (11.19), an information-feedback plan is expressed as
π : Ihist → U. (11.20)
We are almost ready to define the planning problem. This will require the spec- ification of a cost functional. The cost depends on the histories of states x˜ and actions u˜ as in Section 10.1. The planning formulation involves the following com- ponents, summarizing most of the concepts introduced so far in Section 11.1 (see Formulation 10.1 for similarities):
Formulation 11.1 (Discrete Information Space Planning)
- A nonempty state space X that is either finite or countably infinite.
- A nonempty, finite action space U. It is assumed that U contains a special
termination action, which has the same effect as defined in Formulation 2.3.
- A finite nature action space Θ(x, u) for each x ∈ X and u ∈ U.
- A state transition function f that produces a state, f(x, u, θ), for every
x ∈ X, u ∈ U, and θ ∈ Θ(x, u).
- A finite or countably infinite observation space Y .
- A finite nature sensing action space Ψ(x) for each x ∈ X.
- A sensor mapping h which produces an observation, y = h(x, ψ), for each x X and ψ Ψ(x). This definition assumes a state-nature sensor mappings. A state sensor mapping or history-based sensor mapping, as defined in Section 11.1.1, could alternatively be used.
∈
∈
- A set of stages, each denoted by k, which begins at k = 1 and continues indefinitely.
- An initial condition η0, which is an element of an initial condition space, I0.
- A history I-space Ihist which is the union of I0 and Ik = I0 × U˜k−1 × Y˜k for every stage k ∈ N.
- Let L denote a stage-additive cost functional, which may be applied to any pair (x˜K+1, u˜K ) of state and action histories to yield
K
-
L(x˜K+1, u˜K ) = l(xk, uk) + lF (xK+1). (11.21)
k=1
If the termination action uT is applied at some stage k, then for all i ≥ k, ui = uT , xi = xk, and l(xi, uT ) = 0. Either a feasible or optimal planning
problem can be defined, as in Formulation 10.1; however, the plan here is specified as π : I → U.
A goal set may be defined as XG X. Alternatively, the goal could be expressed as a desirable set of history I-states. After Section 11.2, it will be seen that the goal can be expressed in terms of I-states that are derived from histories.
⊂
Some immediate extensions of Formulation 11.1 are possible, but we avoid them here simplify notation in the coming concepts. One extension is to allow different action sets, U(x), for each x X. Be careful, however, because information regarding the current state can be inferred if the action set U(x) is given, and it varies depending on x. Another extension is to allow the costs to depend on nature, to obtain l(xk, uk, θk), instead of l(xk, uk) in (11.21).
∈
The cost of a plan The next task is to extend the definition of the cost-to-go under a fixed plan, which was given in Section 10.1.3, to the case of imperfect state information. Consider evaluating the quality of a plan, so that the “best” one might be selected. Suppose that the nondeterministic uncertainty is used to model nature and that a nondeterministic initial condition is given. If a plan π is fixed, some state and action trajectories are possible, and others are not.
It is impossible to know in general what histories will occur; however, the plan constrains the choices substantially. Let H(π, η0) denote the set of state-action
histories that could arise from π applied to the initial condition η0.
The cost of a plan π from an initial condition η0 is measured using worst-case analysis as
J \
Gπ (η0) = max L(x˜, u˜) . (11.22)
)∈H(π
(x˜,u˜ ,η )
0
Note that x˜ includes x1, which is usually not known. It may be known only to lie in X1, as specified by η0. Let Π denote the set of all possible plans. An optimal plan using worst-case analysis is any plan for which (11.22) is minimized over all π Π and η0 0. In the case of feasible planning, there are usually numerous equivalent alternatives.
∈ ∈ I
Under probabilistic uncertainty, the cost of a plan can be measured using
expected-case analysis as
Gπ (η0) = EH(_π,η_0)「L(x˜, u˜)l, (11.23)
in which E denotes the mathematical expectation of the cost, with the probability distribution taken over (π, η0). The task is to find a plan π Π that minimizes (11.23).
H ∈
The information space is just another state space It will become important throughout this chapter and Chapter 12 to view the I-space as an ordinary state space. It only seems special because it is derived from another state space, but once this is forgotten, it exhibits many properties of an ordinary state space in planning. One nice feature is that the state in this special space is always known. Thus, by converting from an original state space to its I-space, we also convert from having imperfect state information to always knowing the state, albeit in a larger state space.
One important consequence of this interpretation is that the state transition equation can be lifted into the I-space to obtain an information transition function,
fI. Suppose that there are no sensors, and therefore no observations. In this case, future I-states are predictable, which leads to
ηk+1 = fI(ηk, uk). (11.24)
The function fI generates ηk+1 by concatenating uk onto ηk.
Now suppose that there are observations, which are generally unpredictable. In Section 10.1, the nature action θk ∈ Θ(x, u) was used to model the unpredictability.
In terms of the information transition equation, yk+1 serves the same purpose. When the decision is made to apply uk, the observation yk+1 is not yet known (just as θk is unknown in Section 10.1). In a sequential game against nature with perfect state information, xk+1 is directly observed at the next stage. For the information transition equation, yk+1 is instead observed, and ηk+1 can be determined. Using the history I-state representation, (11.14), simply concatenate uk and yk+1 onto the histories in ηk to obtain ηk+1. The information transition equation is expressed as
ηk+1 = fI(ηk, uk, yk+1), (11.25)
Ihist
κest
κndet
κprob
_κ_1
_κ_2
Indet I2
_κ_3
Iprob I3
X I1
Figure 11.3: Many alternative information mappings may be proposed. Each leads to a derived information space.
with the understanding that yk+1 plays the same role as θk in the case of perfect state information and unpredictable future states. Even though nature causes future I-states to be unpredictable, the current I-state is always known. A plan,
π : I → U, now seems like a state-feedback plan, if the I-space is viewed as a state
space. The transitions are all specified by fI.
The costs in this new state space can be derived from the original cost func-
tional, but a maximization or expectation is needed over all possible states given the current information. This will be covered in Section 12.1.