Derived Information Spaces
The history I-space appears to be quite complicated. Every I-state corresponds to a history of actions and observations. Unfortunately, the length of the I-state sequence grows linearly with the number of stages. To overcome this difficultly, it is common to map history I-states to some simpler space. In many applications, the ability to perform this simplification is critical to finding a practical solution. In some cases, the simplification fully preserves the history I-space, meaning that completeness, and optimality if applicable, is not lost. In other cases, we are willing to tolerate a simplification that destroys much of the structure of the history I- space. This may be necessary to obtain a dramatic reduction in the size of the I-space.
- Information Mappings
Consider a function that maps the history I-space into a space that is simpler to manage. Formally, let κ : Ihist → Ider denote a function from a history I-space,
hist, to a derived I-space, der. The function, κ, is called an information mapping,
I I
or I-map. The derived I-space may be any set; hence, there is great flexibility in defining an I-map.2 Figure 11.3 illustrates the idea. The starting place is Ihist,
and mappings are made to various derived I-spaces. Some generic mappings, κ1,
I
2Ideally, the mapping should be onto der; however, to facilitate some definitions, this will not be required.
κ2, and κ3, are shown, along with some very important kinds, est, ndet and prob. The last two are the subjects of Sections 11.2.2 and 11.2.3, respectively. The other important I-map is κest, which uses the history to estimate the state; hence, the derived I-space is X (see Example 11.11). In general, an I-map can even map any
I I I
derived I-space to another, yielding κ : Ider → Id′ er, for any I-spaces Ider and Id′ er. Note that any composition of I-maps yields an I-map. The derived I-spaces I2 and I3 from Figure 11.3 are obtained via compositions.
Making smaller information-feedback plans The primary use of an I-map
is to simplify the description of a plan. In Section 11.1.3, a plan was defined as a function on the history I-space, Ihist. Suppose that an I-map, κ, is introduced that maps from Ihist to Ider. A feedback plan on Ider is defined as π : Ider → U. To execute a plan defined on Ider, the derived I-state is computed at each stage k by applying κ to ηk to obtain κ(ηk) ∈ Ider. The action selected by π is π(κ(ηk)) ∈ U. To understand the effect of using Ider instead of Ihist as the domain of π, consider the set of possible plans that can be represented over Ider. Let Πhist and
Πder be the sets of all plans over hist and der, respectively. Any π Πder can be converted into an equivalent plan, π′ Πhist, as follows: For each η hist, define π′(η) = π(κ(η)).
∈ ∈ I
I I ∈
It is not always possible, however, to construct a plan, π Πder, from some π′ hist. The problem is that there may exist some η1, η2 hist for which π′(η1) = π′(η2) and κ(η1) = κ(η2). In words, this means that the plan in Πhist requires that two histories cause different actions, but in the derived I-space the histories cannot be distinguished. For a plan in Πder, both histories must yield the same action.
/
∈ I ∈ I
∈
An I-map κ has the potential to collapse hist down to a smaller I-space by inducing a partition of hist. For each ηder der, let the preimage κ−1(ηder) be defined as
I ∈ I
I
κ−1(ηder) = {η ∈ Ihist | ηder = κ(η)}. (11.26)
This yields the set of history I-states that map to ηder. The induced partition can intuitively be considered as the “resolution” at which the history I-space is characterized. If the sets in (11.26) are large, then the I-space is substantially reduced. The goal is to select κ to make the sets in the partition as large as possible; however, one must be careful to avoid collapsing the I-space so much that the problem can no longer be solved.
Example 11.11 (State Estimation) In this example, the I-map is the classical approach that is conveniently taken in numerous applications. Suppose that a
technique has been developed that uses the history I-state η ∈ Ihist to compute
an estimate of the current state. In this case, the I-map is κest : hist X. The
I →
derived I-space happens to be X in this case! This means that a plan is specified as π : X → U, which is just a state-feedback plan.
Consider the partition of hist that is induced by κest. For each x X, the set
I ∈
κ−e 1(x), as defined in (11.26), is the set of all histories that lead to the same state
st
estimate. A plan on X can no longer distinguish between various histories that led to the same state estimate. One implication is that the ability to encode the amount of uncertainty in the state estimate has been lost. For example, it might be wise to make the action depend on the covariance in the estimate of x; however,
this is not possible because decisions are based only on the estimate itself. .
Example 11.12 (Stage Indices) Consider an I-map, κstage, that returns only the current stage index. Thus, κstage(ηk) = k. The derived I-space is the set of stages, which is N. A feedback plan on the derived I-space is specified as
π : N → U. This is equivalent to specifying a plan as an action sequence,
(u1, u2, . . . , ), as in Section 2.3.2. Since the feedback is trivial, this is precisely
the original case of planning without feedback, which is also refereed to as an open-loop plan. .
Constructing a derived information transition equation As presented so far, the full history I-state is needed to determine a derived I-state. It may be preferable, however, to discard histories and work entirely in the derived I-space. Without storing the histories on the machine or robot, a derived information transition equation needs to be developed. The important requirement in this case is as follows:
If ηk is replaced by κ(ηk), then κ(ηk+1) must be correctly determined using only κ(ηk), uk, and yk+1.
Whether this requirement can be met depends on the particular I-map. An- other way to express the requirement is that if κ(ηk) is given, then the full history η does not contain any information that could further constrain κ(ηk+1). The information provided by κ is sufficient for determining the next derived I-states. This is similar to the concept of a sufficient statistic, which arises in decision
theory [89]. If the requirement is met, then κ is called a sufficient I-map. One peculiarity is that the sufficiency is relative to Ider, as opposed to being absolute in some sense. For example, any I-map that maps onto Ider = {0} is sufficient
because κ(ηk+1) is always known (it remains fixed at 0). Thus, the requirement for sufficiency depends strongly on the particular derived I-space.
For a sufficient I-map, a derived information transition equation is determined
as
κ(ηk+1) = fI der(κ(ηk), uk, yk+1). (11.27)
The implication is that der is the new I-space in which the problem “lives.” There is no reason for the decision maker to consider histories. This idea is crucial to the success of many planning algorithms. Sections 11.2.2 and 11.2.3 introduce nondeterministic I-spaces and probabilistic I-spaces, which are two of the most important derived I-spaces and are obtained from sufficient I-maps. The I-map
I
_f_I
Ihist
Ihist
_f_I
Ihist
Ihist
κ κ κ−1
κ
singleton?
Ider
f_I _der?
Ider
Ider Ider
Stage k Stage k + 1 Stage k Stage k + 1
(a) (b)
Figure 11.4: (a) For an I-map to be sufficient, the same result must be reached in the lower right, regardless of the path taken from the upper left. (b) The problem is that κ images may contain many histories, which eventually map to multiple derived I-states.
κstage from Example 11.12 is also sufficient. The estimation I-map from Example
11.11 is usually not sufficient because some history is needed to provide a better estimate.
The diagram in Figure 11.4a indicates the problem of obtaining a sufficient I-map. The top of the diagram shows the history I-state transitions before the I-map was introduced. The bottom of the diagram shows the attempted derived information transition equation, fI der. The requirement is that the derived I-state obtained in the lower right must be the same regardless of which path is followed from the upper left. Either fI can be applied to η, followed by κ, or κ can be applied to η, followed by some fI der. The problem with the existence of fI der is that κ is usually not invertible. The preimage κ−1(ηder) of some derived I-state ηder der yields a set of histories in hist. Applying fI to all of these yields a set of possible next-stage history I-states. Applying κ to these may yield a set of derived I-states because of the ambiguity introduced by κ−1. This chain of mappings is shown in Figure 11.4b. If a singleton is obtained under all circumstances, then this yields the required values of fI der. Otherwise, new uncertainty arises about the current derived I-state. This could be handled by defining an information space over the information space, but this nastiness will be avoided here.
∈ I I
Since I-maps can be defined from any derived I-space to another, the concepts presented in this section do not necessarily require Ihist as the starting point. For example, an I-map, κ : Ider → Id′ er, may be called sufficient with respect to Ider rather than with respect to Ihist.
- Nondeterministic Information Spaces
This section defines the I-map κndet from Figure 11.3, which converts each history I-state into a subset of X that corresponds to all possible current states. Nature
is modeled nondeterministically, which means that there is no information about what actions nature will choose, other than that they will be chosen from Θ and Ψ. Assume that the state-action sensor mapping from Section 11.1.1 is used. Consider what inferences may be drawn from a history I-state, ηk = (η0, u˜k−1, y˜k). Since the
model does not involve probabilities, let η0 represent a set X1 ⊆ X. Let κndet(ηk)
be the minimal subset of X in which xk is known to lie given ηk. This subset is referred to as a nondeterministic I-state. To remind you that κndet(ηk) is a subset of X, it will now be denoted as Xk(ηk). It is important that Xk(ηk) be as small as possible while consistent with ηk.
Recall from (11.6) that for every observation yk, a set H(yk) ⊆ X of possible
values for xk can be inferred. This could serve as a crude estimate of the nondeter- ministic I-state. It is certainly known that Xk(ηk) H(yk); otherwise, xk, would not be consistent with the current sensor observation. If we carefully progress from the initial conditions while applying constraints due to the state transition equation, the appropriate subset of H(yk) will be obtained.
⊆
From the state transition function f, define a set-valued function F that yields a subset of X for every x ∈ X and u ∈ U as
F(x, u) = {x′ ∈ X | ∃θ ∈ Θ(x, u) for which x′ = f(x, u, θ)}. (11.28)
Note that both F and H are set-valued functions that eliminate the direct ap- pearance of nature actions. The effect of nature is taken into account in the set that is obtained when these functions are applied. This will be very convenient for computing the nondeterministic I-state.
An inductive process will now be described that results in computing the nonde- terministic I-state, Xk(ηk), for any stage k. The base case, k = 1, of the induction proceeds as
X1(η1) = X1(η0, y1) = X1 ∩ H(y1). (11.29)
The first part of the equation replaces η1 with (η0, y1), which is a longer way to write the history I-state. There are not yet any actions in the history. The second part applies set intersection to make consistent the two pieces of information: 1) The initial state lies in X1, which is the initial condition, and 2) the states in H(y1) are possible given the observation y1.
Now assume inductively that Xk(ηk) ⊆ X has been computed and the task is
to compute Xk+1(ηk+1). From (11.15), ηk+1 = (ηk, uk, yk+1). Thus, the only new pieces of information are that uk was applied and yk+1 was observed. These will be considered one at a time.
Consider computing Xk+1(ηk, uk). If xk was known, then after applying uk, the state could lie anywhere within F(xk, uk), using (11.28). Although xk is actually
not known, it is at least known that xk ∈ Xk(ηk). Therefore,
Xk+1(ηk, uk) =
I
xk ∈Xk (ηk )
F(xk, uk). (11.30)
This can be considered as the set of all states that can be reached by starting from
some state in Xk(ηk) and applying any actions uk U and θk Θ(xk, uk). See Figure 11.5.
∈ ∈
F (xk, uk)
Xk(ηk)
Xk+1(ηk, uk)
Figure 11.5: The first step in computing the nondeterministic I-state is to take the union of F(xk, uk) over all possible xk ∈ Xk(ηk).
The next step is to take into account the observation yk+1. This information alone indicates that xk+1 lies in H(yk+1). Therefore, an intersection is performed to obtain the nondeterministic I-state,
Xk+1(ηk+1) = Xk+1(ηk, uk, yk+1) = Xk+1(ηk, uk) ∩ H(yk+1). (11.31)
Thus, it has been shown how to compute Xk+1(ηk+1) from Xk(ηk). After start- ing with (11.29), the nondeterministic I-states at any stage can be computed by iterating (11.30) and (11.31) as many times as necessary.
Since the nondeterministic I-state is always a subset of X, the nondeterministic I-space, Indet = pow(X), is obtained (shown in Figure 11.3). If X is finite, then
ndet is also finite, which was not the case with hist because the histories continued to grow with the number of stages. Thus, if the number of stages is unbounded or large in comparison to the size of X, then nondeterministic I-states seem preferable. It is also convenient that κndet is a sufficient I-map, as defined in Section 11.2.1.
I I
This implies that a planning problem can be completely expressed in terms of Indet
without maintaining the histories. The goal region, XG, can be expressed directly
as a nondeterministic I-state. In this way, the planning task is to terminate in a nondeterministic I-state, Xk(ηk), for which Xk(ηk) ⊆ XG.
The sufficiency of κndet is obtained because (11.30) and (11.31) show that Xk+1(ηk+1) can be computed from Xk(ηk), uk, and yk+1. This implies that a derived information transition equation can be formed. The nondeterministic I- space can also be treated as “just another state space.” Although many history I-states may map to the same nondeterministic I-state, it has been assumed for decision-making purposes that particular history is irrelevant, once Xk(ηk) is given.
The following example is not very interesting in itself, but it is simple enough to illustrate the concepts.
Example 11.13 (Three-State Example) Let X = {0, 1, 2}, U = {−1, 0, 1}, and Θ(x, u) = {0, 1} for all x ∈ X and u ∈ U. The state transitions are given
by f(x, u, θ) = (x + u + θ) mod 3. Regarding sensing, Y = {0, 1, 2, 3, 4} and Ψ(x) = {0, 1, 2} for all x ∈ X. The sensor mapping is y = h(x, ψ) = x + ψ.
The history I-space appears very cumbersome for this example, which only involves three states. The nondeterministic I-space for this example is
Indet = {∅, {0}, {1}, {2}, {0, 1}, {1, 2}, {0, 2}, {0, 1, 2}}, (11.32)
which is the power set of X = {0, 1, 2}. Note, however, that the empty set, ∅, can usually be deleted from Indet. Suppose that the initial condition is X1 = {0, 2}
3
and that the initial state is x1 = 0. The initial state is unknown to the decision maker, but it is needed to ensure that valid observations are made in the example. Now consider the execution over a number of stages. Suppose that the first
observation is y1 = 2. Based on the sensor mapping, H(y1) = H(2) = {0, 1, 2},
which is not very helpful because H(2) = X. Applying (11.29) yields X1(η1) =
{0, 2}. Now suppose that the decision maker applies the action u1 = 1 and nature
applies θ1 = 1. Using f, this yields x2 = 2. The decision maker does not know θ1 and must therefore take into account any nature action that could have been applied. It uses (11.30) to infer that
X2(η1, u1) = F(2, 1) ∪ F(0, 1) = {0, 1} ∪ {1, 2} = {0, 1, 2}. (11.33)
Now suppose that y2 = 3. From the sensor mapping, H(3) = 1, 2 . Applying (11.31) yields
{ }
X2(η2) = X2(η1, u1) ∩ H(y2) = {0, 1, 2} ∩ {1, 2} = {1, 2}. (11.34)
This process may be repeated for as many stages as desired. A path is generated through Indet by visiting a sequence of nondeterministic I-states. If the observa-
tion yk = 4 is ever received, the state, xk, becomes immediately known because
H(4) = {2}. .
- Probabilistic Information Spaces
This section defines the I-map κprob from Figure 11.3, which converts each history I-state into a probability distribution over X. A Markov, probabilistic model is assumed in the sense that the actions of nature only depend on the current state and action, as opposed to state or action histories. The set union and intersection of (11.30) and (11.31) are replaced in this section by marginalization and Bayes’ rule, respectively. In a sense, these are the probabilistic equivalents of union and intersection. It will be very helpful to compare the expressions from this section to those of Section 11.2.2.
3One notable exception is in the theory of nondeterministic finite automata, in which it is possible that all copies of the machine die and there is no possible current state [891].
Rather than write κprob(η), standard probability notation will be applied to obtain P(x|η). Most expressions in this section of the form P(xk|·) have an anal-
ogous expression in Section 11.2.2 of the form Xk( ). It is helpful to recognize the similarities.
·
The first step is to construct probabilistic versions of H and F. These are
P(xk|yk) and P(xk+1|xk, uk), respectively. The latter term was given in Section
10.1.1. To obtain P(xk|yk), recall from Section 11.1.1 that P(yk|xk) is easily derived from P(ψk|xk). To obtain P(xk|yk), Bayes’ rule is applied:
P(xk
|yk
P(y x )P(x ) P(y x )P(x )
) = = . (11.35)
k| k k k| k k
-
P(yk)
P(yk|xk)P(xk)
xk ∈X
In the last step, P(yk) was rewritten using marginalization, (9.8). In this case xk appears as the sum index; therefore, the denominator is only a function of yk, as required. Bayes’ rule requires knowing the prior, P(xk). In the coming expressions, this will be replaced by a probabilistic I-state.
Now consider defining probabilistic I-states. Each is a probability distribution over X and is written as P(xk ηk). The initial condition produces P(x1). As for the nondeterministic case, probabilistic I-states can be computed inductively. For the base case, the only new piece of information is y1. Thus, the probabilistic I-state, P(x1 η1), is P(x1 y1). This is computed by letting k = 1 in (11.35) to yield
|
| |
P(y x )P(x )
1| 1 1
P(x |η ) = P(x |y ) = . (11.36)
1 1 1 1
-
P(y1|x1)P(x1)
x_1∈_X
Now consider the inductive step by assuming that P(xk|ηk) is given. The task is to determine P(xk+1|ηk+1), which is equivalent to P(xk+1|ηk, uk, yk+1). As in
Section 11.2.2, this will proceed in two parts by first considering the effect of uk, followed by yk+1. The first step is to determine P(xk+1 ηk, uk) from P(xk ηk). First, note that
| |
P(xk+1|ηk, xk, uk) = P(xk+1|xk, uk) (11.37)
because ηk contains no additional information regarding the prediction of xk+1 once xk is given. Marginalization, (9.8), can be used to eliminate xk from P(xk+1 xk, uk). This must be eliminated because it is not given. Putting these steps together yields P(xk+1|ηk, uk) = - P(xk+1|xk, uk, ηk)P(xk|ηk)
|
xk ∈X
= P(xk+1|xk, uk)P(xk|ηk),
-
xk ∈X
(11.38)
which expresses P(xk+1 ηk, uk) in terms of given quantities. Equation (11.38) can be considered as the probabilistic counterpart of (11.30).
|
The next step is to take into account the observation yk+1. This is accomplished by making a version of (11.35) that is conditioned on the information accumulated so far: ηk and uk. Also, k is replaced with k + 1. The result is
P(x
k+1
|yk+1
, ηk
, uk
P(yk+1|xk+1, ηk, uk)P(xk+1|ηk, uk)
xk-+1∈X
) = . (11.39)
P(yk+1|xk+1, ηk, uk)P(xk+1|ηk, uk)
This can be considered as the probabilistic counterpart of (11.31). The left side of (11.39) is equivalent to P(xk+1 ηk+1), which is the probabilistic I-state for stage k + 1, as desired. There are two different kinds of terms on the right. The
|
expression for P(xk+1|ηk, uk) is given in (11.38). Therefore, the only remaining term to calculate is P(yk+1|xk+1, ηk, uk). Note that
P(yk+1|xk+1, ηk, uk) = P(yk+1|xk+1) (11.40)
because the sensor mapping depends only on the state (and the probability model for the nature sensing action, which also depends only on the state). Since
P(yk+1|xk+1) is specified as part of the sensor model, we have now determined
how to obtain P(xk+1 ηk+1) from P(xk ηk), uk, and yk+1. Thus, prob is another I-space that can be treated as just another state space.
| | I
The probabilistic I-space prob (shown in Figure 11.3) is the set of all probabil- ity distributions over X. The update expressions, (11.38) and (11.39), establish that the I-map κprob is sufficient, which means that the planning problem can be expressed entirely in terms of prob, instead of maintaining histories. A goal re- gion can be specified as constraints on the probabilities. For example, from some
I
I
particular x ∈ X, the goal might be to reach any probabilistic I-state for which
P(xk|ηk) > 1/2.
1
_p_1
0
0 _p_0 1
Figure 11.6: The probabilistic I-space for the three-state example is a 2-simplex embedded in R3. This simplex can be projected into R2 to yield the depicted triangular region in R2.
Example 11.14 (Three-State Example Revisited) Now return to Example 11.13, but this time use probabilistic models. For a probabilistic I-state, let pi
denote the probability that the current state is i X. Any probabilistic I-state can be expressed as (p0, p1, p2) R3. This implies that the I-space can be nicely embedded in R3. By the axioms of probability (given in Section 9.1.2), p0+p1+p2 =
∈
∈
1, which can be interpreted as a plane equation in R3 that restricts Iprob to a 2D
set. Also following the axioms of probability, for each i 0, 1, 2 , 0 pi 1. This means that prob is restricted to a triangular region in R3. The vertices of this triangular region are (0, 0, 1), (0, 1, 0), and (1, 0, 0); these correspond to the
I
∈ { } ≤ ≤
three different ways to have perfect state information. In a sense, the distance away from these points corresponds to the amount of uncertainty in the state. The uniform probability distribution (1/3, 1/3, 1/3) is equidistant from the three vertices. A projection of the triangular region into R2 is shown in Figure 11.6.
The interpretation in this case is that p0 and p1 specify a point in R2, and p2 is
automatically determined from p2 = 1 p0 p1.
− −
The triangular region in R3 is an uncountably infinite set, even though the history I-space is countably infinite for a fixed initial condition. This may seem strange, but there is no mistake because for a fixed initial condition, it is generally
impossible to reach all of the points in Iprob. If the initial condition can be any point in Iprob, then all of the probabilistic I-space is covered because I0 = Iprob, in which I0 is the initial condition space.. .
- Limited-Memory Information Spaces
Limiting the amount of memory provides one way to reduce the sizes of history I-states. Except in special cases, this usually does not preserve the feasibility or optimality of the original problem. Nevertheless, such I-maps are very useful in practice when there appears to be no other way to reduce the size of the I-space. Furthermore, they occasionally do preserve the desired properties of feasibility, and sometimes even optimality.
Previous i stages Under this model, the history I-state is truncated. Any actions or observations received earlier than i stages ago are dropped from memory.
An I-map, κi, is defined as
κi(ηk) = (uk−i, . . . , uk−1, yk−i+1, . . . , yk), (11.41)
for any integer i > 0 and k > i. If i k, then the derived I-state is the full history I-state, (11.14). The advantage of this approach, if it leads to a solution, is that the length of the I-state no longer grows with the number of stages. If X and U are finite, then the derived I-space is also finite. Note that κi is sufficient in the sense defined in Section 11.2.1 because enough history is passed from stage to stage to determine the derived I-states.
≤
Sensor feedback An interesting I-map is obtained by removing all but the last sensor observation from the history I-state. This yields an I-map, κsf : Ihist → Y ,
which is defined as κsf (ηk) = yk. The model is referred to as sensor feedback. In this case, all decisions are made directly in terms of the current sensor observation. The derived I-space is Y , and a plan on the derived I-space is π : Y U, which is called a sensor-feedback plan. In some literature, this may be referred to as a purely reactive plan. Many problems for which solutions exist in the history I-space cannot be solved using sensor feedback. Neglecting history prevents the complicated deductions that are often needed regarding the state. In some sense, sensor feedback causes short-sightedness that could unavoidably lead to repeating the same mistakes indefinitely. However, it may be worth determining whether such a sensor-feedback solution plan exists for some particular problem. Such plans tend to be simpler to implement in practice because the actions can be connected directly to the sensor output. Certainly, if a sensor-feedback solution plan exists for a problem, and feasibility is the only concern, then it is pointless to design and implement a plan in terms of the history I-space or some larger derived I-space. Note that this I-map is sufficient, even though it ignores the entire history.
→