Derived Information Spaces

The history I-space appears to be quite complicated. Every I-state corresponds to a history of actions and observations. Unfortunately, the length of the I-state sequence grows linearly with the number of stages. To overcome this diﬃcultly, it is common to map history I-states to some simpler space. In many applications, the ability to perform this simpliﬁcation is critical to ﬁnding a practical solution. In some cases, the simpliﬁcation fully preserves the history I-space, meaning that completeness, and optimality if applicable, is not lost. In other cases, we are willing to tolerate a simpliﬁcation that destroys much of the structure of the history I- space. This may be necessary to obtain a dramatic reduction in the size of the I-space.

- 1. Information Mappings

Consider a function that maps the history I-space into a space that is simpler to manage. Formally, let κ : Ihist → Ider denote a function from a history I-space,

hist, to a derived I-space, der. The function, κ, is called an information mapping,

I I

or I-map. The derived I-space may be any set; hence, there is great ﬂexibility in deﬁning an I-map.2 Figure 11.3 illustrates the idea. The starting place is Ihist,

and mappings are made to various derived I-spaces. Some generic mappings, κ1,

2Ideally, the mapping should be onto der; however, to facilitate some definitions, this will not be required.

κ2, and κ3, are shown, along with some very important kinds, est, ndet and prob. The last two are the subjects of Sections 11.2.2 and 11.2.3, respectively. The other important I-map is κest, which uses the history to estimate the state; hence, the derived I-space is X (see Example 11.11). In general, an I-map can even map any

I I I

derived I-space to another, yielding κ : Ider → Id′ er, for any I-spaces Ider and Id′ er. Note that any composition of I-maps yields an I-map. The derived I-spaces I2 and I3 from Figure 11.3 are obtained via compositions.

Making smaller information-feedback plans The primary use of an I-map

is to simplify the description of a plan. In Section 11.1.3, a plan was deﬁned as a function on the history I-space, Ihist. Suppose that an I-map, κ, is introduced that maps from Ihist to Ider. A feedback plan on Ider is deﬁned as π : Ider → U. To execute a plan deﬁned on Ider, the derived I-state is computed at each stage k by applying κ to ηk to obtain κ(ηk) ∈ Ider. The action selected by π is π(κ(ηk)) ∈ U. To understand the eﬀect of using Ider instead of Ihist as the domain of π, consider the set of possible plans that can be represented over Ider. Let Πhist and

Πder be the sets of all plans over hist and der, respectively. Any π Πder can be converted into an equivalent plan, π′ Πhist, as follows: For each η hist, deﬁne π′(η) = π(κ(η)).

∈ ∈ I

I I ∈

It is not always possible, however, to construct a plan, π Πder, from some π′ hist. The problem is that there may exist some η1, η2 hist for which π′(η1) = π′(η2) and κ(η1) = κ(η2). In words, this means that the plan in Πhist requires that two histories cause diﬀerent actions, but in the derived I-space the histories cannot be distinguished. For a plan in Πder, both histories must yield the same action.

∈ I ∈ I

∈

An I-map κ has the potential to collapse hist down to a smaller I-space by inducing a partition of hist. For each ηder der, let the preimage κ−1(ηder) be deﬁned as

I ∈ I

κ−1(ηder) = {η ∈ Ihist | ηder = κ(η)}. (11.26)

This yields the set of history I-states that map to ηder. The induced partition can intuitively be considered as the “resolution” at which the history I-space is characterized. If the sets in (11.26) are large, then the I-space is substantially reduced. The goal is to select κ to make the sets in the partition as large as possible; however, one must be careful to avoid collapsing the I-space so much that the problem can no longer be solved.

Example 11.11 (State Estimation) In this example, the I-map is the classical approach that is conveniently taken in numerous applications. Suppose that a

technique has been developed that uses the history I-state η ∈ Ihist to compute

an estimate of the current state. In this case, the I-map is κest : hist X. The

I →

derived I-space happens to be X in this case! This means that a plan is speciﬁed as π : X → U, which is just a state-feedback plan.

Consider the partition of hist that is induced by κest. For each x X, the set

I ∈

κ−e 1(x), as deﬁned in (11.26), is the set of all histories that lead to the same state

estimate. A plan on X can no longer distinguish between various histories that led to the same state estimate. One implication is that the ability to encode the amount of uncertainty in the state estimate has been lost. For example, it might be wise to make the action depend on the covariance in the estimate of x; however,

this is not possible because decisions are based only on the estimate itself. ．

Example 11.12 (Stage Indices) Consider an I-map, κstage, that returns only the current stage index. Thus, κstage(ηk) = k. The derived I-space is the set of stages, which is N. A feedback plan on the derived I-space is speciﬁed as

π : N → U. This is equivalent to specifying a plan as an action sequence,

(u1, u2, . . . , ), as in Section 2.3.2. Since the feedback is trivial, this is precisely

the original case of planning without feedback, which is also refereed to as an open-loop plan. ．

Constructing a derived information transition equation As presented so far, the full history I-state is needed to determine a derived I-state. It may be preferable, however, to discard histories and work entirely in the derived I-space. Without storing the histories on the machine or robot, a derived information transition equation needs to be developed. The important requirement in this case is as follows:

If ηk is replaced by κ(ηk), then κ(ηk+1) must be correctly determined using only κ(ηk), uk, and yk+1.

Whether this requirement can be met depends on the particular I-map. An- other way to express the requirement is that if κ(ηk) is given, then the full history η does not contain any information that could further constrain κ(ηk+1). The information provided by κ is sufficient for determining the next derived I-states. This is similar to the concept of a sufficient statistic, which arises in decision

theory [89]. If the requirement is met, then κ is called a sufficient I-map. One peculiarity is that the suﬃciency is relative to Ider, as opposed to being absolute in some sense. For example, any I-map that maps onto Ider = {0} is suﬃcient

because κ(ηk+1) is always known (it remains ﬁxed at 0). Thus, the requirement for suﬃciency depends strongly on the particular derived I-space.

For a suﬃcient I-map, a derived information transition equation is determined

κ(ηk+1) = fI der(κ(ηk), uk, yk+1). (11.27)

The implication is that der is the new I-space in which the problem “lives.” There is no reason for the decision maker to consider histories. This idea is crucial to the success of many planning algorithms. Sections 11.2.2 and 11.2.3 introduce nondeterministic I-spaces and probabilistic I-spaces, which are two of the most important derived I-spaces and are obtained from suﬃcient I-maps. The I-map

_f_I

Ihist

_f_I

Ihist

κ κ κ−1

singleton?

Ider

f_I _der?

Ider

Ider Ider

Stage k Stage k + 1 Stage k Stage k + 1

(a) (b)

Figure 11.4: (a) For an I-map to be suﬃcient, the same result must be reached in the lower right, regardless of the path taken from the upper left. (b) The problem is that κ images may contain many histories, which eventually map to multiple derived I-states.

κstage from Example 11.12 is also suﬃcient. The estimation I-map from Example

11.11 is usually not suﬃcient because some history is needed to provide a better estimate.

The diagram in Figure 11.4a indicates the problem of obtaining a suﬃcient I-map. The top of the diagram shows the history I-state transitions before the I-map was introduced. The bottom of the diagram shows the attempted derived information transition equation, fI der. The requirement is that the derived I-state obtained in the lower right must be the same regardless of which path is followed from the upper left. Either fI can be applied to η, followed by κ, or κ can be applied to η, followed by some fI der. The problem with the existence of fI der is that κ is usually not invertible. The preimage κ−1(ηder) of some derived I-state ηder der yields a set of histories in hist. Applying fI to all of these yields a set of possible next-stage history I-states. Applying κ to these may yield a set of derived I-states because of the ambiguity introduced by κ−1. This chain of mappings is shown in Figure 11.4b. If a singleton is obtained under all circumstances, then this yields the required values of fI der. Otherwise, new uncertainty arises about the current derived I-state. This could be handled by deﬁning an information space over the information space, but this nastiness will be avoided here.

∈ I I

Since I-maps can be deﬁned from any derived I-space to another, the concepts presented in this section do not necessarily require Ihist as the starting point. For example, an I-map, κ : Ider → Id′ er, may be called sufficient with respect to Ider rather than with respect to Ihist.

- 1. Nondeterministic Information Spaces

This section deﬁnes the I-map κndet from Figure 11.3, which converts each history I-state into a subset of X that corresponds to all possible current states. Nature

is modeled nondeterministically, which means that there is no information about what actions nature will choose, other than that they will be chosen from Θ and Ψ. Assume that the state-action sensor mapping from Section 11.1.1 is used. Consider what inferences may be drawn from a history I-state, ηk = (η0, u˜k−1, y˜k). Since the

model does not involve probabilities, let η0 represent a set X1 ⊆ X. Let κndet(ηk)

be the minimal subset of X in which xk is known to lie given ηk. This subset is referred to as a nondeterministic I-state. To remind you that κndet(ηk) is a subset of X, it will now be denoted as Xk(ηk). It is important that Xk(ηk) be as small as possible while consistent with ηk.

Recall from (11.6) that for every observation yk, a set H(yk) ⊆ X of possible

values for xk can be inferred. This could serve as a crude estimate of the nondeter- ministic I-state. It is certainly known that Xk(ηk) H(yk); otherwise, xk, would not be consistent with the current sensor observation. If we carefully progress from the initial conditions while applying constraints due to the state transition equation, the appropriate subset of H(yk) will be obtained.

⊆

From the state transition function f, deﬁne a set-valued function F that yields a subset of X for every x ∈ X and u ∈ U as

F(x, u) = {x′ ∈ X | ∃θ ∈ Θ(x, u) for which x′ = f(x, u, θ)}. (11.28)

Note that both F and H are set-valued functions that eliminate the direct ap- pearance of nature actions. The eﬀect of nature is taken into account in the set that is obtained when these functions are applied. This will be very convenient for computing the nondeterministic I-state.

An inductive process will now be described that results in computing the nonde- terministic I-state, Xk(ηk), for any stage k. The base case, k = 1, of the induction proceeds as

X1(η1) = X1(η0, y1) = X1 ∩ H(y1). (11.29)

The ﬁrst part of the equation replaces η1 with (η0, y1), which is a longer way to write the history I-state. There are not yet any actions in the history. The second part applies set intersection to make consistent the two pieces of information: 1) The initial state lies in X1, which is the initial condition, and 2) the states in H(y1) are possible given the observation y1.

Now assume inductively that Xk(ηk) ⊆ X has been computed and the task is

to compute Xk+1(ηk+1). From (11.15), ηk+1 = (ηk, uk, yk+1). Thus, the only new pieces of information are that uk was applied and yk+1 was observed. These will be considered one at a time.

Consider computing Xk+1(ηk, uk). If xk was known, then after applying uk, the state could lie anywhere within F(xk, uk), using (11.28). Although xk is actually

not known, it is at least known that xk ∈ Xk(ηk). Therefore,

Xk+1(ηk, uk) =

xk ∈Xk (ηk )

F(xk, uk). (11.30)

This can be considered as the set of all states that can be reached by starting from

some state in Xk(ηk) and applying any actions uk U and θk Θ(xk, uk). See Figure 11.5.

∈ ∈

F (xk, uk)

Xk(ηk)

Xk+1(ηk, uk)

Figure 11.5: The ﬁrst step in computing the nondeterministic I-state is to take the union of F(xk, uk) over all possible xk ∈ Xk(ηk).

The next step is to take into account the observation yk+1. This information alone indicates that xk+1 lies in H(yk+1). Therefore, an intersection is performed to obtain the nondeterministic I-state,

Xk+1(ηk+1) = Xk+1(ηk, uk, yk+1) = Xk+1(ηk, uk) ∩ H(yk+1). (11.31)

Thus, it has been shown how to compute Xk+1(ηk+1) from Xk(ηk). After start- ing with (11.29), the nondeterministic I-states at any stage can be computed by iterating (11.30) and (11.31) as many times as necessary.

Since the nondeterministic I-state is always a subset of X, the nondeterministic I-space, Indet = pow(X), is obtained (shown in Figure 11.3). If X is ﬁnite, then

ndet is also ﬁnite, which was not the case with hist because the histories continued to grow with the number of stages. Thus, if the number of stages is unbounded or large in comparison to the size of X, then nondeterministic I-states seem preferable. It is also convenient that κndet is a suﬃcient I-map, as deﬁned in Section 11.2.1.

I I

This implies that a planning problem can be completely expressed in terms of Indet

without maintaining the histories. The goal region, XG, can be expressed directly

as a nondeterministic I-state. In this way, the planning task is to terminate in a nondeterministic I-state, Xk(ηk), for which Xk(ηk) ⊆ XG.

The suﬃciency of κndet is obtained because (11.30) and (11.31) show that Xk+1(ηk+1) can be computed from Xk(ηk), uk, and yk+1. This implies that a derived information transition equation can be formed. The nondeterministic I- space can also be treated as “just another state space.” Although many history I-states may map to the same nondeterministic I-state, it has been assumed for decision-making purposes that particular history is irrelevant, once Xk(ηk) is given.

The following example is not very interesting in itself, but it is simple enough to illustrate the concepts.

Example 11.13 (Three-State Example) Let X = {0, 1, 2}, U = {−1, 0, 1}, and Θ(x, u) = {0, 1} for all x ∈ X and u ∈ U. The state transitions are given

by f(x, u, θ) = (x + u + θ) mod 3. Regarding sensing, Y = {0, 1, 2, 3, 4} and Ψ(x) = {0, 1, 2} for all x ∈ X. The sensor mapping is y = h(x, ψ) = x + ψ.

The history I-space appears very cumbersome for this example, which only involves three states. The nondeterministic I-space for this example is

Indet = {∅, {0}, {1}, {2}, {0, 1}, {1, 2}, {0, 2}, {0, 1, 2}}, (11.32)

which is the power set of X = {0, 1, 2}. Note, however, that the empty set, ∅, can usually be deleted from Indet. Suppose that the initial condition is X1 = {0, 2}

and that the initial state is x1 = 0. The initial state is unknown to the decision maker, but it is needed to ensure that valid observations are made in the example. Now consider the execution over a number of stages. Suppose that the ﬁrst

observation is y1 = 2. Based on the sensor mapping, H(y1) = H(2) = {0, 1, 2},

which is not very helpful because H(2) = X. Applying (11.29) yields X1(η1) =

{0, 2}. Now suppose that the decision maker applies the action u1 = 1 and nature

applies θ1 = 1. Using f, this yields x2 = 2. The decision maker does not know θ1 and must therefore take into account any nature action that could have been applied. It uses (11.30) to infer that

X2(η1, u1) = F(2, 1) ∪ F(0, 1) = {0, 1} ∪ {1, 2} = {0, 1, 2}. (11.33)

Now suppose that y2 = 3. From the sensor mapping, H(3) = 1, 2 . Applying (11.31) yields

{ }

X2(η2) = X2(η1, u1) ∩ H(y2) = {0, 1, 2} ∩ {1, 2} = {1, 2}. (11.34)

This process may be repeated for as many stages as desired. A path is generated through Indet by visiting a sequence of nondeterministic I-states. If the observa-

tion yk = 4 is ever received, the state, xk, becomes immediately known because

H(4) = {2}. ．

- 1. Probabilistic Information Spaces

This section deﬁnes the I-map κprob from Figure 11.3, which converts each history I-state into a probability distribution over X. A Markov, probabilistic model is assumed in the sense that the actions of nature only depend on the current state and action, as opposed to state or action histories. The set union and intersection of (11.30) and (11.31) are replaced in this section by marginalization and Bayes’ rule, respectively. In a sense, these are the probabilistic equivalents of union and intersection. It will be very helpful to compare the expressions from this section to those of Section 11.2.2.

3One notable exception is in the theory of nondeterministic finite automata, in which it is possible that all copies of the machine die and there is no possible current state [891].

Rather than write κprob(η), standard probability notation will be applied to obtain P(x|η). Most expressions in this section of the form P(xk|·) have an anal-

ogous expression in Section 11.2.2 of the form Xk( ). It is helpful to recognize the similarities.

The ﬁrst step is to construct probabilistic versions of H and F. These are

P(xk|yk) and P(xk+1|xk, uk), respectively. The latter term was given in Section

10.1.1. To obtain P(xk|yk), recall from Section 11.1.1 that P(yk|xk) is easily derived from P(ψk|xk). To obtain P(xk|yk), Bayes’ rule is applied:

P(xk

|yk

P(y x )P(x ) P(y x )P(x )

) = = . (11.35)

k| k k k| k k

P(yk)

P(yk|xk)P(xk)

xk ∈X

In the last step, P(yk) was rewritten using marginalization, (9.8). In this case xk appears as the sum index; therefore, the denominator is only a function of yk, as required. Bayes’ rule requires knowing the prior, P(xk). In the coming expressions, this will be replaced by a probabilistic I-state.

Now consider deﬁning probabilistic I-states. Each is a probability distribution over X and is written as P(xk ηk). The initial condition produces P(x1). As for the nondeterministic case, probabilistic I-states can be computed inductively. For the base case, the only new piece of information is y1. Thus, the probabilistic I-state, P(x1 η1), is P(x1 y1). This is computed by letting k = 1 in (11.35) to yield

| |

P(y x )P(x )

1| 1 1

P(x |η ) = P(x |y ) = . (11.36)

1 1 1 1

P(y1|x1)P(x1)

x_1∈_X

Now consider the inductive step by assuming that P(xk|ηk) is given. The task is to determine P(xk+1|ηk+1), which is equivalent to P(xk+1|ηk, uk, yk+1). As in

Section 11.2.2, this will proceed in two parts by ﬁrst considering the eﬀect of uk, followed by yk+1. The ﬁrst step is to determine P(xk+1 ηk, uk) from P(xk ηk). First, note that

| |

P(xk+1|ηk, xk, uk) = P(xk+1|xk, uk) (11.37)

because ηk contains no additional information regarding the prediction of xk+1 once xk is given. Marginalization, (9.8), can be used to eliminate xk from P(xk+1 xk, uk). This must be eliminated because it is not given. Putting these steps together yields P(xk+1|ηk, uk) = - P(xk+1|xk, uk, ηk)P(xk|ηk)

xk ∈X

= P(xk+1|xk, uk)P(xk|ηk),

xk ∈X

(11.38)

which expresses P(xk+1 ηk, uk) in terms of given quantities. Equation (11.38) can be considered as the probabilistic counterpart of (11.30).

The next step is to take into account the observation yk+1. This is accomplished by making a version of (11.35) that is conditioned on the information accumulated so far: ηk and uk. Also, k is replaced with k + 1. The result is

P(x

k+1

|yk+1

, ηk

, uk

P(yk+1|xk+1, ηk, uk)P(xk+1|ηk, uk)

xk-+1∈X

) = . (11.39)

P(yk+1|xk+1, ηk, uk)P(xk+1|ηk, uk)

This can be considered as the probabilistic counterpart of (11.31). The left side of (11.39) is equivalent to P(xk+1 ηk+1), which is the probabilistic I-state for stage k + 1, as desired. There are two diﬀerent kinds of terms on the right. The

expression for P(xk+1|ηk, uk) is given in (11.38). Therefore, the only remaining term to calculate is P(yk+1|xk+1, ηk, uk). Note that

P(yk+1|xk+1, ηk, uk) = P(yk+1|xk+1) (11.40)

because the sensor mapping depends only on the state (and the probability model for the nature sensing action, which also depends only on the state). Since

P(yk+1|xk+1) is speciﬁed as part of the sensor model, we have now determined

how to obtain P(xk+1 ηk+1) from P(xk ηk), uk, and yk+1. Thus, prob is another I-space that can be treated as just another state space.

| | I

The probabilistic I-space prob (shown in Figure 11.3) is the set of all probabil- ity distributions over X. The update expressions, (11.38) and (11.39), establish that the I-map κprob is suﬃcient, which means that the planning problem can be expressed entirely in terms of prob, instead of maintaining histories. A goal re- gion can be speciﬁed as constraints on the probabilities. For example, from some

particular x ∈ X, the goal might be to reach any probabilistic I-state for which

P(xk|ηk) > 1/2.

_p_1

0 _p_0 1

Figure 11.6: The probabilistic I-space for the three-state example is a 2-simplex embedded in R3. This simplex can be projected into R2 to yield the depicted triangular region in R2.

Example 11.14 (Three-State Example Revisited) Now return to Example 11.13, but this time use probabilistic models. For a probabilistic I-state, let pi

denote the probability that the current state is i X. Any probabilistic I-state can be expressed as (p0, p1, p2) R3. This implies that the I-space can be nicely embedded in R3. By the axioms of probability (given in Section 9.1.2), p0+p1+p2 =

∈

1, which can be interpreted as a plane equation in R3 that restricts Iprob to a 2D

set. Also following the axioms of probability, for each i 0, 1, 2 , 0 pi 1. This means that prob is restricted to a triangular region in R3. The vertices of this triangular region are (0, 0, 1), (0, 1, 0), and (1, 0, 0); these correspond to the

∈ { } ≤ ≤

three diﬀerent ways to have perfect state information. In a sense, the distance away from these points corresponds to the amount of uncertainty in the state. The uniform probability distribution (1/3, 1/3, 1/3) is equidistant from the three vertices. A projection of the triangular region into R2 is shown in Figure 11.6.

The interpretation in this case is that p0 and p1 specify a point in R2, and p2 is

automatically determined from p2 = 1 p0 p1.

− −

The triangular region in R3 is an uncountably inﬁnite set, even though the history I-space is countably inﬁnite for a ﬁxed initial condition. This may seem strange, but there is no mistake because for a ﬁxed initial condition, it is generally

impossible to reach all of the points in Iprob. If the initial condition can be any point in Iprob, then all of the probabilistic I-space is covered because I0 = Iprob, in which I0 is the initial condition space.. ．

- 1. Limited-Memory Information Spaces

Limiting the amount of memory provides one way to reduce the sizes of history I-states. Except in special cases, this usually does not preserve the feasibility or optimality of the original problem. Nevertheless, such I-maps are very useful in practice when there appears to be no other way to reduce the size of the I-space. Furthermore, they occasionally do preserve the desired properties of feasibility, and sometimes even optimality.

Previous i stages Under this model, the history I-state is truncated. Any actions or observations received earlier than i stages ago are dropped from memory.

An I-map, κi, is deﬁned as

κi(ηk) = (uk−i, . . . , uk−1, yk−i+1, . . . , yk), (11.41)

for any integer i > 0 and k > i. If i k, then the derived I-state is the full history I-state, (11.14). The advantage of this approach, if it leads to a solution, is that the length of the I-state no longer grows with the number of stages. If X and U are ﬁnite, then the derived I-space is also ﬁnite. Note that κi is suﬃcient in the sense deﬁned in Section 11.2.1 because enough history is passed from stage to stage to determine the derived I-states.

≤

Sensor feedback An interesting I-map is obtained by removing all but the last sensor observation from the history I-state. This yields an I-map, κsf : Ihist → Y ,

which is deﬁned as κsf (ηk) = yk. The model is referred to as sensor feedback. In this case, all decisions are made directly in terms of the current sensor observation. The derived I-space is Y , and a plan on the derived I-space is π : Y U, which is called a sensor-feedback plan. In some literature, this may be referred to as a purely reactive plan. Many problems for which solutions exist in the history I-space cannot be solved using sensor feedback. Neglecting history prevents the complicated deductions that are often needed regarding the state. In some sense, sensor feedback causes short-sightedness that could unavoidably lead to repeating the same mistakes indeﬁnitely. However, it may be worth determining whether such a sensor-feedback solution plan exists for some particular problem. Such plans tend to be simpler to implement in practice because the actions can be connected directly to the sensor output. Certainly, if a sensor-feedback solution plan exists for a problem, and feasibility is the only concern, then it is pointless to design and implement a plan in terms of the history I-space or some larger derived I-space. Note that this I-map is suﬃcient, even though it ignores the entire history.

→

Derived Information Spaces

Derived Information Spaces

results matching ""

No results matching ""