Examples for Discrete State Spaces
- Basic Nondeterministic Examples
First, we consider a simple example that uses the sign sensor of Example 11.3.
Example 11.15 (Using the Sign Sensor) This example is similar to Example 10.1, except that it involves sensing uncertainty instead of prediction uncertainty.
Let X = Z, U = {−1, 1, uT }, Y = {−1, 0, 1}, and y = h(x) = sgnx. For the state
transition equation, xk+1 = f(xk, uk) = xk + uk. No nature actions interfere with the state transition equation or the sensor mapping. Therefore, future history
I-states are predictable. The information transition equation is ηk+1 = fI(ηk, uk). Suppose that initially, η0 = X, which means that any initial state is possible. The
goal is to terminate at 0 ∈ X.
The general expression for a history I-state at stage k is
ηk = (X, u1, . . . , uk−1, y1, . . . , yk). (11.42)
A possible I-state is η5 = (X, −1, 1, 1, −1, 1, 1, 1, 1, 0). Using the nondeterministic I-space from Section 11.2.2, ndet = pow(X), which is uncountably infinite. By looking carefully at the problem, however, it can be seen that most of the nonde- terministic I-states are not reachable. If yk = 0, it is known that xk = 0; hence,
I
Xk(ηk) = {0}. If yk = 1, it will always be the case that Xk(ηk) = {1, 2, . . .} unless
- is observed. If yk = 1, then Xk(ηk) = . . . , 2, 1 . From this a plan, π, can be specified over the three nondeterministic I-states mentioned above. For the first one, π(Xk(ηk)) = uT . For the other two, π(Xk(ηk)) = 1 and π(Xk(ηk)) = 1, respectively. Based on the sign, the plan tries to move toward 0. If different ini- tial conditions are allowed, then more nondeterministic I-states can be reached,
−
− { − − }
but this was not required as the problem was defined. Note that optimal-length solutions are produced by the plan.
The solution can even be implemented with sensor feedback because the action depends only on the current sensor value. Let π : Y → U be defined as
−1 if y = 1
π(y) =
- if y = −1
uT if y = 0.
(11.43)
This provides dramatic memory savings over defining a plan on Ihist. .
The next example provides a simple illustration of solving a problem without ever knowing the current state. This leads to the goal recognizability problem [659] (see Section 12.5.1).
Example 11.16 (Goal Recognizability) Let X = Z, U = 1, 1, uT , and Y = Z. For the state transition equation, xk+1 = f(xk, uk) = xk +uk. Now suppose that a variant of Example 11.7 is used to model sensing: y = h(x, ψ) = x + ψ
{− }
and Ψ = {−5, −4, . . . , 5}. Suppose that once again, η0 = X. In this case, it is
impossible to guarantee that a goal, XG = 0 , is reached because of the goal recognizability problem. The disturbance in the sensor mapping does not allow precise enough state measurements to deduce the precise achievement of the state. If the goal region, XG, is enlarged to 5, 4, . . . , 5 , then the problem can be solved. Due to the disturbance, the nondeterministic I-state is always a subset of a consecutive sequence of 11 states. It is simple to derive a plan that moves this interval until the nondeterministic I-state becomes a subset of XG. When this occurs, then the plan applies uT . In solving this problem, the exact state never
{ }
{− − }
had to be known. .
The problem shown in Figure 11.7 serves two purposes. First, it is an example of sensorless planning [321, 394], which means that there are no observations (see Sections 11.5.4 and 12.5.2). This is an interesting class of problems because it appears that no information can be gained regarding the state. Contrary to intu- ition, it turns out for this example and many others that plans can be designed that estimate the state. The second purpose is to illustrate how the I-space can be dramatically collapsed using the I-map concepts of Section 11.2.1. The stan- dard nondeterministic I-space for this example contains 219 I-states, but it can be mapped to a much smaller derived I-space that contains only a few elements.
Example 11.17 (Moving in an L-shaped Corridor) The state space X for the example shown in Figure 11.7 has 19 states, each of which corresponds to a location on one of the white tiles. For convenience, let each state be denoted by (i, j). There are 10 bottom states, denoted by (1, 1), (2, 1), . . ., (10, 1), and 10 left
states, denoted by (1, 1), (1, 2), . . ., (1, 10). Since (1, 1) is both a bottom state and a left state, it is called the corner state.
10 9
8
7
6
1 2 3 4 5 6 7 8 9 10
5
4
3
2
1
Figure 11.7: An example that involves 19 states. There are no sensor observations; however, actions can be chosen that enable the state to be estimated. The example provides an illustration of reducing the I-space via I-maps.
There are no sensor observations for this problem. However, nature interferes with the state transitions, which leads to a form of nondeterministic uncertainty. If an action is applied that tries to take one step, nature may cause two or three steps to be taken. This can be modeled as follows. Let
U = {(1, 0), (−1, 0), (0, 1), (0, −1)} (11.44)
and let Θ = 1, 2, 3 . The state transition equation is defined as f(x, u, θ) = x+θu whenever such motion is not blocked (by hitting a dead end). For example, if x = (5, 1), u = ( 1, 0), and θ = 2, then the resulting next state is (5, 1) + 2( 1, 0) = (3, 1). If blocking is possible, the state changes as much as possible until it becomes blocked. Due to blocking, it is even possible that f(x, u, θ) = x.
{ }
− −
Since there are no sensor observations, the history I-state at stage k is
ηk = (η0, u1, . . . , uk−1). (11.45)
Now use the nondeterministic I-space, Indet = pow(X). The initial state, x1 = (10, 1), is given, which means that the initial I-state, η0, is (10, 1) . The goal is to arrive at the I-state, (1, 10) , which means that the task is to design a plan that moves from the lower right to the upper left.
{ }
{ }
With perfect information, this would be trivial; however, without sensors
the uncertainty may grow very quickly. For example, after applying the ac- tion u1 = (−1, 0) from the initial state, the nondeterministic I-state becomes
{(7, 1), (8, 1), (9, 1)}. After u2 = (−1, 0) it becomes {(4, 1), . . . , (8, 1)}. A nice
feature of this problem, however, is that uncertainty can be reduced without sens- ing. Suppose that for 100 stages, we repeatedly apply uk = ( 1, 0). What is the resulting I-state? As the corner state is approached, the uncertainty is reduced be- cause the state cannot be further changed by nature. It is known that each action,
−
uk = (−1, 0), decreases the X coordinate by at least one each time. Therefore,
after nine or more stages, it is known that ηk = (1, 1) . Once this is known, then the action (0, 1) can be applied. This will again increase uncertainty as the state moves through the set of left states. If (0, 1) is applied nine or more times, then it is known for certain that xk = (1, 10), which is the required goal state.
{ }
A successful plan has now been obtained: 1) Apply (−1, 0) for nine stages, 2)
then apply (0, 1) for nine stages. This plan could be defined over Indet; however, it is simpler to use the I-map κstage from Example 11.12 to define a plan as π : N → U. For k such that 1 ≤ k ≤ 9, π(k) = (−1, 0). For k such that 10 ≤ k ≤ 18,
π(k) = (0, 1). For k > 18, π(k) = uT . Note that the plan works even if the initial condition is any subset of X. From this point onward, assume that any subset may be given as the initial condition.
Some alternative plans will now be considered by making other derived I-spaces from Indet. Let κ3 be an I-map from Indet to a set I3 of three derived I-states. Let I3 = {g, l, a}, in which g denotes “goal,” l denotes “left,” and a denotes “any.”
The I-map, κ3, is
g if X(η) = {(1, 10)}
X(η) =
l if X(η) is a subset of the set of left states
a otherwise.
(11.46)
Based on the successful plan described so far, a solution on I3 is defined as π(g) =
uT , π(l) = (0, 1), and π(a) = ( 1, 0). This plan is simpler to represent than the one on N; however, there is one drawback. The I-map κ3 is not sufficient. This implies that more of the nondeterministic I-state needs to be maintained
−
during execution. Otherwise, there is no way to know when certain transitions occur. For example, if ( 1, 0) is applied from a, how can the robot determine whether l or a is reached in the next stage? This can be easily determined from the nondeterministic I-state.
−
To address this problem, consider a new I-map, κ19 : ndet 19, which is sufficient. There are 19 derived I-states, which include g as defined previously,
I → I
li for 1 ≤ j ≤ 9, and ai for 2 ≤ i ≤ 10. The I-map is defined as κ19(X(η)) = g if X(η) = {(1, 10)}. Otherwise, κ19(X(η)) = li for the smallest value of i such that X(η) is a subset of {(1, i), . . . , (1, 10)}. If there is no such value for
i, then κ19(X(η)) = ai, for the smallest value of i such that X(η) is a subset of {(1, 1), . . . , (1, 10), (2, 1), . . . , (i, 1)}. Now the plan is defined as π(g) = uT ,
π(li) = (0, 1), and π(ai) = ( 1, 0). Although the plan is larger, the robot does not need to represent the full nondeterministic I-state during execution. The correct
−
transitions occur. For example, if uk = (−1, 0) is applied at a5, then a4 is obtained. If u = (−1, 0) is applied at a2, then l1 is obtained. From there, u = (0, 1) is applied
to yield l2. These actions can be repeated until eventually l9 and g are reached.
a
ǫ 1
0
b c 0
0,1
- (b)
Figure 11.8: (a) An nondeterministic finite automaton (NFA) is a state machine that reads an input string and decides whether to accept it. (b) A graphical depiction of an NFA.
The resulting plan, however, is not an improvement over the original open-loop one. .
- Nondeterministic Finite Automata
An interesting connection lies between the ideas of this chapter and the theory of finite automata, which is part of the theory of computation (see [462, 891]). In Section 2.1, it was mentioned that determining whether there exists some string that is accepted by a DFA is equivalent to a discrete feasible planning problem. If unpredictability is introduced into the model, then a nondeterministic finite au- tomaton (NFA) is obtained, as depicted in Figure 11.8. This represents one of the simplest examples of nondeterminism in theoretical computer science. Such non- deterministic models serve as a powerful tool for defining models of computation and their associated complexity classes. It turns out that these models give rise to interesting examples of information spaces.
An NFA is typically described using a directed graph as shown in Figure 11.8b, and is considered as a special kind of finite state machine. Each vertex of the graph represents a state, and edges represent possible transitions. An input string of finite length is read by the machine. Typically, the input string is a binary sequence of 0’s and 1’s. The initial state is designated by an inward arrow that has no source vertex, as shown pointing into state a in Figure 11.8b. The machine starts in this state and reads the first symbol of the input string. Based on its value, it makes appropriate transitions. For a DFA, the next state must be specified for each of the two inputs 0 and 1 from each state. From a state in an NFA, there may be any number of outgoing edges (including zero) that represent the response to a single symbol. For example, there are two outgoing edges if 0 is read from state c (the arrow from c to b actually corresponds to two directed edges, one for 0 and the other for 1). There are also edges designated with a special ǫ symbol. If a state has an outgoing ǫ, the state may immediately transition along the edge
without reading another symbol. This may be iterated any number of times, for any outgoing ǫ edges that may be encountered, without reading the next input symbol. The nondeterminism arises from the fact that there are multiple choices for possible next states due to multiple edges for the same input and ǫ transitions. There is no sensor that indicates which state is actually chosen.
The interpretation often given in the theory of computation is that when there are multiple choices, the machine clones itself and one copy runs each choice. It is like having multiple universes in which each different possible action of nature is occurring simultaneously. If there are no outgoing edges for a certain combination of state and input, then the clone dies. Any states that are depicted with a double boundary, such as state a in Figure 11.8, are called accept states. When the input string ends, the NFA is said to accept the input string if there exists at least one alternate universe in which the final machine state is an accept state.
The formulation usually given for NFAs seems very close to Formulation 2.1 for discrete feasible planning. Here is a typical NFA formulation [891], which formalizes the ideas depicted in Figure 11.8:
Formulation 11.2 (Nondeterministic Finite Automaton)
- A finite state space X.
- A finite alphabet Σ which represents the possible input symbols. Let Σǫ = Σ ∪ {ǫ}.
- A transition function, δ : X Σǫ pow(X). For each state and symbol, a set of outgoing edges is specified by indicating the states that are reached.
× →
- A start state x0 ∈ X.
- A set A ⊆ X of accept states.
Example 11.18 (Three-State NFA) The example in Figure 11.8 can be ex- pressed using Formulation 11.2. The components are X = {a, b, c}, Σ = {0, 1}, Σǫ = {0, 1, ǫ}, x0 = a, and A = {a}. The state transition equation requires the specification of a state for every x ∈ X and symbol in Σǫ:
0 1 ǫ
a ∅ {c} {b}
b {a} ∅ ∅
c {b, c} {b} ∅ .
(11.47)
.
Now consider reformulating the NFA and its acceptance of strings as a kind of planning problem. An input string can be considered as a plan that uses no form of feedback; it is a fixed sequence of actions. The feasible planning problem is to determine whether any string exists that is accepted by the NFA. Since there is
no feedback, there is no sensing model. The initial state is known, but subsequent states cannot be measured. The history I-state ηk at stage k reduces to ηk = u˜k−1 = (u1, . . . , uk−1), the action history. The nondeterminism can be accounted for by defining nature actions that interfere with the state transitions. This results in the following formulation, which is described in terms of Formulation 11.2.
Formulation 11.3 (An NFA Planning Problem)
- A finite state space X.
- An action space U = Σ ∪ {uT }.
- A state transition function, F : X U pow(X). For each state and symbol, a set of outgoing edges is specified by indicating the states that are reached.
× →
- An initial state x0 = x1.
- A set XG = A of goal states.
The history I-space Ihist is defined using
Ik = U˜k−1 (11.48)
for each k N and taking the union as defined in (11.19). Assume that the initial state of the NFA is always fixed; therefore, it does not appear in the definition of
∈
hist.
I
For expressing the planning task, it is best to use the nondeterministic I-space
Indet = pow(X) from Section 11.2.2. Thus, each nondeterministic I-state, X(η) ∈
ndet, is the subset of X that corresponds to the possible current states of the machine. The initial condition could be any subset of X because ǫ transitions can occur from x1. Subsequent nondeterministic I-states follow directly from F. The task is to compute a plan of the form
I
π = (u1, u2, . . . , uK , uT ), (11.49)
which results in XK+1(ηK+1) ∈ Indet with XK+1(ηK+1) ∩XG ∅. This means that
at least one possible state of the NFA must lie in XG after the termination action is applied. This condition is much weaker than a typical planning requirement. Using
worst-case analysis, a typical requirement would instead be that every possible NFA state lies in XG.
The problem given in Formulation 11.3 is not precisely a specialization of For- mulation 11.1 because of the state transition function. For convenience, F was directly defined, instead of explicitly requiring that f be defined in terms of na- ture actions, Θ(x, u), which in this context depend on both x and u for an NFA. There is one other small issue regarding this formulation. In the planning prob- lems considered in this book, it is always assumed that there is a current state.
For an NFA, it was already mentioned that if there are no outgoing edges for a certain input, then the clone of the machine dies. This means that potential cur- rent states cease to exist. It is even possible that every clone dies, which leaves no current state for the machine. This can be easily enabled by directly defining F; however, planning problems must always have a current state. To resolve this issue, we could augment X in Formulation 11.3 to include an extra dead state, which signifies the death of a clone when there are no outgoing edges. A dead state can never lie in XG, and once a transition to a dead state occurs, the state remains dead for all time. In this section, the state space will not be augmented in this way; however, it is important to note that the NFA formulation can easily be made consistent with Formulation 11.3.
The planning model can now be compared to the standard use of NFAs in the theory of computation. A language of an NFA is defined to be the set of all input strings that it accepts. The planning problem formulated here determines whether there exists a string (which is a plan that ends with termination actions) that is accepted by the NFA. Equivalently, a planning algorithm determines whether the language of an NFA is empty. Constructing the set of all successful plans is equivalent to determining the language of the NFA.
Example 11.19 (Planning for the Three-State NFA) The example in Fig- ure 11.8 can be expressed using Formulation 11.2. The components are X = a, b, c , Σ = 0, 1 , Σǫ = 0, 1, ǫ , x0 = a, and F = a . The function F(x, u) is defined as
{ } { } { } { }
(11.50)
The nondeterministic I-space is
X(η) = {∅, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}, (11.51)
in which the initial condition is η0 = {a, b} because an ǫ transition occurs imme- diately from a. An example plan that solves the problem is (1, 0, 0, uT , . . .). This corresponds to sending an input string “100” through the NFA depicted in Figure
11.8. The sequence of nondeterministic I-states obtained during the execution of the plan is
1 0 0 uT
{a, b} → {c} → {b, c} → {a, b, c} → {a, b, c}. (11.52)
.
A basic theorem from the theory of finite automata states that for the set of strings accepted by an NFA, there exists a DFA (deterministic) that accepts the same set [891]. This is proved by constructing a DFA directly from the nondeter- ministic I-space. Each nondeterministic I-state can be considered as a state of a DFA. Thus, the DFA has 2n states, if the original NFA has n states. The state
transitions of the DFA are derived directly from the transitions between nondeter- ministic I-states. When an input (or action) is given, then a transition occurs from one subset of X to another. A transition is made between the two corresponding states in the DFA. This construction is an interesting example of how the I-space is a new state space that arises when the states of the original state space are unknown. Even though the I-space is usually larger than the original state space, its states are always known. Therefore, the behavior appears the same as in the case of perfect state information. This idea is very general and may be applied to many problems beyond DFAs and NFAs; see Section 12.1.2
- The Probabilistic Case: POMDPs
Example 11.14 generalizes nicely to the case of n states. In operations research and artificial intelligence literature, these are generally referred to as partially observable Markov decision processes or POMDPs (pronounced “pom dee peez”).
For the case of three states, the probabilistic I-space, Iprob, is a 2-simplex embedded in R3. In general, if |X| = n, then Iprob is an (n−1)-simplex embedded in Rn. The coordinates of a point are expressed as (p0, p1, . . . , pn−1) ∈ Rn. By the axioms of
probability, p0 + + pn−1 = 1, which implies that prob is an (n 1)-dimensional subspace of Rn. The vertices of the simplex correspond to the n cases in which the
· · · I −
state is known; hence, their coordinates are (0, 0, . . . , 0, 1), (0, 0, . . . , 0, 1, 0), . . .,
(1, 0, . . . , 0). For convenience, the simplex can be projected into Rn−1 by specifying a point in Rn−1 for which p1 +· · ·+pn−2 ≤ 1 and then choosing the final coordinate
as pn−1 = 1 p1 + + pn−2. Section 12.1.3 presents algorithms for planning for POMDPs.
− · · ·