Environment Uncertainty and Mapping
After reading Section 12.2, you may have already wondered what happens if the map is not given. This leads to a fascinating set of problems that are fundamental to robotics. If the state represents configuration, then the I-space allows tasks to be solved without knowing the exact configuration. If, however, the state also represents the environment, then the I-space allows tasks to be solved without even having a complete representation of the environment! This is obviously very powerful because building a representation of a robot’s environment is very costly and subject to errors. Furthermore, it is likely to become quickly outdated.
- Grid Problems
To gain a clear understanding of the issues, it will once again be helpful to consider discrete problems. The discussion here is a continuation of Section 12.2.1. In that section, the state represented a position, p, and a direction, d. Now suppose that the state is represented as (p, d, e), in which e represents the particular environment that contains the robot. This will require defining a space of environments, which is rarely represented explicitly. It is often expressed as a constraint on the types of environments that can exist. For example, the set of environments could be defined as all connected 2D grid-planning problems. The set of simply connected grid-planning problems is even further constrained.
One question immediately arises: When are two maps of an environment equiv- alent? Recall the maps shown in Figures 12.2a and 12.3b. The map in Figure 12.3b appears the same for every 90-degree rotation; however, the map in Figure 12.2a appears to be different. Even if it appears different, it should still be the same environment, right? Imagine mapping a remote island without having a compass that indicates the direction to the north pole. An orientation (which way is up?)
for the map can be chosen arbitrarily without any harm. If a map of the environ- ment is made by “drawing” on R2, it should seem that two maps are equivalent if a
transformation in SE(2) (i.e., translation and rotation) can be applied to overlay one perfectly on top of the other.
When defining an environment space, it is important to clearly define what it means for two environments to be equivalent. For example, if we are required to build a map by exploration, is it required to also provide the exact translation and orientation? This may or may not be required, but it is important to specify this in the problem description. Thus, we will allow any possibility: If the maps only differ by a transformation in SE(2), they may or may not be defined as equivalent, depending on the application.
To consider some examples, it will be convenient to define some finite or infinite sets of environments. Suppose that planning on a 2D grid is once again considered. In this section, assume that each grid point p has integer coordinates (i, j) Z Z,
∈ ×
as defined in Section 2.1. Let E denote a set of environments. Once again, there
are four possible directions for the robot to face; let D denote this set. The state space is
X = Z × Z × D × E. (12.23)
Assume in general that an environment, e E, is specified by indicating a subset of Z Z that corresponds to the positions of all of the white tiles on which the robot can be placed. All other tiles are black, which means that they are obstacles. If any subset of Z Z is allowed, then E = pow(Z Z). This includes many useless maps, such as a checkerboard that spans the entire plane; this motivates
×
∈
× ×
some restrictions on E. For example, E can be restricted to be the subset of pow(Z Z) that corresponds to all maps that include a white tile at the origin, (0, 0), and for which all other white tiles are reachable from it and lie within a
×
bounded region.
Examples will be given shortly, but first think about the kinds of problems that can be formulated:
- Map building: The task is to visit every reachable tile and construct a map. Depending on how E is defined, this may identify a particular environment in E or a set of environments that are consistent with the exploration. This may also be referred to as simultaneous localization and mapping, or SLAM, because constructing a complete map usually implies that the robot position and orientation are eventually known [483, 970]. Thus, the complete state, x X, as given in (12.23) is determined by the map-building process. For the grid problem considered here, this point is trivial, but the problem be- comes more difficult for the case of probabilistic uncertainty in a continuous environment. See Section 12.3.5 for this case.
∈
- Determining the environment: Imagine that a robot is placed into a building at random and then is switched on. The robot is told that it is in one of a fixed (i.e., 10) number of buildings. It must move to determine which one. As the number of possible environments is increased, the problem appears to be more like map building. In fact, map building can be consid-
_e_1 _e_2 _e_3
_e_4 _e_5 _e_6
Figure 12.12: A set of possible 2D grid environments. In each case, the “up” direction represents north and the white circle represents the origin, p = (0, 0).
ered as a special case in which little or no constraints are initially imposed on the set of possible environments.
- Navigation: In this case, a goal position is to be reached, even though the robot has no map. The location of the goal relative to the robot can be specified through a sensor. The robot is allowed to solve this problem without fully exploring the environment. Thus, the final nondeterministic I-state after solving the task could contain numerous possible environments. Only a part of the environment is needed to solve the problem.
- Searching: In this case, a goal state can only be identified when it is reached (or detected by a short-range sensor). There are no additional sensors to help in the search. The environment must be systematically explored, but the search may terminate early if the goal is found. A map does not necessarily have to be constructed. Searching can be extended to pursuit-evasion, which is covered in Section 12.4.
Simple examples of determining the environment and navigation will now be given.
_e_7 _e_8 _e_9
Figure 12.13: Add these environments to the set depicted in Figure 12.12. Each is essentially equivalent to an environment already given and generally does not affect the planning problem.
Example 12.4 (Determining the Environment) Suppose that the robot is told that it was placed into one of the environments shown in Figure 12.12. Let the initial position of the robot be (0, 0), which is shown as a white circle. Let the initial direction be east and the environment be e3. These facts are unknown to the robot. Use the same actions and state transition model as in Section 12.2.1. The current state space includes the environment, but the environment never changes. Only information regarding which environment the robot is in will change. The sensing model again only indicates whether the robot has changed its position from the application of the last action.
The initial condition is X, because any position, orientation, and environ- ment are possible. Some nondeterministic I-states will now be determined. Let (u1, u2, u3) = (F, R, R). From this sequence of actions, the sensor observations (y2, y3, y4) report that the robot has not yet changed its position. The orientation was changed to west, but this is not known to the robot (it does, however, know that it is now pointing in the opposite direction with respect to its initial orienta- tion). What can now be inferred? The robot has discovered that it is on a tile that is bounded on three sides by obstacles. This means that e1 and e6 are ruled out as possible environments. In the remaining four environments, the robot deduces that it must be on one of the end tiles: 1) the upper left of e2, 2) the upper right of e2, 3) the bottom of e3, 4) the rightmost of e3, 5) the top of e4, 6) the lower left of e5, or 7) the upper left of e5. It can also make strong inferences regarding its orientation. It even knows that the action u4 = R would cause it to move because all four directions cannot be blocked.
Apply (u4, u5) = (R, F). The robot should move two times, to arrive in the upper left of e3 facing north. In this case, any of e2, e3, e4, or e5 are still possible; however, it now knows that its position at stage 4 could not have been in the upper left of e5. If the robot is in e3, it knows that it must be in the upper left, but it still does not know its orientation (it could be north or west). The robot could also be in the lower left or lower right of e2.
Now let (u6, u7) = (R, F), which moves the robot twice. At this point, e4 and
e5 are ruled out, and the set of possible environments is {e2, e3} (one orientation
from e2 is also ruled out). If u8 = R is applied, then the sensor observation y9 reports that the robot does not move. This rules out e2. Finally, the robot can deduce that it is in the upper right of e3 facing south. It can also deduce that in its initial state it was in the lower left of e3 facing east. Thus, all of the uncertainty has been eliminated through the construction of the nondeterministic I-states.
Now consider adding the environments shown in Figure 12.13 to the set and starting the problem over again. Environment e7 is identical to e1, except that the origin is moved, and e8 is identical to e2, except that it is rotated by 180 degrees. In these two cases, there exist no inputs that enable the robot to distinguish between e1 and e7 or between e2 and e8. It is reasonable to declare these environments to be pairwise equivalent. The only distinction between them is the way that the map is drawn.
If the robot executes the same action sequence as given previously, then it will also not be able to distinguish e3 from e9. It is impossible for the robot to deduce whether there is a white tile somewhere that is not reachable. A general envi- ronment space may include such variations, and this will prevent the robot from knowing the precise environment. However, this usually presents no additional difficulty in solving a planning problem. Therefore, it might make sense to declare e3 and e9 to be equivalent. The fact that tasks can be achieved without knowing the precise environment is very important. In a sense, the environment is observed at some “resolution” that is sufficient for solving a problem; further details beyond that are unimportant. Since the robot can ignore unnecessary details, cheaper and
more reliable systems can often be built. .
Example 12.5 (Reaching a Goal State) Suppose once again that the set of environments shown in Figure 12.12 is given. This time, also assume that the position p = (0, 0) and orientation east are known. The environment is e4, but it is unknown to the robot. The task is to reach the position (2, 0), which means that the robot must move two tiles to the east. The plan (u1, u2) = (F, F) achieves the goal without providing much information about the environment. After u1 = F is applied, it is known that the environment is not e3; however, after this, no additional information is gathered regarding the environment because it is not relevant to solving the problem. If the goal had been to reach (2, 2), then more information would be obtained regarding the environment. For example, if the
plan is (F, L, R, L), then it is known that the environment is e6. .
Algorithms for determining the environment To determine the environ- ment (which includes the map-building problem), it is sufficient to reach and re- member all of the tiles. If the robot must determine its environment from a small set of possibilities, an optimal worst-case plan can be precomputed. This can be
computed on X- = Indet by using value iteration or the nondeterministic version
of Dijkstra’s algorithm from Section 10.2.3. When the robot is dropped into the
environment, it applies the optimal plan to deduce its position, orientation, and environment. If the set of possible environments is too large (possibly infinite), then a lazy approach is most suitable. This includes the map-building problem, for which there may be little or no assumptions about the environment. A lazy ap- proach to the map-building problem simply has to ensure that every tile is visited. One additional concern may be to minimize the amount of reroute paths, which were mentioned in Section 12.2.1. A simple algorithm that solves the problem while avoiding excessive rerouting is depth-first search, from Section 2.2.2.
Algorithms for navigation The navigation task is to reach a prescribed goal, even though no environment map is given. It is assumed that the goal is expressed in coordinates relative to the robot’s initial position and orientation (these are odometric coordinates). If the goal can only be identified when the robot is on the goal tile, then searching is required, which is covered next. As seen in Example 12.5, the robot is not required to learn the whole environment to solve a naviga- tion problem. The search algorithms of Section 2.2 may be applied. For example, the A∗ method will find the optimal route to the goal, and a reasonable heuris- tic underestimate of the cost-to-go can be defined by assuming that all tiles are empty. Although such a method will work, the reroute costs are not being taken into account. Thus, the optimal path eventually computed by A∗ may be mean- ingless unless other robots will later use this information to reach the same goal in the same environment. For the unfortunate robot that went first, a substantial amount of exploration steps might have been wasted because A∗ is not designed for exploration during execution. Even though the search algorithms in Section
- assumed that the search graph was gradually revealed during execution, as op- posed to being given in advance, they allow the current state in the search to jump around arbitrarily. In the current setting, this would require teleporting the robot to different parts of the environment. Section 12.3.2 covers a navigation algorithm that extends Dijkstra’s algorithm to work correctly when the costs are discovered during execution. It can be nicely applied to the grid-based navigation problem presented in this section, even when the environment is initially unknown.
Algorithms for maze searching A fascinating example of using an I-map to dramatically reduce the I-space was given a long time ago by Blum and Kozen [119]. Map building requires space that is linear in the number of tiles; however, it is possible to ensure that the environment has been systematically searched using much less space. For 2D grid environments, the searching problem can be solved without maintaining a complete map. It must systematically visit every tile; however, this does not imply that it must remember all of the places that it has visited. It is important only to ensure that the robot does not become trapped in an infinite loop before covering all tiles. It was shown in [119] that any maze can be searched using space that is only logarithmic in the number of
tiles. This implies that many different environments have the same representation in the machine. Essentially, an I-map was developed that severely collapses ndet down to a smaller derived I-space.
I
Assume that the robot motion model is the same as has been given so far in this section; however, no map of the environment is initially given. Whatever direction the robot is facing initially can be declared to be north without any harm. It is assumed that any planar 2D grid is possible; therefore, there are identical maps for each of the four orientations. The north direction of one of these maps might be mislabeled by arbitrarily declaring the initial direction to be north, but this is not critical for the coming approach. It is assumed that the robot is a finite automaton that carries a binary counter. The counter will be needed because it can store values that are arbitrarily large, which is not possible for the automaton alone.
To keep the robot from wandering around in circles forever, two important pieces of information need to be maintained:
- The latitude, which is the number of tiles in the north direction from the robot’s initial position.
- When a loop path is executed, it needs to know its orientation, which means whether the loop travels clockwise or counterclockwise.
Both of these can be computed from the history I-state, which takes the same form as in (12.12), except in the current setting, X is given by (12.23) and E is the set of all bounded environments (bounded means that the white tiles can be contained in a large rectangle). From the history I-state, let u˜′k denote the subsequence of the action history that corresponds to actions that produce motions. The latitude, l(u˜′k), can be computed by counting the number of actions that produce motions in the north direction and subtracting those that produce motions in the south direction. The loop orientation can be determined by angular odometry (which is equivalent to having a compass in this problem [286]). Let the value r(u˜′k) give
the number of right turns in u˜k′ minus the number of left turns in u˜k′ . Note that
making four rights yields a clockwise loop and r(u˜′k) = 4. Making four lefts yields a counterclockwise loop and r(u˜′k) = −4. In general, it can be shown that for
any loop path that does not intersect itself, either r(u˜′k) = 4, which means that it travels clockwise, or r(u˜′k) = −4, which means that it travels counterclockwise.
It was stated that a finite automaton and a binary counter are needed. The counter is used to keep track of l(u˜′k) as the robot moves. It turns out that an additional counter is not needed to measure the angular odometry because the robot can instead perform mod-3 arithmetic when counting right and left turns. If
the result is r(u˜′k) = 1 mod 3 after forming a loop, then the robot traveled coun- terclockwise. If the result is r(u˜′k) = 2 mod 3, then the robot traveled clockwise. This observation avoids using an unlimited number of bits, contrary to the case of
maintaining latitude. The construction so far can be viewed as part of an I-map that maps the history I-states into a much smaller derived I-space.
The plan will be described in terms of the example shown in Figure 12.14. For any environment, there are obstacles in the interior (this example has six), and there is an outer boundary. Using the latitude and orientation information, a unique point can be determined on the boundary of each obstacle and on the outer boundary. The unique point is defined as the westernmost vertex among the southernmost vertices of the obstacle. These are shown by small discs in Figure
12.15. By using the latitude and orientation information, the unique point can always be found (see Exercise 4).
To solve the problem, the robot moves to a boundary and traverses it by performing wall following. The robot can use its sensing information to move in a way that keeps the wall to its left. Assuming that the robot can always detect a unique point along the boundary, it can imagine that the obstacles are connected as shown in Figure 12.15. There is a fictitious thin obstacle that extends southward from each unique point. This connects the obstacles together in a way that appears to be an extension of the outer boundary. In other words, imagine that the obstacles are protruding from the walls, as opposed to “floating” in the interior. By refusing to cross these fictitious obstacles, the robot moves around the boundary of all obstacles in a single closed-loop path. The strategy so far does not ensure that every cell will be visited. Therefore, the modification shown in Figure 12.16 is needed to ensure that every tile is visited by zig-zag motions. It is interesting to compare the solution to the spanning-tree coverage planning approach in Section 7.6, which assumed a complete map was given and the goal was to optimize the distance traveled.
If there is some special object in the environment that can be detected when reached by the robot, then the given strategy is always guaranteed to find it, even though at the end, it does not even have a map!
The resulting approach can be considered as an information-feedback plan on the I-space. In this sense, Blum and Kozen were the “planner” that found a plan that solves any problem. Alternative plans do not need to be computed from the problem data because the plan can handle all possible environments without modification. This is the power of working directly with an I-space over the set of environments, as opposed to requiring state estimation.
- Stentz’s Algorithm (D∗)
Imagine exploring an unknown planet using a robotic vehicle. The robot moves along the rugged terrain while using a range scanner to make precise measurements of the ground in its vicinity. As the robot moves, it may discover that some parts were easier to traverse than it originally thought. In other cases, it might realize that some direction it was intending to go is impassable due to a large bolder or a ravine. If the goal is to arrive at some specified coordinates, this problem can be viewed as a navigation problem in an unknown environment. The resulting solution is a lazy approach, as discussed in Section 12.2.1.
This section presents Stentz’s algorithm [913], which has been used in many
Figure 12.14: An example that has six obstacles.
Figure 12.15: The obstacles are connected together by extending a thin obstacle downward from their unique points.
outdoor vehicle navigation applications, such as the vehicle shown in Figure 12.17. The algorithm can be considered as a dynamic version of the backward variant of Dijkstra’s algorithm. Thus, it maintains cost-to-go values, and the search grows outward from the goal, as opposed to cost-to-come values from xI in the version of Dijkstra’s algorithm in Section 2.3.3. The method applies to any optimal planning problem. In terms of the state transition graph, it is assumed that the costs of edge transitions are unknown (equivalently, each cost l(x, u) is unknown). In the navigation problem, a positive cost indicates the difficulty of traveling from state x to state x′ = f(x, u).
To work with a concrete problem, imagine that a planet surface is partitioned into a high-resolution grid. The state space is simply a bounded set of grid tiles; hence, X Z Z. Each grid tile is assigned a positive, real value, c(x), that
⊆ ×
indicates the difficulty of its traversal. The actions U(x) at each grid point can
be chosen using standard grid neighbors (e.g., four-neighbors or eight-neighbors). This now defines a state transition graph over X. From any x′ X and u′ U(x′) such that x = f(x′, u′), the cost term is assigned using c as l(x′, u′) = c(x). This model is a generalization of the grid in Section 12.3.1, in which the tiles were
∈ ∈
- (b)
Figure 12.16: (a) A clockwise loop produced by wall following. (b) An alternative loop that visits all of the tiles in the interior.
Figure 12.17: The Automated Cross-Country Unmanned Vehicle (XUV) is equipped with laser radar and other sensors, and uses Stentz’s algorithm to navi- gate (courtesy of General Dynamics Robotic Systems).
either empty or occupied; here any positive real value is allowed. In the coming explanation, the costs may be more general than what is permitted by starting from c(x), and the state transition graph does not need to be derived from a grid. Some initial values are assigned arbitrarily for all l(x, u). For example, in the planetary exploration application, the cost of traversing a level, unobstructed surface may be uniformly assumed.
The task is to navigate to some goal state, xG. The method works by initially constructing a feedback plan, π, on a subset of X that includes both xI and xG. The plan, π, is computed by iteratively applying the procedure in Figure 12.18 until the optimal cost-to-go is known at xI . A priority queue, Q, is maintained as in Dijkstra’s algorithm; however, Stentz’s algorithm allows the costs of elements in Q to be modified due to information sensed during execution. Let Gbest(x) denote the lowest cost-to-go associated with x during the time it spends in Q. Assume that Q is sorted according to Gbest. Let Gcur(x) denote its current cost-to-go
value, which may actually be more than Gbest(x) if some cost updates caused it to increase. Suppose that some u ∈ U(x) can be applied to reach a state x′ = f(x, u).
Let Gvia(x, x′) denote the cost-to-go from x by traveling via x′,
Gvia(x, x′) = Gcur(x′) + l(x, u). (12.24) If Gvia(x, x′) < Gcur(x), then it indicates that Gcur(x) could be reduced. If
Gcur(x′) Gbest(x), then it is furthermore known that Gcur(x′) is optimal. If both of these conditions are met, then Gcur(x) is updated to Gvia(x, x′).
≤
After the iterations of Figure 12.18 finish, the robot executes π, which generates a sequence of visited states. Let xk denote the current state during execution. If it is discovered that if π(xk) = uk would be applied, the received cost would not match the cost l(xk, uk) in the current model, then the costs need to be updated. More generally, the robot may have to be able to update costs within a region around xk that corresponds to the sensor field of view. For the description below, assume that an update, l(xk, uk), is obtained for xk only (the more general case is handled similarly). First, l(xk, uk) is updated to the newly measured value. If xk happened to be dead (visited, but no longer in Q), then it is inserted again into
Q, with cost Gcur(xk). The steps in Figure 12.18 are performed until Gcur(xk) ≤
Gbest(x) for all x Q. Following this, the plan execution continues until either the goal is reached or another cost mismatch is discovered. At any time during execution, the robot motions are optimal given the current information about the costs [913].
∈
Figure 12.19 illustrates the execution of the algorithm. Figure 12.19a shows a synthetic terrain that was generated by a stochastic fractal. Darker gray values indicate higher cost. In the center, very costly terrain acts as a barrier, for which an escape route exists in the downward direction. The initial state is the middle of the left edge of the environment, and the goal state is the right edge. The robot initially plans a straight-line path and then incrementally updates the path in each step as it moves. In Figure 12.19b, the robot has encountered the costly center and begins to search for a way around. Finally, the goal is reached, as shown in Figure 12.19c. The executed path is actually the result of executing a series of optimal paths, each of which is based on the known information at the time a single action is applied.
Interpretation in terms of I-spaces An alternative formulation will now be given to help understand the connection to I-spaces of a set of environments. The state space, as defined previously, could instead be defined as a configuration space,
= Z Z. Let q denote a configuration. Suppose that each possible environ- ment corresponds to one way to assign costs to all of the edges in a configuration
C × ∈ C
transition graph. The set E of all possible environments for this problem seems to be all possible ways to assign costs, l(q, u). The state space can now be defined as E, and for each state, x = (q, e) X, the configuration and complete set of costs are specified. Initially, it is guessed that the robot is in some particular e E. If a cost mismatch is discovered, this means that a different environment model is now assumed because a transition cost is different from what was ex- pected. The costs should actually be written as l(x, u) = l(q, e, u), which indicates
∈
C × ∈
STENTZ’S ALGORITHM
- Remove x from Q, which is the state with the lowest Gbest(x) value.
- If Gbest(x) < Gcur(x), then x has increased its value while on Q. If x can improve its cost by traveling via a neighboring state for which the optimal cost-to-go is known, it should do so. Thus, for every u U(x), test for x′ = f(x, u) whether Gvia(x, x′) < Gcur(x) and Gcur(x′) Gbest(x). If so, then update Gcur(x) := Gvia(x, x′) and π(x) := u.
≤
∈
- This and the remaining steps are repeated for each x′ such that there exists u′ U(x′) for which x = f(x′, u′). If x′ is unvisited, then assign π(x′) := u′, and place x′ onto Q with cost Gvia(x′, x).
∈
- If the cost-to-go from x′ appears incorrect because π(x′) = u′ but Gvia(x′, x) = Gcur(x′), then an update is needed. Place x′ onto Q with cost Gvia(x′, x).
/
- If π(x′) = u′ but Gvia(x′, x) < Gcur(x′), then from x′ it is better to travel via x than to use π(x′). If Gcur(x) = Gbest(x), then π(x′) := u′ and x′ is inserted into Q because the optimal cost-to-go for x is known. Otherwise, x (instead of x′) is inserted into Q with its current value, Gcur(x).
/
- One final condition is needed to avoid generating cycles in π. If x′ is dead (visited, but no longer in Q), it may need to be inserted back into Q with
cost Gcur(x′). This must be done if π(x′) /= u′, Gvia(x, x′) < Gcur(x), and
Gcur(x) > Gbest(x)
Figure 12.18: Stentz’s algorithm, often called D∗ (pronounced “dee star”), is a variant of Dijkstra’s algorithm that dynamically updates cost values as the cost terms are learned during execution. The steps here are only one iteration of updating the costs after a removal of a state from Q.
the dependency of the costs on the particular environment is assumed.
A nondeterministic I-state corresponds to a set of possible cost assignments, along with their corresponding configurations. Since the method requires assigning costs that have not yet been observed, it takes a guess and assumes that one particular environment in the nondeterministic I-state is the correct one. As cost mismatches are discovered, it is realized that the previous guess lies outside of the updated nondeterministic I-state. Therefore, the guess is changed to incorporate the new cost information. As this process evolves, the nondeterministic I-state continues to shrink. Note, however, that in the end, the robot may solve the problem while being incorrect about the precise e E. Some tiles are never visited, and their true costs are therefore unknown. A default assumption about
∈
their costs was made to solve the problem; however, the true e ∈ E can only be
- (b) (c)
Figure 12.19: An example of executing Stentz’s algorithm (courtesy of Tony Stentz).
known if all tiles are visited. It is only true that the final assumed default values lie within the final nondeterministic I-state.
- Planning in Unknown Continuous Environments
We now move from discrete to continuous environments but continue to use nonde- terministic uncertainty. First, several bug algorithms [504, 667, 505] are presented, which represent a family of motion plans that solve planning problems using ideas that are related in many ways to the maze exploration ideas of Section 12.3.1. In addition to bug algorithms, the concept of competitive ratios is also briefly covered.
The following model will be used for bug algorithms. Suppose that a point robot is placed into an unknown 2D environment that may contain any finite number of bounded obstacles. It is assumed that the boundary of each obstacle and the outer boundary (if it exists) are piecewise-analytic (here, analytic implies that each piece is smooth and switches its curvature sign only a finite number of times). Thus, the obstacles could be polygons, smooth curves, or some combination of curved and linear parts. The set E of possible environments is overwhelming, but it will be managed by avoiding its explicit construction. The robot configuration is characterized by its position and orientation.
There are two main sensors:2
- A goal sensor indicates the current Euclidean distance to the goal and the direction to the goal, expressed with respect to an absolute “north.”
- A local visibility sensor provides the exact shape of the boundary within a small distance from the robot. The robot must be in contact or almost in
2This is just one possible sensing model. Alternative combinations of sensors may be used, provided that they enable the required motions and decisions to be executed in the coming motion strategies.
Figure 12.20: An illustration of the Bug1 strategy.
contact to observe part of the boundary; otherwise, the sensor provides no useful information.
The goal sensor essentially encodes the robot’s position in polar coordinates (the goal is the origin). Therefore, unique (x, y) coordinates can be assigned to any position visited by the robot. This enables it to incrementally trace out obstacle boundaries that it has already traversed. The local visibility sensor provides just enough information to allow wall-following motions; the range of the sensor is very short so that the robot cannot learn anything more about the structure of the environment.
Some strategies will now be considered for the robot. Each of these can be considered as an information-feedback plan on a nondeterministic I-space.
The Bug1 strategy A strategy called Bug1 was developed in [667] and is illus- trated in Figure 12.20. The execution is as follows:
- Move toward the goal until an obstacle or the goal is encountered. If the goal is reached, then stop.
- Turn left and follow the entire perimeter of the contacted obstacle. Once the full perimeter has been visited, then return to the point at which the goal was closest, and go to Step 1.
Determining that the entire perimeter has been traversed may seem to require a pebble or marker; however, this can be inferred by finding the point at which the goal sensor reading repeats.
Figure 12.21: A bad example for Bug1. The perimeter of each obstacle is spanned one and a half times.
The worst case is conceptually simple to understand. The total distance trav- eled by the robot is no greater than
M
i
d + p , (12.25)
2
i=1
in which d is the Euclidean distance from the initial position to the goal position, pi is the perimeter of the ith obstacle, and M is the number of obstacles. This means that the boundary of each obstacle is followed no more than 3/2 times. Figure 12.21 shows an example in which each obstacle is traversed 3/2 times. This bound relies on the fact that the robot can always recall the shortest path along the boundary to the point from which it needs to leave. This seems reasonable because the robot can infer its distance traveled along the boundary from the goal sensor. If this was not possible, then the 3/2 would have to be replaced by 2 because the robot could nearly traverse the full boundary twice in the worst case.
The Bug2 strategy An alternative to Bug1 is the Bug2 strategy, which is il- lustrated in Figure 12.22. The robot always attempts to move along a line that connects the initial and goal positions. When the robot is on this line, the goal direction will be either the same as from the initial state or it will differ by π radians (if the robot is on the other side of the goal). The first step is the same as for Bug1. In the second step, the robot follows the perimeter only until the line is reached and it is able to move in the direction toward the goal. From there, it goes to Step 1. As expressed so far, it is possible that infinite cycles occur. Therefore, a small modification is needed. The robot remembers the distance to the goal from the last point at which it departed from the boundary, and only departs from the boundary again if the candidate point that is closer to the goal. This is applied iteratively until the goal is reached or it is deemed to be impossible.
For the Bug2 strategy, the total distance traveled is no more than
1 M
-
d + 2 ni_p_i, (12.26)
i=1
Figure 12.22: An illustration of the Bug2 strategy.
xG
Figure 12.23: A bad case for Bug2. Only part of the resulting path is shown. Points from which the robot can leave the boundary are indicated.
in which ni is the number of times the ith obstacle crosses the line segment between the initial position and the goal position. An example that illustrates the trouble caused by the crossings is shown in Figure 12.23.
Using range data The VisBug [666] and TangentBug [505, 592] strategies in- corporate distance measurements made by a range or visibility sensor to improve the efficiency. The TangentBug strategy will be described here and is illustrated in Figure 12.24. Suppose that in addition to the sensors described previously, it is also equipped with a sensor that produces measurements as shown in Figure
- The strategy is as follows:
- Move toward the goal, either through the interior of the space or by wall following, until it is realized that the robot is trapped in a local minimum or the goal is reached. This is similar to the gradient-descent motion of the
- The strategy is as follows:
Figure 12.24: An illustration of the VisBug strategy with unlimited radius.
potential-field planner of Section 5.4.3. If the goal is reached, then stop; otherwise, go to the next step.
- Execute motions along the boundary. First, pick a direction by comparing the previous heading to the goal direction. While moving along the bound- ary, keep track of two distances: df and dr. The distance df is the minimal distance from the goal, observed while traveling along the boundary. The distance dr is the length of the shortest path from the current position to the goal, assuming that the only obstacles are those visible by the range sen- sor. The robot stops following the boundary if dr < df . In this case, go to Step 1. If the robot loops around the entire obstacle without this condition occurring, then the algorithm reports that the goal is not reachable.
A one-parameter family of TangentBug algorithms can be made by setting a depth limit for the range sensor. As the maximum depth is decreased, the robot becomes more short-sighted and performance degrades. It is shown in [505] that the distance traveled is no greater than
M M
d + - pi + - pi_m_i, (12.27)
i=1
i=1
i=1
i=1
in which mi is the number of local minima for the ith obstacle and d is the initial distance to the goal. The bound is taken over M obstacles, which are assumed to intersect a disc of radius d, centered at the goal (all others can be ignored). A variant of the TangentBug, called WedgeBug, was developed in [592] for planetary rovers that have a limited field of view.
Figure 12.25: The candidate motions with respect to the range sensor are the directions in which there is a discontinuity in the depth map. The distances from the robot to the small circles are used to select the desired motion.
Competitive ratios A popular way to evaluate algorithms that utilize different information has emerged from the algorithms community. The idea is to compute a competitive ratio, which places an on-line algorithm in competition with an algorithm that receives more information [674, 892]. The idea can generally be applied to plans. First a cost is formulated, such as the total distance that the robot travels to solve a navigation task. A competitive ratio can then be defined as
max
e∈E
Cost of executing the plan that does not know e in advance.
Cost of executing the plan that knows e in advance
. (12.28)
The maximum is taken over all e E, which is usually an infinite set, as in the case of the bug algorithms. A competitive ratio for a navigation problem can be made by comparing the optimal distance to the total distance traveled by the robot during the execution of the on-line algorithm. Since E is infinite, many plans fail to produce a finite competitive ratio. The bug algorithms, while elegant, represent such an example. Imagine a goal that is very close, but a large obstacle boundary needs to be explored. An obstacle boundary can be made arbitrarily large while making the optimal distance to the goal very small. When evaluated in (12.28), the result over all environments is unbounded. In some contexts, the ratio may still be useful if expressed as a function of the representation. For example, if E is a polygon with n edges, then an O( n) competitive ratio means that (12.28) is
∈
√
√ ∈
bounded over all n by c n for some c R. For competitive ratio analysis in the context of bug algorithms, see [375].
A nice illustration of competitive ratio analysis and issues is provided by the
lost-cow problem [60]. As shown in Figure 12.26a, a short-sighted cow is following
- (b)
Figure 12.26: (a) A lost cow must find its way to the gate, but it does not know in which direction the gate lies. (b) If there is no bound on the distance to the gate, then a doubling spiral strategy works well, producing a competitive ratio of 9.
along an infinite fence and wants to find the gate. This makes a convenient one- dimensional planning problem. If the location of the gate is given, then the cow can reach it by traveling directly. If the cow is told that the gate is exactly distance 1 away, then it can move one unit in one direction and return to try the other direction if the gate has not been found. The competitive ratio in this case (the set of environments corresponds to all gate placements) is 3. What if the cow is told only that the gate is at least distance 1 away? In this case, the best strategy is a spiral search, which is to zig-zag back and forth while iteratively doubling the distance traveled in each direction, as shown in Figure 12.26b. In other words: left one unit, right one unit, left two units, right two units, left four units, and so on. The competitive ratio for this strategy turns out to be 9, which is optimal. This approach resembles iterative deepening, which was covered in Section 2.2.2.
- Optimal Navigation Without a Geometric Model
This section presents gap navigation trees (GNTs) [943, 945], which are a data structure and associated planning algorithm for performing optimal navigation in the continuous environments that were considered in Section 12.3.3. It is assumed in this section that the robot is equipped with a gap sensor, as depicted in Figure
- of Section 11.5.1. At every instant in time, the robot has available one action for each gap that is visible in the gap sensor. If an action is applied, then the robot moves toward the corresponding gap. This can be applied over continuous time, which enables the robot to “chase” a particular gap. The robot has no other sensing information: It has no compass and no ability to measure distances. Therefore, it is impossible to construct a map of the environment that contains metric information.
Assume that the robot is placed into an unknown but simply connected planar environment, X. The GNT can be extended to the case of multiply connected environments; however, in this case there are subtle issues with distinguishability, and it is only possible to guarantee optimality within a homotopy class of paths [944]. By analyzing the way that critical events occur in the gap sensor, a tree rep- resentation can be built that indicates how to move optimally in the environment, even though precise measurements cannot be taken. Since a gap sensor cannot
Robot position
Gap
Gap chasing action
Point of boundary contact
Figure 12.27: A gap-chasing action is applied, which moves the robot straight in the direction of the gap until the boundary is contacted. Once this occurs, a new part of the environment becomes visible.
even measure distances, it may seem unusual that the robot can move along short- est paths without receiving any distance (or metric) information. This will once again illustrate the power of I-spaces.
The appearance of the environment relative to the position of the robot is en- coded as a tree that indicates how the gaps change as the robot moves. It provides the robot with sufficient information to move to any part of the environment while traveling along the shortest path. It is important to understand that the tree does not correspond to some static map of the environment. It expresses how the en- vironment appears relative to the robot and may therefore change as the robot moves in the environment.
The root of the tree represents the gap sensor. For each gap that currently appears in the sensor, an edge is connected to the root. Let these edges be called root edges. Each root edge corresponds to an action that can be applied by the robot. By selecting a root edge, the action moves the robot along a straight line toward that gap. Thus, there is a simple control model that enables the robot to move precisely toward a particular point along the boundary, ∂X, as shown in Figure 12.27.
Let V (x) be the visibility region, which is the set of all points in X that are visible from x. Let X V (x) be called the shadow region, which is the set of all points not visible from x. Let each connected component of the shadow region be called a shadow component. Every gap in the gap sensor corresponds to a line segment in X that touches ∂X in two places (for example, see Figure 11.15a). Each of these segments forms a boundary between the visibility region and a shadow component. If the robot would like to travel to this shadow component, the shortest way is to move directly to the gap. When moving toward a gap, the robot eventually reaches ∂X, at which point a new action must be selected.
\
Critical gap events As the robot moves, several important events can occur in the gap sensor:
- Disappear: A gap disappears because the robot crosses an inflection ray as shown in Figure 12.28. This means that some previous shadow component
Disappear
Appear
- (b)
Figure 12.28: (a) The robot crosses a ray that extends from an inflectional tangent.
- A gap appears or disappears from the gap sensor, depending on the direction.
is now visible.
- Appear: A gap appears because the robot crosses an inflection ray in the opposite direction. This means that a new shadow component exists, which represents a freshly hidden portion of the environment.
- Split: A gap splits into two gaps because the robot crosses a bitangent ray, as shown in Figure 12.29 (this was also shown in Figure 12.5). This means that one shadow component splits into two shadow components.
- Merge: Two gaps merge into one because the robot crosses a bitangent ray in the oppose direction. In this case, two shadow components merge into one.
This is a complete list of possible events, under a general position assumption that precludes environments that cause degeneracies, such as three gaps that merge into one or the appearance of a gap precisely where two other gaps split.
As each of these gap events occurs, it needs to be reflected in the tree. If a gap disappears, as shown in Figure 12.30, then the corresponding edge and vertex are simply removed. If a merge event occurs, then an intermediate vertex is inserted as shown in Figure 12.31. This indicates that if that gap is chased, it will split into the two original gaps. If a split occurs, as shown in Figure 12.32, then the intermediate vertex is removed. The appearance of a gap is an important case, which generates a primitive vertex in the tree, as shown in Figure 12.33. Note that a primitive vertex can never split because chasing it will result in its disappearance.
A simple example will now be considered.
Example 12.6 (Gap Navigation Tree) Suppose that the robot does not know the environment in Figure 12.34. It moves from cells 1 to 7 in order and then re- turns to cell 1. The following sequence of trees occurs: T1, . . ., T7, T6′ , . . ., T1′ , as shown in Figure 12.35. The root vertex is shown as a solid black disc. Vertices
Split
Merge
- (b)
Figure 12.29: (a) The robot crosses a ray that extends from a bitangent. (b) Gaps split or merge, depending on the direction.
a b a b
c
Figure 12.30: If a gap disappears, it is simply removed from the GNT.
a b a b d
c c
c
Figure 12.31: If two gaps merge, an intermediate vertex is inserted into the tree.
that are not known to be primitive are shown as circles; primitive vertices are squares. Note that if any leaf vertex is a circle, then it means that the shadow region of R that is hidden by that gap has not been completely explored. Note that once the robot reaches cell 5, it has seen the whole environment. This occurs precisely when all leaf vertices are primitive. When the robot returns to the first region, the tree is larger because it knows that the region on the right is composed of two smaller regions to the right. If all leaves are squares, this means that the
environment has been completely explored. .
In the example, all of the interesting parts of the environment were explored. From this point onward, all leaf vertices will be primitive vertices because all pos- sible splits have been discovered. In a sense, the environment has been completely learned, at the level of resolution possible with the gap sensor. A simple strategy for exploring the environment is to chase any gaps that themselves are nonprimi- tive leaf vertices or that have children that are nonprimitive leaf vertices. A leaf vertex in the tree can be chased by repeatedly applying actions that chase its corre- sponding gap in the gap sensor. This may cause the tree to incrementally change; however, there is no problem if the action is selected to chase whichever gap hides the desired leaf vertex, as shown in Figure 12.36. Every nonprimitive leaf vertex will either split or disappear. After all nonprimitive leaf vertices have been chased, all possible splits have been performed and only primitive leaves remain. In this case, the environment has been completely learned.
Figure 12.32: If two gaps split, the intermediate vertex is removed.
Figure 12.33: The appearance of a gap results in a primitive vertex, which is denoted by a square.
b
Using the GNTs for optimal navigation Since there is no precise map of the environment, it is impossible to express a goal state using coordinates in R2. However, a goal can be expressed in terms of the vertex that must be chased to
make the state visible. For example, imagine showing the robot an object while it explores. At first, the object is visible, but a gap may appear that hides the object. After several merges, a vertex deep in the tree may correspond to the location from which the object is visible. The robot can navigate back to the object optimally by chasing the vertex that first hid the object by its appearance. Once this vertex and its corresponding gap disappear, the object becomes visible. At this time the robot can move straight toward the object (assuming an additional sensor that indicates the direction of the object). It was argued in [945] that when the robot chases a vertex in the GNT, it precisely follows the paths of the shortest-
Figure 12.34: A simple environment for illustrating the gap navigation tree.
path roadmap, which was introduced in Section 6.2.4. Each pair of successive gap events corresponds to the traversal of a bitangent edge.
I-space interpretation In terms of an I-space over the set of environments, the GNT considers large sets of environments to be equivalent. This means that an I-map was constructed on which the derived I-space is the set of possible GNTs. Under this I-map, many environments correspond to the same GNT. Due to this, the robot can accomplish interesting tasks without requesting further informa- tion. For example, if two environments differ only by rotation or scale, the GNT representations are identical. Surprisingly, the robot does not even need to be con- cerned about whether the environment boundary is polygonal or curved. The only important concern is how the gaps events occur. For example, the environments in Figure 12.37 all produce the same GNTs and are therefore indistinguishable to the robot. In the same way that the maze exploring algorithm of Section 12.3.1 did not need a complete map to locate an object, the GNT does not need one to perform optimal navigation.
- Probabilistic Localization and Mapping
The problems considered so far in Section 12.3 have avoided probabilistic model- ing. Suppose here that probabilistic models exist for the state transitions and the observations. Many problems can be formulated by replacing the nondeterministic models in Section 12.3.1 by probabilistic models. This would lead to probabilistic I-states that represent distributions over a set of possible grids and a configura- tion within each grid. If the problem is left in its full generality, the I-space is enormous to the point that is seems hopeless to approach problems in the manner used to far. If optimality is not required, then in some special cases progress may be possible.
The current problem is to construct a map of the environment while simul- taneously localizing the robot with the respect to the map. Recall Figure 1.7 from Section 1.2. The section covers a general framework that has been popular in mobile robotics in recent years (see the literature suggested at the end of the chapter). The discussion presented here can be considered as a generalization of
T1 T2 T3 T4 T5
T6 T7 T6′
T5′
T4′
T3′
T2′
T1′
Figure 12.35: Building a representation of the environment in Figure 12.34 us- ing the gap navigation tree. The sequence is followed from left to right. For convenience, the “R” or “L” inside of each vertex indicates whether the shadow component is to the right or left of the gap, respectively. This information is not needed by the algorithm, but it helps in understanding the representation.
the discussion from Section 12.2.3, which was only concerned with the localization portion of the current problem. Now the environment is not even known. The current problem can be interpreted as localization in a state space defined as
X = C × E, (12.29)
in which C is a configuration space and E is the environment space. A state, xk, is represented as xk = (qk, e); there is no k subscript for e because the environment is assumed to be static). The history I-state provides the data to use in the process of determining the state. As for localization in Section 12.2, there are both passive and active versions of the problem. An incremental version of the active problem is sometimes called the next-best-view problem [66, 238, 793]. The difficulty is that the robot has opposing goals of: 1) trying to turn on the sensor at places that will gain as much new data as possible, and 2) this minimization of redundancy can make it difficult to fuse all of the measurements into a global map. The passive problem will be described here; the methods can be used to provide information for solving the active problem.
Suppose that the robot is a point that translates and rotates in R2. According to Section 4.2, this yields C = R2 ×S1, which represents SE(2). Let q ∈ C denote a
gap h disappeared
Figure 12.36: Optimal navigation to a specified part of the environment is achieved by “chasing” the desired vertex in the GNT until it disappears. This will make a portion of the environment visible. In the example, the gap labeled “h” is chased.
Figure 12.37: These environments yield the same GNTs and are therefore equiva- lent at the resolution of the derived I-space. The robot cannot measure distances and does not even care whether walls are straight or curved; it is not relevant to the navigation task. Nevertheless, it executes optimal motions in terms of the Euclidean distance traveled.
configuration, which yields the position and orientation of the robot. Assume that configuration transitions are modeled probabilistically, which requires specifying a
probability density, p(qk+1|qk, uk). This can be lifted to the state space to obtain
p(xk+1|xk, uk) by assuming that the configuration transitions are independent of
the environment (assuming no collisions ever occur). This replaces qk and qk+1 by xk and xk+1, respectively, in which xk = (qk, e) and xk+1 = (qk+1, e) for any e E. Suppose that observations are obtained from a depth sensor, which ideally would produce measurements like those shown in Figure 11.15b; however, the data are assumed to be noisy. The probabilistic model discussed in Section 12.2.3 can be used to define p(y x). Now imagine that the robot moves to several parts of the environment, such as those shown in Figure 11.15a, and performs a sensor sweep in each place. If the configuration qk is not known from which each sweep yk was performed, how can the data sets be sewn together to build a correct,
∈
|
global map of the environment? This is trivial after considering the knowledge of the configurations, but without it the problem is like putting together pieces of a jigsaw puzzle. Thus, the important data in each stage form a vector, (yk, qk). If the sensor observations, yk, are not tagged with a configuration, qk, from which they are taken, then the jigsaw problem arises. If information is used to tightly constrain the possibilities for qk, then it becomes easier to put the pieces together. This intuition leads to the following approach.
The EM algorithm The problem is often solved in practice by applying the expectation-maximization (EM) algorithm [106]. In the general framework, there are three different spaces:
- A set of parameters, which are to be determined through some measurement and estimation process. In our problem, this represents E, because the main goal is to determine the environment.
- A set of data, which provide information that can be used to estimate the parameter. In the localization and mapping problem, this corresponds to the history I-space K . Each history I-state ηK K is ηK = (p(x), u˜K−1, y˜K ), in which p(x) is a prior probability density over X.
I ∈ I
- A set of hidden variables, which are unknown but need to be estimated to complete the process of determining the parameters. In the localization and
mapping problem, this is the configuration space C.
Since both the parameters and the hidden variables are unknown, the choice be- tween the two may seem arbitrary. It will turn out that expressions can be derived to nicely express the probability density for the hidden variables, but the param- eters are much more complicated.
The EM algorithm involves an expectation step followed by a maximization step. The two steps are repeated as necessary until a solution with the desired accuracy is obtained. The method is guaranteed to converge under general con- ditions [269, 977, 978]. In practice, it appears to work well even under cases that are not theoretically guaranteed to converge [940].
From this point onward, let E, K , and denote the three spaces for the EM algorithm because they pertain directly to the problem. Suppose that a robot has
I C
moved in the environment for K − 1 stages, resulting in a final stage, K. At each
stage, k 1, . . . , K , an observation, yk, is made using its sensor. This could, for example, represent a set of distance measurements made by sonars or a range scanner. Furthermore, an action, uk, is applied for k = 1 to k = K. A prior probability density function, p(x), is initially assumed over X. This leads to the history I-state, ηk, as defined in (11.14).
∈ { }
Now imagine that K stages have been executed, and the task is to estimate e.
From each qk, a measurement, yk, of part of the environment is taken. The EM algorithm generates a sequence of improved estimates of e. In each execution of
the two EM steps, a new estimate of e ∈ E is produced. Let eˆi denote this estimate
after the ith iteration. Let q˜K denote the configuration history from stage 1 to stage K. The expectation step computes the expected likelihood of ηK given eˆi. This can be expressed as3
Q(e, eˆi−1) =E [p(ηK , q˜K | e)| ηK , eˆi−1]
r
= p(ηK
C
, q˜K
| e)p(q˜K
| ηK
, eˆi−1
)dq˜K
(12.30)
,
in which the expectation is taken over the configuration histories. Since ηK is given and the expectation removes q˜k, (12.30) is a function only of e and eˆi−1. The
term p(ηK , q˜K | e) can be expressed as
p(ηK , q˜K | e) = p(q˜K | ηK , e)p(ηK |e), (12.31)
in which p(ηK ) is a prior density over the I-space, given nothing but the environ- ment e. The factor p(q˜K | ηK , e) differs from the second factor of the integrand in
(12.30) only by using e or eˆi−1. The main difficulty in evaluating (12.30) is to eval- uate p(q˜k ηK , eˆi−1) (or the version that uses e). This is essentially a localization problem with a given map, as considered in Section 12.2.3. The information up
|
to stage k can be applied to yield the probabilistic I-state p(qk| ηk, eˆi−1) for each
qk; however, this neglects the information from the remaining stages. This new
information can be used to make inferences about old configurations. For example, based on current measurements and memory of the actions that were applied, we
have better information regarding the configuration several stages ago. In [941] a method of computing p(qk| ηk, eˆi−1) is given that computes two terms: One is
p(qk ηk), and the other is a backward probabilistic I-state that starts at stage K
|
and runs down to k + 1.
Note that once determined, (12.30) is a function only of e and maximization step involves selecting an eˆi that minimizes (12.30):
eˆi−1. The
eˆi = argmax Q(e, eˆi−1). (12.32)
e∈E
This optimization is often too difficult, and convergence conditions exist if eˆi is chosen such that Q(eˆi, eˆi−1) > Q(eˆi−1, eˆi−1). Repeated iterations of the EM al- gorithm result in a kind of gradient descent that arrives at a local minimum in E.
One important factor in the success of the method is in the representation of E. In the EM computations, one common approach is to use a set of landmarks, which were mentioned in Section 11.5.1. These are special places in the environment that can be identified by sensors, and if correctly classified, they dramatically improve localization. In [941], the landmarks are indicated by a user as the robot travels. Classification and positioning errors can both be modeled probabilistically and
|
3In practice, a logarithm is applied to p(ηK , qk e) because densities that contain exponentials usually arise. Taking the logarithm makes the expressions simpler without affecting the result of the optimization. The log is not applied here because this level of detail is not covered.
incorporated into the EM approach. Another idea that dramatically simplifies the representation of E is to approximate environments with a fine-resolution grid. Probabilities are associated with grid cells, which leads to a data structure called an occupancy grid [307, 685, 850]. In any case, E must be carefully defined to ensure that reasonable prior distributions can be made for p(e) to initialize the EM algorithm as the robot first moves.