14.7 Gradient-Based Trajectory Optimization
This section provides a brief overview of a complementary problem to motion planning. Suppose that an algorithm in this chapter returns a feasible action trajectory. How can the solution be improved? Trajectory optimization refers to the problem of perturbing the trajectory while satisfying all constraints so that its quality can be improved. For example, it may be desirable to shorten a trajectory computed by an RRT, to remove some of the arbitrary changes in actions due to randomization. Trajectory optimization is considered complementary to motion planning because it usually requires an initial guess, which could be provided by a planning algorithm. Trajectory optimization can be considered as a kind of BVP, but one that improves an initial guess, as opposed to determining trajectories from scratch.
The optimization issue also exists for paths computed by sampling-based algo- rithms for the Piano Mover’s Problem; however, without differential constraints, it is much simpler to shorten paths. The plan and transform method of Section
14.6.2 can be applied, and the LPM just connects pairs of configurations along the shortest path in . In the presence of differential constraints, the BVP must be faced.
C
In the most general setting, it is very difficult to improve trajectories. There are numerous methods from optimization literature; see [98, 151, 664] for overviews. The purpose of this section is to encourage further study by briefly mentioning the various kinds of methods that have been developed, instead of explaining them in detail. The methods fall under the area of nonlinear programming (NLP) (or nonlinear optimization), as opposed to linear programming, which was used to find randomized security strategies in Section 9.3. The optimization actually occurs in a space of possible trajectories, each of which is a function of time. Therefore, the calculus of variations, which was used in Section 13.4.1, becomes relevant to characterize extrema. The functional Φ from that setting becomes the cost
functional L in the current setting. The system x˙ = f(x, u) forms an additional
set of constraints that must be satisfied, but u can be selected in the optimization.
To enable numerical computation methods, a family of trajectories is specified in terms of a parameter space. The optimization can then be viewed as an incre- mental search in the parameter space while satisfying all constraints. The direction
of motion in each step is determined by computing the gradient of a cost functional with respect to the parameters while constrained to move in a direction tangent to the constraints. Hence, much of nonlinear programming can be considered as an application of Newton’s method or gradient descent. As in standard optimization, second-order derivatives of the cost functional can be used to indicate when the search should terminate. The numerical issues associated with these methods are quite involved; several NLP software packages, such as the NAG Fortran Library or packages within Matlab, are available.
Nonlinear optimal control theory can be considered as a variant of NLP. The dy- namic programming recurrence becomes a differential equation in the continuous- time setting, and Hamilton’s equations (13.198) generalize to Pontryagin’s mini- mum principle. These are covered in Section 15.2. The extra variables that arise in the minimum principle can be considered as Lagrange multipliers of a constrained
optimization, in which x˙ = f(x, u) is the constraint. The differential equations
arising from dynamic programming or the minimum principle are difficult to solve analytically; therefore, in most cases, numerical techniques are used. The case of numerical dynamic programming was covered in Section 14.5.
Shooting methods constitute the simplest family of trajectory optimization methods. As a simple example, suppose that an action trajectory u˜ : [0, tF ] R has been computed of the form
→
u(t) = w1 + w2t, (14.47)
in which w1 and w2 are some fixed parameters. Consider perturbing w1 and w2 by some small amount and applying the integration in (14.1). If f satisfies Lipschitz conditions, then a small perturbation should produce a small change in x˜. The resulting new trajectory can be evaluated by a cost functional to determine whether it is an improvement. It might, for example, have lower maximum curvature. Rather than picking a perturbation at random, the gradient of the cost functional with respect to the parameters can be computed. A small step in the parameter space along the negative gradient direction should reduce the cost. It is very likely, however, that perturbing w1 and w2 will move the final state x(tF ). Usually, a termination condition, such as x(tF ) = xG, must be enforced as a constraint in the optimization. This removes degrees of freedom from the optimization; therefore, more trajectory parameters are often needed.
Suppose more generally that a motion planning algorithm computes an action sequence based on the discrete-time model. Each action in the sequence remains constant for duration ∆t. The time duration of each action can instead be defined as a parameter to be perturbed. Each action variable ui over each interval could also be perturbed using by (14.47) with the initial condition that w1 = ui and w2 = 0. The dimension of the search has increased, but there are more degrees of freedom. In some formulations, the parameters may appear as implicit constraints; in this case, a BVP must be solved in each iteration. The minimum principle is often applied in this case [98]. More details on formulating and solving the trajectory optimization problem via shooting appear in [151].
Several difficulties are encountered when applying the shooting technique to trajectory optimization among obstacles. Each perturbation requires integration and collision-checking. For problems involving vehicles, the integrations can some- times be avoided by exploiting symmetries [197]. For example, a path for the Dubins car can be perturbed by changing a steering angle over a short amount of time, and the rest of the trajectory can simply be transformed using a matrix of SE(2). A critical problem is that following the negative gradient may suggest shortening the path in a way that causes collision. The problem can be alleviated by breaking the trajectory into segments, as in the plan-and-transform approach; however, this yields more optimizations. Another possible solution is to invent a penalty function for the obstacles; however, this is difficult due to local minima problems and the lack of representing the precise boundary of Xobs.
Another difficulty with shooting is that a small change in the action near the
starting time may lead to great changes in the states at later times. One way to alleviate this problem is by multiple shooting (as opposed to single shooting, which has been described so far). In this case, the trajectory is initially bro- ken into segments. These could correspond to the time boundaries imposed by a sequence of motion primitives. In this case, imagine perturbing each motion primitive separately. Extra constraints are needed in this case to indicate that all of the trajectory pieces must remain connected. The multiple shooting method can be generalized to a family of methods called transcription or collocation (see
- for references). These methods again split the trajectory into segments, but each connection constraint relates more points along the trajectory than just the segment endpoints. One version of transcription uses implicit constraints, which require using another BVP solver, and another version uses parametric constraints, which dramatically increases the dimension of the search. The latter case is still useful in practice by employing fast, sparse-matrix computation methods.
One of the main difficulties with trajectory optimization methods is that they can become stuck in a local minimum in the space of trajectories. This means that their behavior depends strongly on the initial guess. It is generally impossible for them to find a trajectory that is not homotopic to the initial trajectory. They cannot recover from an initial guess in a bad homotopy class. If Xobs is compli- cated, then this issue becomes increasingly important. In many cases, variational techniques might not even find an optimal solution within a single homotopy class. Multiple local minima may exist if the closure of Xfree contains positive curvature. If it does not, the space is called nonpositively curved (NPC) or CAT(0), which is a property that can be derived directly from the metric on X [139]. For these spaces, the locally optimal trajectory with respect to the metric is always the best within its homotopy class.
Further Reading
The characterization and computation of reachable sets has been growing in interest [100, 102, 706, 707, 916, 955]. One motivation for studying reachability is verification, which ensures that a control system behaves as desired under all possible disturbances.
This can actually be modeled as a game against nature, in which nature attempts to bring the system into an undesirable state (e.g., crashing an airplane). For recent progress on characterizing Xric, see [355]. The triangularization argument for completeness ap- pears in a similar context in [292]. The precise rate of convergence, expressed in terms of dispersion and Lipschitz conditions, for resolution-complete sampling-based motion planning methods under differential constraints is covered in [196]. For the computa- tional complexity of control problems, see [114, 766]. For further reading on motion primitives in the context of planning, see [360, 362, 363, 393, 787, 794, 848]. For further reading on dynamical simulation and numerical integration, see [331, 440, 863].
Section 14.4.1 was based on [288, 290, 441]. For more works on kinodynamic plan- ning, see [203, 237, 289, 356, 360, 611, 780, 999]. Section 14.4.2 was inspired by [73]. Section 14.4.3 was drawn from [611]. For more work on RRTs under differential con- straints, see [138, 199, 224, 324, 360, 393, 509, 949]. For other works on nonholonomic
planning, see the survey [596] and [67, 277, 334, 335, 354, 357, 482, 579, 633, 672]. Combinatorial approaches to nonholonomic planning have appeared in [13, 128, 347].
Section 14.5 was developed by adapting value iteration to motion planning problems. For general convergence theorems for value iteration with interpolation, see [168, 292, 400, 565, 567]. In [168], global constraints on the phase space are actually considered. The use of these techniques and the development of Dijkstra-like variants are covered in [607]. Related work exists in artificial intelligence [722] and control theory [946].
Decoupled approaches to planning, as covered in Section 14.6, are very common in robotics literature. For material related to the plan-and-transform method, see [333, 596, 859]. For more on decoupled trajectory planning and time scaling, see [353, 456, 457, 843, 876, 877, 880, 881], and see [104, 120, 121, 785, 879, 894, 878] for particular
emphasis on time-optimal trajectories.
For more on gradient-based techniques in general, see [98] and references therein. Classical texts on the subject are [151, 664]. Gradient-based approaches to path defor- mation in the context of nonholonomic planning appear in [197, 343, 575].
The techniques presented in this chapter are useful in other fields beyond robotics. For aerospace applications of motion planning, see [86, 202, 436, 437, 786]. Motion planning problems and techniques have been gaining interest in computer graphics, particularly for generating animations of virtual humans (or digital actors); works in this area include [35, 86, 393, 498, 544, 554, 557, 591, 617, 649, 712, 802, 980]. In many
of these works, motion capture is a popular way to generate a database of recorded
motions that serves as a set of motion primitives in the planning approach.
Exercises
- Characterize Xric for the case of a point mass in W = R2, with each coordinate modeled as a double integrator. Assume that u1 = 1 and u2 may take any value
in [−1, 1]. Determine Xric for:
- A point obstacle at (0, 0) in W.
- A segment from (0, −1) to (0, 1) in W.
Characterize the solutions in terms of the phase variables q1(0), q2(0), q˙1(0), and
q˙2(0).
- Extending the double integrator:
- Develop a lattice for the triple integrator q(3) = u that extends naturally from the double-integrator lattice.
- Describe how to develop a lattice for higher order integrators q(n) for n > 3.
- Make a figure similar to Figure 14.6b, but for three stages of the Reeds-Shepp car.
- Determine expressions for the upper and lower boundaries of the time-limited reachable sets shown in Figure 14.14. Express them as parabolas, with q˙ as a function of q.
- A reachability graph can be made by “rolling” a polyhedron in the plane. For example, suppose a solid, regular tetrahedron is placed on a planar surface. As- suming high friction, the tetrahedron can be flipped in one of four directions by pushing on the top. Construct the three-stage reachability graph for this problem.
- Construct a four-stage reachability graph similar to the one shown in Figure 14.6b, but for the case of a differential drive robot modeled by (13.17). Use the three actions (1, 0), (0, 1), and (1, 1). Draw the graph in the plane and indicate the configuration coordinates of each vertex.
- Section 14.2.2 explained how resolution-complete algorithms exist for planning under differential constraints. Suppose that in addition to continuous state vari- ables, there are discrete modes, as introduced in Section 7.3, to form a hybrid system. Explain how resolution-complete planning algorithms can be developed for this case. Extend the argument shown in Figure 14.7.
- Extending the double integrator:
Implementations
- Compare the performance and accuracy of Euler integration to fourth-order Runge- Kutta on trajectories generated for a single, double, and triple integrator. For accuracy, compare the results to solutions obtained analytically. Provide recom- mendations of which one to use under various conditions.
- Improve Figure 14.13 by making a plot of the actual trajectories, which are parabolic in most cases.
- In Figure 14.13, the state trajectory segments are longer as |x˙ | increases. Develop a lattice that tries to keep all segments as close to the same length as possible by
reducing ∆t as |x˙ | increases. Implement and experiment with different schemes
and report on the results.
- Develop an implementation for computing approximately time-optimal state tra- jectories for a point mass in a 2D polygonal world. The robot dynamics can be modeled as two independent double integrators. Search the double-integrator lattice in X = R4 to solve the problem. Animate the computed solutions.
- Experiment with RDT methods applied to a spacecraft that is modeled as a 3D rigid body with thrusters. Develop software that computes collision-free trajecto- ries for the robot. Carefully study the issues associated with choosing the metric on X.
- Solve the problem of optimally bringing the Dubins car to a goal region in a polygonal world by using value iteration with interpolation.
- Select and implement a planning algorithm that computes pushing trajectories for a differential drive robot that pushes a box in a polygonal environment. This was given as an example of a nonholonomic system in Section 13.1.3. To use the appropriate constraints on U , see [671].
- Select and implement a planning algorithm that computes trajectories for parking a car while pulling a single trailer, using (13.19). Make an obstacle region in W
that corresponds to a tight parking space and vary the amount of clearance. Also, experiment with driving the vehicle through an obstacle course.
- Generate a 3D rendering of reachability graphs for the airplane model in (13.20). Assume that in each stage there are nine possible actions, based on combinations of flying to the right, left, or straight and decreasing, increasing, or maintaining altitude.
- Implement the dynamic programming algorithm shown in Figure 14.27 for the two-link manipulator model given in Example 13.13.
- Implement the bang-bang algorithm shown in Figure 14.28 for the two-link ma- nipulator model given in Example 13.13.
- For the Dubins car (or another system), experiment with generating a search graph based on Figure 14.7 by alternating between various step sizes. Plot in the plane, the vertices and state trajectories associated with the edges of the graph. Experiment with different schemes for generating a resolution-complete search graph in a rectangular region and compare the results.
- Use value iteration with interpolation to compute the optimal cost-to-go for the Reeds-Shepp car. Plot level sets of the cost-to-go, which indicate the time-limited reachable sets. Compare the result to Figure 14.4.