Multiple Decision Makers
Differential models can be extended to model the interaction of multiple decision makers. This leads to continuous-time extensions of sequential decision making, from Formulation 10.1, and sequential games, from Formulation 10.4. A differen- tial version of the state transition equation can be made for these extensions.
- Differential Decision Making
To make a differential game against nature that extends Formulation 10.1 to con- tinuous time, suppose that nature actions θ(t) are chosen from Θ. A differential model can be defined as
x˙ = f(x, u, θ). (13.199)
The state space X and action space U are used in the same way as throughout this chapter. The difference only comes in the state transition equation. State- dependent nature action spaces may also be used.
As observed repeatedly throughout Part III, nature can be modeled nondeter- ministically or probabilistically. In the nondeterministic case, (13.199) is equiva- lent to a differential inclusion [53]:
x˙ ∈ {x˙ ′ | ∃θ ∈ Θ such that x˙ ′ = f(x, u, θ)}. (13.200)
Possible future values for x˙ can be computed using forward projections. Reachable sets, which will be introduced in Section 14.2.1, can be defined that characterize the evolution of future possible states over time. Plans constructed under this model usually use worst-case analysis.
Example 13.15 (Nondeterministic Forward Projection) As a simple ex- ample of using (13.199), consider expressing the uncertainty model used in the preimage planning framework of Section 12.5.1.
At each time t 0, nature chooses some θ Θ(t). The state transition equation is
≥ ∈
x˙ = u + θ. (13.201)
The cone shown in Figure 12.45 is just the nondeterministic forward projection under the application of a constant u ∈ U. .
In the probabilistic case, restrictions must be carefully placed on the nature action trajectory (e.g., a Weiner process [910]). Under such conditions, (13.199) be- comes a stochastic differential equation. Planning in this case becomes continuous- time stochastic control [567], and the task is to optimize the expected cost.
Example 13.16 (A Simple Car and Nature) Uncertainty can be introduced into any of the models of this chapter. For example, recall the simple car, (13.15).
Suppose that nature interferes with the steering action so that it is not precisely known in which direction the car will drive. Let Θ = [−θmax, θmax], in which
θmax (0, π/2) represents the maximum amount of steering angle error that can be caused by nature. The simple-car model can be modified to account for this error as
∈
x˙ = us cos θ y˙ = us sin θ
θ˙ = us tan(u
L φ
- γ),
(13.202)
in which the domain of tan must be extended to R or other suitable restrictions must be imposed. At each time t, a nature action12 γ ∈ Θ causes the true heading
of the car to be perturbed from the commanded direction uφ. Under nondetermin- istic uncertainty, the maximum amount that the car deviates from the commanded direction must be determined by the planning algorithm. A probability density function p(γ) can be assigned to obtain a probabilistic model. When integrated
over time, (13.202) yields probability density functions over future car configura- tions [1004]. .
In a similar way, parameters that account for nature can be introduced virtually anywhere in the models of this chapter. Some errors may be systematic, which reflect mistakes or simplifications made in the modeling process. These correspond to a constant nature action applied at the outset. In this case, nature is not allowed to vary its action over time. Other errors could correspond to noise, which is expected to yield different nature actions over time.
12The notation γ is used instead of θ to avoid conflicting with the car orientation variable θ
in this particular example.
- Differential Game Theory
The extension of sequential game theory to the continuous-time case is called differential game theory (or dynamic game theory [59]), a subject introduced by Isaacs [477]. All of the variants considered in Sections 9.3, 9.4, 10.5 are possible:
- There may be any number of players.
- The game may be zero-sum or nonzero-sum.
- The state may or may not be known. If the state is unknown, then interesting I-spaces arise, similar to those of Section 11.7.
- Nature can interfere with the game.
- Different equilibrium concepts, such as saddle points and Nash equilibria, can be defined.
See [59] for a thorough overview of differential games. Two players, P1 and P2, can be engaged in a differential game in which each has a continuous set of actions. Let U and V denote the action spaces of P1 and P2, respectively. A state transition equation can be defined as
x˙ = f(x, u, v), (13.203)
in which x is the state, u U, and v V .
∈ ∈
Linear differential games are an important family of games because many tech- niques from optimal control theory can be extended to solve them [59].
Example 13.17 (Linear Differential Games) The linear system model (13.37) can be extended to incorporate two players. Let X = Rn be a phase space. Let U = Rm_1 and V = R_m_2 be an action spaces for m1, m2 n. A **_linear differential game **is expressed as
≤
x˙ = Ax + Bu + Cv, (13.204)
in which A, B, and C are constant, real-valued matrices of dimensions n × n, n m1, and n m2, respectively. The particular solution to such games depends on the cost functional and desired equilibrium concept. For the case of a quadratic cost, closed-form solutions exist. These extend techniques that are developed for linear systems with one decision maker; see Section 15.2.2 and [59].
× ×
The original work of Isaacs [477] contains many interesting examples of pursuit- evasion differential games. One of the most famous is described next.
Example 13.18 (Homicidal Chauffeur) In the homicidal chauffeur game, the pursuer is a Dubins car and the evader is a point robot that can translate
in any direction. Both exist in the same world, W = R2. The speeds of the car
and robot are s1 and s2, respectively. It is assumed that s1 > s2 , which means that the pursuer moves faster than the evader. The transition equation is given
| | | |
by extending (13.15) to include two state variables that account for the robot position:
x˙ 1 = s1 cos θ1
y˙1 = s1 sin θ1
x˙ 2 = s2 cos v
y˙2 = s2 sin v (13.205)
θ˙1
s1
= tan uφ.
L
The state space is X is R4 S1, and the action spaces are U = [ φmax, φmax] and
× −
V = [0, 2π).
The task is to determine whether the pursuer can come within some prescribed distance ǫ of the evader:
(x1 − x2) + (y1 − y2) < ǫ . (13.206)
2 2 2
If this occurs, then the pursuer wins; otherwise, the evader wins. The solution depends on the L, s1, s2, ǫ, and the initial state. Even though the pursuer moves faster, the evader may escape because it does not have a limited turning radius. For given values of L, s1, s2, and ǫ, the state space X can be partitioned into two regions that correspond to whether the pursuer or evader wins [59, 477]. To gain some intuition about how this partition may appear, imagine the motions that a bullfighter must make to avoid a fast, charging bull (yes, bulls behave very much
like a fast Dubins car when provoked). .
Another interesting pursuit-evasion game arises in the case of one car attempt- ing to intercept another [694].
Example 13.19 (A Game of Two Cars) Imagine that there are two simple cars that move in the same world, = R2. Each has a transition equation given by (13.15). The state transition equation for the game is
W
x˙ 1 = us cos θ1
y˙1 = us sin θ1
x˙ 2 = vs cos θ2
y˙2 = vs sin θ2 (13.207)
θ˙1
us
= tan uφ L1
θ˙2
vs
= tan vφ.
L2
The pursuit-evasion game becomes very interesting if both players are restricted to be Dubins cars. .
Further Reading
This chapter was synthesized from numerous sources. Many important, related subjects were omitted. For some mechanics of bodies in contact and manipulation in general, see [681]. Three-dimensional vehicle models were avoided because they are complicated
by SO(3); see [433]. For computational issues associated with simulating dynamical systems, see [247, 863].
For further reading on velocity constraints on the C-space, see [596, 725] and Sec- tions 15.3 to 15.5. For more problems involving rolling spheres, see [527] and references therein. The rolling-ball problem is sometimes referred to as the Chaplygin ball. A non- holonomic manipulator constructed from rolling-ball joints was developed and analyzed in [729]. The kinematics of curved bodies in contact was studied in [632, 716]. For mo- tion planning in this context, see [101, 103, 223, 676]. Other interesting nonholonomic
systems include the snakeboard [473, 629], roller racer [556], rollerblader [214], Trikke [213], and examples in [112] (e.g., the Chaplygin sled).
Phase space representations are a basic part of differential equations, physics, and control theory; see [44, 192].
Further reading in mechanics is somewhat complicated by two different levels of treatment. Classical mechanics texts do not base the subject on differential geometry, which results in cumbersome formulations and unusual terminology (e.g., generalized coordinates). Modern mechanics texts overcome this problem by cleanly formulating everything in terms of geodesics on Riemannian manifolds; however, this may be more difficult to absorb for readers without background in differential geometry. An excellent source for modern mechanics is [39]. One of the most famous texts for classical mechanics is [397]. For an on-line book that covers the calculus of variations, including constrained Lagrangians, see [790]. The constrained Lagrangian presentation is based on Chapter 3 of [789], Section 2.4 of [397], and parts of [405]. Integral constraints on the Lagrangian are covered in [790], in addition to algebraic and differential constraints. Lagrangian mechanics under inequality constraints is considered in [789]. The presentation of the Hamiltonian in Section 13.4.4 is based on Chapter 7 of [397] and Section 15 of [39]. For advanced, modern treatments of mechanics in the language of affine connections and Christoffel symbols, see [3, 156, 677]. Another source, which is also heavily illustrated, is [359]. For further reading on robot dynamics, see [30, 204, 725, 856, 907, 994]. For dynamics of automobiles, see [389].
For further reading on differential game theory, primary sources are [59, 423, 477]; see also [34, 57, 783, 985, 991, 992, 993, 997]. Lower bounds for the algorithmic complexity of pursuit-evasion differential games are presented in [821].
Exercises
- Let C = R4. There are two Pfaffian constraints, q˙1 +q˙2 +q˙3 +q˙4 = 0 and q˙2 −q˙4 = 0. Determine the appropriate number of action variables and express the differential constraints in the form q˙ = f (q, u).
- Introduce a phase space and convert 2y¨ − 10y˙2 + 5y = 0 into the form x˙ = f (x).
- Introduce a phase space and convert y(4) + y = 0 into the form x˙ = f (x).
- Derive the configuration transition equation (13.19) for a car pulling trailers.
- Use the main idea of Section 13.2.4 to develop a smooth-steering extension of the car pulling trailers, (13.19).
Figure 13.14: A double pendulum.
- Suppose that two identical differential-drive robots are connected together at their centers with a rigid bar of length d. The robots are attached at each end of the rod, and each attachment forms a revolute joint. There are four wheels to control; however, some combinations of wheel rotations cause skidding. Assuming that skidding is not allowed, develop a motion model of the form q˙ = f (q, u), in which
C and U are chosen to reflect the true degrees of freedom.
- Extend the lunar lander model to a general rigid body with a thruster that does not apply forces through the center of mass.
- Develop a model for a 3D rotating rigid body fired out of a canon at a specified angle above level ground under gravity. Suppose that thrusters are placed on the body, enabling it to be controlled before it impacts the ground. Develop general phase transition equations.
- Add gravity with respect to q2 in Example 13.12 and derive the new state transition equation using the Euler-Lagrange equation.
- Use the constrained Lagrangian to derive the equations of motion of the pendulum in Example 13.8.
- Define a phase space, and determine an equation of the form x˙ double pendulum shown in Figure 13.14.
= f (x) for the
- Extend Example 13.13 to obtain the dynamics of a three-link manipulator. The third link, A3, is attached to the other two by a revolute joint. The new parameters
are θ3, d2, ℓ3, m3, and I3.
- Solve Example 13.14 by parameterizing the sphere with standard spherical coor- dinates and using the unconstrained Lagrangian. Verify that the same answer is obtained.
- Convert the equations in (13.161) into phase space form, to obtain the phase transition equation in the form x˙ = f (x, u). Express the right side of the equation in terms of the basic parameters, such as mass, moment of inertia, and lengths.
- Define the Hamiltonian for a free-floating 2D rigid body under gravity and develop Hamilton’s equations.
Implementations
- Make a 3D spacecraft (rigid-body) simulator that allows any number of binary thrusters to be placed in any position and orientation.
- Make a simulator for the two-link manipulator in Example 13.13.