Nonzero-Sum Games
This section parallels the development of Section 9.3, except that the more general case of nonzero-sum games is considered. This enables games with any desired degree of conflict to be modeled. Some decisions may even benefit all players. One of the main applications of the subject is in economics, where it helps to explain the behavior of businesses in competition.
The saddle-point solution will be replaced by the Nash equilibrium, which again is based on eliminating regret. Since the players do not necessarily oppose each other, it is possible to model a game that involves any number of players. For nonzero games, new difficulties arise, such as the nonuniqueness of Nash equilibria and the computation of randomized Nash equilibria does not generally fit into
linear programming.
- Two-Player Games
To help make the connection to Section 9.3 smoother, two-player games will be considered first. This case is also easier to understand because the notation is simpler. The ideas are then extended without difficulty from two players to many players. The game is formulated as follows.
Formulation 9.8 (A Two-Player Nonzero-Sum Game)
- The same components as in Formulation 9.7, except the cost function.
- A function, L1 : U × V → R ∪ {∞}, called the cost function for P1.
- A function, L2 : U × V → R ∪ {∞}, called the cost function for P2.
The only difference with respect to Formulation 9.7 is that now there are two, independent cost functions, L1 and L2, one for each player. Each player would like to minimize its cost. There is no maximization in this problem; that appeared in zero-sum games because P2 had opposing interests from P1. A zero-sum game can be modeled under Formulation 9.7 by setting L1 = L and L2 = L.
−
Paralleling Section 9.3, first consider applying deterministic strategies to solve the game. As before, one possibility is that a player can apply its security strategy. To accomplish this, it does not even need to look at the cost function of the other player. It seems somewhat inappropriate, however, to neglect the consideration of both cost functions when making a decision. In most cases, the security strategy results in regret, which makes it inappropriate for nonzero-sum games.
A strategy that avoids regret will now be given. A pair (u∗, v∗) of actions is
defined to be a Nash equilibrium if
L1(u∗, v∗) = min L1(u, v∗) (9.66)
J \
u∈U
and
L2(u∗, v∗) = min L2(u∗, v) . (9.67)
J \
v∈V
These expressions imply that neither P1 nor P2 has regret. Equation (9.66) indi- cates that P1 is satisfied with its action, u∗, given the action, v∗, chosen by P2. P1 cannot reduce its cost any further by changing its action. Likewise, (9.67) indicates that P2 is satisfied with its action v∗.
The game in Formulation 9.8 can be completely represented using two cost matrices. Let A and B denote the cost matrices for P1 and P2, respectively. Recall that Figure 9.2 showed a pattern for detecting a saddle point. A Nash
equilibrium can be detected as shown in Figure 9.5. Think about the relationship between the two. If A = −B, then B can be negated and superimposed on top of A. This will yield the pattern in Figure 9.2 (each ≥ becomes ≤ because of
A: B:
| ≥ | ||
|---|---|---|
| L∗a | ||
| ≥ |
| ≥ | L∗b | ≥ |
Figure 9.5: A Nash equilibrium can be detected in a pair of matrices by finding some (i, j) such that L∗a = L1(i, j) is the lowest among all elements in column j of A, and L∗b = L2(i, j) is the lowest among all elements in row i of B. Compare this with Figure 9.2.
negation). The values L∗a and L∗b coincide in this case. This observation implies that if A = B, then the Nash equilibrium is actually the same concept as a saddle point. It applies, however, to much more general games.
−
Example 9.17 (A Deterministic Nash Equlibrium) Consider the game spec- ified by the cost matrices A and B:
V
| 9 | 4 | 7 |
|---|---|---|
| 6 | -1 | 5 |
| 1 | 4 | 2 |
A : U
B : U
V
. (9.68)
| 2 | 1 | 6 |
|---|---|---|
| 5 | 0 | 2 |
| 2 | 2 | 5 |
By applying (9.66) and (9.67), or by using the patterns in Figure 9.5, it can be seen that u = 3 and v = 1 is a Nash equilibrium. The resulting costs are L1 = 1 and L2 = 2. Another Nash equilibrium appears at u = 2 and v = 2. This yields costs L1 = 1 and L2 = 0, which is better for both players.
−
For zero-sum games, the existence of multiple saddle points did not cause any problem; however, for nonzero-sum games, there are great troubles. In the ex- ample shown here, one Nash equilibrium is clearly better than the other for both
players. Therefore, it may seem reasonable that a rational DM would choose the better one. The issue of multiple Nash equilibria will be discussed next. .
- Dealing with multiple Nash equilibria
Example 9.17 was somewhat disheartening due to the existence of multiple Nash equilibria. In general, there could be any number of equilibria. How can each player know which one to play? If they each choose a different one, they are not guaranteed to fall into another equilibrium as in the case of saddle points of zero- sum games. Many of the equilibria can be eliminated by using Pareto optimality, which was explained in Section 9.1.1 and also appeared in Section 7.7.2 as a way
to optimally coordinate multiple robots. The idea is to formulate the selection as a multi-objective optimization problem, which fits into Formulation 9.2.
Consider two-dimensional vectors of the form (xi, yi), in which x and y repre- sent the costs L1 and L2 obtained under the implementation of a Nash equilibrium
denoted by πi. For two different equilibria π1 and π2, the cost vectors (x1, y1) and (x2, y2) are obtained. In Example 9.17, these were (1, 2) and (−1, 0). In general, π1
is said to be better than π2 if x1 x2, y1 y2, and at least one of the inequalities is strict. In Example 9.17, the equilibrium that produces ( 1, 0) is clearly better than obtaining (1, 2) because both players benefit.
−
≤ ≤
The definition of “better” induces a partial ordering on the space of Nash equilibria. It is only partial because some vectors are incomparable. Consider, for
example, (−1, 1) and (1, −1). The first one is preferable to P1, and the second is
preferred by P2. In game theory, the Nash equilibria that are minimal with respect to this partial ordering are called admissible. They could alternatively be called Pareto optimal.
The best situation is when a game has one Nash equilibrium. If there are multiple Nash equilibria, then there is some hope that only one of them is admis- sible. In this case, it is hoped that the rational players are intelligent enough to figure out that any nonadmissible equilibria should be discarded. Unfortunately, there are many games that have multiple admissible Nash equilibria. In this case, analysis of the game indicates that the players must communicate or collaborate in some way to eliminate the possibility of regret. Otherwise, regret is unavoid- able in the worst case. It is also possible that there are no Nash equilibria, but, fortunately, by allowing randomized strategies, a randomized Nash equilibrium is always guaranteed to exist. This will be covered after the following two examples.
Example 9.18 (The Battle of the Sexes) Consider a game specified by the cost matrices A and B:
V
A : U
| -2 | 0 |
|---|---|
| 0 | -1 |
B : U
V
. (9.69)
| -1 | 0 |
|---|---|
| 0 | -2 |
This is a famous game called the “Battle of the Sexes.” Suppose that a man and a woman have a relationship, and they each have different preferences on how to spend the evening. The man prefers to go shopping, and the woman prefers to watch a football match. The game involves selecting one of these two activities. The best case for either one is to do what they prefer while still remaining together. The worst case is to select different activities, which separates the couple. This game is somewhat unrealistic because in most situations some cooperation between them is expected.
Both u = v = 1 and u = v = 2 are Nash equilibria, which yield cost vectors ( 2, 1) and ( 1, 2), respectively. Neither solution is better than the other;
− − − −
therefore, they are both admissible. There is no way to avoid the possibility of regret unless the players cooperate (you probably already knew this). .
The following is one of the most famous nonzero-sum games.
Example 9.19 (The Prisoner’s Dilemma) The following game is very simple to express, yet it illustrates many interesting issues. Imagine that a heinous crime has been committed by two people. The authorities know they are guilty, but they do not have enough evidence to convict them. Therefore, they develop a plan to try to trick the suspects. Each suspect (or player) is placed in an isolated prison cell and given two choices. Each player can cooperate with the authorities, u = 1 or v = 1, or refuse, u = 2 or v = 2. By cooperating, the player admits guilt and turns over evidence to the authorities. By refusing, the player claims innocence and refuses to help the authorities.
The cost Li represents the number of years that the player will be sentenced to prison. The cost matrices are assigned as
V
A : U
| 8 | 0 |
|---|---|
| 30 | 2 |
B : U
V
. (9.70)
| 8 | 30 |
|---|---|
| 0 | 2 |
The motivation is that both players receive 8 years if they both cooperate, which is the sentence for being convicted of the crime and being rewarded for cooperating with the authorities. If they both refuse, then they receive 2 years because the authorities have insufficient evidence for a longer term. The interesting cases occur if one refuses and the other cooperates. The one who refuses is in big trouble because the evidence provided by the other will be used against him. The one who cooperates gets to go free (the cost is 0); however, the other is convicted on the evidence and spends 30 years in prison.
What should the players do? What would you do? If they could make a coordinated decision, then it seems that a good choice would be for both to refuse, which results in costs (2, 2). In this case, however, there would be regret because each player would think that he had a chance to go free (receiving cost 0 by refusing). If they were to play the game a second time, they might be inclined to change their decisions.
The Nash equilibrium for this problem is for both of them to cooperate, which results in (8, 8). Thus, they pay a price for not being able to communicate and
coordinate their strategy. This solution is also a security strategy for the players, because it achieves the lowest cost using worst-case analysis. .
- Randomized Nash equilibria
What happens if a game has no Nash equilibrium over the space of deterministic strategies? Once again the problem can be alleviated by expanding the strategy space to include randomized strategies. In Section 9.3.3 it was explained that ev- ery zero-sum game under Formulation 9.7 has a randomized saddle point on the space of randomized strategies. It was shown by Nash that every nonzero-sum
game under Formulation 9.8 has a randomized Nash equilibrium [730]. This is a nice result; however, there are a couple of concerns. There may still exist other admissible equilibria, which means that there is no reliable way to avoid regret un- less the players collaborate. The other concern is that randomized Nash equilibria unfortunately cannot be computed using the linear programming approach of Sec- tion 9.3.3. The required optimization is instead a form of nonlinear programming [94, 664, 731], which does not necessarily admit a nice, combinatorial solution.
Recall the definition of randomized strategies from Section 9.3.3. For a pair (w, z) of randomized strategies, the expected costs, L¯1(w, z) and L¯2(w, z), can be computed using (9.57). A pair (w∗, z∗) of strategies is said to be a randomized Nash equilibrium if
and
L¯1(w∗, z∗) = min L¯1(w, z∗) (9.71)
w∈W
J \
L¯2(w∗, z∗) = min L¯2(w∗, z) . (9.72)
J \
z∈Z
In game-theory literature, this is usually referred to as a mixed Nash equilibrium. Note that (9.71) and (9.72) are just generalizations of (9.66) and (9.67) from the space of deterministic strategies to the space of randomized strategies.
Using the cost matrices A and B, the Nash equilibrium conditions can be written as
J \
and
w∗Az∗ = min wAz∗ (9.73)
w∈W
w∗Bz∗ = min w∗Bz . (9.74)
J \
z∈Z
Unfortunately, the computation of randomized Nash equilibria is considerably more challenging than computing saddle points. The main difficulty is that Nash equilibria are not necessarily security strategies. By using security strategies, it is possible to decouple the decisions of the players into separate linear programming problems, as was seen in Example 9.16. For the randomized Nash equilibrium, the optimization between the players remains coupled. The resulting optimization is often referred to as the linear complementarity problem. This can be formulated as a nonlinear programming problem [664, 731], which means that it is a nonlinear optimization that involves both equality and inequality constraints on the domain (in this particular case, a bilinear programming problem is obtained [59]).
Example 9.20 (Finding a Randomized Nash Equilibrium) To get an idea of the kind of optimization that is required, recall Example 9.18. A randomized Nash equilibrium that is distinct from the two deterministic equilibria can be found. Using the cost matrices from Example 9.18, the expected cost for P1 given
randomized strategies w and z is
L¯1(w, z) = wAz
= (w
1
w ) (−2 0 \ (z1\
= − 2w1z1 − w2z2
2
0 −1
z2
(9.75)
= − 3w1z1 + w1 + z1,
in which the final step uses the fact that w2 = 1 − w1 and z2 = 1 − z1. Similarly, the expected cost for P2 is
L¯2(w, z) = wBz
= (w
w ) (−1 0 \ (z1\
= − w1z1 − 2w2z2
1
2
0 −2
z2
(9.76)
= − 3w1z1 + 2w1 + 2z1.
If z is fixed, then the final equation in (9.75) is linear in w; likewise, if w is fixed, then (9.76) is linear in z. In the case of computing saddle points for zero- sum games, we were allowed to make this assumption; however, it is not possible here. We must choose (w∗, z∗) to simultaneously optimize (9.75) while z = z∗ and (9.76) while w = w∗.
It turns out that this problem is simple enough to solve with calculus. Using the classical optimization method of taking derivatives, a candidate solution can be found by computing
and
∂L¯1(w1, z1)
∂w1
∂L¯2(w1, z1)
∂z1
= 1 − 3z1 (9.77)
= 2 − 3w1. (9.78)
Extrema occur when both of these simultaneously become 0. Solving 1 3z1 = 0 and 2 3w1 = 0 yields (w∗, z∗) = (2/3, 1/3), which is a randomized Nash equilib- rium. The deterministic Nash equilibria are not detected by this method because
−
−
they occur on the boundary of W and Z, where the derivative is not defined. .
The computation method in Example 9.20 did not appear too difficult because there were only two actions per player, and half of the matrix costs were 0. In gen- eral, two complicated equations must be solved simultaneously. The expressions, however, are always second-degree polynomials. Furthermore, they each become linear with respect to the other variables if w or z is held fixed.
Summary of possible solutions The solution possibilities to remember for a nonzero-sum game under Formulation 9.8 are as follows.
- There may be multiple, admissible (deterministic) Nash equilibria.
- There may be no (deterministic) Nash equilibria.
- There is always at least one randomized Nash equilibrium.
- More Than Two Players
The ideas of Section 9.4.1 easily generalize to any number of players. The main difficulty is that complicated notation makes the concepts appear more difficult. Keep in mind, however, that there are no fundamental differences. A nonzero-sum game with n players is formulated as follows.
Formulation 9.9 (An n-Player Nonzero-Sum Game)
- A set of n players, P1, P2, . . ., Pn.
- For each player Pi, a finite, nonempty set U i called the action space for Pi. For convenience, assume that each U i is a set of consecutive integers from 1
to |U |. Each u ∈ U is referred to as an action of Pi.
i i i
- For each player Pi, a function, Li : U 1 × U 2 × · · · × U n → R ∪ {∞} called the cost function for Pi.
A matrix formulation of the costs is no longer possible because there are too many dimensions. For example, if n = 3 and |U | = 2 for each player, then Li(u , u , u ) is specified by a 2 × 2 × 2 cube of 8 entries. Three of these cubes are needed to
i 1 2 3
specify the game. Thus, it may be helpful to just think of Li as a multivariate function and avoid using matrices.4
The Nash equilibrium idea generalizes by requiring that each Pi experiences no regret, given the actions chosen by the other n 1 players. Formally, a set (u1∗, . . . , un∗) of actions is said to be a (deterministic) Nash equilibrium if
−
Li(u1∗, . . . , ui∗, . . . , un∗) = min Li(u1∗, . . . , u(i−1)∗, ui, u(i+1)∗, . . . , un∗) (9.79)
J \
ui∈Ui
for every i 1, . . . , n .
∈ { }
For n > 2, any of the situations summarized at the end of Section 9.4.1 can occur. There may be no deterministic Nash equilibria or multiple Nash equilib- ria. The definition of an admissible Nash equilibrium is extended by defining the notion of better over n-dimensional cost vectors. Once again, the minimal vectors
4If you enjoy working with tensors, these could be used to capture n-player cost functions [107].
with respect to the resulting partial ordering are considered admissible (or Pareto optimal). Unfortunately, multiple admissible Nash equilibria may still exist.
It turns out that for any game under Formulation 9.9, there exists a randomized Nash equilibrium. Let zi denote a randomized strategy for Pi. The expected cost for each Pi can be expressed as
m_1 _m_2 _mn
L¯i(z1, z2, . . . , zn) = - - · · · - Li(i1, i2, . . . , in)z1 z2 · · · zn . (9.80)
_i_1=1 _i_2=1
in=1
_i_1=1 _i_2=1
in=1
_i_1
_i_2
in
Let Zi denote the space of randomized strategies for Pi. An assignment, (z1∗, . . . , zn∗), of randomized strategies to all of the players is called a random- ized Nash equilibrium if
L¯i(z1∗, . . . , zi∗, . . . , zn∗) = min L¯i(z1∗, . . . , z(i−1)∗, zi, z(i+1)∗, . . . , zn∗) (9.81)
J \
zi∈Zi
for all i 1, . . . , n .
∈ { }
As might be expected, computing a randomized Nash equilibrium for n > 2 is even more challenging than for n = 2. The method of Example 9.20 can be generalized to n-player games; however, the expressions become even more complicated. There are n equations, each of which appears linear if the randomized strategies are fixed for the other n 1 players. The result is a collection of n-degree polynomials over which n optimization problems must be solved simultaneously.
−
Example 9.21 (A Three-Player Nonzero-Sum Game) Suppose there are three players, P1, P2, and P3, each of which has two actions, 1 and 2. A deterministic strategy is specified by a vector such as (1, 2, 1), which indicates u1 = 1, u2 = 2, and u3 = 1.
Now some costs will be defined. For convenience, let
L(i, j, k) = (L1(i, j, k), L2(i, j, k), L3(i, j, k)\ (9.82)
for each i, j, k ∈ {1, 2}. Let the costs be
L(1, 1, 1) = (1, 1, −2) L(1, 1, 2) = (−4, 3, 1)
L(1, 2, 1) = (2, −4, 2) L(1, 2, 2) = (−5, −5, 10) (9.83)
L(2, 1, 1) = (3, −2, −1) L(2, 1, 2) = (−6, −6, 12)
L(2, 2, 1) = (2, 2, −4) L(2, 2, 2) = (−2, 3, −1).
There are two deterministic Nash equilibria, which yield the costs (2, 4, 2) and (3, 2, 1). These can be verified using (9.79). Each player is satisfied with the outcome given the actions chosen by the other players. Unfortunately, both Nash equilibria are both admissible. Therefore, some collaboration would be needed
− −
−
between the players to ensure that no regret will occur. .