Information Spaces in Game Theory

This section unifies the sequential game theory concepts from Section 10.5 with the I-space concepts from this chapter. Considerable attention is devoted to the modeling of information in game theory. The problem is complicated by the fact that each player has its own frame of reference, and hence its own I-space. Game solution concepts, such as saddle points or Nash equilibria, depend critically on the information available to each player as it makes it decisions. Paralleling Section 10.5, the current section first covers I-states in game trees, followed by I-states for games on state spaces. The presentation in this section will be confined to the case in which the state space and stages are finite. The formulation of I-spaces extends naturally to countably infinite or continuous state spaces, action spaces, and stages [59].

      1. Information States in Game Trees

Recall from Section 10.5.1 that an important part of formulating a sequential game in a game tree is specifying the information model. This was described in Step 4 of Formulation 10.3. Three information models were considered in Section 10.5.1: alternating play, stage-by-stage, and open loop. These and many other information models can be described using I-spaces.

From Section 11.1, it should be clear that an I-space is always defined with respect to a state space. Even though Section 10.5.1 did not formally introduce a

P1 P1

P2 P2

P1 P1

P2 P2

4 2 0

0 1 0 3 2 2 3 1 2 4 1 3 2

4 2 0

0 1 0 3 2 2 3 1 2 4 1 3 2

(a) Alternating play (b) Stage-by-stage

P1 P1

P2 P2

P1 P1

P2 P2

4 2 0

0 1 0 3 2 2 3 1 2 4 1 3 2

4 2 0

0 1 0 3 2 2 3 1 2 4 1 3 2

(c) Open loop (d) Something else

Figure 11.30: Several different information models are illustrated for the game in Figure 10.13.

state space, it is not difficult to define one. Let the state space X be N, the set of all vertices in the game tree. Assume that two players are engaged in a sequential zero-sum game. Using notation from Section 10.5.1, N1 and N2 are the decision

vertices of P1 and P2, respectively. Consider the nondeterministic I-space Indet

over N. Let η denote a nondeterministic I-state; thus, each η ndet is a subset of N.

∈ I

There are now many possible ways in which the players can be confused while making their decisions. For example, if some η contains vertices from both N1 and N2, the player does not know whether it is even its turn to make a decision. If η additionally contains some leaf vertices, the game may be finished without a player even being aware of it. Most game tree formulations avoid these strange situations. It is usually assumed that the players at least know when it is their turn to make a decision. It is also usually assumed that they know the stage of

the game. This eliminates many sets from Indet.

While playing the game, each player has its own nondeterministic I-state be- cause the players may hide their decisions from each other. Let η1 and η2 denote the nondeterministic I-states for P1 and P2, respectively. For each player, many

sets in Indet are eliminated. Some are removed to avoid the confusions mentioned

above. We also impose the constraint that ηi Ni for i = 1 and i = 2. We only care about the I-state of a player when it is that player’s turn to make a deci- sion. Thus, the nondeterministic I-state should tell us which decision vertices in

Ni are possible as Pi faces a decision. Let I1 and I2 represent the nondeterministic

I-spaces for P1 and P2, respectively, with all impossible I-states eliminated.

The I-spaces 1 and 2 are usually defined directly on the game tree by circling

I I

vertices that belong to the same I-state. They form a partition of the vertices in each level of the tree (except the leaves). In fact, Ii even forms a partition

of Ni for each player. Figure 11.30 shows four information models specified in this way for the example in Figure 10.13. The first three correspond directly to the models allowed in Section 10.5.1. In the alternating-play model, each player always knows the decision vertex. This corresponds to a case of perfect state information. In the stage-by-stage model, P1 always knows the decision vertex; P2 knows the decision vertex from which P1 made its last decision, but it does not know which branch was chosen. The open-loop model represents the case that has the poorest information. Only P1 knows its decision vertex at the beginning of the game. After that, there is no information about the actions chosen. In fact, the players cannot even remember their own previous actions. Figure 11.30d shows an information model that does not fit into any of the three previous ones. In this model, very strange behavior results. If P1 and P2 initially choose right branches, then the resulting decision vertex is known; however, if P2 instead chooses the left branch, then P1 will forget which action it applied (as if the action of P2 caused P1 to have amnesia!). Here is a single-stage example:

Example 11.26 (An Unusual Information Model) Figure 11.31 shows a game that does not fit any of the information models in Section 10.5.1. It is actually

a variant of the game considered before in Figure 10.12. The game is a kind of hybrid that partly looks like the alternating-play model and partly like the stage- by-stage model. This particular problem can be solved in the usual way, from the bottom up. A value is computed for each of the nondeterministic I-states, for the level in which P2 makes a decision. The left I-state has value 5, which corresponds to P1 choosing 1 and P2 responding with 3. The right I-state has value 4, which

results from the deterministic saddle point in a 2 × 3 matrix game played between

P1 and P2. The overall game has a deterministic saddle point in which P1 chooses 3 and P2 chooses 3. This results in a value of 4 for the game. .

P2 act

Cost 3 3

5 1 −1 7

0 −2 4

Figure 11.31: A single-stage game that has an information model unlike those in Section 10.5.1.

Plans are now defined directly as functions on the I-spaces. A (deterministic) plan for P1 is defined as a function π1 on I1 that yields an action u ∈ U(η1) for

each η1 ∈ I1, and U(η1) is the set of actions that can be inferred from the I-state

η1; assume that this set is the same for all decision vertices in η1. Similarly, a

(deterministic) plan for P2 is defined as a function π2 on I2 that yields an action

v ∈ V (η2) for each η2 ∈ I2.

There are generally two alternative ways to define a randomized plan in terms of I-spaces. The first choice is to define a globally randomized plan, which is a probability distribution over the set of all deterministic plans. During execution, this means that an entire deterministic plan will be sampled in advance according to the probability distribution. An alternative is to sample actions as they are needed at each I-state. This is defined as follows. For the randomized case, let W (η1) and Z(η2) denote the sets of all probability distributions over U(η1) and

V (η2), respectively. A locally randomized plan for P1 is defined as a function that yields some w ∈ W (η1) for each η1 ∈ I1. Likewise, a locally randomized plan for P2

is a function that maps from 2 into Z(η2). Locally randomized plans expressed as functions of I-states are often called behavioral strategies in game theory literature.

I

A randomized saddle point on the space of locally randomized plans does not exist for all sequential games [59]. This is unfortunate because this form of ran- domization seems most natural for the way decisions are made during execution. At least for the stage-by-stage model, a randomized saddle point always exists on the space of locally randomized plans. For the open-loop model, randomized saddle points are only guaranteed to exist using a globally randomized plan (this was actually done in Section 10.5.1). To help understand the problem, suppose that the game tree is a balanced, binary tree with k stages (hence, 2k levels). For each player, there are 2k possible deterministic plans. This means that 2k 1

probability values may be assigned independently (the last one is constrained to force them to sum to 1) to define a globally randomized plan over the space of deterministic plans. Defining a locally randomized plan, there are k I-states for each player, one for each search stage. At each stage, a probability distribution is defined over the action set, which contains only two elements. Thus, each of these distributions has only one independent parameter. A randomized plan is specified in this way using k 1 independent parameters. Since k 1 is much less than 2k 1, there are many globally randomized plans that cannot be expressed as a locally randomized plan. Unfortunately, in some games the locally randomized representation removes the randomized saddle point.

− −

This strange result arises mainly because players can forget information over time. A player with perfect recall remembers its own actions and also never forgets any information that it previously knew. It was shown by Kuhn that the space of all globally randomized plans is equivalent to the space of all locally randomized plans if and only if the players have perfect memory [562]. Thus, by sticking to games in which all players have perfect recall, a randomized saddle point always exists in the space locally randomized plans. The result of Kuhn even holds for the more general case of the existence of randomized Nash equilibria on the space of locally randomized plans.

The nondeterministic I-states can be used in game trees that involve more

players. Accordingly, deterministic, globally randomized, and locally randomized plans can be defined. The result of Kuhn applies to any number of players, which ensures the existence of a randomized Nash equilibrium on the space of locally randomized strategies if (and only if) the players have perfect recall. It is generally preferable to exploit this fact and decompose the game tree into smaller matrix games, as described in Section 10.5.1. It turns out that the precise condition that allows this is that it must be ladder-nested [59]. This means that there are decision vertices, other than the root, at which 1) the player that must make a decision knows it is at that vertex (the nondeterministic I-state is a singleton set), and

  1. the nondeterministic I-state will not leave the subtree rooted at that vertex (vertices outside of the subtree cannot be circled when drawing the game tree). In this case, the game tree can be decomposed at these special decision vertices and replaced with the game value(s). Unfortunately, there is still the nuisance of multiple Nash equilibria.

It may seem odd that nondeterministic I-states were defined without being derived from a history I-space. Without much difficulty, it is possible to define a sensing model that leads to the nondeterministic I-states used in this section. In many cases, the I-state can be expressed using only a subset of the action histories. Let u˜k and v˜k denote the action histories of P1 and P2, respectively. The history I-state for the alternating-play model at stage k is (u˜k−1, v˜k−1) for P1 and (u˜k, v˜k−1) for P2. The history I-state for the stage-by-stage model is (u˜k−1, v˜k−1) for both players. The nondeterministic I-states used in this section can be derived from these histories. For other models, such as the one in Figure 11.31, a sensing model is additionally needed because only partial information regarding some actions appears. This leads into the formulation covered in the next section, which involves both sensing models and a state space.

      1. Information Spaces for Games on State Spaces

I-space concepts can also be incorporated into sequential games that are played over state spaces. The resulting formulation naturally extends Formulation 11.1 of Section 11.1 to multiple players. Rather than starting with two players and generalizing later, the full generality of having n players is assumed up front. The focus in this section is primarily on characterizing I-spaces for such games, rather than solving them. Solution approaches depend heavily on the particular information models; therefore, they will not be covered here.

As in Section 11.7.1, each player has its own frame of reference and therefore its own I-space. The I-state for each player indicates its information regarding a common game state. This is the same state as introduced in Section 10.5; however, each player may have different observations and may not know the actions of others. Therefore, the I-state is different for each decision maker. In the case of perfect state sensing, these I-spaces all collapse to X.

Suppose that there are n players. As presented in Section 10.5, each player has its own action space, U i; however, here it is not allowed to depend on x,

because the state may generally be unknown. It can depend, however, on the I-state. If nature actions may interfere with the state transition equation, then (10.120) is used (if there are two players); otherwise, (10.121) is used, which leads to predictable future states if the actions of all of the players are given. A single nature action, θ Θ(x, u , u , . . . , u ), is used to model the effect of nature across all players when uncertainty in prediction exists.

1 2 n

Any of the sensor models from Section 11.1.1 may be defined in the case of multiple players. Each has its own observation space Y i and sensor mapping hi. For each player, nature may interfere with observations through nature sensing actions, Ψi(x). A state-action sensor mapping appears as yi = hi(x, ψi); state sensor mappings and history-based sensor mappings may also be defined.

Consider how the game appears to a single player at stage k. What information might be available for making a decision? Each player produces the following in

the most general case: 1) an initial condition, ηi ; 2) an action history, u˜i

; and

i 0 k−1

  1. and an observation history, y˜k. It must be specified whether one player knows

the previous actions that have been applied by other players. It might even be

possible for one player to receive the observations of other players. If Pi receives all of this information, its history I-state at stage k is

ηi = (ηi , u˜1

, u˜2

, . . . , u˜n , y˜1, y˜2, ..., y˜n). (11.84)

k 0 k−1

k−1

k−1 k k k

In most situations, however, ηi only includes a subset of the histories from (11.84).

k

A typical situation is

ηi = (ηi , u˜i , y˜i ), (11.85)

k 0 k−1 k

which means that Pi knows only its own actions and observations. Another possi- bility is that all players know all actions that have been applied, but they do not receive the observations of other players. This results in

ηi = (ηi , u˜1

, u˜2

, . . . , u˜n , y˜i ). (11.86)

k 0 k−1

k−1

k−1 k

Of course, many special cases may be defined by generalizing many of the examples in this chapter. For example, an intriguing sensorless game may be defined in which the history I-state consists only of actions. This could yield

ηi = (ηi , u˜1

, u˜2

, . . . , u˜n

), (11.87)

k 0 k−1

k−1

k−1

or even a more secretive game in which the actions of other players are not known:

ηi = (ηi , u˜i

). (11.88)

k 0 k−1

Once the I-state has been decided upon, a history I-space Ihist for each player is

i

defined as the set of all history I-states. In general, I-maps and derived I-spaces

can be defined to yield alternative simplifications of each history I-space.

Assuming all spaces are finite, the concepts given so far can be organized into a sequential game formulation that is the imperfect state information counterpart of Formulation 10.4:

Formulation 11.4 (Sequential Game with I-Spaces)

    1. A set of n players, P1, P2, . . ., Pn.
    2. A nonempty, finite state space X.
    3. For each Pi, a finite action space U i. We also allow a more general definition, in which the set of available choices depends on the history I-state; this can be written as U ii).
    4. A finite nature action space Θ(x, u1, . . . , un) for each x ∈ X, and ui ∈ U i for each i such that 1 ≤ i ≤ m.
    5. A state transition function f that produces a state, f(x, u1, . . . , un, θ), for every x ∈ X, θ ∈ Θ(x, u), and u ∈ U for each i such that 1 ≤ i ≤ n.

i i

    1. For each Pi, a finite observation space Y i.
    2. For each Pi, a finite nature sensing action space Ψi(x) for each x ∈ X.
    3. For each Pi, a sensor mapping hi which produces an observation, y = hi(x, ψi), for each x X and ψi Ψi(x). This definition assumes a state- nature sensor mapping. A state sensor mapping or history-based sensor mapping, as defined in Section 11.1.1, may alternatively be used.

∈ ∈

    1. A set of K stages, each denoted by k, which begins at k = 1 and ends at

k = K. Let F = K + 1.

    1. For each Pi, an initial condition ηi , which is an element of an initial condition

0

space I0.

i

    1. For each Pi, a history I-space Ihist which is the set of all history I-states,

i

formed from action and observation histories, and may include the histories

of other players.

    1. For each Pi, let Li denote a stage-additive cost functional,

K

Li(x˜F , u˜1 , . . . , u˜2 ) = - l(xk, u1 , . . . , un) + lF (xF ). (11.89)

k=1

K

K

k

k

Extensions exist for cases in which one or more of the spaces are continuous; see [59]. It is also not difficult to add goal sets and termination conditions and allow the stages to run indefinitely.

An interesting specialization of Formulation 11.4 is when all players have iden- tical cost functions. This is not equivalent to having a single player because the players have different I-states. For example, a task may be for several robots to search for a treasure, but they have limited communication between them. This results in different I-states. They would all like to cooperate, but they are unable

Figure 11.32: In the Battleship game, each player places several ships on a grid. The other player must guess the locations of ships by asking whether a particular tile is occupied.

to do so without knowing the state. Such problems fall under the subject of team theory [225, 450, 530].

As for the games considered in Formulation 10.4, each player has its own plan. Since the players do not necessarily know the state, the decisions are based on the I-state. The definitions of a deterministic plan, a globally randomized plan, and a locally randomized plan are essentially the same as in Section 11.7.1. The only difference is that more general I-spaces are defined in the current setting. Various kinds of solution concepts, such as saddle points and Nash equilibria, can be defined for the general game in Formulation 11.4. The existence of locally randomized saddle points and Nash equilibria depends on general on the particular information model [59].

Example 11.27 (Battleship Game) Many interesting I-spaces arise from clas- sical board games. A brief illustration is provided here from Battleship, which is a sequential game under the alternating-turn model. Two players, P1 and P2, each having a collection of battleships that it arranges secretly on a 10 10 grid; see Figure 11.32.

×

A state is the specification of the exact location of all ships on each player’s grid. The state space yields the set of all possible ship locations for both players. Each player always knows the location of its own ships. Once they are placed on the grid, they are never allowed to move.

The players take turns guessing a single grid tile, expressed as a row and column, that it suspects contains a ship. The possible observations are “hit” and “miss,” depending on whether a ship was at that location. In each turn, a single

guess is made, and the players continue taking turns until one player has observed a hit for every tile that was occupied by a ship.

This is an interesting game because once a “hit” is discovered, it is clear that a player should search for other hits in the vicinity because there are going to be several contiguous tiles covered by the same ship. The only problem is that the

precise ship position and orientation are unknown. A good player essentially uses the nondeterministic I-state to improve the chances that a hit will occur next. .

Example 11.28 (The Princess and the Monster) This is a classic example from game theory that involves no sensing. A princess and a monster move about in a 2D environment. A simple motion model is assumed; for example, they take single steps on a grid. The princess is trying not to be discovered by the monster, and the game is played in complete darkness. The game ends when the monster and the princess are on the same grid point. There is no form of feedback that can be used during the game; however, it is possible to construct nondeterministic I-states for the players. For most environments, it is impossible for the monster to be guaranteed to win; however, for some environments it is guaranteed to succeed. This example can be considered as a special kind of pursuit-evasion game. A continuous-time pursuit-evasion game that involves I-spaces is covered in Section

12.4. .

Further Reading

The basic concept of an information space can be traced back to work of Kuhn [562] in the context of game trees. There, the nondeterministic I-state is referred to as an infor- mation set. After spreading throughout game theory, the concept was also borrowed into

stochastic control theory (see [95, 564]). The term information space is used extensively

in [59] in the context of sequential and differential game theory. For further reading on

  1. spaces in game theory, see [59, 759]. In artificial intelligence literature, I-states are referred to as belief states and are particularly important in the study of POMDPs; see the literature suggested at the end of Chapter 12. The observability problem in control

theory also results in I-spaces [192, 308, 478, 912], in which observers are used to recon-

struct the current state from the history I-state. In robotics literature, they have been called hyperstates [396] and knowledge states [315]. Concepts closely related to I-spaces also appear as perceptual equivalence classes in [287] and also appear in the information

invariants framework of Donald [286]. I-spaces were proposed as a general way to repre-

sent planning under sensing uncertainty in [68, 604, 605]. For further reading on sensors

in general, see [352].

The Kalman filter is covered in great detail in numerous other texts; see for example, [226, 564, 912]. The original reference is [500]. For more on particle filters, see [45, 293,

350, 536].

Exercises

    1. Forward projections in Indet:
      1. Starting from a nondeterministic I-state, Xk(ηk), and applying an action uk, derive an expression for the nondeterministic one-stage forward projection by extending the presentation in Section 10.1.2.
      2. Determine an expression for the two-stage forward projection starting from

Xk(ηk) and applying uk and uk+1.

    1. Forward projections in Iprob:
      1. Starting from a probabilistic I-state, P (xk|ηk), and applying an action uk, derive an expression for the probabilistic one-stage forward projection.
      2. Determine an expression for the two-stage forward projection starting from

P (xk|ηk) and applying uk and uk+1.

    1. Determine the strong and weak backprojections on Ihist for a given history I-state,

ηk. These should give sets of possible ηk−1 ∈ Ihist.

    1. At the end of Section 11.3.2, it was mentioned that an equivalent DFA can be constructed from an NFA.
      1. Give an explicit DFA that accepts the same set of strings as the NFA in Figure 11.8b.
      2. Express the problem of determining whether the NFA in Figure 11.8b accepts any strings as a planning problem using Formulation 2.1.
    2. This problem involves computing probabilistic I-states for Example 11.14. Let the initial I-state be

P (x1) = [1/3 1/3 1/3], (11.90)

in which the ith entry in the vector indicates P (x1 = i + 1). Let U = {0, 1}. For each action, a state transition matrix can be specified, which gives the probabilities

P (xk+1|xk, uk). For u = 0, let P (xk+1|xk, uk = 0) be

4/5 1/5 0

1/10 4/5 1/10

. (11.91)

 0 1/5 4/5 

The jth entry of the ith row yields P (xk+1 = i | xk = j, uk = 0). For u = 1, let

P (xk+1 | xk, uk = 1) be

 

1/10 5/5 1/10

0 1/5 4/5 . (11.92)

 0 0 1 

The sensing model is specified by three vectors:

P (yk|xk = 0) = [4/5 1/5], (11.93)

P (yk|xk = 1) = [1/2 1/2], (11.94)

X

Y

      1. (b)

Figure 11.33: (a) A topological graph in which a point moves (note that two vertices are vertically aligned). (b) An exercise that is a variant of Example 11.17.

xG

and

P (yk|xk = 2) = [1/5 4/5], (11.95)

in which the ith component yields P (yk = i | xk). Suppose that k = 3 and the history I-state obtained so far is

(η0, u1, u2, y1, y2, y3) = (η0, 1, 0, 1, 0, 0). (11.96)

The task is to compute the probabilistic I-state. Starting from P (x1), compute the following distributions: P (x1|η1), P (x2|η1, u1), P (x2|η2), P (x3|η2, u2), P (x3|η3).

    1. Explain why it is not possible to reach every nondeterministic I-state from every other one for Example 11.7. Give an example of a nondeterministic I-state that cannot be reached from the initial I-state. Completely characterize the reachability of nondeterministic I-states from all possible initial conditions.
    2. In the same spirit as Example 11.21, consider a point moving on the topological graph shown in Figure 11.33. Fully characterize the connectivity of Indet (you

may exploit symmetries to simplify the answer).

    1. Design an I-map for Example 11.17 that is not necessarily sufficient but leads to a solution plan defined over only three derived I-states.
    2. Consider the discrete problem in Figure 11.33b, using the same sensing and motion model as in Example 11.17.
      1. Develop a sufficient I-map and a solution plan that uses as few derived I- states as possible.
      2. Develop an I-map that is not necessarily sufficient, and a solution plan that uses as few derived I-states as possible.
    3. Suppose that there are two I-maps, κ1 : I1 → I2 and κ2 : I2 → I3, and it is given that κ1 is sufficient with respect to I1, and κ2 is sufficient with respect to I2. Determine whether the I-map κ2 ◦ κ1 is sufficient with respect to I1, and prove

your claim.

    1. Propose a solution to Example 11.16 that uses fewer nondeterministic I-states.
    2. Suppose that a point robot moves in R2 and receives observations from three homing beacons that are not collinear and originate from known locations. Assume that the robot can calibrate the three observations on S1.
      1. Prove that the robot can always recover its position in R2.
      2. What can the robot infer if there are only two beacons?
    3. Nondeterministic I-state problems:
      1. Prove that the nondeterministic I-states for Example 11.23 are always a single connected region whose boundary is composed only of circular arcs and line segments.
      2. Design an algorithm for efficiently computing the nondeterministic I-states from stage to stage.
    4. Design an algorithm that takes as input a simply connected rectilinear region (i.e., described by a polygon that has all right angles) and a goal state, and designs a sequence of tray tilts that guarantees the ball will come to rest at the goal. Example 11.24 provides an illustration.
    5. Extend the game-theoretic formulation from Section 11.7.2 of history I-spaces to continuous time.
    6. Consider the “where did I come from?” problem.
      1. Derive an expression for X1(ηk).
      2. Derive an expression for P (x1|ηk).
    7. In the game of Example 11.27, could there exist a point in the game at which one player has not yet observed every possible “hit” yet it knows the state of the game (i.e., the exact location of all ships)? Explain.
    8. When playing blackjack in casinos, many card-counting strategies involve remem- bering simple statistics of the cards, rather than the entire history of cards seen

so far. Define a game of blackjack and card counting as an example of history I-states and an I-map that dramatically reduces the size of the I-space, and an information-feedback plan.

Implementations

    1. Implement the Kalman filter for the case of a robot moving in the plane. Show the confidence ellipsoids obtained during execution. Be careful of numerical issues (see [564]).
    2. Implement probabilistic I-state computations for a point robot moving in a 2D polygonal environment. Compare the efficiency and accuracy of grid-based ap- proximations to particle filtering.
    3. Design and implement an algorithm that uses nondeterministic I-states to play a good game of Battleship, as explained in Example 11.27.

results matching ""

    No results matching ""