A Game Against Nature

- 1. Modeling Nature

For the ﬁrst time in this book, uncertainty will be directly modeled. There are two DMs:

Robot: This is the name given to the primary DM throughout the book. So far, there has been only one DM. Now that there are two, the name is

more important because it will be used to distinguish the DMs from each other.

Nature: This DM is a mysterious force that is unpredictable to the robot. It has its own set of actions, and it can choose them in a way that interferes with the achievements of the robot. Nature can be considered as a synthetic DM that is constructed for the purposes of modeling uncertainty in the decision-making or planning process.

Imagine that the robot and nature each make a decision. Each has a set of actions to choose from. Suppose that the cost depends on which actions are chosen by each. The cost still represents the eﬀect of the outcome on the robot; however, the robot must now take into account the inﬂuence of nature on the cost. Since nature is unpredictable, the robot must formulate a model of its behavior. Assume that the robot has a set, U, of actions, as before. It is now assumed that nature also has a set of actions. This is referred to as the nature action space and is denoted by Θ. A nature action is denoted as θ Θ. It now seems appropriate to call U the robot action space; however, for convenience, it will often be referred to as the action space, in which the robot is implied.

∈

This leads to the following formulation, which extends Formulation 9.1.

Formulation 9.3 (A Game Against Nature)

- - 1. A nonempty set U called the (robot) action space. Each u U is referred to as an action.

∈

- - 1. A nonempty set Θ called the nature action space. Each θ Θ is referred to as a nature action.

∈

- - 1. A function L : U × Θ → R ∪ {∞}, called the cost function.

The cost function, L, now depends on u ∈ U and θ ∈ Θ. If U and Θ are ﬁnite, then it is convenient to specify L as a |U| × |Θ| matrix called the cost matrix.

Example 9.8 (A Simple Game Against Nature) Suppose that U and Θ each contain three actions. This results in nine possible outcomes, which can be speci- ﬁed by the following cost matrix:

1	−1	0
−1	2	−2
2	−1	1

The robot action, u U, selects a row, and the nature action, θ Θ, selects a column. The resulting cost, L(u, θ), is given by the corresponding matrix entry. ．

∈ ∈

In Formulation 9.3, it appears that both DMs act at the same time; nature does not know the robot action before deciding. In many contexts, nature may know the robot action. In this case, a diﬀerent nature action space can be deﬁned

for every u ∈ U. This generalizes Formulation 9.3 to obtain:

Formulation 9.4 (Nature Knows the Robot Action)

A nonempty set U called the action space. Each u U is referred to as an

∈

action.

For each u ∈ U, a nonempty set Θ(u) called the nature action space.
A function L : U × Θ → R ∪ {∞}, called the cost function.

If the robot chooses an action u ∈ U, then nature chooses from Θ(u).

- 1. Nondeterministic vs. Probabilistic Models

What is the best decision for the robot, given that it is engaged in a game against nature? This depends on what information the robot has regarding how nature chooses its actions. It will always be assumed that the robot does not know the precise nature action to be chosen; otherwise, it is pointless to deﬁne nature. Two alternative models that the robot can use for nature will be considered. From the robot’s perspective, the possible models are

Nondeterministic: I have no idea what nature will do.

Probabilistic: I have been observing nature and gathering statistics.

Under both models, it is assumed that the robot knows Θ in Formulation 9.3 or Θ(u) for all u U in Formulation 9.4. The nondeterministic and probabilistic terminology are borrowed from Erdmann [313]. In some literature, the term pos- sibilistic is used instead of nondeterministic. This is an excellent term, but it is unfortunately too similar to probabilistic in English.

∈

Assume ﬁrst that Formulation 9.3 is used and that U and Θ are ﬁnite. Under the nondeterministic model, there is no additional information. One reasonable approach in this case is to make a decision by assuming the worst. It can even be imagined that nature knows what action the robot will take, and it will spitefully choose a nature action that drives the cost as high as possible. This pessimistic view is sometimes humorously referred to as Murphy’s Law (“If anything can go wrong, it will.”) [111] or Sod’s Law. In this case, the best action, u∗ U, is selected as

∈

u∗ = argmin J max JL(u, θ)＼＼. (9.14)

u∈U

θ∈Θ

The action u∗ is the lowest cost choice using worst-case analysis. This is sometimes

referred to as a minimax solution because of the min and max in (9.14). If U or

Θ is inﬁnite, then the min or max may not exist and should be replaced by inf or sup, respectively.

Worst-case analysis may seem too pessimistic in some applications. Perhaps the assumption that all actions in Θ are equally likely may be preferable. This can be handled as a special case of the probabilistic model, which is described next.

Under the probabilistic model, it is assumed that the robot has gathered enough data to reliably estimate P(θ) (or p(θ) if Θ is continuous). In this case, it is imagined that nature applies a randomized strategy, as deﬁned in Section 9.1.3. It assumed that the applied nature actions have been observed over many trials, and in the future they will continue to be chosen in the same manner, as predicted by the distribution P(θ). Instead of worst-case analysis, expected-case analysis is used. This optimizes the average cost to be received over numerous independent

trials. In this case, the best action, u∗ ∈ U, is

u∗ = argmin JEθ 「L(u, θ)l＼, (9.15)

u∈U

in which Eθ indicates that the expectation is taken according to the probability distribution (or density) over θ. Since Θ and P(θ) together form a probability space, L(u, θ) can be considered as a random variable for each value of u (it assigns a real value to each element of the sample space).3 Using P(θ), the expectation in (9.15) can be expressed as

Eθ [L(u, θ)] = L(u, θ)P(θ). (9.16)

θ∈Θ

Example 9.9 (Nondeterministic vs. Probabilistic) Return to Example 9.8. Let U = u1, u2, u3 represent the robot actions, and let Θ = θ1, θ2, θ3 represent the nature actions.

{ } { }

Under the nondeterministic model of nature, u∗ = u1, which results in L(u∗, θ) = 1 in the worst case using (9.14). Under the probabilistic model, let P(θ1) = 1/5, P(θ2) = 1/5, and P(θ3) = 3/5. To ﬁnd the optimal action, (9.15) can be used. This involves computing the expected cost for each action:

Eθ [L(u1, θ)] = (1)1/5 + (−1)1/5 + (0)3/5 = 0

Eθ [L(u2, θ)] = (−1)1/5 + (2)1/5 + (−2)3/5 = −1

Eθ [L(u3, θ)] = (2)1/5 + (−1)1/5 + (1)3/5 = 4/5.

The best action is u∗ = u2, which produces the lowest expected cost, −1.

(9.17)

If the probability distribution had instead been P = [1/10 4/5 1/10], then u∗ = u1 would have been obtained. Hence the best decision depends on P(θ); if this information is statistically valid, then it enables more informed decisions to be made. If such information is not available, then the nondeterministic model may be more suitable.

3Alternatively, a random variable may be defined over U Θ, and conditional expectation would be taken, in which u is given.

It is possible, however, to assign P(θ) as a uniform distribution in the absence

of data. This means that all nature actions are equally likely; however, conclusions based on this are dangerous; see Section 9.5. ．

In Formulation 9.4, the nature action space Θ(u) depends on u U, the robot action. Under the nondeterministic model, (9.14) simply becomes

∈

u∗ = argmin J max L(u, θ)＼. (9.18)

u∈U

θ∈Θ(u)

u∈U

θ∈Θ(u)

Unfortunately, these problems do not have a nice matrix representation because the size of Θ(u) can vary for diﬀerent u U. In the probabilistic case, P(θ) is replaced by a conditional probability distribution P(θ u). Estimating this distri- bution requires observing numerous independent trials for each possible u U. The behavior of nature can now depend on the robot action; however, nature is still characterized by a randomized strategy. It does not adapt its strategy across multiple trials. The expectation in (9.16) now becomes

∈

Eθ L(u, θ) =

「 l -

θ∈Θ(u)

L(u, θ)P(θ|u), (9.19)

which replaces P(θ) by P(θ|u).

Regret It is important to note that the models presented here are not the only accepted ways to make good decisions. In game theory, the key idea is to minimize “regret.” This is the feeling you get after making a bad decision and wishing that you could change it after the game is ﬁnished. Suppose that after you choose some u U, you are told which θ Θ was applied by nature. The regret is the amount of cost that you could have saved by picking a diﬀerent action, given the nature action that was applied.

∈ ∈

For each combination of u ∈ U and θ ∈ Θ, the regret, T , is deﬁned as

T (u, θ) = max L(u, θ) L(u′, θ) . (9.20)

J − ＼

u′∈U

For Formulation 9.3, if U and Θ are ﬁnite, then a Θ U regret matrix can be deﬁned.

| | × | |

Suppose that minimizing regret is the primary concern, as opposed to the actual cost received. Under the nondeterministic model, the action that minimizes the worst-case regret is

u∗ = argmin J max JT (u, θ)＼＼. (9.21)

u∈U

θ∈Θ

In the probabilistic model, the action that minimizes the expected regret is

u∗ = argmin JEθ 「T (u, θ)l＼. (9.22)

u∈U

The only diﬀerence with respect to (9.14) and (9.15) is that L has been replaced by T . In Section 9.3.2, regret will be discussed in more detail because it forms the basis of optimality concepts in game theory.

Example 9.10 (Regret Matrix) The regret matrix for Example 9.8 is

2	0	2
0	3	0
3	0	3

Using the nondeterministic model, u∗ = u1, which results in a worst-case regret of 2 using (9.21). Under the probabilistic model, let P(θ1) = P(θ2) = P(θ3) = 1/3. In this case, u∗ = u1, which yields the optimal expected regret, calculated as 1 using (9.22).

- 1. Making Use of Observations

Formulations 9.3 and 9.4 do not allow the robot to receive any information (other than L) prior to making its decision. Now suppose that the robot has a sensor that it can check just prior to choosing the best action. This sensor provides an obser- vation or measurement that contains information about which nature action might be chosen. In some contexts, the nature action can be imagined as a kind of state that has already been selected. The observation then provides information about this. For example, nature might select the current temperature in Bangkok. An observation could correspond to a thermometer in Bangkok that takes a reading.

Formulating the problem Let Y denote the observation space, which is the set of all possible observations, y Y . For convenience, suppose that Y , U, and Θ are all discrete. It will be assumed as part of the model that some constraints on θ are known once y is given. Under the nondeterministic model a set Y (θ) Y is speciﬁed for every θ Θ. The set Y (θ) indicates the possible observations, given that the nature action is θ. Under the probabilistic model a conditional probability distribution, P(y θ), is speciﬁed. Examples of sensing models will be given in Section 9.2.4. Many others appear in Sections 11.1.1 and 11.5.1, although they are expressed with respect to a state space X that reduces to Θ in this section. As before, the probabilistic case also requires a prior distribution, P(Θ), to be given. This results in the following formulation.

∈

⊆

Formulation 9.5 (A Game Against Nature with an Observation)

- - 1. A ﬁnite, nonempty set U called the action space. Each u U is referred to as an action.

∈

- - 1. A ﬁnite, nonempty set Θ called the nature action space.
    2. A ﬁnite, nonempty set Y called the observation space.
    3. A set Y (θ) Y or probability distribution P(y θ) speciﬁed for every θ Θ. This indicates which observations are possible or probable, respectively, if θ is the nature action. In the probabilistic case a prior, P(θ), must also be speciﬁed.

⊆ | ∈

- - 1. A function L : U × Θ → R ∪ {∞}, called the cost function.

Consider solving Formulation 9.5. A strategy is now more complicated than simply specifying an action because we want to completely characterize the be- havior of the robot before the observation has been received. This is accomplished by deﬁning a strategy as a function, π : Y U. For each possible observation, y Y , the strategy provides an action. We now want to search the space of possible strategies to ﬁnd the one that makes the best decisions over all possible observations. In this section, Y is actually a special case of an information space, which is the main topic of Chapters 11 and 12. Eventually, a strategy (or plan) will be conditioned on an information state, which generalizes an observation.

∈

→

Optimal strategies Now consider ﬁnding the optimal strategy, denoted by π∗, under the nondeterministic model. The sets Y (θ) for each θ Θ must be used to determine which nature actions are possible for each observation, y Y . Let Θ(y) denote this, which is obtained as

∈

Θ(y) = {θ ∈ Θ | y ∈ Y (θ)}. (9.23)

The optimal strategy, π∗, is deﬁned by setting

π∗(y) = argmin J max JL(u, θ)＼＼, (9.24)

u∈U

θ∈Θ(y)

u∈U

θ∈Θ(y)

for each y Y . Compare this to (9.14), in which the maximum was taken over all Θ. The advantage of having the observation, y, is that the set is restricted to Θ(y) Θ.

∈

⊆

Under the probabilistic model, an operation analogous to (9.23) must be per- formed. This involves computing P(θ y) from P(y θ) to determine the information that y contains regarding θ. Using Bayes’ rule, (9.9), with marginalization on the denominator, the result is

| |

P(θ|y) = P(y|θ)P(θ) . (9.25)

P(y θ)P(θ)

θ∈Θ

To see the connection between the nondeterministic and probabilistic cases, deﬁne a probability distribution, P(y θ), that is nonzero only if y Y (θ) and use a uniform distribution for P(θ). In this case, (9.25) assigns nonzero probability to precisely the elements of Θ(y) as given in (9.23). Thus, (9.25) is just the

| ∈

probabilistic version of (9.23). The optimal strategy, π∗, is speciﬁed for each

y ∈ Y as

π∗(y) = argmin JEθ 「L(u, θ) III yl＼ = argmin （- L(u, θ)P(θ|y)_ . (9.26)

u∈U

θ∈Θ

u∈U

θ∈Θ

This diﬀers from (9.15) and (9.16) by replacing P(θ) with P(θ y). For each u, the expectation in (9.26) is called the conditional Bayes’ risk. The optimal strategy, π∗, always selects the strategy that minimizes this risk. Note that P(θ y) in (9.26) can be expressed using (9.25), for which the denominator (9.26) represents P(y) and does not depend on u; therefore, it does not aﬀect the optimization. Due to this, P(y θ)P(θ) can be used in the place of P(θ y) in (9.26), and the same π∗ will be obtained. If the spaces are continuous, then probability densities are used in the place of all probability distributions, and the method otherwise remains the same.

| |

Nature acts twice A convenient, alternative formulation can be given by al- lowing nature to act twice:

First, a nature action, θ ∈ Θ, is chosen but is unknown to the robot.
Following this, a nature observation action is chosen to interfere with the robot’s ability to sense θ.

Let ψ denote a nature observation action, which is chosen from a nature observation action space, Ψ(θ). A sensor mapping, h, can now be deﬁned that yields y = h(θ, ψ) for each θ Θ and ψ Ψ(θ). Thus, for each of the two kinds of nature actions, θ Θ and ψ Ψ, an observation, y = h(θ, ψ), is given. This yields an alternative way to express Formulation 9.5:

∈ ∈

Formulation 9.6 (Nature Interferes with the Observation)

A nonempty, ﬁnite set U called the action space.
A nonempty, ﬁnite set Θ called the nature action space.
A nonempty, ﬁnite set Y called the observation space.
For each θ Θ, a nonempty set Ψ(θ) called the nature observation action space.

∈

A sensor mapping h : Θ × Ψ → Y .
A function L : U × Θ → R ∪ {∞} called the cost function.

This nicely uniﬁes the nondeterministic and probabilistic models with a single function h. To express a nondeterministic model, it is assumed that any ψ Ψ(θ) is possible. Using h,

∈

Θ(y) = {θ ∈ Θ | ∃ψ ∈ Ψ(θ) such that y = h(θ, ψ)}. (9.27)

For a probabilistic model, a distribution P(ψ θ) is speciﬁed (often, this may reduce to P(ψ)). Suppose that when the domain of h is restricted to some θ Θ, then it forms an injective mapping from Ψ to Y . In other words, every nature observation action leads to a unique observation, assuming θ is ﬁxed. Using P(ψ) and h, P(y θ) is derived as

∈

P(y θ) = P(ψ θ) for the unique ψ such that y = h(θ, ψ).

| （ |

0 if no such ψ exists.

(9.28)

If the injective assumption is lifted, then P(ψ θ) is replaced by a sum over all ψ for which y = h(θ, ψ). In Formulation 9.6, the only diﬀerence between the nonde- terministic and probabilistic models is the characterization of ψ, which represents a kind of measurement interference. A strategy still takes the form π : Θ U. A hybrid model is even possible in which one nature action is modeled nondeter- ministically and the other probabilistically.

→

Receiving multiple observations Another extension of Formulation 9.5 is to allow multiple observations, y1, y2, . . ., yn, before making a decision. Each yi is assumed to belong to an observation space, Yi. A strategy, π, now depends on all observations:

π : Y1 × Y2 × · · · × Yn → U. (9.29)

Under the nondeterministic model, Yi(θ) is speciﬁed for each i and θ Θ. The set Θ(y) is replaced by

∈

Θ(y1) ∩ Θ(y2) ∩ · · · ∩ Θ(yn) (9.30) in (9.24) to obtain the optimal action, π∗(y1, . . . , yn).

Under the probabilistic model, P(yi θ) is speciﬁed instead. It is often assumed

that the observations are conditionally independent given θ. This means for any yi, θ, and yj such that i /= j, P(yi|θ, yj ) = P(yi|θ). The condition P(θ|y) in (9.26) is replaced by P(θ|y1, . . . , yn). Applying Bayes’ rule, and using the conditional

independence of the yi’s given θ, yields

P(θ|y , . . . , y

P(y θ)P(y θ) P(y θ)P(θ)

) = . (9.31)

1| 2| · · · n|

1 n

P(y1, . . . , yn)

The denominator can be treated as a constant factor that does not aﬀect the optimization. Therefore, it does not need to be explicitly computed unless the optimal expected cost is needed in addition to the optimal action.

Conditional independence allows a dramatic simpliﬁcation that avoids the full speciﬁcation of P(y θ). Sometimes the conditional independence assumption is used when it is incorrect, just to exploit this simpliﬁcation. Therefore, a method that uses conditional independence of observations is often called naive Bayes.

- 1. Examples of Optimal Decision Making

The framework presented so far characterizes statistical decision theory, which covers a broad range of applications and research issues. Virtually any context in which a decision must be made automatically, by a machine or a person following speciﬁed rules, is a candidate for using these concepts. In Chapters 10 through 12, this decision problem will be repeatedly embedded into complicated planning problems. Planning will be viewed as a sequential decision-making process that iteratively modiﬁes states in a state space. Most often, each decision step will be simpler than what usually arises in common applications of decision theory. This is because planning problems are complicated by many other factors. If the decision step in a particular application is already too hard to solve, then an extension to planning appears hopeless.

It is nevertheless important to recognize the challenges in general that arise when modeling and solving decision problems under the framework of this section. Some examples are presented here to help illustrate its enormous power and scope.

- - 1. Pattern classification

An active ﬁeld over the past several decades in computer vision and machine learn- ing has been pattern classification [271, 295, 711]. The general problem involves using a set of data to perform classiﬁcations. For example, in computer vision, the data correspond to information extracted from an image. These indicate observed features of an object that are used by a vision system to try to classify the object (e.g., “I am looking at a bowl of Vietnamese noodle soup”).

The presentation here represents a highly idealized version of pattern classi- ﬁcation. We will assume that all of the appropriate model details, including the required probability distributions, are available. In some contexts, these can be ob- tained by gathering statistics over large data sets. In many applications, however, obtaining such data is expensive or inaccessible, and classiﬁcation techniques must be developed in lieu of good information. Some problems are even unsupervised, which means that the set of possible classes must also be discovered automatically. Due to issues such as these, pattern classiﬁcation remains a challenging research ﬁeld.

The general model is that nature ﬁrst determines the class, then observations are obtained regarding the class, and ﬁnally the robot action attempts to guess the correct class based on the observations. The problem ﬁts under Formulation

9.5. Let Θ denote a ﬁnite set of classes. Since the robot must guess the class,

U = Θ. A simple cost function is deﬁned to measure the mismatch between u and

θ:

（

L(u, θ) = 0 if u = θ (correct classiﬁcation

1 if u /= θ (incorrect classiﬁcation) .

(9.32)

The nondeterministic model yields a cost of 1 if it is possible that a classiﬁcation error can be made using action u. Under the probabilistic model, the expectation

of (9.32) gives the probability that a classiﬁcation error will be made given an action u.

The next part of the formulation considers information that is used to make the classiﬁcation decision. Let Y denote a feature space, in which each y Y is called a feature or feature vector (often y Rn). The feature in this context is just an observation, as given in Formulation 9.5. The best classifier or classification rule is a strategy π : Y U that provides the smallest classiﬁcation error in the worst case or expected case, depending on the model.

∈

→

A Bayesian classifier The probabilistic approach is most common in pattern classiﬁcation. This results in a Bayesian classifier. Here it is assumed that P(y θ) and P(θ) are given. The distribution of features for a given class is indicated by P(y θ). The overall frequency of class occurrences is given by P(θ). If large, pre- classiﬁed data sets are available, then these distributions can be reliably learned. The feature space is often continuous, which results in a density p(y θ), even though P(θ) remains a discrete probability distribution. An optimal classiﬁer, π∗, is designed according to (9.26). It performs classiﬁcation by receiving a feature vector, y, and then declaring that the class is u = π∗(y). The expected cost using (9.32) is the probability of error.

Example 9.11 (Optical Character Recognition) An example of classiﬁcation is given by a simpliﬁed optical character recognition (OCR) problem. Suppose that a camera creates a digital image of a page of text. Segmentation is ﬁrst performed to determine the location of each letter. Following this, the individual letters must be classiﬁed correctly. Let Θ = A, B, C, D, E, F, G, H , which would ordinarily include all of the letters of the alphabet.

{ }

Suppose that there are three diﬀerent image processing algorithms:

Shape extractor: This returns s = 0 if the letter is composed of straight edges only, and s = 1 if it contains at least one curve.

End counter: This returns e, the number of segment ends. For example,

O has none and X has four.

Hole counter: This returns h, the number of holes enclosed by the charac- ter. For example, X has none and O has one.

The feature vector is y = (s, e, h). The values that should be reported under ideal conditions are shown in Figure 9.1. These indicate Θ(s), Θ(e), and Θ(h). The intersection of these yields Θ(y) for any combination of s, e, and h.

Imagine doing classiﬁcation under the nondeterministic model, with the as- sumption that the features always provide correct information. For y = (0, 2, 1), the only possible letter is A. For y = (1, 0, 2), the only letter is B. If each (s, e, h) is consistent with only one or no letters, then a perfect classiﬁer can be constructed. Unfortunately, (0, 3, 0) is consistent with both E and F. In the worst case, the cost of using (9.32) is 1.

Shape	0	A E F H
	1	B C D G
Ends	0	B D
	1
	2	A C G
	3	F E
	4	H
Holes	0	C E F G H
	1	A D
	2	B

Figure 9.1: A mapping from letters to feature values.

One way to ﬁx this is to introduce a new feature. Suppose that an image pro- cessing algorithm is used to detect corners. These are places at which two segments meet at a right (90 degrees) angle. Let c denote the number of corners, and let the new feature vector be y = (s, e, h, c). The new algorithm nicely distinguishes E from F, for which c = 2 and c = 1, respectively. Now all letters can be correctly classiﬁed without errors.

Of course, in practice, the image processing algorithms occasionally make mis- takes. A Bayesian classiﬁer can be designed to maximize the probability of success. Assume conditional independence of the observations, which means that the classi- ﬁer can be considered naive. Suppose that the four image processing algorithms are run over a training data set and the results are recorded. In each case, the correct classiﬁcation is determined by hand to obtain probabilities P(s θ), P(e θ), P(h θ), and P(c θ). For example, suppose that the hole counter receives the letter A as input. After running the algorithm over many occurrences of A in text, it may be determined that P(h = 1 θ = A) = 0.9, which is the correct answer. With smaller probabilities, perhaps P(h = 0 θ = A) = 0.09 and P(h = 2 θ = A) = 0.01. As- suming that the output of each image processing algorithm is independent given the input letter, a joint probability can be assigned as

| |

| | |

P(y|θ) = P(s, e, h, c| θ) = P(s|θ)P(e|θ)P(h|θ)P(c|θ). (9.33)

The value of the prior P(θ) can be obtained by running the classiﬁer over large amounts of hand-classiﬁed text and recording the relative numbers of occurrences of each letter. It is interesting to note that some context-speciﬁc information can be incorporated. If the text is known to be written in Spanish, then P(θ) should be diﬀerent than from text written in English. Tailoring P(θ) to the type of text that will appear improves the performance of the resulting classiﬁer.

The classiﬁer makes its decisions by choosing the action that minimizes the probability of error. This error is proportional to

P(s|θ)P(e|θ)P(h|θ)P(c|θ)P(θ), (9.34)

θ∈Θ

by neglecting the constant P(y) in the denominator of Bayes’ rule in (9.26). ．

- - 1. Parameter estimation

Another important application of the decision-making framework of this section is parameter estimation [89, 268]. In this case, nature selects a parameter, θ Θ, and Θ represents a parameter space. Through one or more independent trials, some observations are obtained. Each observation should ideally be a direct measure- ment of Θ, but imperfections in the measurement process distort the observation. Usually, Θ Y , and in many cases, Y = Θ. The robot action is to guess the

∈

⊆

parameter that was chosen by nature. Hence, U = Θ. In most applications, all of the spaces are continuous subsets of Rn. The cost function is designed to increase

as the error, Iu − θI, becomes larger.

Example 9.12 (Parameter Estimation) Suppose that U = Y = Θ = R. Na- ture therefore chooses a real-valued parameter, which is estimated. The cost of making a mistake is

L(u, θ) = (u − θ) . (9.35)

Suppose that a Bayesian approach is taken. The prior probability density p(θ) is given as uniform over an interval [a, b] R. An observation is received, but it is noisy. The noise can be modeled as a second action of nature, as described in Section 9.2.3. This leads to a density p(y θ). Suppose that the noise is modeled with a Gaussian, which results in

⊂

p(y|θ) = √2πσ2

e−(y−θ)2/_2σ_2 , (9.36)

in which the mean is θ and the standard deviation is σ.

The optimal parameter estimate based on y is obtained by selecting u R to minimize

∈

in which

∞ L(u, θ)p(θ|y)dθ, (9.37)

−∞

p(y θ)p(θ)

p(θ|y) = p(y) , (9.38)

by Bayes’ rule. The term p(y) does not depend on θ, and it can therefore be ignored in the optimization. Using the prior density, p(θ) = 0 outside of [a, b]; hence, the domain of integration can be restricted to [a, b]. The value of p(θ) = 1/(b a) is also a constant that can be ignored in the optimization. Using (9.36), this means that u is selected to optimize

−

L(u, θ)p(y θ)dθ, (9.39)

which can be expressed in terms of the standard error function, erf(x) (the integral from 0 to a constant, of a Gaussian density over an interval).

If a sequence, y1, . . ., yk, of independent observations is obtained, then (9.39) is replaced by

L(u, θ)p(y1|θ) · · · p(yk|θ)dθ. (9.40)

．

A Game Against Nature

A Game Against Nature

results matching ""

No results matching ""