Friday, June 27, 2008

Bayesian Thinking

Since this winter, I haven't had a lot of time to work on ChessVortex. But today I've revisited, for the umpteen-thousandnth time, the wikipedia page for Bayesian Inference. Since I will probably use Bayesian Inference for some ChessVortex studies I may contrive (and since it is a cornerstone of the Glicko rating system), I decided it might make a good topic for a blog entry.

Now, every time I work with Bayes's theorem, I have to re-read the entire introduction of the wikipedia article to help me remember exactly what Bayes's theorem means. Today, I invent a mnemonic that I hope will stick. First, the theorem:



This, of course, is the mathematical formula. Let's forget the formula for now and instead think about what it describes in terms of natural language, avoiding numbers, math, and equations at all costs (for a while, at least). For this exercise, I will use an example.

First, let's say we decided to take a hike through the desert. We know water is scarce (our source for Hydration out in the desert). We'll give a name to this knowledge: "prior knowledge". In other words, our prior knowledge is that, at any arbitrary time while walking through the desert, we know that we have a low chance (or Probability) to observe a source of Hydration. To put it yet another way, we can say that our Probability to observe Hydration in the desert is low.

Second, we know that Elephants, like sources of Hydration, can also be found in the desert, but we don't expect to see them very often. That is, the Probability to observe Elephants in the desert is low. This is also prior knowledge.

Third, lets imagine some other prior knowledge we have, namely that, if we observe a source of Hydration in the desert, we know that an Elephant must not be far away. We can re-phrase this by saying that the Probability to observe a source of Hydration increases given the observation of an Elephant.

Now, we can use all of this prior knowledge while in the desert and on the quest for some water. Namely, Bayes's Theorem says that we know to look hard for a source of Hydration when we see an Elephant. Why? Because (1) we know Hydration is scarce, (2) we know Elephants are even more scarce, but (3) we know that Hydration means Elephants. Since Hydration means Elephants, then conversely we can conclude that Elephants mean Hydration. That's Bayes theorem.

For completeness, let's turn it into the formula above:
  • Probability of Hydration: P(H)
  • Probability of Elephants: P(E)
  • Probability of Elephants given Hydration: P(E|H)
  • Probability of Hydration given Elephants: P(H|E)
Here is Bayes's theorem again:



One way to think about equations is to examine what will happen if you change values of the terms in them. For example, if the number of Elephants in the desert decreases, the the probability of seeing Elephants, P(E), decreases. If P(E) decreases, P(H|E) must increase if all other terms stay the same. In other words, as the probability of seeing Elephants decreases, the probability of finding Hydration given the observation of an Elephant, P(H|E), must increase. Such analysis can give equations like Bayes's Theorem an intuitive feeling.

Now, if we know the numbers for P(H), P(E), and P(E|H), then we can plug those in and get P(H|E). For example, let's say we expect only a 1% (P(H)=0.01) chance of seeing Hydration and a 0.5% (P(E)=0.005) chance of seeing Elephants at any given time in the desert. Let's also assume that, 40% (P(E|H)=0.40) of the sources of Hydration in a desert also have an Elephant near, then the probability that a source of Hydration is near when we observe an Elephant is

P(H|E) = P(E|H)*P(H)/P(E) = 0.40*0.01/0.005 = 0.8 = 80%

So, when we see an Elephant out in the desert, our expectation that Hydration is nearby jumps to a whopping 80%. This makes sense, doesn't it? Its like saying that we expect to see a restaurant near a hotel or fire in proximity to smoke. Bayes theorem puts numbers to all of these types of intuitive guesses we make every day.

Advanced Bayes: The Negative Test

Now we are going to rely a tiny bit more on math, so this is the Advanced section.

To make Bayes's theorem more useful, we need to know how to use it when we get a negative test. First, let's define our Elephant test as "looking around for an Elephant", which we do every time we are thirsty. Let's imagine that we get thirsty and then look around for an Elephant but we don't see one, resulting in a negative test. How can we combine the results of this test with our prior knowledge to estimate our chances for finding Hydration nearby?

Before we do this, we need to be able to define probabilities in terms of their complements. For example, the complement to one's seeing a source of Hydration is one's not seeing a source of Hydration. In terms of probabilities, the complement probability is one minus the probability. So, the Probability to not observe Hydration in the desert (denoted P(H')) is one minus the probability to observe Hydration in the desert:

P(H') = 1 - P(H) = 1 - 0.01 = 0.99 = 99%

If we see sources of Hydration in 1% of the places we stand in the desert, then we expect to not see sources of Hydration in the other 99% places we stand. The complement probability is just that simple.

But what is the complement of a conditional probability, like P(E|H)? Remember that P(E|H) is the Probability of seeing an Elephant given that we have observed a source of Hydration. The complement would be the Probability of not seeing an Elephant given that we have observed a source of Hydration. This probability is simply:

P(E'|H) = 1 - P(E|H) = 1 - 0.4 = 0.6 = 60%

This says that since 40% of the sources of Hydration have Elephants around them, 60% won't--it's pretty straightforward.

What about the probability of being near a source of Hydration given that we don't observe an Elephant (P(H|E'))? This is a bit more complicated, but mathematically we get:

P(H|E') = {P(H) - P(H)*P(E|H)}/{1 - P(E)}

To help understand why this equation takes the form that it does, notice that the left side, P(H|E'), is the value we want to calculate and the right side is put into terms we already know: P(H), P(E|H), and P(E). To put it succinctly:

This equation is the simplest equation that puts the value we want to calculate into terms we already know.

This idea explains almost everything about the choices we make for the terms we put equations into.

That equation is not terribly difficult to derive, but I'll press on to show how to use it and give a quick derivation at the end just for the curious (and for my own future reference).

Now, lets pretend that we are in the desert and get thirsty, look around for an Elephant, and don't see one. What is our new Probability for observing Hydration nearby?

P(H|E') = {0.01 - 0.01*0.4}/{1 - 0.005} = 0.006 = 0.6%

So the our observation that we can't spot an Elephant drops our expectation of seeing a source of Hydration from a 1% prior probability to an 0.6% posterior probability. In other words, the effect on our estimation of the probability for the presence of Hydration is fairly minimal, mostly because we didn't really expect to see an Elephant in the desert anyway given how rare they are in general (more rare than even water).

In my next posting, I'll go one step further and show how to use Bayes's theorem without our having any direct knowledge of how Probable Elephants are found in the desert (no direct knowledge of P(E)). This turns out to be a little more useful in the real world, as I hope to demonstrate soon.

Here's that derivation I was talking about for the curious (notice I begin with a statement of Bayes's theorem with E' substituted for E):

P(H|E') = P(E'|H)*P(H) / P(E')
P(H|E') = {1 - P(E|H)} * P(H) / P(E')
P(H|E') = {P(H) - P(H)*P(E|H)} / {1 - P(E)}

Though not used in the above derivation, of special note are the following identities which allows for an alternative derivation:

P(x ∩ y) = P(x|y)*P(y) = P(y|x)*P(x)
P(
x ∩ y') = P(x) - P(x ∩ y)

Here's the alternative derivation (beginning with the first identity above):

P(H|E') = P(H
∩ E') / P(E')
P(H|E') = {P(H) - P(H
∩ E)} / P(E')
P(H|E') = {P(H) - P(H)*P(E|H)} / P(E')
P(H|E') = {P(H) - P(H)*P(E|H)} / {1 - P(E)}