Thomas Bayes was an English minister and mathematician, and he became famous after his death when a colleague published his solution to the “inverse probability” problem. Described below.

Given that you have a urn with 10 black balls and 20 white ones, what’s the probability that by picking randomly you’ll get a white ball? This is a conditional probability problem.

Now what if we get another urn. You don’t know which balls where inserted, but as I go along picking balls and showing them to you, what can be said about the number of black and white balls in the urn? That’s inverse probability, although it’s an old term. Today it’s called inferential and Bayesian statistics.

## Conditional Probability

Before proceeding it’s important that you understand the basic probability principles. Our starting point is the conditional probability rule:

`P(A|B) = P(A∩B) / P(B)`

In words: the probability of event A given that event B happened is equal to the intersection of the two events divided by the probability of B. Let’s use an example to illustrate the concept.

Suppose you have a school with 100 students. Out of those 55 study math (event A), 25 study physics (event B), and 20 study both, as illustrated in the Venn diagram below:

Now what’s the probability that a student picked at random studies math GIVEN that we know he studies physics? Using the definition:

```
P(A|B) = P(A∩B) / P(B)
P(A|B) = 20 / 25
P(A|B) = 0.8
```

So the probability is 80%. Here’s what is going on: we are basically dividing the number of students who study both math and physics by the total number of students who study physics. This will give us the probability that, by picking a student at random and verifying that he studies physics, he also studies maths.

## Bayes’ Theorem

Remember that:

P(A|B) = probability that student studies math GIVEN he studies physics

P(B|A) = probability that student studies physics GIVEN he studies math

Now we just calculated P(A|B). What if from that resulted we wanted to calculate the inverse probability, that is P(B|A)?

That’s where Bayes’ Theorem comes into place. Bayes basically re-arranged the conditional probability definition to make it easier to calculate those inverse probabilities.

Condition probability definition (A given B):

`1. P(A|B) = P(A∩B) / P(B)`

Isolate P(A∩B) in 1:

`2. P(A∩B) = P(A|B) P(B)`

Condition probability definition (B given A):

`3. P(B|A) = P(A∩B) / P(A)`

Now plug 2 in 3 to get **Bayes’ Theorem**:

```
Bayes' Theorem
P(B|A) = P(A|B) P(B) / P(A)
and
P(A|B) = P(B|A) P(A) / P(B)
```

Here’s one way to think about it: when you do P(B|A) P(A), you are basically finding the intersection of the two events. After that you divide the result by either P(B) to get the conditional probability. For instance, with our example above P(B|A) is the probability that a student studies physics given he studies math, which is 20/55. If we multiply that by the number of students that do study math, 55, we get 20 as the result, which is the number of students who study both subjects. Now if we divide that number by 25 we get 0.8, which is the conditional probability of studying math given it studies physics.

And here’s how to apply Bayes’ Theorem to calculate the inverse probability. We know that P(A|B) = 0.8, so:

```
P(B|A) = P(A|B) P(B) / P(A)
P(B|A) = (0.8)(0.25) / 0.55 = 20/55 = 0.36
```

**Example**: Suppose you are a hospital manager, and you are considering the use of a new method to diagnose a rare form of bowel syndrome. You know that only 0,1% of the population suffers from that disease. You also know that if a person has the disease, the test has 99% of chance of turning out positive. If the person doesn’t have the disease, the test has a 98% chance of turning negative.

**Question**: How feasible is this diagnostics method? That is, given that a test turned out positive, what are the chances of the person really having the disease?

**Solution**:

Let’s say that event DIS is having the disease, and event POS is getting a positive test. To solve the problem we want to find P(DIS|POS). The description of the problem tells us that:

```
P(DIS) = 0.001 (population with the disease)
P(DIS') = 0.999 (population without the disease, so complement of event DIS)
P(POS|DIS) = 0.99 (positive test given patient has disease)
P(POS|DIS') = 0.01 (positive test given patient doesn't have disease)
P(POS'|DIS) = 0.02 (negative test given patient has disease)
P(POS'|DIS') = 0.98 (negative test given patient doesn't have the disease)
```

Bayes’ theorem is the following:

`P(DIS|POS) = P(POS|DIS) P(DIS) / P(POS)`

So we are just missing P(POS).

```
P(POS) = P(DIS)P(POS|DIS) + P(DIS')P(POS|DIS')
P(POS) = 0.001 * 0.99 + 0.999 * 0.02
P(POS) = 0.02097
```

Now we can apply Bayes’ theorem:

`P(DIS|POS) = 0.99 * 0.001 / 0.02097 = 0.0472103`

In other words, given the test turned out to be positive, the person only has a chance of 4.7% of actually having the disease. Clearly this is not a feasible method for diagnosing the rare disease.

**Important Note**: As long as you know the basic conditional probability rule you don’t really need to know Bayes’ theorem to solve any problem. After all his theorem only re-arranges the original rule. For example, in the problem above you could have solved it using the condition probability rule:

`P(DIS|POS) = P(DIS ∩ POS) / P(POS)`

You would just need to remember that you cac find the intersection of POS and DIS by using this:

`P(DIS ∩ POS) = P(POS|DIS) P(DIS)`

So why is Bayes’ theorem important if we don’t need it? Well, you don’t need it for problems like the above one. However, there are many classes of problems that can be understood and solved much more easily applying Bayes’ theorem. I’ll talk about it next.

## Prior and Posterior Probability

Prior probability is the probability you attribute to a certain event without further knowledge about it. Once you acquire more information you might be able to revise your probability calculations, thus getting a posterior probability.

This kind of calculation is called inference statistics, and Bayes’ theorem provides a very simple and practical framework for this type of calculation.

In its basic form we could say that P(A) is your prior probability for event A, and after you acquire knowledge that event B also happened your posterior probability of event A becomes P(A|B) = P(B|A)P(A)/P(B) (Bayes’ rule).

We can also use the formula with odds. Odds represent a slightly different application of probability. Say we are rolling a die, and you win if we roll either a 1 or a 2. Clearly your probability of winning is 2/6 or 1/3. However, your odds are 2/4 or 1/2. That is, you’ll win one time for every two times you lose.

**General rule**: if you have a probability of a/b, the odds of a over b are a / (b-a). Inversely if your odds are c/d, the probability of c is c / (c+d).

Now suppose we are interested in event A again. We have a prior probability P(A), and then event B happens. We could find the posterior probability by applying Bayes’ theorem in the odds form.

`P(A|B) / P(A'/B) = P(B|A)P(A)P(B) / P(B|A')P(A') P(B) `

As you can see we can cancel P(B) on the right side of the equation, getting:

`P(A|B) / P(A'/B) = P(B|A)P(A) / P(B|A')P(A')`

This can be seen as: Posterior odds = Prior odds * Bayes factor

Here’s an example from the book “Understanding Probability” by Henk Tijms:

**Example**: “It’s believed that a treasure will be in a certain sea area with probability p = 0.4. A search in that area will detect the wreck with probability d = 0.9 if it’s there. What’s the posterior probability of the treasure being in the area when a search didn’t find anything?”

**Solution**: The author uses the odds form to solve the problem.

Posterior odds = (0.4)(0.1) / (0.6)(1) = 1/15

And to transform it back to a probability we simply do 1 / (1+15) = 1/16