The example says that 'about 1 in a 1000 people have this blood type'. The assumption here is that this proportion applies universally - not just to the 10,000 adults mentioned later in the example as being possible suspects. So it is fair to assert that the probability of a randonly selected innocent person having the matching blood type is about 1 in a 1000, i.e. P(E | H) = 1/1000.
In fact P(E) is precisely what is calculated in the denominator of the Bayes equation, i.e. is equal to
P(E|H)*P(H) + P(E|not H)*P(not H) = 10999/10,000,000, which is slightly more than 1/1000.
First, thanks for the book - quite interesting and new (for me at least).
But your reasoning seems wrong:
H - hypo "Fred is innocent" and h = "not H"
Statement "about 1 in a 1000 people have this blood type" is UNCONDITIONAL probability (confidence) P(E) = 1/1000/
Then the Bayes says (for hypo "h"): P(h|E) = P(E|h)*P(h)/P(E).
If the population is 10,000 and there is no other evidence against Fred,
then "prior" P(h) = 1/10,000.
Apparently P(E|h) = 1 ("Fred WAS present at the crime scene") and then we have:
P(h|E) = 1*(1/10,000)/(1/1000) = 0.1 and P(H|E) = 0.9
To calculate P(E|H) we note P(E) = P(E|h)*P(h) + P(E|H)*P(H).
Prior P(H) = 9,999/10,000 (anybody at the scene, except Fred) and after some math:
P(E|H) = (P(E) - P(E|h)*P(h))/P(H) = (1/1000 - 1*1/10,000)/(9,999/10,000), i.e.
P(E|H) = 1/1,111.
And reasoning: if we know for sure that Fred (with his blood type) WAS NOT at the crime scene the value of P(E|H) shall be LESS than P(E) as he has to be excluded both from number of suspects (now 9 instead of 10) and number of population (9,999 instead of 10,000).
The confusion which you are picking up on in this example concerns the 'population', together with the background knowledge.
Neither of these is made explicit in the example (we will fix this in a future edition)
There are actually two populations being referred to: When we refer to the 1 in a 1000 having the matching blood type we are really talking about the 'world population' rather than the 10,000 adult males who were in the town (and are the only ones we assume are possible suspects).
Now we know that one of the 10,000 (the guilty person) has the matching blood type.
The question is how many of the other 99,999 innocent people will also have that blood type.
Now if you assume that the 1 in 1000 match is somehow 'exact' for all samples of 10,000 people then we know that 10 people from the 10,000 will match. So that would mean exactly 9 innocent people of the 10,000 match. On that basis you could conclude - as you do = that P(H|E) = 0.9.
HOWEVER, because the 1 in 1000 blood match type refers to the entire population - and not uniformly to each sample size of 10,000, we have to assume that the probability of any one of the 9999 innocent people having the same blood type is still 1/1000, i.e. P(E|H) = 1/1000.
When we are talking about P(E | H) we are actually using the 10,000 population rather than the world population so technically taking P(E | H) = 1/1000 is an approximation. We should actually have used
P(E | H) = 99/9999
i'm sorry to say but i used that 10,000 figure just to follow your example. You can easily generalize the case for entire Earth (or Universe :)) population, let say N:
Provided there is no other evidence against Fred, he is guilty (Hypo "h") with prior probability P(h) = 1/N. I.e. P(H) = (N-1)/N. If N -> inf. then Fred is practically innocent (if there are NO other evidences - prior knowledge).
- P(E|h)=1, of course (he was at the scene with his blood type);
- P(E) = 1/1000 (or whatever, can be ANY figure!) - UNCONDITIONAL probability. Without saying N shall be > 1000!
Apparently the factor in the brackets = (N-1/P(E))/(N-1) < 1 and, respectively, P(E|H) < P(E).
Therefore example in your book suggest N -> infinity and P(E|H) = P(E). In case of "limited" population, which CAN have access to the site, e.g.
- if P(E) =1/1000 then P(E|H) = 1/1000*(N-1000)/(N-1).
- if N=10,000 then 1/1000*9,000/9,999=1/1,111 as before
Just as the prior for P(H) (=1/10,000) is conditioned on the background knowledge of there being only 10,000 possible suspects, so must P(E) be conditioned on the same background knowledge.
Hence it is WRONG to assume P(E)=1/1000.
In fact P((E) = P(E|H)*P(H) + P(E|h)*P(h) and this is exactly the denominator used on page 124, namely it works out as 10,999/10000000 = 0.0011
i.e. P(E) is greater than 1/1000
once again - i do NOT make any suggestions on the exact figures:
P(E) = probability of statement "about 1 in a XXX people have this blood type" shall NOT depend if Fred is guilty or innocent, i.e. can be any figure: e.g. it can 1/4 in avaerage (like if 4 types of blood were know years ago equally distributed among entire Earth population).
In your reasoning we have exactly this:
- if Fred is innocent, then "about 1 in a 1000 people have this blood type"
- and what if Fred is guilty? Does it change overall [worldwide/citywide/villagewide] statistics?
If i follow your logic, i have to revise scientific (or, assumed being scientific/statistic/objective - choose name of preference) PRIOR knowledge that "about 1 in a 1000 people have this blood type" and make it depended on innocence of Fred.
I.e. I have to assume P(E|given the city particular exactly 10,000 men are tested AND Fred is innocent) = 1/1000 - then your are right. But i'm doubtful if the meaning in your example was like this.