Suppose, we are presented with a choice of three closed envelopes. One envelope contains a prize, the other two are empty. After we have selected an envelope, it is revealed that one of the envelopes that we had not selected is empty. We are now permitted to choose again. What should we do? Stick with our initial selection? Randomly choose between the two remaining envelopes? Or pick the remaining envelope—that is, not the one that we selected initially and not the one that has been opened?
This is a famous problem, which is sometimes known as the “Monty Hall Problem” (after the host of a game show that featured a similar game).
As it turns out, the last strategy (always switch to the remaining envelope) is the most beneficial. The problem appears to be paradoxical because the additional information that is revealed (that an envelope we did not select is empty) does not seem to be useful in any way. How can this information affect the probability that our initial guess was correct?
The argument goes as follows. Our initial selection is correct with probability p = 1/3 (because one envelope among the original three contains the prize). If we stick with our original choice, then we should therefore have a 33 percent chance of winning. On the other hand, if in our second choice, we choose randomly from the remaining options (meaning that we are as likely to pick the initially chosen envelope or the remaining one), then we will select the correct envelope with probability p = 1/2 (because now one out of two envelopes contains the prize). A random choice is therefore better than staying put!
But this is still not the best strategy. Remember that our initial choice only had a p = 1/3 probability of being correct—in other words, it has probability q = 2/3 of being wrong. The additional information (the opening of an empty envelope) does not change this probability, but it removes all alternatives. Since our original choice is wrong with probability q = 2/3 and since now there is only one other envelope remaining, switching to this remaining envelope should lead to a win with 66 percent probability!
I don’t know about you, but this is one of those cases where I had to “see it to believe it.” Although the argument above seems compelling, I still find it hard to accept. The program in the following listing helped me do exactly that.
import sys
import random as rnd
strategy = sys.argv[1] # must be 'stick', 'choose', or 'switch'
wins = 0
for trial in range( 1000 ):
# The prize is always in envelope 0 ... but we don't know that!
envelopes = [0, 1, 2]
first_choice = rnd.choice( envelopes )
if first_choice == 0:
envelopes = [0, rnd.choice( [1,2] ) ] # Randomly retain 1 or 2
else:
envelopes = [0, first_choice] # Retain winner and first choice
if strategy == 'stick':
second_choice = first_choice
elif strategy == 'choose':
second_choice = rnd.choice( envelopes )
elif strategy == 'switch':
envelopes.remove( first_choice )
second_choice = envelopes[0]
# Remember that the prize is in envelope 0
if second_choice == 0:
wins += 1
print winsThe program reads our strategy from the command line: the possible choices are stick, choose, and switch. It then performs a thousand trials of the game. The “prize” is always in envelope 0, but we don’t know that. Only if our second choice equals envelope 0 we count the game as a win.
The results from running this program are consistent with the argument given previously: stick wins in one third of all trials, choose wins half the time, but switch amazingly wins in two thirds of all cases.
Turning raw data into something useful requires that you know how to extract precisely what you need. With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.




Help





