An Introduction to Mathematica and the “Birthday Problem”

Note to teacher : This unit is intended for high school seniors in AP Statistics. Its intent is to introduce students both to Mathematica and to the Monte Carlo method of solving probability problems, in particular, the “Birthday Problem.” Using Mathematica’s random-number-generating capability along with its ability to work with lists, students can use the computer to simulate the birthdays of any number of people and find approximate but reliable solutions to the “Birthday Problem.” The unit will also serve as an introduction to programming in Mathematica for those students who are not yet familiar with this powerful piece of software. In this unit, students will be asked to copy and evaluate Mathematica code and analyze the output. They will also be asked to modify code and create their own code in order to obtain certain desired results. Questions have been inserted throughout the unit as an aid to assessment. It is recommended that students do not proceed with the unit if they are unable to answer any of the questions, since this may mean that the students are not fully understanding either some concepts of probability or the Mathematica code necessary in the simulation.

NCTM Standards: This unit incorporates several of the goals from the NCTM Statistics Standard. In particular, students will be guided in designing and conducting a “probability experiment” whose results will then be interpreted. Students will also be asked to draw inferences from probability graphs generated by computer simulation.

The Birthday Problem

You may be sitting in a classroom right now with about 25 other students. Do you think that any of you have the same birthdays? It probably seems unlikely, since there are 365 days in the year and only around 25 of you. Write down your guess (as a percentage) about the likelihood of there being a match.

Let's use Mathematica to simulate this problem and get an idea of what the probability is that there is at least one pair of matching birthdays in the class. Here's the plan. We'll let Mathematica create a list of 25 random integers between 1 and 365, inclusive. Each number will represent the birthday of one of the 25 people in the room. We'll assume that people are born on all days of the year with equal likelihood. That is, being born on, say, November 12 is just as likely as being born on, say, February 3. Why do think we restrict the random numbers to integers from 1 to 365?

The following is the Mathematica code for creating such a list. You can try it out by typing it in to a Mathematica notebook and evaluating it by pressing the Shift and Enter keys together.

Table[Random[Integer,{1,365}],{25}]

Were there any matching birthdays? Explain. Can you think of any ways of simulating birthdays without using a computer? See if you can write the code to create a list of 7 random REAL numbers between 2 and 5.

An easier way to check for matches is to have Mathematica sort the list after it creates it. Go back to your first input and sort it as follows. (Don’t forget the square brackets.)

Sort[Table[Random[Integer,{1,365}],{25}]]

An even better way to determine whether or not there is a match is to have Mathematica check the list for us. That way, we won't even have to inspect it. We can do this using the Union function. Type and evaluate each of these inputs.

Union[{3,2,6,1,8}, {6,3,8,1.6,12,9}]

Union[{5,6,6,14,34,77,89,89,89,103,235}]

Notice that when you take the Union of one set, any elements of the set that are repeated are eliminated. Therefore, if the Union of our list is different than the list itself, we must have had at least one match! To test if they are different, we can use the If function. Notice that the If function can take three arguments: the test condition, what to do if the condition tests true, and what to do if it tests false. Try the next input and explain the output. (Note: “=!=” means “not equal.”)

If[{1,2,3,3,4} =!= Union{1,2,3,3,4}, Print["Match!"], Print["No Match"]]

See if you can write an If statement that outputs “Even Number” if an integer is even and “Odd Number” if the integer is odd. You may wish to use the EvenQ function. Note: information on any function can be found by looking it up in the Help menu (top right) or by typing a question mark (“?”) and then the name of the function in a new cell. For example:

?EvenQ

EvenQ[expr] gives True if expr is an even

integer, and False otherwise.

For our experiment, we would like to test whether or not any of the 25 random integers in our list is duplicated. To make our programming a little less cumbersome, let's give our list a name, “datelist.” Then we'll use that list in an If statement. You can simply click on your first input and insert “datelist =” at the beginning of the line.

datelist = Table[Random[Integer, {1,365}],{25}]

Now for our If statement:

If[datelist =!= Union[datelist],

Print["Match!"], Print["Sorry, no match this time."]]

Note: Mathematica generally ignores carriage returns, extra spaces, tabs, etc. Therefore, you should include them as necessary in order to make your code more readable. Evaluate the cell containing the If statement ten times, keeping track of how many times a match is made and record your results. Each time you evaluate that cell, you are really performing a trial. What does each trial represent? Based on this little experiment, what is the approximate probability of their being a match?

So far we've found a way to simulate the birthdays of 25 people and decide if any of them is the same. Of course, doing a mere ten trials is not sufficient to determine the probability with great confidence. This would take hundreds, perhaps thousands, of trials. It would become very tiresome to evaluate that cell hundreds of times and keep a tally on paper. Therefore, let's get Mathematica to do it! To do this we will need a Do loop. Let's see how one works first. Try the next line and explain the output.

Do[Print[i," ", i^2," ", i^3], {i,1,3}]

1 1 1

2 4 8

3 9 27

The “i” in the above cell is called an iterator; it starts at 1 and stops at 3. The Print statement is called three times (while i <= 3). The quotation marks only serve to insert spaces between results. See if you can write a Do statement that will print the integers and the decimal approximations of their square roots from 4 to 16. For this you will want to use the Sqrt function and the N function, which gives decimal approximations of numbers. Note: It is also possible to accomplish this using the Table function.

Now let's build our Do loop. Besides creating and testing many lists, we would also like Mathematica to keep track of how many of those lists contain matches:

c=0;

Do[{datelist[i]=Sort[Table[Random[Integer, {1,365}],{25}]]

If[Union[datelist[i]] =!= datelist[i], c=c+1]

}, {1000}

]

Let’s take a closer look at the above code. “datelist[i]” is a subscripted variable as in xor x. Rather than having a certain numerical value, though, each datelist[i] is its own list of twenty five numbers. That is, datelist[1], datelist[2], datelist[3], ... , datelist[1000] each represent a set of 25 integers from 1 to 365. Each datelist[i] is checked for repeated numbers, and if at least one pair of numbers is repeated, c is increased by one. c, therefore, serves as a counter, keeping track of “the number of groups of 25 people with at least one pair of matching birthdays.” Note that c is set equal to zero in the beginning. The braces in the above code group together all the operations that will be carried out in the Do loop, in this case, 1000 times.

If you tried evaluating this code, you probably noticed that there was no output. That’s because we haven’t yet asked Mathematica to tell us anything. It knows how many of the thousand lists contain matches, but we must ask it to share that information. Let’s have it output the portion of lists that had at least one match:

N[c/1000]

What is the approximate probability that in any given set of 25 people there will be at least two people born on the same day of the year? Explain your answer. In what way could the Mathematica code be modified in order to better approximate the probability?

Note: Using Mathematica to perform excessive numbers of trials is acceptable as long as computer time is not at a premium! Every time you evaluate that cell, Mathematica will approximate the desired probability for 25 people. See if you can modify the code to find the comparable probabilities for only two people. Record your result and compare it to what you would predict based on your knowledge of probability. Hint: What’s the probability of two birthdays not matching?

Do you think there is a certain number of people beyond which there is absolute certainty that there will be at least one match? Explain your reasoning.

Try changing the number of people several times between (2 and 100 people) and record your results. What are the maximum and minimum values for the probability? What appears to be happening to the probability when the number of people gets large? Is this what you would expect?

It is possible to define a function in Mathematica that takes as its argument the number of people and returns the probability of there being a matching birthday in a group of that many people. Before we do that, though, let’s learn about how to create and work with functions in Mathematica. First of all, square brackets are always used, not parentheses, and the variables can be entire words. For reasons which will not be explained here, the independent variable is followed by an underscore, and “:=” is used rather than “=” in defining the function. In the example below, a function called “func” of the independent variable x is defined to be 2x+ 5x + 1. Then func is evaluated at x = 1, 0, and var1 + var2. Note that unless you’ve told Mathematica the values of var1 and var2 beforehand, it just substitutes their sum in for x.

func[x_]:= 2x^2 + 5x + 1

func[1]

func[0]

func[var1+var2]

8

1

2

1 + 5 (var1 + var2) + 2 (var1 + var2)

See if you can write a function called “logsum” that takes two variables, “num1” and “num2” and adds the sum of their natural logs, returning a decimal approximation. In defining your function you will want to use Mathematica’s built in Log and N functions. Then use logsum to find the sum of the logs of 2 and 3. Compare that result with what Mathematica gives for the log of 6. Surprised? Record your results.

Now let’s write a function called “prob” that takes one variable, “people” and returns the corresponding probability:

prob[people_]:= ( c=0;

Do[ { datelist[i] = Sort[Table[Random[Integer, {1,365}],{people}]]

If[Union[datelist[i]] =!= datelist[i], c = c+1] }, {1000}]; N[ c/1000] )

Notice that if we want a function to several operation (such as initializing a variable, carrying out a Do loop, and returning a number), we need to group those operations with parentheses. To find the probability of at least one duplicate birthday in a group of, say, 30 people, all we need to do now is to find prob(30):

prob[30]

0.715

Let’s incorporate our “func” function into a Do loop so that we can evaluate it at several x values all at once:

Do[ Print[func[x]], {x,-2,5} ]

-2

1

8

19

34

53

76

We could have also used the Table function:

Table[func[x], {x,-2,5}]

{-1, -2, 1, 8, 19, 34, 53, 76}

For the purposes of our “Birthday Problem,” we would like to include our “test” function into a Table so that we can evaluate “test” at many different values. Table is preferable to Do in this case since it returns lists, which are easily manipulated in Mathematica. We’ll call our list “data.” We will also include the number of people in the output:

data = Table[{people, prob[people]}, {people,15,30}]

{{15, 0.246}, {16, 0.273}, {17, 0.291},

{18, 0.371}, {19, 0.376}, {20, 0.433},

{21, 0.447}, {22, 0.458}, {23, 0.527},

{24, 0.524}, {25, 0.547}, {26, 0.581},

{27, 0.626}, {28, 0.626}, {29, 0.67},

{30, 0.713}}

Notice that Mathematica outputs a “list of lists.” What is the approximate probability when there are 17 people? About how many people do you need to have in a room in order to have about a 50-50 chance of there being a matching birthday?

We can make the output move visually appealing with the TableForm function:

TableForm[data]

15 0.246

16 0.273

17 0.291

18 0.371

19 0.376

20 0.433

21 0.447

22 0.458

23 0.527

24 0.524

25 0.547

26 0.581

27 0.626

28 0.626

29 0.67

30 0.713

You may have noticed that Mathematica lists the probability for 24 people as slightly less than that for 23. This obviously cannot be true. Did Mathematica make a miscalculation? Explain this phenomenon.

We can also ask Mathematica to display these results graphically:

ListPlot[data]

Notice that built-in Mathematica functions always start with a capital letter, and if they are made up of two English words, they are both capitalized with no spaces in between. We can change the appearance of the graph as follows:

ListPlot[data, PlotRange->{{0,60},{0,1}},

AxesLabel->{" Number of People", "Probability"}]

“PlotRange” and “AxesLabel” are examples of options. To find out what options a function can take you can type “??FunctionName”.

It may seem by looking at the graph that the probability is a linear function of the number of people. Explain why this can’t be so, at least over a large range of people.

See if you can make your own graph that displays the probabilities for 2 through 60 people. You should get something like this:

Does this graph make sense? What appears to be happening to the probability as the number of people gets very large? Explain.

With the “PlotJoined” option we can “connect the dots.”

ListPlot[data, PlotRange->{{0,60},{0,1}}, PlotJoined->True,

AxesLabel->{" Number of People", "Probability"}]

Compare the previous graph with the following graph. See if you can explain why the following graph is much smoother. Hint: It has nothing to do with the options for ListPlot. Rather, it came about from a slight modification of the “prob” function.

Note: Creating the data necessary to produce the above required about an hour of computer time on my computer. (I used 10,000 trials for each of 59 different-sized groups of people, groups of 2 up to groups of 60).

Now that you’re a master at programming probability simulations in Mathematica, see if you can write the code needed to solve the “Birth Month Problem”--the probability of there being at least two people in a room who were born in the same month--using the Monte Carlo Method. Graph probability as a function of people. How many people are required for a match to be a certainty? What about for the probability to be at least 50%?