Phillies Attendance Analysis Matt Groves

Introduction

The 2009 Major League Baseball season saw the Philadelphia Phillies make the playoffs for the third straight year. They successfully defended their National League pennant and returned to the World Series one year after winning ultimate title in baseball for the first time since 1980. Coming into a season as the World “bleeping” Champions and with virtually the same team intact and primed to make another run meant the Phillies were going to be a hot selling ticket in the city of Philadelphia. The ball club was not going to have trouble getting fans into the stands; but what aspects of each game are going to affect the attendance that day?

During the off-season, teams set up promotional give-a-ways to attract fans to the ballpark. Common sense would say that during a 162 game season a team will struggle to draw big crowds on daily basis and promotions, in theory, should attract fans to the ballpark. Bill Veeck, former owner of the Chicago White Sox and who is considered to be the inventor of game day promotions, believed that each day should be like Mardi Gras in order to enhance each fans experience at the ballpark. With that in mind, are the Phils getting a positive return on their investment with promotional give-a-ways drawing a bigger crowd? Do other variables, like the Phillies’ winning percentage, the opponent’s winning percentage, whether the game is played at night or during the day, whether the game is played on weekday or weekend, is the game in the first half or second half of the season, importance of the game, and the Phillies winning streak heading into the game affect attendance? Because certain variables did not change during the 2009 Phillies season, factors like ballpark quality, tailgating options, and demographic data can be held constant. Economic factors such as ticket price, average income, and population are considered constant in this study as well.

Prior Research

Early research concerning this topic goes back to 1989 when, Hansen and Gauthier identified three categories into which most of the research falls. The first category is economic factors like income, population characteristics, and price of tickets. Next are demographic factors like race, sex, and age of the target audiences of the ball clubs.The last category is the attractiveness of the game, using factors like opponent quality, presence of star athletes, and promotions. The drawback of the economic and demographic factors is that they are not easily controlled or acted upon by teams to boost attendance.

In more recent times researchers have shifted their focus on factors that can be controlled or manipulated by management to impact the attendance.The research emphasizedcontrollable aspects that make the game more attractive, with non- controllable factors in the study to assure that the model is valid. Boyd and Krehbiel (1999) were the first to study the effect of promotion timing on MLB attendance. They used data for six teams over four seasons and studied the effect of running promotions on weekend games versus weekday games and night games versus day games. Their results showed that the timing of promotions can influence their effectiveness, showing that management can make a difference with promotions. They found that the greatest impact on attendance occurred when a promotion was run during a day game on a weekday. The other main point of the research was that the data varied by team, so teams should not rely on the aggregate data as their only source of information about the effectiveness of promotions.

In 2000 McDonald and Rascher studied whether or not there were diminishing marginal returns to additional promotions.To measure this they ran a regression testing if the promotion variable is equal to 1one, rejection in the direction below one implies diminishing returns. Using forty independent variables they examined data from twenty-nine MLB teams during the 1996 season, finding that promotions increased attendance by 14 percent on average. They did find evidence of diminishing marginal returns but they were not negative for any team. Boyd and Krehbiel did not find any trace of diminishing returns (differenceswithin the samples could be the reason for this). In 2003 they argued that diminishing marginal returns could be due to less expensive or less popular promotions. The fact that promotions may be added to games that already have a high attractiveness and would have a high attendance regardless of the promotion cannot be ignored here.One would assume that if the game is highly attractive than the promotion would have less of an impact on attendance.

McDonald and Rascher also added the idea that different promotions will not have equal effect on attendance.They separated promotions by price and nonprice and used the monetary value of the promotion to reflect attractiveness. Although monetary value may not be the only factor in attractiveness, it is hard to put a value on a promotion that would allow your ten year old to meet his favorite player or run the bases. Boyd and Krehbiel tried to answer this question by creating categories of different types of promotions. Using multiple regression analysis and three categories of promotions (price discounts, special events, and giveaways) they found that different promotions increased attendance by different amounts. Their results for six teams in the 1999 season showed that the effect of a price discount increased attendance by 1,347, a giveaway increased attendance by 6,207, and a special event increased attendance by 5,563. The analysis also showed that you must consider game attractiveness as well as promotions.They found statistical significance for games played against rivals or on a weekend. The results also showed that running a promotion with a highly attractive game (called stacking) still increased attendance but the promotion did not have a big effect on the increased attendance. The research done has yielded a general consensus that promotions do have a positive impact on attendance and are the most effective when run on weekdays and for day games.

Method

For my research I collected data from all 81 of the Phillies home games of the 2009 season. I got all of my information from the team website ( and looked up the box score from each game.Attendance is the dependent variable while the independent variables I chose for the regression are Phillies winning percentage at the time of the game, opponents winning percentage at the time of the game, day of the week of the game, start time of the game, the importance of the game (also known as game attractiveness), the Phillies win streak going into the game, the month of the year of the game, and if the game had a promotion or not. Because it is a one season sample ticket price, population, stadium quality, tailgating options, and city demographics are all held constant. Here is some information on each independent variable.

1)Phillies Winning Percentage (phillieswp) –The to-date winning percentage of the team heading into each game. For example if the Phillies have a record of 20-20 heading into the game there win percentage for that game has been entered as .500.

2)Opponents Winning Percentage (opponentwp) – Same as phillieswp but done with the opponent of that day’s game.

3)Day of the Week (weekendgame) – Dummy variable depicting the day of the week that the game is played. Weekend games are coded with a 1 and weekday games are coded with a 0. So for example a game on a Wednesday would be coded as 0.

4)Time of the Day (daygame) – Dummy variable based on the start time for each game.Day games are coded with a 1 and night games are coded with a 0. Day games are considered to start between the hours of 12 noon to 4 p.m. while night games are any game starting from 5 p.m. to 12 midnight.

5)Importance (importance) – The importance of that day’s game. Can also be referred to as game attractiveness. This graded on 1 thru 5 scale with 1 having no importance and 5 being extremely important. Earlier season games are coded so that they have lesser importance than games played later in the season. Also games against division rivals are coded for being more important than other games. Games against division leaders and interleague games are also given high importance. Games in which the Phillies can clinch a playoff spot and/or the division title are also given extreme importance.The rule followed is that games on average increase in importance as the season progresses.For example a game played against the New York Mets is given a higher importance than a game against the Arizona Diamondbacks. Also, agame against the Mets in August or September is given higher importance than a game against the Mets played in April or May.

6)Phillies Win Streak (winstreak) – The win streak that the team has going into that day’s game. Losing streaks are just coded as a 0.

7)Pre or Post All-Star Break (postallstar) – Dummy variable showing the part of the season the game is played.All games played in the first half of season, before the All-Star Break, are coded with a 0; all games played in the second half of the season, after the All-Star Break, are coded with a 1. For example a game played in May is coded a 0 and game played in August is coded a 1.

8)Promotions (promotion) – Dummy variable that measures whether that day’s game had a promotional give-a-way or not. Games with a promotion are coded with a 1 and games without a promotion are coded with a 0.

I selected these variables to try and be as consistent with the previous research as

possible.The winning percentage variables phillieswp and opponentwp are recorded slightly different. The previous research uses the prior year’s winning percentages for the first series while I just went straight with the to date win percentage so Opening Day is coded as .000 for both teams. Most of the variables are quantitative; they are indentified on a numerical scale. Phillieswp, opponentwp, dayweek, timeday, winstreak and monthyear are all based on numerical data that can be looked up. The qualitative variables, data used to describe certain types of information, are importance and promotion. Importance is more of an opinion but has a statistical backing while promotion is based on the information of the day’s promotion of lack thereof.

Model

The equation for the regression is as follows:

attendance = constant term + phillieswp*X1 + opponentwp*X2 + importance*X3 + winstreakX4 + weekendgame*X5 + daygame*X6 + postallstar*X7 + promotion*X8 + error term

The regression than produced the following results:

As you can see from the results three of the variables are statically significant on two-sided test and one is significant on a one-sided test. The null hypotheses that opponentwp = 0, importance = 0, and weekendgame = 0 can be rejected on a 95% confidence interval. Opponentwp has a t-value of 3.81, greater than 2, and the probability of obtaining a value greater is .000, less than .05. Importance has a t-value of 3.53, greater than 2, and the probability of getting a greater value is .001, less than .05. We fail to reject the null hypothesis that promotion = 0 on a 95% confidence interval but we can reject the null on a 90% confidence interval. The t-value of promotion is 1.894, less than 2, and the probability of a greater value is .057, greater than .05 but less than .1.The probability is much closer to .05 than .1.Therefore based on this data we can say that promotions for the 2009 Philadelphia Phillies season did have a statistical significant affect on attendance.

Based on the results, a 1% increase in the Phillies winning percentage would cause attendance to increase by 2,133 people while the same increase in the opponents winning percentage would cause increase attendance by 5,456 people.If the importance of the game increased by 1, than the attendance will increase by 1,037 people. A 1 game increase in a win streak will increase attendance by 104 people. A game played on a weekend will bring 1,399 more people to the ballpark. Day games and games played during the second half of the season actually have a negative effect on attendance. Based on the regression a day game will decrease attendance by 178 people while a game played after the All-Star Break will decrease attendance by 490 people. The R-squared value.4154 and the adjusted R-squared is .3504 showing that the regression line is a poor approximation of the real data points.

We can see the relationship between the opponents winning percentage and attendance with the following scatter plot:

The graph shows that for the most part the higher the opponents winning percentage than the higher the attendance. There is a strong cluster of high attendance right around the .500 win percentage mark. The outliers are due to the early season games where attendance is not as likely to be affected by the opponent. The attendance during the opening series is most likely to already be high regardless of the opponent’s quality. Based on the graph games against an opponent with a better than .500 win percentage will draw a high attendance figure.

The following graph shows the relationship between importance of the game and attendance:

The graph shows that the more important games, the games coded with a 4 or 5, have a higher attendance on average. The games coded with a 2 or 3 have games that did draw a high number but also have games with low attendance. One games coded with a 2 has an attendance figure at around 33,000 people. Based on the graph you can make the claim that importance of the game isn’t the only factor in attracting fans to the ballpark but games that are important will have a high attendance.

The following graph shows the relationship between weekend games and attendance:

The graph shows that weekend games will attract more fans to the ballpark.All games that were played on a Friday, Saturday, or Sunday, have attendance figures between 43,000 and 46,000 people. Some games played on weekdays still drew big crowds but also had days with below 43,000 fans in attendance. Weekend games have a low standard deviation of the mean, the range of the data is much smaller than the range for weekday games. Based on the graph you can say that the day the game is played is not the only factor affecting attendance, but games played on weekends will have a high attendance.

The relationship between promotions and attendance is shown on the following graph:

The graph showsthat games with promotions have high attendance figures. All games with a promotion have an attendance figure between 40,000 and 46,000 people. Games without a promotion still have high attendance numbers but also have low attendance. Like with game importance and weekend games, promotions are not the only factor that affects attendance but a game with promotion will most likely draw more fans to the ballpark.

The normal regression done above shows the normal affects the independent variables have on attendance. If we want to see the affects by percentage the given variables have on attendance than we must run a regression of the log of attendance on the logs of the same variables.

The equation for the regression of the logs is as follows:

logattendance = constant term + logphillieswp*X1 + logopponentwp*X2 + logimportance*X3 + logwinstreak*X3 + error term. The dummy/binary variables get dropped in this regression.

The regression than produced the following results:

The promotion, daygame, weekendgame, and postallstar variables get dropped in the log form because they are dummy/binary variables.The regression shows that a 1% increase in the winning percentage of the Phillies will increase attendance by 29%. A 1% increase in the opponents winning percentagewill increase attendance by 8%; a 1% increase in importance of the game will increase attendance by 5%. Win streak actually has a negative impact, although very small, on attendance in the log form. A 1% increase in the teams win streak will decrease attendance by .07%. Only the opponents winning percentage is rejected on a 95% confidence interval. The t-value to logopponentwp is 2.43, greater than 2, and the probability of a great value is .020, less than .05.

The relationship between logattendance and logopponentwp is shown on the following scatter plot:

The graph shows that larger logarithm of winning percentages is generally associated with a higher logarithm of the attendance.

The summary of the data is shown on the following chart:

The summary of the data of the logged variables is shown on the following chart: