Andrew Oxfeld

CPSC 445

Homework 1

Sports Data Mining

Sports are famous for keeping lots of statistics. Baseball players are judged by statistics such as batting average, runs batted in, and earned run average. Football players are ranked by touchdowns, rushing yards, and interceptions. Basketball players are scored by points, blocks, and fouls. These statistics are used for nearly everything, including ranking how good a player is, betting on who will win a game, creating fantasy sports games, and providing material for announcers to use during a game. All of these applications of sports statistics are using data mining techniques.

It is often important in sports to know how “good” a player or a team is. For example, in college football, the BCS system uses a complex formula to rank teams, and to decide which two teams make it to the championship game. There are also a multitude of awards for players to win. An example of data mining in this field was the use of a Bayesian classifier to predict the winners of the Cy Young pitching award in baseball. This effort created a model that was 80% accurate for starting pitchers. In the future, these models could be used to replace the voting process for picking winners of awards, or at least to highlight cases of biased decision making.

Teams need to know which players to sign, and how much they should pay them. A player could use data mining to demand a pay raise by using a model to show that their level of pay is not commensurate with their performance. A team could use data mining to guide their draft picks, instead of the current process of scouting players by hand. A fantasy sports website, during the draft phase, might use data mining to assign a dollar value to a player.

While people have been predicting and betting on sports games for a long time, traditionally most predictions were made either based on home-town loyalties, on simple statistics such as win-loss records, or on gut instinct. However, data mining techniques are now used to produce much more sophisticated methods of predicting the result of a sports game. Formulas can take into account how individual players have matched up against each other in the past, how performance has varied at different sports stadiums (including the concept of home team advantage), and psychological aspects such as momentum. This concept is especially important in computer games, which often include a season mode; since the game cannot expect the player to play every game of the season, many of the games must be simulated.

Data mining can also be used to help a team make decisions in how to best beat the other team. In baseball, a batter could use data mining to figure out the best place in the ballpark to hit the ball to. In football, data mining can be used to figure out which plays the other team are most vulnerable to. In soccer, data mining can be used to help a striker decide which part of the goal to attempt to shoot the ball into.

The creation of a fantasy sports website also involves substantial use of data mining. As mentioned previously, data mining can be used to estimate the value of a player, which can be used in the draft to both guide the user in making their decision on who to draft, and also to charge them an appropriate amount of virtual money for drafting the player. Data mining techniques also must be used to decide who wins a “game”. Because each team in a fantasy sport consist of players from differing real teams, the winner of a game is usually based on performance numbers from each player in their most recent game. However, this isn’t always fair, because some players may be playing against easier opponents on a given day, and some players may be playing against harder opponents. Players also may be penalized if they happen to not play in a given game. Data mining can be used to produce a more sophisticated model to decide who wins a game.

In conclusion, data mining has increasingly transformed the way sports are played, and the way that fans interact with sports. However, we have only begun to scratch the surface of how data mining can be used, and it is certain that the use of data mining in sports will continue to increase.

Sources: