Predicting Album Sales with Back Propagation Neural Networks

Sophomore Slumpware

Prepared by Matthew Wirtala

for Yu Hen Hu

ECE 539, Fall 2005

12/21/05

Table of Contents

Introduction ………………………………………………………………...... 2

Overview of Project……………………………………………………………………4

Critical Acclaim………………………………………………………………...4

Previous Album Releases……………………………………………………6

Hype Level………………………………………………………………………6

Classification…………………………………………………………………...7

Implementation…………………………………………………………………………8

The Neural Network……………………………………………………………8

Preprocessing of Data………………………………………………………..8

Results…………………………………………………………………………………..9

Discussion of Results……………………………………………………………….10

Introduction

The music industry has faced a great deal of turmoil over the greater part of the past decade. Industry revenue has dropped between 4-8% annually over the past 5 years, with some sources putting the overall decline in sales during this span at upwards of 30%. The Recording Industry Association of America has famously blamed the decline on internet file sharing: starting with Napster users in 1999, around 17,000 people have been sued by the RIAA for copyright infringement-related matters. Though Napster was shut down in 2002, it seemed that replacement services were continuously available, with applications such as KaZaA, Morpheus and LimeWire allowing users the same services as Napster, much to the RIAA’s chagrin.

Many argue that it is the music industry itself that is to blame for this decline in album sales, claiming that file-sharing services in fact help users find new music that they are more likely to then go out and buy then if they were unable to initially sample it for free. Product, they argue, is to blame: if record labels focused on long term artistic development as opposed to flash-in-the-pan prefab pop stars then perhaps more people would buy these (hypothetically) “better” records. One argument used to back these claims up was Radiohead’s 2000 album Kid A, which was widely available on file-sharing networks months before it’s official release. In spite of this, the album sold incredibly well, charting at number 1 on Billboard and going platinum. This was especially interesting considering Radiohead had never entered the top 20 in the US before and Kid A was not as heavily promoted as many albums, suggesting much of the sales came from buzz created by file-sharing.

Most recently, the iPod has become all but ubiquitous, with sales growing from about 400,000 in 2002 to over 20,000,000 in 2005. Accompanying this is the spread of pay-for-download music sites such as the iTunes Music Store and eMusic, with iTunes having sold over 500,000,000 songs since its launch in 2003. The music industry has entered a reluctant partnership with these services, realizing that mp3 technology has become entrenched and as such models of music distribution must eventually adapt to these global developments. Still, the RIAA consistently butts heads with companies such as Apple, the creator of iTunes, over how profits from these downloads should be divided. Furthermore, companies such as Sony have recently come under fire for including software on their CDs to prevent the digital distribution of music contained therein. In Sony’s case, rootkit software was present on some CDs that automatically installed itself on users’ computers with no corresponding uninstall software and in a manner that was invisible to the user and anti-virus software. These tactics show that the industry is still very resistant to current trends in musical listening and distribution.

Whether the decline of album sales is due to file sharing or a drop off in product quality, one thing is for sure: The industry could use some tools to better manage their resources. With over 30,000 albums released each year, which albums should receive significant marketing pushes? Which albums are more likely to be profitable? Which albums may be overlooked, and which may not sell well no matter how they are pushed? For this project, artificial neural networks were implemented to test whether or not sales figure could easily be predicted from some easily gathered statistics.

Overview of Project

This project attempted to develop a neural network configuration that could predict the approximate order of magnitude of sales an album could expect based on several different factors. Several factors considered were the critical acclaim an album receives, the number of albums in the artists back-catalogue, and the general level of hype surrounding the artist and album. Each of these factors will be discussed in turn.

Popular rock music criticism has been around since the 60’s and 70’s when magazines such as Rolling Stone and Creem came into being. Rock fans could count on magazines to provide analysis of the latest batch of albums, often accompanied by a numerical score of 0 to 5 stars that provided a simple way for the critic to show how fully he backed a specific work. This model has been used in various configurations for over half a century. For this project, I chose to utilize critical rankings from four different sources: www.pitchforkmedia.com, www.metacritic.com, www.allmusic.com, and Rolling Stone.

Critical Acclaim

Pitchforkmedia.com has become a major sensation in online music reporting since its creation in 1995. Currently over 1,000,000 different people visit the site every month, and its reviews can often break bands to a much larger audience. While its focus is generally on independent music releases, it frequently also features larger mainstream releases. Pitchfork was selected for this project due to their status as the premier online music review site. Having come of age during the same time as the advent of digital media and file-sharing, I thought the correlation between Pitchfork’s critical sensibilities and the success of an artist to be worth analyzing. Pitchfork rates albums on a scale of 0.0 to 10.0, and this rating was employed in the feature vector.

Metacritic is another online review site. Rather than providing in-house reviews like Pitchfork, Metacritic compiles scores from a wide variety of online and print publications and assigns an averaged “metascore” that indicates the general overall critical consensus on a work. I thought this was an excellent measurement to consider, as it comprises many other sources that would have been difficult to gather individually by hand for the albums considered. Metacritics scores are assigned on a 100 point scale.

Allmusic is less like an online review site and more like an online database of albums. It is incredibly comprehensive in its coverage of all artists and musical styles, and one is hard pressed to find a release that is not featured in some way on this site. The Allmusic website is run by All Music Guide (AMG), which is responsible for a number of musical database projects including the similarly comprehensive listening stations found at Barnes & Noble bookstores. Allmusics ratings are assigned on a 5 star scale, with half-star scores also possible.

Rolling Stone has become the largest musical publication since its inception in 1967. Though currently facing a decline in circulation, it remains incredibly influential, and features extensive coverage on enormous mainstream groups as well as up-and-coming acts. Its rating system is, like Allmusic, the standard 5-star scale.

Previous Album Releases

This is a fairly straightforward metric that was primarily for two reasons. The first is that an artist that has a significant back catalog of releases can be considered to be more established, with a larger fan base that is willing to purchase their albums regardless of critical acclaim or hype. The other reason is the dreaded “sophomore slump”. Frequently an artists second album is a barometer for their lasting power: While media hype may allow a group to sell many copies of their debut, a fickle public may have moved on to new trends by the release of their second album, resulting in a steep decline of sales. This feature was listed as the number of prior releases, so a debut release is assigned a value of 0 and is incremented by one for each subsequent release in the artists back catalogue.

Hype Level

The final metric considered is the level of media attention received by an artist and release. While by far the least scientific measurement made in the study, this score is significant in terms of how aware the public are aware of an artist. To estimate this, searches for each artist were performed on Rolling Stone and Spin’s websites. The number of search results pertaining to the artist were added to the feature vector. It was assumed that the more press coverage received by an artist, the higher their album sales would be.

Classification

It was determined that attempting to train a neural network to predict an exact number of album sales would be prohibitively difficult. Furthermore, calculating classification error would be impossible, as the number of individual classes of sales likely be as large as the sample space. For this reason, albums were put into three separate classes. Class 1 featured albums that had sold under 500,000 albums in the U.S. These albums are frequently independent acts that have a small but dedicated fanbase, often with hopes of releasing a “breakthrough” album and achieving widespread mainstream success. Class 2 contained albums that had sold between 500,000 and 1,000,000 copies. This level of sales will garner an artist recognition by the RIAA in the form of a Gold Record award and is considered to be moderately successful. Class 3 featured albums that had sold over 1,000,000 copies, a distinction earning a Platinum Record award.

A total of 60 albums were considered, with feature data gathered from all sources listed above. Twenty albums were gathered for each class. A sampling was considered that featured a wide array of feature combinations: Platinum albums that had been panned critically, separate releases by an artist that fell into two different classes, and critically acclaimed albums that nevertheless failed to attain even Gold status. This was done to ensure the robustness of the neural network prediction algorithm. A full list of all albums considered, along with their feature vectors and classification labels, is contained in the appendix.

Implementation

The Neural Network

Professor Hu’s bp.m back-propagation neural network algorithm was utilized for training and testing purposes. Several configurations were implemented to find the best classification rate. Classification rate was measured using 3 way cross validation. The dataset featuring the 60 samples was randomized and then split into three equally sized partitions, each featuring 6 samples from one class and 7 samples each from the other two classes. The neural network was then trained using two of these partitions and tested on the third, for all combinations of the three partitions. This division of data was intended to fully test the robustness of network implementation.

Preprocessing of Data

Due to the large variation in input data ranges, all data was normalized to fall between -5 and 5. The classes were separated and the mean and standard deviation of each feature within each class was calculated to determine if any feature corresponded more specifically to any specific classification. Surprisingly, there appeared to be a slight inverse correlation between critical acclaim and record sales, with critical ratings declining slightly for albums that sold more. This could be interpreted as slight proof that major labels are putting out inferior products these days, as many of the albums that were rated higher were released on smaller record labels, indicating sales may have been reduced more by a decreased marketing budget than by being a poorer album. Magazine coverage tended to increase with record sales, though it is somewhat unclear if this is a case of coverage being offered to artists because they sold well or if the coverage was what sparked the high sales in the first place, or perhaps some combination of the two.

Results

Many configurations were trialed to determine the optimal separation of the different album classes. Initially, configurations with a hidden layer featuring as many neurons as features (7) were trialed. These configurations failed to separate the data, resulting in very low classification rates as all of the test set was classified as one label. Lower and higher numbers of neurons were also trialed with different learning and momentum rates.

The training error was consistently around 9.5% after convergence. A sample plot of training error vs. epochs is presented in figure 1.

Figure 1: Training error vs. training epochs for a sample neural network configuration

After many trials, improvement in classification was slowly made by adding a second hidden layer of 8 neurons. Increasing the learning rate from the default value of 0.1 to a value of about 0.266 and reducing the momentum from 0.8 to about 0.007 also showed improvement in classification, with a classification rate of 60% being achieved after a great deal of tweaking. The following table summarizes the ‘milestone’ configuration modifications that showed improvements in classification rates during repeated trials.

Table 1: MLP configurations and their corresponding classification rate

# Layers / # Hidden Neurons (Layer 1) / # Hidden Neurons (Layer 2) / alpha / momentum / Classification
1 / 7 / --- / 0.5 / 0.1 / 35%
1 / 5 / --- / 0.1 / 0.8 / 40%
2 / 7 / 4 / 0.5 / 0.3 / 50%
2 / 7 / 8 / 0.24 / 0.005 / 55%
2 / 7 / 8 / 0.266 / 0.007 / 60%

Furthermore, at its highest levels of classification, the resulting confusion matrices indicated that while classification of albums in classes 1 and 3 was between 80-100%, the neural network was rarely able to properly classify albums falling into class 2. The following sample confusion matrix displays this anomaly.

Cmat =

4 0 2

2 0 5

1 0 6

Figure 2: Sample confusion matrix showing the difficulty in predicting class label 2 (Crate = 50%)

Discussion of Results

A classification rate of 60% shows that the neural network is able to correctly predict some class labels, while others prove to be too difficult. The similarity in the means of many of the features support this. At its optimal level, the network is able to predict albums in classes 1 and 3 with an accuracy of 80-90%. Albums in class 2, however, are consistently misclassified into classes 1 or 3, suggesting that the features are too similar to these other classes.

For future improvements, I would perhaps reduce the class labels to either albums that have sold below and above 500,000 copies or below and above 1,000,000 copies. This may lead to increased predictions. Furthermore, support vector machines with appropriate kernels may be able to separate feature vectors to a higher degree of accuracy.