Business Understanding:
For our project, we examined Bitcoin, a cryptocurrency first initialized in 2008. Bitcoin is a currency that formed from the “mining of each singular bitcoin”, creating a limited amount of “Bitcoins” to be used as a sort of electronic currency. This mining is the solving of advanced mathematical equations using cryptography. This allows the currency to technically be a limited resource because only so much of it can be mined at any given point in time with current technology. Bitcoin has currently been on an enormous rise (increasing thousands of percentage points per month) and is being talked about in newspapers, social media, and academia everywhere. The validity of its U.S. currency value is being leveraged through its value in services that currently accept bitcoin as a form of payment. However, we believe the cryptocurrency market is a bubble, and the value of the industry is being fueled by public opinion and investment as opposed to the currency gaining any real value through commodity trade. In the past six months, it has grown from being worth under $1,000/Bitcoin to over $10,000/Bitcoin. Our goal was to explore text analytics using Twitter, BBC, and USA Today sentiment analysis specifically related to Bitcoin, and compare those trends over time to the action seen in Bitcoin’s market value. The stock value was pulled from a free Kaggle S&P 500 dataset. We also gathered 5 other companies (M&T Bank, PNC, Netflix, Fiserv, and Ethereum) to test the effect of public sentiment on well established companies whose market caps are equivalent. Essentially, we would cross reference each company’s sentiment analysis scores against their stock data to inference the relationships between the companies. We hypothesis that Bitcoin’s stock is more volatile and swayed by social media activity when compared with other reliable stocks found on the S&P 500.
Data Understanding:
Our data focused on every S&P 500 company with daily valuations over 5 years and cryptocurrency companies over the past year. We quickly noticed that the established companies had stable daily values in their opens, closes, and volumes over the entire period while the cryptocurrency companies whose values were measured in the billions but would lose 30% of their value overnight to regain it plus 15% the next day. This was immediately alarming, and we knew that the best way to compare these values was to look at what a company was tweeting about and seeing how many people would respond to it. The idea was the higher the response rate, the more people would see it, and the more they would invest and drive the value up.
Once we began collecting the twitter data to test this theory we noticed another trend, while a well-established company would have mostly neutral comments with a good mix of positive and negative sentiments, cryptocurrency sentiments never appeared to be negative even when they were reporting on something negative in their industry.
The news data was collected to see if it would influence the valuation of the company like the twitter sentiment. I would be used to support a more critic style effect on the market.
Data Preparation:
To perform sentiment analysis and compare the effects social media has on standard companies versus crypto-currencies, we had to start with a basic Twitter mining process to gather Tweets. We followed each company’s official Twitter handle and used the RapidMiner software to collect their Tweets from between the dates of February 11th and September 22nd, as this was the period that we had stock data for each company. We also mined Tweets from prior to February 11th to use as a training data set for classifying each company’s Tweets. Due to the limitations on Twitter’s data collection, we had to go this route and only use company’s own Tweets, which were a combination of their status updates as well as Tweets from others that the company had retweeted. Each company’s test data (i.e. Tweets from between the February 11th and September 22nd window) was saved separately into individual CSV’s; the same was done with the pre-March 22nd training data. The training data was run through Aylien to create a model that would accurately identify the positive and negative opinion of Tweets. From Aylien, we limited what we took for our model by reducing the entire output to only take items with confidence scores above 90%. From the remaining tweets we would sort through them by hand and determine if the sentiment was accurate or not to develop the most accurate training set possible.
Simultaneously with the Twitter analysis, we worked to collect news articles on each company to see if the same conclusions could be drawn when viewing sentiment through that lens. Finding the articles to use for the six different companies seemed rather simple at first glance but ended up becoming more complicated than we initially gave it credit for. First, we had to find news sources that allowed us to view an unlimited number of articles, because web crawling was often interrupted when it ran into a security monitor or different encryption. Another issue we ran into revolved around the Wall Street Journal despite a teammate having a subscription. Sadly, even with that subscription the crawling was still hindered by the basic authorization. Once we had found news sources that worked (USA Today and BBC), we searched for the articles on the companies we chose. This involved putting in search parameters that would filter by company name and reference, filter some websites date and time, and sometimes even filtering by specific article names. The problem here ended up coming from the seed page, as if it didn’t have the correct elements to recognize a present URL link, RapidMiner ended up not reading the HTML link that was buried on the pages. When given the correct seed page the rest of the process was rather similar simply saving each HTML link to a file folder labeled correctly as there were six different companies and different news sources being used. Thankfully we were able to manually go through and date each of the articles for the companies that had less articles overall, while also getting rid of any extraneous or non-useful information; for example, if Bitcoin was the company but RapidMiner pulled an article that Bitcoin wasn’t the focus, it was excluded at this point.
Modeling:
Once we had classified our training data through Aylien, were able to begin applying a model using these results that would eventually be used on the testing data. For each company, we used the same barebones training process, but tinkered with each operator and use varying cross validation techniques to achieve as high of an accuracy metric as possible. A “Read CSV” operator always came first to bring in the company’s training data (that is, the pre-February 11th Tweets for that given company already with sentiment labelled by Aylien), followed by a “Process Documents from Data” operator where all the tokenization, stemming, and other text pre-processing occurred. Figure 1 is a picture of the inside of this operator from M&T Bank’s process. With other companies, we utilized different pruning techniques, different n-gram amounts, different lengths for filtering, and other minor changes depending on what would improve that model’s accuracy. Following this operator was always a “Set Role” operator to hone in on the polarity attribute and target a label, and then the “Weight by Information Gain Ratio” and “Select by Weights” operators. Finally, we would have the “Cross Validation” operator, inside of which would be one active cross validation operator and then the “Apply Model” and “Performance” operators. The parameters inside of any one of these operators throughout the training process were different across companies, as we wanted to have models that did a good job for each of them rather than one large model that performed less accurately. For example, our M&T Bank process (pictured below) used an absolute pruning method above and below specific values, used a k value of 1000 in the “select by weights” operator, followed a k-folds cross validation with 200 folds, and applied a KNN processor in our cross validation. By comparison, Naive Bayes was sometimes the strongest cross validation method (Bitcoin and PNC), while a decision tree had better accuracy results at times as well (Ethereum, Fiserv and Netflix). Additionally, the models varied in the number of folds in cross validation, number of selected features for weights, different pruning and/or text processing options, and more. Figure 2 is an image of the inside of our training process’ Cross Validation Operator too, for easier visualization of how each model was put together at this point.
Another important item to note was the fact that most Tweets were classified as “neutral” rather than “positive” or “negative”. This made model selection more important, as there were some models that had higher accuracy simply because they classified every single Tweet as neutral. We knew it was more important to have a model that did a better job with recall rates rather than just focusing on overall model accuracy, so there was a bit of a balancing act there to try and make an “optimal” decision. Ultimately, even while trying to accurately predict classes, we were still able to achieve overall model accuracy totals roughly 85% or better for every company: for most, it was even above 90%. Below is the lowest accuracy attained out of any of the 6 companies, from M&T Bank. It is clear in this output too how we emphasized the importance of class accuracy as well. We made sure the model was able to properly predict at least 50% of the positive and negative Tweets, and it ultimately cost us some overall accuracy in doing so. However, this model did more of what we wanted and would attempt to predict classes rather than just assume “neutral” for every Tweet.
For the modeling of our news data It was rather simple once the data preparation was finished. It was a running the process below on each of the companies and the data from each news source so a total of twelve different identical processes that all wrote their own csv however we had Aylein and the Lexicon write to the same csv for a quicker analysis at the end. There were quite a few steps during the process. I had an updated lexicon that included 75 words found from the articles about bitcoin to run on just bitcoin and Ethereum. The regular lexicon on the other 4 companies for a better analysis. The first process documents from files includes a word-based tokenization, removal of stop words, and by length. After that it was generating an aggregation and joining attributes to get a SumPos and SumNeg. The overall sentiment for the lexicon based approach was simply the value of SumPos-SumNeg. For the Aylien Operator we put it in and Aylien spit out its analysis.
Once the training side of the process was working properly and had a strong enough accuracy amongst each class, we begun running them on the test data to truly dig into the sentiment analysis portion of the project. Since we had 6 unique training processes, we had to match this with 6 unique testing processes (again, one per company). The testing half was essentially identical across companies: a “Read CSV” operator to bring in the test data, a “Process Documents from Data” operator (copied and pasted from the one on the training side to ensure an exact match through the process), a “Set Role” operator to label Tweets based on polarity, and an “Apply Model” operator to run the test data through the same training model produced above it. This whole thing, the training and testing as one complete process, this can be seen in figure 3 (the top row is the training process and the bottom row is testing).
At the end of the training and testing process, we had a “Write CSV” operator that created new CSV files holding all the useful sentiment data on each company. With those CSV files we built numerous correlation models such as linear and polynomial regression for each company to compare public opinion against their closing stock values. We used closing values as it shows whether the opinions for the day affected the stock price. We also used dimension reduction techniques to find relationships between our companies and other S&P 500 companies to determine what relationships are significant.
Evaluation:
We found between the financial and technology industries there was no significant relationship between public opinion and the closing values (none had a t-stat of more than 1). There was a significant relationship between the financial/technology industries and the healthcare and information technology industries. Some of the t-scores were well into the 20’s with coefficients in the dozens.
The cryptocurrencies showed great difference in their relationships. We found, there not only was a relationship, there was a strong linear relationship between the public opinions and the closing values. This public opinion to close value relationship implies cryptocurrency performance is volatile enough that what people say during the day will reflect how well the company will close. This relationship is somewhat reflected in the time series data but is best seen in a bar chart comparing the support and confidence scores.
The support for the bitcoin 4.02 or a 4 to 1 relationship and the confidence score is 4.5 or 4.5 standard deviations away from the observation being a false positive. With this score we conclude that public opinion is an indicator of cryptocurrency value.
While the relationship between bitcoin and twitter is clear, what we did not find is any relationship between Ethereum’s value and twitter analysis. What we noticed however was for our time series Ethereum was a much smaller and relatively unknown company. It wasn’t until the recent cryptocurrency market spike that the twitter feed for the company began to skyrocket (increasing 700% in retweets the month after our time series ends). We conclude with more recent data this would be a much more correlated scenario but with our older data sets, the relationship is inconclusive.
Perhaps one of the most interesting findings was how strong of a relationship there was between public opinion and the news feed sentiments we gathered. Initially there seems to be no relationship between the two but upon closer inspection it turns out there is a strong inverse relationship between newsfeeds and twitter sentiment from the company in Bitcoin’s case. We found that if the sentiment in the news was negative the response from bitcoin would coincide with a positive twitter story that had on average 27 retweets (-27.35 coefficient). We believe this only further validates our point that Bitcoin is a bubble not because of a market frenzy but because Bitcoin itself is spinning a narrative that seems to push up the valuation of their stock.
Deployment:
In practice, these techniques should be tracked over time to see if the sentiment of twitter users continues to influence the cryptocurrency values. If our initial bubble belief holds over time, the sentiment and values should be tracked while the bubble bursts to see if there is an event that marks the beginning of the crash.
In all models, investors should ensure that the company in question is not only gaining value from the sentiment of the public but also that the company is spinning negative narrative into positive narrative that people agree with based on the number of retweets.
Should a unique sentiment trait mark the beginning of a market crash, that may be factored into stock prediction models to not only spot market bubbles as they happen but also when they start to crash. This may be used to better protect future investors from potentially financially devastating mistakes.
Contributions:
Before getting into everyone’s contributions, we would like to highlight some of the changes or improvements we would have made if we had more time to work on this, as well as some recommendations should someone in the future want to explore this same topic or build on our groundwork. Regarding data collection, due to limitations in what can be taken from Twitter time-wise, as mentioned above, we were only able to use the companies’ own Tweets. If we were to continue pursuing this, we would set up processes to scrape and save Tweets from the public each day regarding the companies, rather than just the companies’ own status updates. This would likely supply us with a wider and better representation of public opinion. The use of retweets as a weight with the companies’ own Tweets is a sensible method as well, but moving forward we would likely have a better idea of overall sentiment, especially negative sentiment, if we could sample the entire public. This is because a company is highly unlikely to have a negative Tweet about itself, and so we really were only able to capture the amount to which people agree with the company via retweets. Another change we would like to have tried is the use of a financial lexicon in place of the Aylien sentiment labelling we went with here. It would be an interesting comparison between the two avenues, and possibly could improve our accuracy in training (and then ultimately in testing as well). As suggested, we would like to try the McDonald’s Word list as an alternate lexicon. If this were the groundwork for another individual to build upon, we would suggest that they include some other variables to use as controls in the regression model, such as previous day’s price, retweet count, and maybe some items outside of both stock value and sentiment to see if there are confounding effects. We would run more than just a linear regression if necessary too.