Nicholas Hainsey /
10/13/2014 /
Background:
Introduction:
Neural networks and machine learning algorithms are now being developed for the purpose of being able to solve problems that conventional algorithms have trouble with. Such problems include pattern, image, and speech recognition. To do so, computer systems are being designed more like a human brain than a computer. The human neocortex is able to take in any type of sensory data, make inferences from it, predict outcomes, and react based off those predictions. If we mimic this in building computer systems, it’s possible we can make a computer that can recognize speech patterns or image patterns and do the same. Essentially the goal is to create a computer that can learn from data it is trained on, remember that, and make inferences as to what new data looks like.
Neural Networks
As stated, the goal of neural networks is to more efficiently solve problems like pattern recognition. A simple example of a problem like this is given in Neural Networks for Pattern Recognition [1].Say we want to determine the classification of some image. We could simply store each possible image with its corresponding classification. This may work for incredibly tiny images but it quickly becomes unmanageable. Take for example images that are 256 x 256 pixels; that means each image consists of 65536 pixels, each one a certain color represented by 8-bit numbers. The total possible number of images would then come out to 28x256x256 or roughly 10158000 images. It is just impossible to store them all. In comparison, using a neural network, we might only use a few thousand pictures to train it on. When presented with a new image, the neural network would then make an inference as to what classification it would be, based off what other images it knows it is similar to.
An example of a neural network performing pattern recognition similar to this can be seen in an article in Expert Systems with Applications[2]. In the article they discuss designing a multi-layered perceptron (MLP), a type of neural network, to perform iris recognition which is a common type of biometrics recognition. Using a simple implementation of an MLP they tested multiple different ways of partitioning the data they were given, which were images of eyes. Based on the way they partitioned the data and organized the MLP, they were able to achieve up to 93.33% accuracy. This shows that data preprocessing can be essential to creating an accurate neural network. Their accuracy results were able to exceed 4 others who tried before them and approach that made by others in 2008. Also working on iris recognition, Xu et. al. used a different type of neural network called an intersecting cortical model to test the accuracy of it under different circumstances [3]. In their tests, they were able to achieve 98% accuracy for iris recognition using this neural network.
Iris recognition isn’t the only thing neural networks can excel at. Another article in Neural Computing and Applications in 2009 shows that using a self-adaptive radial basis function neural network for facial recognition could outperform the currently used methods of facial recognition at that time [4]. The neural network was tested separately on 2 different facial recognition databases, one with a small variation in angle and scaling of the images (ORL), and one with large variation in angle and scaling of the images (UMIST). With both databases and their proposed method they were able to achieve error rates better than other methods for facial recognition and which approached the best error rates for facial recognition.
All of these articles so far have dealt with image recognition; however we’ve already said that is not the only type of pattern recognition that neural networks can be used for. What we are particularly interested in, is using neural networks for predicting time-series data. In an article in Expert Systems with Applications in 2011, a neural network model was used to forecast the occurrence of seismic events such as earthquakes [5]. In the article, the case study was a neural network trained on only time-series magnitude data and the output was the magnitude of the next day. The accuracy of this method was 80.55% for all seismic events but only 58.02% for major seismic events. The second case study involved a neural network being trained on seismic electric signals (SES) which occur before earthquakes, as well as the time between SES and earthquakes. After constructing the missing SES data in the time-series, they were able to predict seismic events’ magnitudes at 84% when predicting on the magnitude. When predicting the magnitude and time lag to the seismic events they were 83.56% accurate on the magnitude and 92.96% accurate on the time lag.
Another example of neural networks being used for time series can be seen in an article from 2010 in Solar Energy [6]. This is another group to use an MLP to determine patterns in data. Their goal was to use the MLP to be able to predict daily solar radiation on a horizontal plane, which would be useful information for solar electricity providers. When trained on a time series of solar radiation data, the MLP was able to perform at or better than other common models they had been testing, based on mean square error, even before deseasonalization.
This is particularly surprising given the results of another article from the European Journal of Operational Research which found that “neural networks are not able to capture seasonal or trend variations effectively with the unpreprocessed raw data and either detrending or deseasonalization can dramatically reduce forecasting errors [7].” In the article they compared using a feed forward neural network on seasonal data with a trend to using it on the same data that has been deseasonalized, detrended, and both. In every test, the neural network using the original data performed far worse than the other three in teams of root mean square error. This was true for 3 separate levels of noise. This lends more evidence that when building a neural network we should be focused on setting up the data correctly beforehand.
HTM
The specific type of neural network we are interested in using is called hierarchical temporal memory. Jeffrey Hawkins and Numenta, Inc. currently hold a patent [8] for “Trainable hierarchical memory system and method” which is another type of neural network as well as a number of other patents [9] [10]on the structure and function of this system. They aim to replicate the neocortex and the way it functions by copying the hierarchical structure of its neurons. Each layer of the hierarchy learns information from the layer below it all the way down to the input layer. The information is then passed up the hierarchy to the smaller layers above until it reaches the top layer where output is determined [11]. HTMs are specifically designed to work with sensory data that is constantly being read in. because of this they could be a great method for predicting time series data as well as other pattern recognition problems.Currently Numenta has released the source code for their implementation of an HTM called Numenta Platform for Intelligent Computing (NuPIC) [12].
For example, in 2009, a paper published in the South African Journal of Science describes Numenta being used for land-use classification [13]. In the article it is described that the NuPIC was used and trained on satellite images of different types of land use: built-up surface, irrigated land, fallow land, and different plant species. Once the HTM was trained on what each type of land-use looked like it was tested with new satellite images to see if it could determine which type of land it was looking at. In the end, the least accurate the HTM was with a land type was 81.33% while another land type it was able to determine correctly 100% of the time. The HTM showed an overall accuracy of 90.4% with a classification that was 80% better than just randomly classifying images. Later in 2011, they published another article in African Journal of Agricultural Research where they redid this experiment [14]. The purpose for the retooling of the experiment was the first time, each training image and test image only had one pattern of land in it. This time they used the same land classifications however the images were less restrictive, allowing more than one land pattern in each one, also adding other factors in to try to increase accuracy. In the end they were able to produce results similar to before, where they were able to determine some land types with 100% accuracy and others at accuracies as low as 87.4% with an overall accuracy of 96%.
Another article from Neurocomputing compares different Heirarchical temporal memory models and Hidden Markov Models for sign language recognition [15]. The article discusses training different HTM models and Hidden Markov Models on input data about hand signs such as position, velocity, and acceleration of the hand in 3D space; roll, pitch, and yaw of the wrist; and bend coefficients for each finger. Once trained on the data, the systems would be given new hand sign data to determine how accurately each one could predict the sign. Hidden Markov Models ended with an accuracy of 88% while NuPIC had an accuracy of 61%. However, when they modified the HTM and decided to partition the input space into multiple regions, they were able to produce HTMs with accuracies greater than 61% and even greater than 88% in some cases.
In 2008, Nathan C. Schey presented his Honors Thesis at Ohio State University on using NuPIC for song identification [16].In this study Schey began by using a piano roll graph of a midi file as his data read into the HTM. In this attempt, the HTM had an accuracy of only 47%. However he found this was due to the HTM using Gaussian distance when data is represented in binary. Upon changing his representation scheme of the songs and creating a larger, more robust HTM, he was able to achieve a song prediction accuracy of 100%. But the HTM only had to learn 5 songs. So he decided to change the data set to 40 songs and still achieved 100% prediction accuracy. The songs that were being analyzed were stored in a Midi format which is generally simpler than that of other audio file formats; however the 100% accuracy still lends weight to the idea that HTMs can be powerful mechanisms for computerized pattern recognition.
Time Series Prediction:
Though we have given a few examples of neural networks used to pattern recognition, the area we would really like to focus on is time-series prediction. A literature review on various time series prediction algorithms by Kumara, et al. goes into some detail on using neural networks for time series analysis[17]. They stated that with invention of back-propagation algorithms (algorithms that let information flow forward and backward in a network) in 1986, applications of neural networks on time-series problems began to have successful results, eventually showing the capability to outperform statistical forecasting methods like regression analysis and Box-Jenkins forecasting. Georg Dorffner also provides an overview of neural networks being used for time series processing [18]. He also explains that in most cases, data must be pre-processed depending on the problem and dataset before it can be used in the neural network, giving the examples of deseasonalizing and detrending. His reasoning for this is that many methods for forecasting require stationarity of the data. These and the other papers we mentioned concerning time-series prediction using neural networks[5][6][7] give an idea what must be done to use neural networks for time series forecasting, and provide an idea of what we can do with HTMs given such a problem.
My Proposed Project:
Neural networks can be applied with relatively good accuracy to a multitude of pattern recognition problems including time-series prediction. As was said before time-series prediction is the problem that interests us most. While we were unable to find any articles using an HTM for time-series prediction, we have found examples of it online. The HTM community on Github has created some tutorials on the matter [19] and Numenta themselves have created commercial products that use such a method. One product they have available is Grok [20]. Grok monitors and learns standard patterns from real-time data about a user’s Amazon Web Server environment. Once it has learned these patterns, it alerts the user when it receives data that is anomalous compared to what it is expecting. An example of this would be if CPU usage spikes when it is expected to stay low and stable, this is flagged as anomalous.
NuPIC:
NuPIC is open source software for building HTMs that implement Numenta’s cortical learning algorithm (CLA).
Specialized Graphical Interface:
It is in our interest now to take the Numenta Platform for Intelligent Computing and build a user interface for it that would allow an HTM to easily be applied to a time-series problem. As it stands, NuPIC is run entirely with Python scripts from the terminal or command prompt. Our main goal in this project is to create a graphical user interface for a certain problem that will allow us to build an HTM and run it with a given dataset. Examples of what such an interface may look like can be seen in video tutorials of NuPIC and also in a canceled project called OpenHTM [21]. From the articles we reviewed, it is apparent that a main factor in creating accurate neural networks is data preprocessing. As such, our GUI should provide a view of a user’s data and allow them to perform various measures of preprocessing to their data. It should also provide a view of the system and what the HTM is predicting from given input data. In addition to this, it should allow the user the ability to tweak the customizable paramerters of NuPIC and see the result.
Generalized Graphical Interface:
Once I finish the graphical interface for the specified problem, the next goal would be to extend it for a more general purpose. It would be great if the interface could use any type of time-series data to be analyzed using an HTM. However, as it stands this may be a difficult task to complete as some data may require a certain amount of specialized preprocessing in order for the HTM to have accurate results, as was seen in a previous article [7]. This generalized graphical interface is more of a stretch goal for the case that the specialized graphical interface will be completed ahead of time.
Resources Needed:
As I would be creating a GUI for python code(unless Java proves to be simpler), I would need the use of a PC, as well as whatever python libraries are required to run NuPIC. A development environment for python would be useful as well though, however any text-editor could work.
Timeline:
November: Get NuPIC tutorials working
December: Get NuPIC working with custom data sets
January: Start building GUI around NuPIC as I could get it to work with our datasets.
February: Flesh out GUI, add graphical representations of HTM
March: Extend GUI to a more generalized time-series problem.
Works Cited
[1] / C. M. Bishop, Neural Networks for Pattern Recognition., Oxford, New York: Oxford University Press, 1995.[2] / F. N. Sibai, H. I. Hosani, R. M. Naqbi, S. Dhanhani and S. Shehhi, "Iris recognition using artificial neural networks," Expert Systems with Applications, vol. 38, no. 5, p. 5940–5946, 2011.
[3] / X. Guang-zhu, Z. Zai-Feng and M. Yi-de, "An image segmentation based method for iris feature," The Journal of China Universities of Posts and Telecommunications, vol. 15, no. 1, 2008.
[4] / J. K. Sing, S. Thakur, D. K. Basu, M. Nasipuri and M. Kundu, "High-speed face recognition using self-adaptive radial basis," Neural Computing and Applications, no. 18, pp. 979-990, 2009.
[5] / M. Moustra, M. Avraamides and C. Christodoulou, "Artificial neural networks for earthquake prediction using time series magnitude data or Seismic Electric Signals," Expert Systems with Applications, vol. 38, no. 12, p. 15032–15039, 2011.
[6] / C. Paoli, C. Voyant, M. Muselli and M.-L. Nivet, "Forecasting of preprocessed daily solar radiation time series using neural networks," Solar Energy, vol. 84, no. 12, p. 2146–2160, 2010.
[7] / G. Zhang and M. Qi, "Neural network forecasting for seasonal and trend time series," European Journal of Operational Research, vol. 160, no. 2, p. 501–514, 2005.
[8] / D. George and J. Hawkins, "Trainable hierarchical memory system and method". United States of America Patent US20070005531 A1, 4 Jan 2007.
[9] / J. Hawkins and D. George, "Directed behavior using a hierarchical temporal memory based system". United States of America Patent US20070192268 A1, 16 Aug 2007.
[10] / S. Ahmad, J. Hawkins, F. Astier and D. George, "Extensible hierarchical temporal memory based system". United States of America Patent US20070276774 A1, 29 Nov 2007.