In this week’s readings, we focused on the inability of neural networks to be robust to failure. One paper showed that neural nets can misclassify unrecognizable images, in “Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images.” The other paper showed that deep neural networks can fit random labels, in “Understanding Deep Learning Requires Rethinking Generalization.” Although I’m not surprised by the results of these papers, I am left to wonder why we still care about neural networks.
Although neural networks (especially deep neural networks), have produced excellent results in tasks in image processing, they are absolutely opaque to humans. Further, there is no theory of why they work (namely, why are n layers working better than n+1 layers, etc). Especially with these two prominent papers showing their inabilities, why are we still so obsessed with neural networks, and how can we make sure they are somewhat interpretable.
The one thing that strikes me is that neither of these papers give a tangible solution to why these phenomena occur. Some people have said that they are possible “memorizing features”, so if they scramble labels, they will simply memorize the features over time. However, the unrecognizable images “intuition” remains unknown. Perhaps the neural network is processing some of the “noise” factors as features. No one has discussed this.
A final idea is the [lack of] reproducibility of neural networks. I have personally never heard of someone successfully replicating a neural networks paper with the same error rates and confidence etc. Even if there is some randomization (especially at the beginning), the algorithms and methods in the papers need to be reproducible, at least to some degree. Another reason could be that the training and test set are not always chosen the same way. If that is the case, then there should be a chosen test and training set for each one of the “classic” data sets.
A final question I had was if there is a tradeoff between capacity (the size of the neural network) and its lack of robustness (its ability to be fooled by noise, etc). I would guess that the bigger the neural net is, in terms of layers and nodes, the more difficult it would be to understand, and therefore, it would be easily fooled and it would be even more difficult to find out why. Perhaps a simple test would be to prune the neural network as much as possible, by decreasing the number of layers and/or nodes. Again, none of the papers mentioned pruning as a tangible strategy to start to figure out why these neural networks are difficult to understand and easy to fool.