In this blog post I am going to summarize the paper titled – “Topology of Learning in Artificial Neural Networks“. This paper gave me a new view a new insights about how we can analyze the way neural network learns!
Quoting from the paper – “Understanding how neural networks learn remains one of the central challenges in machine learning research.” Isn’t it surprising that how you merely start with some numbers from a distribution and end up classifying objects.
In this paper, the authors have trained a simple neural network for MNIST dataset classification and they have studied how the weights have evolved.
Although we know that initializing weights zero is not a good strategy because of various reasons(one simple reason is that we want to maintain asymmetry in neural network), in this paper the authors have initialized the weights to zero because it is easy to monitor the evolution of weights.
The key idea behind this paper is that the structure which shall arise during the training of the neural network, should be possible to study via topology (a subject in mathematics which allows us to study shapes and structures).
The authors use something known as Mapper Algorithm to capture captures the topological structure of the original point cloud in the high-dimensional feature space to construct a graph.
Without wasting more time let us check out the results from the paper –
This graph denotes the weights in the output layer which evolved during the training of neural network. If we closely observe then we will be able to see that after a couple of nodes away from the start, the neural network was prediction 7 and 9 as the same digits but then they diverged! Same thing happened with 7 and 1, they also diverge from same parent branch.
Quoting from the paper “Interestingly, the discrimination abilities of the model are correlated with the branchings.”
If we observe the above graph, it clearly suggests that as the branching increases in the hidden layers, the accuracy of the neural network increases.
So what next? I believe that this is a beautify idea and it needs more exploration. I think some simple experimentations can be trying to reproduce the paper with different datasets and different neural network architectures. For example, convnet with fashion MNIST to begin with.
Coming from a mathematics background myself, I found this paper and the idea very beautiful. Let me know your opinions and thoughts in the comments, I am looking forward to it.
Keep Learning! 🙂