We have seen and know the neural networks which takes an image, say car and outputs the log likely hood of various classes. We can also have a neural network(ConvNets) which can predict whether an image is of a cat or not –
However we do the task of finding the object of interest (the object which the neural network either has to detect or classify), i.e giving the neural network the right object.
The challenge is how to find objects to detect/classify in the following picture?
pic source – my Instagram! (yes I love photography)
Where to look? Attention answers this question.
Implicit Attention in Neural Networks –
Deep Learning networks have a tendency to naturally respond to some parts of the data more than others. This is called Implicit Attention.
Implicit attentions are great, when they are combined with RNNs they give amazing results in the field of say, machine translation. Check out this paper.
Explicit Attention in Neural Networks –
We need a mechanism of explicit attention because it helps in –
- Computational Efficiency (If you can explicitly limit yourself to a subset of data then you won’t have to process the whole data)
- Sequential Processing of static data (turn static data in sequence)
- Easier to interpret
Neural Attention Models –
The neural network produces output as usual however it also produces extra set of outputs which are used for training the attention model. The attention model then gives “Glimpse”(a lookup key) as an output which we combine with the input and pass it to neural network.
The whole system is recurrent even when the neural network inside it is not.
Many cognitive researchers describe attention as light projections(spot light), the light which brings objects in focus.
The idea is in a glance of an image can we get the right areas of interest?
Attention models generally work by defining a probability distribution over glimpses g of the data and some set of attention outputs a from the network:
Simplest Case: Assign Probability to set of discrete glimpses!
Note – Foveal Attention! Google it for now, I will write a post on it shortly.
Soft Attention –
Hard Attention: Fixed sized window which moves across the image and gets trained using RL.
Soft Attention is a differentiable way in which we can train with back prop end to end. It is easier than using RL but more expensive to compute.
“Attention in a location” – handwriting synthesis
Associative Attention – Instead of attending by position we attend by content. Associative attention when combined with LSTMs gives us powerful neural networks.
Introspective Attention – Selectively attend to neural networks internal state or memory. With internal information we can do selective writing as well as selective reading, allowing network to iteratively modify its state. Neural turing machines!