Attention and Memory in Deep Learning

We have seen and know the neural networks which takes an image, say car and outputs the log likely hood of various classes. We can also have a neural network(ConvNets) which can predict whether an image is of a cat or not –

Screenshot from 2019-06-27 12-15-25

However we do the task of finding the object of interest (the object which the neural network either has to detect or classify), i.e giving the neural network the right object.

The challenge is how to find objects to detect/classify in the following picture?


pic source – my Instagram! (yes I love photography)

Where to look? Attention answers this question.

Implicit Attention in Neural Networks –

Deep Learning networks have a tendency to naturally respond to some parts of the data more than others. This is called Implicit Attention.

Implicit attentions are great, when they are combined with RNNs they give amazing results in the field of say, machine translation. Check out this paper.

Explicit Attention in Neural Networks –

We need a mechanism of explicit attention because it helps in –

  • Computational Efficiency (If you can explicitly limit yourself to a subset of data then you won’t have to process the whole data)
  • Scalability
  • Sequential Processing of static data (turn static data in sequence)
  • Easier to interpret

Neural Attention Models –

Screenshot from 2019-06-27 22-24-05.png

The neural network produces output as usual however it also produces extra set of outputs which are used for training the attention model. The attention model then gives “Glimpse”(a lookup key) as an output which we combine with the input and pass it to neural network.

The whole system is recurrent even when the neural network inside it is not.

Glimpse Distribution

Many cognitive researchers describe attention as light projections(spot light), the light which brings objects in focus.

The idea is in a glance of an image can we get the right areas of interest?

Attention models generally work by defining a probability distribution over glimpses g of the data and some set of attention outputs a from the network:

Pr (g|a)

Simplest Case: Assign Probability to set of discrete glimpses!

Note – Foveal Attention! Google it for now, I will write a post on it shortly.

Soft Attention –

Hard Attention: Fixed sized window which moves across the image and gets trained using RL.

Soft Attention is a differentiable way in which we can train with back prop end to end. It is easier than using RL but more expensive to compute.

“Attention in a location” – handwriting synthesis

Associative Attention – Instead of attending by position we attend by content. Associative attention when combined with LSTMs gives us powerful neural networks.

Introspective Attention – Selectively attend to neural networks internal state or memory. With internal information we can do selective writing as well as selective reading, allowing network to iteratively modify its state. Neural turing machines!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s