How A Recurrent Neural Network Works

Oct 25, 2020 · 8 min read · deep learning machine learning NLP algorithm ·

Recurrent Neural Network

A recurrent neural network (RNN), is a type of neural network that can process sequential data, like text, audio, or time series data.

Here's how it works: first, the RNN takes in some input data, which could be a word in a sentence, a sound wave from an audio recording, or a measurement from a sensor at a specific time. Then, the RNN processes this input and generates an output, which could be a predicted next word in a sentence, a generated audio waveform, or a predicted sensor measurement.

But here's the cool part: the RNN also has a "memory" that it can use to remember important information from the input it has seen so far. This allows the RNN to make better predictions because it can take into account not only the current input, but also the context of the inputs it has seen before.

For example, if the RNN is processing a sentence, it can use its memory to remember the previous words in the sentence, which can help it predict the next word more accurately. Or if the RNN is processing audio data, it can use its memory to remember the previous sound waves, which can help it generate more realistic-sounding audio.

The Architecture of RNN

The architecture of a RNN is similar to that of a traditional neural network, which has an input layer, hidden layers, and an output layer. But the key difference is that a RNN also has connections that loop back from the output of the hidden layers to the input of the hidden layers.

When the RNN receives some input data, it processes the data in the input layer and then passes it through the hidden layers. As the data passes through the hidden layers, the RNN uses the looping connections to incorporate information from the previous outputs of the hidden layers into the current inputs. This allows the RNN to build up a "memory" of the inputs it has seen so far, which can help it make better predictions.

After the data has passed through the hidden layers, the RNN generates an output in the output layer, which could be a predicted next word in a sentence, a generated audio waveform, or a predicted sensor measurement. The output of the RNN is then fed back into the input of the hidden layers, which allows the RNN to incorporate this output into its memory and make better predictions on the next step.

Difference Between A Normal Neural Network And RNN

The main difference between a traditional neural network and a RNN is that a traditional neural network processes input data independently, while a RNN processes input data sequentially and maintains a "memory" of the data it has seen so far. This allows a RNN to make use of the order and context of the input data to make better predictions.

Here is an example of how you might define a traditional neural network using Keras:

 1model = tf.keras.Sequential()
 2
 3# Add an input layer, which expects input with shape (batch_size, input_length).
 4model.add(tf.keras.layers.InputLayer(input_shape=(batch_size, input_length)))
 5
 6# Add a dense layer with 32 units and a ReLU activation function.
 7model.add(tf.keras.layers.Dense(units=32, activation='relu'))
 8
 9# Add a dense layer with 10 units and a softmax activation function.
10model.add(tf.keras.layers.Dense(units=10, activation='softmax'))

This code defines a Sequential model with two dense layers. The input layer specifies the shape of the input data, and the dense layers process the input data and generate an output. The first dense layer has 32 units and uses the relu activation function, which means it will only pass on non-negative values. The second dense layer has 10 units and uses the softmax activation function, which ensures that the output of the layer is a probability distribution over the 10 classes.

Here is an example of how you might define a RNN using Keras:

 1model = tf.keras.Sequential()
 2
 3# Add an input layer, which expects input with shape (batch_size, input_length, input_dim).
 4model.add(tf.keras.layers.InputLayer(input_shape=(batch_size, input_length, input_dim)))
 5
 6# Add a simple RNN layer with 32 units.
 7model.add(tf.keras.layers.SimpleRNN(units=32))
 8
 9# Add a dense layer with 10 units and a softmax activation function.
10model.add(tf.keras.layers.Dense(units=10, activation='softmax'))

This code defines a Sequential model with a SimpleRNN layer and a dense layer. The input layer specifies the shape of the input data, which in this case is three-dimensional (batch_size, input_length, input_dim). The SimpleRNN layer processes this input data and generates an output, which is then passed to the dense layer to map to the desired number of classes.

Overall, the main difference between a traditional neural network and a RNN is the way they process input data. A traditional neural network processes input independently, while a RNN processes input sequentially and maintains a memory of the data it has seen so far. This allows a RNN to make better use of the context and order of the input data to make more accurate predictions.

Different Variants of RNN

Vanilla RNN: This is the simplest type of RNN, which has a single layer of neurons and a fixed-sized memory. The output of the RNN at each time step is determined by the current input and the previous hidden state.
Long Short-Term Memory (LSTM): This type of RNN is designed to handle long-term dependencies in the data by introducing "memory cells" that can store information for long periods of time. LSTMs also have gates that control the flow of information into and out of the memory cells, allowing them to retain or forget information as needed.
Gated Recurrent Unit (GRU): This type of RNN is similar to an LSTM, but it has fewer parameters and is often easier to train. Like an LSTM, a GRU has gates that control the flow of information into and out of the hidden state, but it does not have separate memory cells.
Bidirectional RNN: This type of RNN processes the input data in two directions, from the beginning to the end and from the end to the beginning. This allows the RNN to incorporate information from the future as well as the past, which can be useful for tasks such as language modeling.

Each of these types of RNN has its own strengths and weaknesses, and they are suitable for different types of tasks. For example, a vanilla RNN might be suitable for processing short sequences of data, while an LSTM or GRU might be better for longer sequences. A bidirectional RNN might be useful for tasks where the order of the input data is important, such as language modeling.

Difference Between LSTM and GRU

The main difference between an LSTM and a GRU is the way they handle the flow of information through the hidden state. An LSTM has separate memory cells and gates that control the flow of information into and out of these cells, while a GRU has a single memory cell and uses gates to control the flow of information into and out of this cell.

An example of how you might define an LSTM using Keras:

 1model = tf.keras.Sequential()
 2
 3# Add an input layer, which expects input with shape (batch_size, input_length, input_dim).
 4model.add(tf.keras.layers.InputLayer(input_shape=(batch_size, input_length, input_dim)))
 5
 6# Add an LSTM layer with 32 units.
 7model.add(tf.keras.layers.LSTM(units=32))
 8
 9# Add a dense layer with 10 units and a softmax activation function.
10model.add(tf.keras.layers.Dense(units=10, activation='softmax'))

An example of how you might define a GRU using Keras:

 1model = tf.keras.Sequential()
 2
 3# Add an input layer, which expects input with shape (batch_size, input_length, input_dim).
 4model.add(tf.keras.layers.InputLayer(input_shape=(batch_size, input_length, input_dim)))
 5
 6# Add a GRU layer with 32 units.
 7model.add(tf.keras.layers.GRU(units=32))
 8
 9# Add a dense layer with 10 units and a softmax activation function.
10model.add(tf.keras.layers.Dense(units=10, activation='softmax'))

When to Use Bidirectional RNN

A bidirectional RNN is a type of RNN that processes the input data in two directions: from the beginning to the end and from the end to the beginning. This allows the RNN to incorporate information from the future as well as the past, which can be useful for tasks such as language modeling.

In contrast, an LSTM processes the input data in a single direction and maintains a memory of the input data. This allows the LSTM to handle long-term dependencies in the data, but it does not have access to information from the future.

When to use a bidirectional RNN over an LSTM depends on the specific characteristics of the input data and the task you are trying to solve. If the order of the input data is important and you need to incorporate information from the future as well as the past, then a bidirectional RNN might be a good choice. For example, if you are building a language model and you want to predict the next word in a sentence, a bidirectional RNN can take into account the words that come before and after the current word to make a more accurate prediction.

On the other hand, if the input data does not have a clear order or if you only need to use information from the past to make predictions, then an LSTM might be a better choice. For example, if you are building a model to forecast stock prices, the order of the data might not be as important as the long-term trends and patterns. In this case, an LSTM can use its memory to capture these trends and make more accurate predictions.

Author: Sadman Kabir Soumik