RNNs

Recurrent Neural Networks (RNNs)

  • a class of artificial neural networks designed to recognize patterns in sequences of data.
  • RNNs have connections that form directed cycles, allowing them to maintain an internal memory or “state” as they process sequential information

Recurrent Neural Networks (RNNs)

  • RNNs process inputs sequentially, one element at a time
  • RNNs maintain a memory of previous inputs through their hidden state
  • This makes them well-suited for tasks involving sequential data like text, time series, or speech

RNNs layers in Keras

SimpleRNN - Basic RNN implementation with the simplest formulation:

tf.keras.layers.SimpleRNN(64, activation = "tanh", recurrent_activation = "sigmoid", 
                          return_sequences = False, return_state = False)

LSTM (Long Short-Term Memory) - Addresses the vanishing gradient problem in standard RNNs:

tf.keras.layers.LSTM(64, activation = "tanh", recurrent_activation = "sigmoid", 
                     return_sequences = False, return_state = False)

GRU (Gated Recurrent Unit) - Simplified version of LSTM with fewer parameters

tf.keras.layers.GRU(64, activation = "tanh", recurrent_activation = "sigmoid", 
                    return_sequences = False, return_state = False)

RNNs layers in Keras

Bidirectional - Wrapper that can be applied to any RNN layer to process sequences in both directions:

tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64))

Activation functions

The tanh (hyperbolic tangent) activation function is a commonly used activation function in recurrent neural networks like LSTMs and GRUs.

The sigmoid activation function is a mathematical function that transforms input values into an output range between 0 and 1. It’s commonly used in binary classification problems.

Case study

The IMDB dataset contains 25,000 movies reviews from IMDB, labeled by sentiment (positive/negative).

Reviews have been preprocessed, and each review is encoded as a list of word indexes (integers).

For convenience, words are indexed by overall frequency in the dataset, so that for instance the integer “3” encodes the 3rd most frequent word in the data.

This allows for quick filtering operations such as: “only consider the top 10,000 most common words, but eliminate the top 20 most common words”.

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.imdb.load_data(num_words = 10000)

Case study – solution

import tensorflow as tf

def main():
    # load the IMDB dataset (movie reviews with sentiment labels)
    # this dataset contains 50,000 movie reviews split for training and testing
    (x_train, y_train), (x_test, y_test) = tf.keras.datasets.imdb.load_data(num_words = 10000)

    # Convert sequences to same length through padding
    max_length = 256
    x_train = tf.keras.preprocessing.sequence.pad_sequences(x_train, maxlen = max_length)
    x_test = tf.keras.preprocessing.sequence.pad_sequences(x_test, maxlen = max_length)

    # Build a simple model for text classification
    model = tf.keras.Sequential([
        # embedding layer - converts integer indices to dense vectors
        tf.keras.layers.Embedding(input_dim=10000, output_dim=128),

        tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),

        tf.keras.layers.Flatten(),
        
        tf.keras.layers.Dense(64, activation="relu"),
        tf.keras.layers.Dropout(0.3),
        
        # output layer for binary classification (positive/negative sentiment)
        tf.keras.layers.Dense(1, activation="sigmoid")
    ])

    # compile the model
    model.compile(
        optimizer="adam",
        loss="binary_crossentropy",
        metrics=["accuracy", "precision", "recall"]
    )

    # train the model
    model.fit(
        x_train, y_train,
        epochs=5,
        batch_size=128,
        validation_split=0.2
    )

    # evaluate the model on test data
    print("********** model evaluation *******")
    model.evaluate(x_test, y_test)

main()