Introduction to Neural Networks

Neural Networks

computational models inspired by the human brain’s structure and function
interconnected nodes (neurons) organized in layers that process and transform input data to produce outputs
Structure: an input layer, one or more hidden layers, and an output layer
Each node receives inputs, applies weights, adds a bias, and passes the result through an activation function

Neural Networks – Learning Process

Forward propagation: Input data flows through the network, generating predictions
Loss calculation: The difference between predictions and actual values is measured
Backpropagation: Errors are propagated backward to adjust weights
Gradient descent: Weights are updated to minimize the loss

Neural Networks – Some terminology

ReLU (Rectified Linear Unit) – activation function defined as f(x) = max(0, x), meaning it outputs x when x is positive and 0 when x is negative or zero. Helps prevent the vanishing gradient problem.

Neural Networks – Some terminology

Adam (Adaptive Moment Estimation) – popular optimization algorithm for training neural networks, combines concepts from two other optimizers:
- Momentum: Accelerates learning in relevant directions by accumulating past gradients
- RMSprop: Adapts learning rates based on recent gradient magnitudes

Neural Networks – Some terminology

Softmax function – exponentiates each output and then normalizes them so they can be interpreted as probabilities (values between 0 and 1 that sum to 1). This makes the model’s predictions more interpretable - each output value now represents the probability that the input belongs to the corresponding class.

Neural Networks – Some terminology

Epochs – one complete pass through the entire training dataset; multiple epochs are typically needed for a network to learn effectively. Too few epochs == underfitting; too many == overfitting.

Neural Networks – advantages

Can model complex non-linear relationships through multiple layers and activation functions
Can model complex decision boundaries
Learn feature representations automatically; work well with unstructured data like images and text
Perform better as dataset size increases (many traditional algorithms plateau in performance)

Neural Networks – disadvantages

Training neural networks typically requires significantly more computational resources
Neural networks generally need large amounts of training data to perform well and avoid overfitting
Finding optimal architectures and hyperparameter settings can be time-consuming and resource-intensive

Neural Networks in sklearn

Tensorflow

To pip install tensorflow

/path/to/python3 -m pip install tensorflow

You might need to downgrade numpy:

/path/to/python3 -m pip install --upgrade numpy==1.26.4

MNIST dataset

70,000 grayscale images of handwritten digits (0-9)
60,000 data points for training
10,000 for testing
dataset from the National Institute of Standards and Technology (NIST)
Each image is 28x28 pixels, representing a grayscale image with pixel values ranging from 0 (black) to 255 (white)
Each image in the dataset is associated with a label representing the digit it depicts (0-9)

MNIST dataset

We will be using TensorFlow’s Keras API

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

Mac users: go to your Applications folder, find the python folder and double click on Install Certificates.command

Neural Network – keras

We will create a Sequential model, or a linear stack of layers using tf.keras.models.Sequential

model = tf.keras.models.Sequential([
  # transform the input from a 2D array of 28×28 pixels into a 1D array of 784 pixels
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  
  # fully connected (Dense) layer with 128 neurons using 'relu' activation function
  tf.keras.layers.Dense(128, activation='relu'),
  
  # a Dropout layer -- randomly sets 20% of the inputs to 0 during training
  # helps prevent overfitting  (forces the network not to rely too heavily 
  # on any particular neuron)
  tf.keras.layers.Dropout(0.2),
  
  # Dense layer with 10 neurons, corresponding to the output classes (digits 0-9)
  tf.keras.layers.Dense(10)
  ])

Neural Network – keras

We now can compile our model. We will define an optimizer (adam is a good choice), and a loss function (Sparse Categorical Crossentropy for integer labels)

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

model.compile(optimizer="adam",
loss=loss_fn,
metrics=["accuracy"])

Neural Network – keras

We will fit the model with the training data with 10 epochs.

model.fit(x_train, y_train, epochs=10)

We evaluate the model with the test data:

model.evaluate(x_test,  y_test)

Neural Network – keras

We will create a new model that builds upon the previously defined neural network by adding an additional Softmax activation function as a new final layer to the model. This added a Softmax layer converts the outputs to probabilities that sum to 1 across all classes.

probability_model = tf.keras.Sequential([
  model,
  tf.keras.layers.Softmax()
  ])

# get labels for first 5 test data points
print(probability_model(x_test[:5]))

Another case study

Fashion-MNIST

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()