Introduction to Neural Networks

Neural Networks

  • computational models inspired by the human brain’s structure and function

  • interconnected nodes (neurons) organized in layers that process and transform input data to produce outputs

  • Structure: an input layer, one or more hidden layers, and an output layer

  • Each node receives inputs, applies weights, adds a bias, and passes the result through an activation function

Neural Networks – Learning Process

  • Forward propagation: Input data flows through the network, generating predictions
  • Loss calculation: The difference between predictions and actual values is measured
  • Backpropagation: Errors are propagated backward to adjust weights
  • Gradient descent: Weights are updated to minimize the loss

Neural Networks – Some terminology

  • ReLU (Rectified Linear Unit) – activation function defined as f(x) = max(0, x), meaning it outputs x when x is positive and 0 when x is negative or zero. Helps prevent the vanishing gradient problem.

Neural Networks – Some terminology

  • Adam (Adaptive Moment Estimation) – popular optimization algorithm for training neural networks, combines concepts from two other optimizers:
    • Momentum: Accelerates learning in relevant directions by accumulating past gradients
    • RMSprop: Adapts learning rates based on recent gradient magnitudes

Neural Networks – Some terminology

  • Softmax function – exponentiates each output and then normalizes them so they can be interpreted as probabilities (values between 0 and 1 that sum to 1). This makes the model’s predictions more interpretable - each output value now represents the probability that the input belongs to the corresponding class.

Neural Networks – Some terminology

  • Epochs – one complete pass through the entire training dataset; multiple epochs are typically needed for a network to learn effectively. Too few epochs == underfitting; too many == overfitting.

Neural Networks – advantages

  • Can model complex non-linear relationships through multiple layers and activation functions
  • Can model complex decision boundaries
  • Learn feature representations automatically; work well with unstructured data like images and text
  • Perform better as dataset size increases (many traditional algorithms plateau in performance)

Neural Networks – disadvantages

  • Training neural networks typically requires significantly more computational resources
  • Neural networks generally need large amounts of training data to perform well and avoid overfitting
  • Finding optimal architectures and hyperparameter settings can be time-consuming and resource-intensive

Neural Networks in sklearn

Neural Networks in sklearn

Tensorflow

To pip install tensorflow

/path/to/python3 -m pip install tensorflow

You might need to downgrade numpy:

/path/to/python3 -m pip install --upgrade numpy==1.26.4

MNIST dataset

  • 70,000 grayscale images of handwritten digits (0-9)
  • 60,000 data points for training
  • 10,000 for testing
  • dataset from the National Institute of Standards and Technology (NIST)
  • Each image is 28x28 pixels, representing a grayscale image with pixel values ranging from 0 (black) to 255 (white)
  • Each image in the dataset is associated with a label representing the digit it depicts (0-9)

MNIST dataset

We will be using TensorFlow’s Keras API

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

Mac users: go to your Applications folder, find the python folder and double click on Install Certificates.command

Neural Network – keras

We will create a Sequential model, or a linear stack of layers using tf.keras.models.Sequential

model = tf.keras.models.Sequential([
  # transform the input from a 2D array of 28×28 pixels into a 1D array of 784 pixels
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  
  # fully connected (Dense) layer with 128 neurons using 'relu' activation function
  tf.keras.layers.Dense(128, activation='relu'),
  
  # a Dropout layer -- randomly sets 20% of the inputs to 0 during training
  # helps prevent overfitting  (forces the network not to rely too heavily 
  # on any particular neuron)
  tf.keras.layers.Dropout(0.2),
  
  # Dense layer with 10 neurons, corresponding to the output classes (digits 0-9)
  tf.keras.layers.Dense(10)
  ])

Neural Network – keras

We now can compile our model. We will define an optimizer (adam is a good choice), and a loss function (Sparse Categorical Crossentropy for integer labels)

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

model.compile(optimizer="adam",
loss=loss_fn,
metrics=["accuracy"])

Neural Network – keras

We will fit the model with the training data with 10 epochs.

model.fit(x_train, y_train, epochs=10)

We evaluate the model with the test data:

model.evaluate(x_test,  y_test)

Neural Network – keras

We will create a new model that builds upon the previously defined neural network by adding an additional Softmax activation function as a new final layer to the model. This added a Softmax layer converts the outputs to probabilities that sum to 1 across all classes.

probability_model = tf.keras.Sequential([
  model,
  tf.keras.layers.Softmax()
  ])

# get labels for first 5 test data points
print(probability_model(x_test[:5]))

Another case study

Fashion-MNIST

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()