computational models inspired by the human brain’s structure and function
interconnected nodes (neurons) organized in layers that process and transform input data to produce outputs
Structure: an input layer, one or more hidden layers, and an output layer
Each node receives inputs, applies weights, adds a bias, and passes the result through an activation function
Neural Networks – Learning Process
Forward propagation: Input data flows through the network, generating predictions
Loss calculation: The difference between predictions and actual values is measured
Backpropagation: Errors are propagated backward to adjust weights
Gradient descent: Weights are updated to minimize the loss
Neural Networks – Some terminology
ReLU (Rectified Linear Unit) – activation function defined as f(x) = max(0, x), meaning it outputs x when x is positive and 0 when x is negative or zero. Helps prevent the vanishing gradient problem.
Neural Networks – Some terminology
Adam (Adaptive Moment Estimation) – popular optimization algorithm for training neural networks, combines concepts from two other optimizers:
Momentum: Accelerates learning in relevant directions by accumulating past gradients
RMSprop: Adapts learning rates based on recent gradient magnitudes
Neural Networks – Some terminology
Softmax function – exponentiates each output and then normalizes them so they can be interpreted as probabilities (values between 0 and 1 that sum to 1). This makes the model’s predictions more interpretable - each output value now represents the probability that the input belongs to the corresponding class.
Neural Networks – Some terminology
Epochs – one complete pass through the entire training dataset; multiple epochs are typically needed for a network to learn effectively. Too few epochs == underfitting; too many == overfitting.
Neural Networks – advantages
Can model complex non-linear relationships through multiple layers and activation functions
Can model complex decision boundaries
Learn feature representations automatically; work well with unstructured data like images and text
Perform better as dataset size increases (many traditional algorithms plateau in performance)
Neural Networks – disadvantages
Training neural networks typically requires significantly more computational resources
Neural networks generally need large amounts of training data to perform well and avoid overfitting
Finding optimal architectures and hyperparameter settings can be time-consuming and resource-intensive
Mac users: go to your Applications folder, find the python folder and double click on Install Certificates.command
Neural Network – keras
We will create a Sequential model, or a linear stack of layers using tf.keras.models.Sequential
model = tf.keras.models.Sequential([# transform the input from a 2D array of 28×28 pixels into a 1D array of 784 pixels tf.keras.layers.Flatten(input_shape=(28, 28)),# fully connected (Dense) layer with 128 neurons using 'relu' activation function tf.keras.layers.Dense(128, activation='relu'),# a Dropout layer -- randomly sets 20% of the inputs to 0 during training# helps prevent overfitting (forces the network not to rely too heavily # on any particular neuron) tf.keras.layers.Dropout(0.2),# Dense layer with 10 neurons, corresponding to the output classes (digits 0-9) tf.keras.layers.Dense(10) ])
Neural Network – keras
We now can compile our model. We will define an optimizer (adam is a good choice), and a loss function (Sparse Categorical Crossentropy for integer labels)
We will fit the model with the training data with 10 epochs.
model.fit(x_train, y_train, epochs=10)
We evaluate the model with the test data:
model.evaluate(x_test, y_test)
Neural Network – keras
We will create a new model that builds upon the previously defined neural network by adding an additional Softmax activation function as a new final layer to the model. This added a Softmax layer converts the outputs to probabilities that sum to 1 across all classes.
probability_model = tf.keras.Sequential([ model, tf.keras.layers.Softmax() ])# get labels for first 5 test data pointsprint(probability_model(x_test[:5]))