Building a Simple Neural Network for Digit Classification with Keras and MNIST

Introduction to Neural Networks for Digit Classification
Understanding the MNIST Dataset
Setting Up the Environment
Loading and Exploring the MNIST Dataset
Data Preprocessing: Reshaping and One-Hot Encoding
Building the Neural Network Model with Keras
- Model Architecture
- Layers Explanation
Compiling the Model: Loss Function and Optimizer
Training the Model: Fitting and Validation
Evaluating the Model: Accuracy and Predictions
Visualizing the Neural Network Structure
Optimizing Training: Utilizing GPUs
Conclusion

Introduction to Neural Networks for Digit Classification

Neural networks have revolutionized the way machines interpret and analyze data, especially in the realm of image recognition. Digit classification, where the goal is to accurately identify handwritten digits, serves as a quintessential example for beginners to grasp the fundamentals of neural networks. By leveraging popular libraries like Keras and datasets like MNIST, building an effective digit classifier becomes both accessible and educational.

Understanding the MNIST Dataset

The MNIST (Modified National Institute of Standards and Technology) dataset is a cornerstone in the machine learning community. It comprises 70,000 grayscale images of handwritten digits (0-9), each sized at 28×28 pixels. The dataset is split into 60,000 training images and 10,000 testing images, making it ideal for training and validating machine learning models.

Key Features of MNIST:

Size: 70,000 images (60k training, 10k testing)
Image Dimensions: 28×28 pixels
Classes: 10 (digits 0 through 9)
Grayscale: Each pixel value ranges from 0 (black) to 255 (white)

Setting Up the Environment

Before diving into the model-building process, it’s essential to set up the development environment. Ensure you have Python installed, and consider using environments like Anaconda for managing packages and dependencies seamlessly.

Required Libraries:

NumPy: For numerical operations
Matplotlib: For data visualization
Keras: For building and training neural networks
Scikit-learn: For preprocessing utilities
Pandas: For data manipulation

Installation Commands:

pip install numpy matplotlib keras scikit-learn pandas

1	pip install numpy matplotlib keras scikit-learn pandas

Loading and Exploring the MNIST Dataset

Keras simplifies the process of accessing the MNIST dataset through its datasets module. Here’s how you can load and inspect the data:

import numpy as np
import matplotlib.pyplot as plt
from keras.datasets import mnist

# Load MNIST handwritten digit data
(X_train, y_train), (X_test, y_test) = mnist.load_data()

print(X_train.shape)  # Output: (60000, 28, 28)
print(X_test.shape)   # Output: (10000, 28, 28)

import numpy as np

import matplotlib.pyplot as plt

from keras.datasets import mnist

# Load MNIST handwritten digit data

(X_train, y_train), (X_test, y_test) = mnist.load_data()

print(X_train.shape) # Output: (60000, 28, 28)

print(X_test.shape) # Output: (10000, 28, 28)

Output:

(60000, 28, 28)
(10000, 28, 28)

1 2	(60000, 28, 28) (10000, 28, 28)

This reveals that there are 60,000 training images and 10,000 testing images, each with dimensions 28×28 pixels.

Visualizing Sample Images

To get a feel for the data, let’s visualize a sample image:

img = X_train[250].reshape(28,28)
plt.imshow(img, cmap="gray")
plt.title(f"Sample Digit Label: {y_train[250]}")
plt.show()

img = X_train[250].reshape(28,28)

plt.imshow(img, cmap="gray")

plt.title(f"Sample Digit Label: {y_train[250]}")

plt.show()

Output:

Note: The actual image will display a handwritten digit corresponding to the label.

Data Preprocessing: Reshaping and One-Hot Encoding

Data preprocessing is a crucial step in machine learning workflows. For neural networks, it’s essential to format the data appropriately and encode the labels.

Reshaping the Data

Neural networks require input data in a specific shape. For the MNIST dataset, we’ll flatten the 28×28 pixel images into a 1D array of 784 elements.

from keras.utils import to_categorical

# Reshape and normalize the image data
X_train = X_train.reshape(60000, 28, 28).astype('float32') / 255
X_test = X_test.reshape(10000, 28, 28).astype('float32') / 255

from keras.utils import to_categorical

# Reshape and normalize the image data

X_train = X_train.reshape(60000, 28, 28).astype('float32') / 255

X_test = X_test.reshape(10000, 28, 28).astype('float32') / 255

One-Hot Encoding the Labels

One-hot encoding transforms categorical labels into a binary matrix, which is more suitable for training.

# One-hot encode the labels
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)

print(y_train.shape)  # Output: (60000, 10)
print(y_test.shape)   # Output: (10000, 10)

# One-hot encode the labels

y_train = to_categorical(y_train, num_classes=10)

y_test = to_categorical(y_test, num_classes=10)

print(y_train.shape) # Output: (60000, 10)

print(y_test.shape) # Output: (10000, 10)

Output:

(60000, 10)
(10000, 10)

1 2	(60000, 10) (10000, 10)

This indicates that each label is now represented as a 10-dimensional binary vector.

Building the Neural Network Model with Keras

Keras offers a user-friendly API to construct and train neural networks. We’ll build a simple Sequential model comprising multiple dense layers.

Model Architecture

Here’s a high-level overview of the model structure:

Flatten Layer: Converts the 2D image data into a 1D array.
Dense Layer 1: 100 neurons with sigmoid activation.
Dense Layer 2: 144 neurons with sigmoid activation.
Output Layer: 10 neurons (one for each digit) with softmax activation.

Layers Explanation

Flatten Layer: Transforms the input data from a 2D matrix (28×28) to a 1D vector (784) to feed into the dense layers.
Dense Layers: These are fully connected layers where each neuron receives input from all neurons in the preceding layer. Activation functions introduce non-linearity:
- Sigmoid Activation: Outputs values between 0 and 1, useful for binary classification but less common in hidden layers for multi-class problems.
- Softmax Activation: Converts the final layer outputs into probability distributions over the 10 classes.

Building the Model:

from keras.models import Sequential
from keras.layers import Dense, Flatten

# Create a simple Neural Network model
model = Sequential()
model.add(Flatten(input_shape=(28,28)))
model.add(Dense(100, activation='sigmoid'))
model.add(Dense(144, activation='sigmoid'))
model.add(Dense(10, activation='softmax'))

from keras.models import Sequential

from keras.layers import Dense, Flatten

# Create a simple Neural Network model

model = Sequential()

model.add(Flatten(input_shape=(28,28)))

model.add(Dense(100, activation='sigmoid'))

model.add(Dense(144, activation='sigmoid'))

model.add(Dense(10, activation='softmax'))

Model Summary

To visualize the model structure and parameters:

model.summary()

1	model.summary()

Output:

Model: "sequential_1"
_______________________________________________________________
Layer (type)                 Output Shape              Param #   
===============================================================
flatten_1 (Flatten)          (None, 784)               0         
_______________________________________________________________
dense_3 (Dense)              (None, 100)               78500     
_______________________________________________________________
dense_4 (Dense)              (None, 144)               14444     
_______________________________________________________________
dense_5 (Dense)              (None, 10)                1450      
===============================================================
Total params: 94,494
Trainable params: 94,494
Non-trainable params: 0
_______________________________________________________________

Model: "sequential_1"

_______________________________________________________________

Layer (type) Output Shape Param #

===============================================================

flatten_1 (Flatten) (None, 784) 0

_______________________________________________________________

dense_3 (Dense) (None, 100) 78500

_______________________________________________________________

dense_4 (Dense) (None, 144) 14444

_______________________________________________________________

dense_5 (Dense) (None, 10) 1450

===============================================================

Total params: 94,494

Trainable params: 94,494

Non-trainable params: 0

_______________________________________________________________

This summary provides insights into each layer’s output shape and the number of parameters to be trained.

Compiling the Model: Loss Function and Optimizer

Before training, the model needs to be compiled with a specified loss function and optimizer.

Loss Function: Measures how well the model’s predictions match the actual labels.
- Categorical Crossentropy: Suitable for multi-class classification problems.
Optimizer: Updates the model’s weights to minimize the loss function.
- Adam Optimizer: An efficient stochastic gradient descent method that adapts the learning rate.

Compiling the Model:

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

1	model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

Training the Model: Fitting and Validation

Training involves feeding the model with data and allowing it to learn patterns through multiple epochs.

Epochs: Number of times the entire dataset is passed through the network.
Validation Data: Used to evaluate the model’s performance on unseen data after each epoch.

Training Process:

history = model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))

1	history = model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))

Sample Output:

Epoch 1/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.9009 - acc: 0.7596 - val_loss: 0.3685 - val_acc: 0.8892
...
Epoch 10/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2251 - acc: 0.9314 - val_loss: 0.2267 - val_acc: 0.9303

Epoch 1/10

1875/1875 [==============================] - 4s 2ms/step - loss: 0.9009 - acc: 0.7596 - val_loss: 0.3685 - val_acc: 0.8892

...

Epoch 10/10

1875/1875 [==============================] - 4s 2ms/step - loss: 0.2251 - acc: 0.9314 - val_loss: 0.2267 - val_acc: 0.9303

By the end of 10 epochs, the model achieves approximately 93% accuracy on both training and validation datasets, indicating a well-performing model.

Evaluating the Model: Accuracy and Predictions

After training, it’s crucial to assess the model’s performance and make predictions on new data.

Making Predictions

# Predicting on the test set
predictions = model.predict(X_test)

1 2	# Predicting on the test set predictions = model.predict(X_test)

Each prediction consists of a probability distribution over the 10 classes. To determine the predicted class:

import numpy as np

# Convert probabilities to class labels
predicted_classes = np.argmax(predictions, axis=1)
true_classes = np.argmax(y_test, axis=1)

import numpy as np

# Convert probabilities to class labels

predicted_classes = np.argmax(predictions, axis=1)

true_classes = np.argmax(y_test, axis=1)

Visualizing Predictions

Let’s visualize some test images along with their predicted and true labels:

import matplotlib.pyplot as plt

def plot_prediction(index):
    img = X_test[index].reshape(28,28)
    plt.imshow(img, cmap="gray")
    plt.title(f"True Label: {true_classes[index]} | Predicted: {predicted_classes[index]}")
    plt.show()

# Plotting the first test image
plot_prediction(0)

# Plotting another sample
plot_prediction(25)

import matplotlib.pyplot as plt

def plot_prediction(index):

img = X_test[index].reshape(28,28)

plt.imshow(img, cmap="gray")

plt.title(f"True Label: {true_classes[index]} | Predicted: {predicted_classes[index]}")

plt.show()

# Plotting the first test image

plot_prediction(0)

# Plotting another sample

plot_prediction(25)

Output:

Displays the image with titles indicating both the true label and the model’s prediction.

Model Accuracy

The model achieved an accuracy of 93% on the validation set, demonstrating its capability to generalize well on unseen data. For improved performance, further tuning and more complex architectures can be explored.

Visualizing the Neural Network Structure

Understanding the architecture of a neural network can aid in comprehending how data flows and transformations occur. Below is a visual representation of the constructed neural network:

import matplotlib.pyplot as plt

def draw_neural_net(ax, left, right, bottom, top, layer_sizes):
    '''
    Draws a neural network diagram.
    
    :param ax: Matplotlib Axes object
    :param left: Left boundary
    :param right: Right boundary
    :param bottom: Bottom boundary
    :param top: Top boundary
    :param layer_sizes: List containing the number of neurons in each layer
    '''
    n_layers = len(layer_sizes)
    v_spacing = (top - bottom)/float(max(layer_sizes))
    h_spacing = (right - left)/float(n_layers - 1)
    
    # Draw neurons
    for n, layer_size in enumerate(layer_sizes):
        layer_top = v_spacing*(layer_size - 1)/2. + (top + bottom)/2.
        for m in range(layer_size):
            circle = plt.Circle((n*h_spacing + left, layer_top - m*v_spacing), v_spacing/4.,
                                color='w', ec='k', zorder=4)
            ax.add_artist(circle)
    
    # Draw edges
    for n, (layer_size_a, layer_size_b) in enumerate(zip(layer_sizes[:-1], layer_sizes[1:])):
        layer_top_a = v_spacing*(layer_size_a - 1)/2. + (top + bottom)/2.
        layer_top_b = v_spacing*(layer_size_b - 1)/2. + (top + bottom)/2.
        for m in range(layer_size_a):
            for o in range(layer_size_b):
                line = plt.Line2D([n*h_spacing + left, (n + 1)*h_spacing + left],
                                  [layer_top_a - m*v_spacing, layer_top_b - o*v_spacing], c='k')
                ax.add_artist(line)

# Drawing the neural network
fig = plt.figure(figsize=(12, 12))
ax = fig.gca()
ax.axis('off')
draw_neural_net(ax, .1, .9, .1, .9, [784, 100, 144, 10])
plt.show()

import matplotlib.pyplot as plt

def draw_neural_net(ax, left, right, bottom, top, layer_sizes):

'''

Draws a neural network diagram.

:param ax: Matplotlib Axes object

:param left: Left boundary

:param right: Right boundary

:param bottom: Bottom boundary

:param top: Top boundary

:param layer_sizes: List containing the number of neurons in each layer

'''

n_layers = len(layer_sizes)

v_spacing = (top - bottom)/float(max(layer_sizes))

h_spacing = (right - left)/float(n_layers - 1)

# Draw neurons

for n, layer_size in enumerate(layer_sizes):

layer_top = v_spacing*(layer_size - 1)/2. + (top + bottom)/2.

for m in range(layer_size):

circle = plt.Circle((n*h_spacing + left, layer_top - m*v_spacing), v_spacing/4.,

color='w', ec='k', zorder=4)

ax.add_artist(circle)

# Draw edges

for n, (layer_size_a, layer_size_b) in enumerate(zip(layer_sizes[:-1], layer_sizes[1:])):

layer_top_a = v_spacing*(layer_size_a - 1)/2. + (top + bottom)/2.

layer_top_b = v_spacing*(layer_size_b - 1)/2. + (top + bottom)/2.

for m in range(layer_size_a):

for o in range(layer_size_b):

line = plt.Line2D([n*h_spacing + left, (n + 1)*h_spacing + left],

[layer_top_a - m*v_spacing, layer_top_b - o*v_spacing], c='k')

ax.add_artist(line)

# Drawing the neural network

fig = plt.figure(figsize=(12, 12))

ax = fig.gca()

ax.axis('off')

draw_neural_net(ax, .1, .9, .1, .9, [784, 100, 144, 10])

plt.show()

Output:

This diagram illustrates the flow from input neurons (784) through hidden layers to the output neurons (10).

Optimizing Training: Utilizing GPUs

Training neural networks, especially deep ones, can be computationally intensive and time-consuming. Leveraging Graphics Processing Units (GPUs) can significantly accelerate the training process. Here’s how you can utilize GPUs with Keras:

Ensure GPU Compatibility:
- Install NVIDIA CUDA Toolkit and cuDNN.
- Verify that your GPU is compatible with TensorFlow (the backend for Keras).
Install GPU-Compatible TensorFlow:

Java

pip install tensorflow-gpu

1

pip install tensorflow-gpu

Configure TensorFlow to Use GPU:

TensorFlow automatically detects and utilizes GPUs. However, you can explicitly specify GPU settings if needed.

import tensorflow as tf

gpus = tf.config.list_physical_devices('GPU')
if gpus:
    try:
        # Restrict TensorFlow to only use the first GPU
        tf.config.set_visible_devices(gpus[0], 'GPU')
        tf.config.experimental.set_memory_growth(gpus[0], True)
    except RuntimeError as e:
        print(e)

import tensorflow as tf

gpus = tf.config.list_physical_devices('GPU')

if gpus:

try:

# Restrict TensorFlow to only use the first GPU

tf.config.set_visible_devices(gpus[0], 'GPU')

tf.config.experimental.set_memory_growth(gpus[0], True)

except RuntimeError as e:

print(e)

Benefits of Using GPUs:

Parallel Processing: GPUs can handle multiple calculations simultaneously, ideal for matrix operations in neural networks.
Faster Training: Models train significantly quicker, allowing for more experimentation and faster iterations.

Note: Optimizing GPU usage may require additional configurations based on specific system setups.

Conclusion

Building a simple neural network for digit classification using Keras and the MNIST dataset is an excellent introduction to the world of machine learning and deep learning. By following this guide, you’ve learned how to:

Understand and preprocess the MNIST dataset.
Construct a neural network model with Keras.
Train and evaluate the model’s performance.
Visualize the network architecture.
Optimize training using GPUs.

While the model discussed achieves respectable accuracy, there’s ample room for improvement. Exploring more complex architectures, experimenting with different activation functions, or implementing regularization techniques can lead to enhanced performance. As you continue your machine learning journey, building upon these fundamentals will empower you to tackle more intricate and impactful projects.

Happy Coding!

S40L12 – A simple digit classifier

Building a Simple Neural Network for Digit Classification with Keras and MNIST

Table of Contents

Introduction to Neural Networks for Digit Classification

Understanding the MNIST Dataset

Setting Up the Environment

Loading and Exploring the MNIST Dataset

Visualizing Sample Images

Data Preprocessing: Reshaping and One-Hot Encoding

Reshaping the Data

One-Hot Encoding the Labels

Building the Neural Network Model with Keras

Model Architecture

Layers Explanation

Model Summary

Compiling the Model: Loss Function and Optimizer

Training the Model: Fitting and Validation

Evaluating the Model: Accuracy and Predictions

Making Predictions

Visualizing Predictions

Model Accuracy

Visualizing the Neural Network Structure

Optimizing Training: Utilizing GPUs

Conclusion