Building a Simple Neural Network for Digit Classification with Keras and MNIST
Table of Contents
- Introduction to Neural Networks for Digit Classification
- Understanding the MNIST Dataset
- Setting Up the Environment
- Loading and Exploring the MNIST Dataset
- Data Preprocessing: Reshaping and One-Hot Encoding
- Building the Neural Network Model with Keras
- Compiling the Model: Loss Function and Optimizer
- Training the Model: Fitting and Validation
- Evaluating the Model: Accuracy and Predictions
- Visualizing the Neural Network Structure
- Optimizing Training: Utilizing GPUs
- Conclusion
Introduction to Neural Networks for Digit Classification
Neural networks have revolutionized the way machines interpret and analyze data, especially in the realm of image recognition. Digit classification, where the goal is to accurately identify handwritten digits, serves as a quintessential example for beginners to grasp the fundamentals of neural networks. By leveraging popular libraries like Keras and datasets like MNIST, building an effective digit classifier becomes both accessible and educational.
Understanding the MNIST Dataset
The MNIST (Modified National Institute of Standards and Technology) dataset is a cornerstone in the machine learning community. It comprises 70,000 grayscale images of handwritten digits (0-9), each sized at 28×28 pixels. The dataset is split into 60,000 training images and 10,000 testing images, making it ideal for training and validating machine learning models.
Key Features of MNIST:
- Size: 70,000 images (60k training, 10k testing)
- Image Dimensions: 28×28 pixels
- Classes: 10 (digits 0 through 9)
- Grayscale: Each pixel value ranges from 0 (black) to 255 (white)
Setting Up the Environment
Before diving into the model-building process, it’s essential to set up the development environment. Ensure you have Python installed, and consider using environments like Anaconda for managing packages and dependencies seamlessly.
Required Libraries:
- NumPy: For numerical operations
- Matplotlib: For data visualization
- Keras: For building and training neural networks
- Scikit-learn: For preprocessing utilities
- Pandas: For data manipulation
Installation Commands:
1 |
pip install numpy matplotlib keras scikit-learn pandas |
Loading and Exploring the MNIST Dataset
Keras simplifies the process of accessing the MNIST dataset through its datasets
module. Here’s how you can load and inspect the data:
1 2 3 4 5 6 7 8 9 |
import numpy as np import matplotlib.pyplot as plt from keras.datasets import mnist # Load MNIST handwritten digit data (X_train, y_train), (X_test, y_test) = mnist.load_data() print(X_train.shape) # Output: (60000, 28, 28) print(X_test.shape) # Output: (10000, 28, 28) |
Output:
1 2 |
(60000, 28, 28) (10000, 28, 28) |
This reveals that there are 60,000 training images and 10,000 testing images, each with dimensions 28×28 pixels.
Visualizing Sample Images
To get a feel for the data, let’s visualize a sample image:
1 2 3 4 |
img = X_train[250].reshape(28,28) plt.imshow(img, cmap="gray") plt.title(f"Sample Digit Label: {y_train[250]}") plt.show() |
Output:

Note: The actual image will display a handwritten digit corresponding to the label.
Data Preprocessing: Reshaping and One-Hot Encoding
Data preprocessing is a crucial step in machine learning workflows. For neural networks, it’s essential to format the data appropriately and encode the labels.
Reshaping the Data
Neural networks require input data in a specific shape. For the MNIST dataset, we’ll flatten the 28×28 pixel images into a 1D array of 784 elements.
1 2 3 4 5 |
from keras.utils import to_categorical # Reshape and normalize the image data X_train = X_train.reshape(60000, 28, 28).astype('float32') / 255 X_test = X_test.reshape(10000, 28, 28).astype('float32') / 255 |
One-Hot Encoding the Labels
One-hot encoding transforms categorical labels into a binary matrix, which is more suitable for training.
1 2 3 4 5 6 |
# One-hot encode the labels y_train = to_categorical(y_train, num_classes=10) y_test = to_categorical(y_test, num_classes=10) print(y_train.shape) # Output: (60000, 10) print(y_test.shape) # Output: (10000, 10) |
Output:
1 2 |
(60000, 10) (10000, 10) |
This indicates that each label is now represented as a 10-dimensional binary vector.
Building the Neural Network Model with Keras
Keras offers a user-friendly API to construct and train neural networks. We’ll build a simple Sequential model comprising multiple dense layers.
Model Architecture
Here’s a high-level overview of the model structure:
- Flatten Layer: Converts the 2D image data into a 1D array.
- Dense Layer 1: 100 neurons with sigmoid activation.
- Dense Layer 2: 144 neurons with sigmoid activation.
- Output Layer: 10 neurons (one for each digit) with softmax activation.
Layers Explanation
- Flatten Layer: Transforms the input data from a 2D matrix (28×28) to a 1D vector (784) to feed into the dense layers.
- Dense Layers: These are fully connected layers where each neuron receives input from all neurons in the preceding layer. Activation functions introduce non-linearity:
- Sigmoid Activation: Outputs values between 0 and 1, useful for binary classification but less common in hidden layers for multi-class problems.
- Softmax Activation: Converts the final layer outputs into probability distributions over the 10 classes.
Building the Model:
1 2 3 4 5 6 7 8 9 |
from keras.models import Sequential from keras.layers import Dense, Flatten # Create a simple Neural Network model model = Sequential() model.add(Flatten(input_shape=(28,28))) model.add(Dense(100, activation='sigmoid')) model.add(Dense(144, activation='sigmoid')) model.add(Dense(10, activation='softmax')) |
Model Summary
To visualize the model structure and parameters:
1 |
model.summary() |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
Model: "sequential_1" _______________________________________________________________ Layer (type) Output Shape Param # =============================================================== flatten_1 (Flatten) (None, 784) 0 _______________________________________________________________ dense_3 (Dense) (None, 100) 78500 _______________________________________________________________ dense_4 (Dense) (None, 144) 14444 _______________________________________________________________ dense_5 (Dense) (None, 10) 1450 =============================================================== Total params: 94,494 Trainable params: 94,494 Non-trainable params: 0 _______________________________________________________________ |
This summary provides insights into each layer’s output shape and the number of parameters to be trained.
Compiling the Model: Loss Function and Optimizer
Before training, the model needs to be compiled with a specified loss function and optimizer.
- Loss Function: Measures how well the model’s predictions match the actual labels.
- Categorical Crossentropy: Suitable for multi-class classification problems.
- Optimizer: Updates the model’s weights to minimize the loss function.
- Adam Optimizer: An efficient stochastic gradient descent method that adapts the learning rate.
Compiling the Model:
1 |
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc']) |
Training the Model: Fitting and Validation
Training involves feeding the model with data and allowing it to learn patterns through multiple epochs.
- Epochs: Number of times the entire dataset is passed through the network.
- Validation Data: Used to evaluate the model’s performance on unseen data after each epoch.
Training Process:
1 |
history = model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test)) |
Sample Output:
1 2 3 4 5 |
Epoch 1/10 1875/1875 [==============================] - 4s 2ms/step - loss: 0.9009 - acc: 0.7596 - val_loss: 0.3685 - val_acc: 0.8892 ... Epoch 10/10 1875/1875 [==============================] - 4s 2ms/step - loss: 0.2251 - acc: 0.9314 - val_loss: 0.2267 - val_acc: 0.9303 |
By the end of 10 epochs, the model achieves approximately 93% accuracy on both training and validation datasets, indicating a well-performing model.
Evaluating the Model: Accuracy and Predictions
After training, it’s crucial to assess the model’s performance and make predictions on new data.
Making Predictions
1 2 |
# Predicting on the test set predictions = model.predict(X_test) |
Each prediction consists of a probability distribution over the 10 classes. To determine the predicted class:
1 2 3 4 5 |
import numpy as np # Convert probabilities to class labels predicted_classes = np.argmax(predictions, axis=1) true_classes = np.argmax(y_test, axis=1) |
Visualizing Predictions
Let’s visualize some test images along with their predicted and true labels:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import matplotlib.pyplot as plt def plot_prediction(index): img = X_test[index].reshape(28,28) plt.imshow(img, cmap="gray") plt.title(f"True Label: {true_classes[index]} | Predicted: {predicted_classes[index]}") plt.show() # Plotting the first test image plot_prediction(0) # Plotting another sample plot_prediction(25) |
Output:
Displays the image with titles indicating both the true label and the model’s prediction.
Model Accuracy
The model achieved an accuracy of 93% on the validation set, demonstrating its capability to generalize well on unseen data. For improved performance, further tuning and more complex architectures can be explored.
Visualizing the Neural Network Structure
Understanding the architecture of a neural network can aid in comprehending how data flows and transformations occur. Below is a visual representation of the constructed neural network:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
import matplotlib.pyplot as plt def draw_neural_net(ax, left, right, bottom, top, layer_sizes): ''' Draws a neural network diagram. :param ax: Matplotlib Axes object :param left: Left boundary :param right: Right boundary :param bottom: Bottom boundary :param top: Top boundary :param layer_sizes: List containing the number of neurons in each layer ''' n_layers = len(layer_sizes) v_spacing = (top - bottom)/float(max(layer_sizes)) h_spacing = (right - left)/float(n_layers - 1) # Draw neurons for n, layer_size in enumerate(layer_sizes): layer_top = v_spacing*(layer_size - 1)/2. + (top + bottom)/2. for m in range(layer_size): circle = plt.Circle((n*h_spacing + left, layer_top - m*v_spacing), v_spacing/4., color='w', ec='k', zorder=4) ax.add_artist(circle) # Draw edges for n, (layer_size_a, layer_size_b) in enumerate(zip(layer_sizes[:-1], layer_sizes[1:])): layer_top_a = v_spacing*(layer_size_a - 1)/2. + (top + bottom)/2. layer_top_b = v_spacing*(layer_size_b - 1)/2. + (top + bottom)/2. for m in range(layer_size_a): for o in range(layer_size_b): line = plt.Line2D([n*h_spacing + left, (n + 1)*h_spacing + left], [layer_top_a - m*v_spacing, layer_top_b - o*v_spacing], c='k') ax.add_artist(line) # Drawing the neural network fig = plt.figure(figsize=(12, 12)) ax = fig.gca() ax.axis('off') draw_neural_net(ax, .1, .9, .1, .9, [784, 100, 144, 10]) plt.show() |
Output:

This diagram illustrates the flow from input neurons (784) through hidden layers to the output neurons (10).
Optimizing Training: Utilizing GPUs
Training neural networks, especially deep ones, can be computationally intensive and time-consuming. Leveraging Graphics Processing Units (GPUs) can significantly accelerate the training process. Here’s how you can utilize GPUs with Keras:
- Ensure GPU Compatibility:
- Install NVIDIA CUDA Toolkit and cuDNN.
- Verify that your GPU is compatible with TensorFlow (the backend for Keras).
- Install GPU-Compatible TensorFlow:
1pip install tensorflow-gpu
- Configure TensorFlow to Use GPU:
TensorFlow automatically detects and utilizes GPUs. However, you can explicitly specify GPU settings if needed.
12345678910import tensorflow as tfgpus = tf.config.list_physical_devices('GPU')if gpus:try:# Restrict TensorFlow to only use the first GPUtf.config.set_visible_devices(gpus[0], 'GPU')tf.config.experimental.set_memory_growth(gpus[0], True)except RuntimeError as e:print(e)
Benefits of Using GPUs:
- Parallel Processing: GPUs can handle multiple calculations simultaneously, ideal for matrix operations in neural networks.
- Faster Training: Models train significantly quicker, allowing for more experimentation and faster iterations.
Note: Optimizing GPU usage may require additional configurations based on specific system setups.
Conclusion
Building a simple neural network for digit classification using Keras and the MNIST dataset is an excellent introduction to the world of machine learning and deep learning. By following this guide, you’ve learned how to:
- Understand and preprocess the MNIST dataset.
- Construct a neural network model with Keras.
- Train and evaluate the model’s performance.
- Visualize the network architecture.
- Optimize training using GPUs.
While the model discussed achieves respectable accuracy, there’s ample room for improvement. Exploring more complex architectures, experimenting with different activation functions, or implementing regularization techniques can lead to enhanced performance. As you continue your machine learning journey, building upon these fundamentals will empower you to tackle more intricate and impactful projects.
Happy Coding!