Using MLP for Image Classification: Implementation, Results, and Limitations
The article demonstrates how a simple fully‑connected MLP can be trained on a small 64×64×3 cat‑vs‑non‑cat dataset, achieving perfect training accuracy but only 78 % test accuracy, and explains that parameter explosion, vanishing gradients, and lack of spatial invariance limit MLPs, motivating the shift to CNNs.
This article, written by Tencent MIG backend engineer Zheng Shanyou, explores the feasibility of using a Multi‑Layer Perceptron (MLP) to perform image classification in the era before convolutional neural networks (CNN). It serves as a conceptual bridge to a forthcoming CNN tutorial.
Data preparation : Three H5 files are used – train_catvnoncat.h5 (209 images, 64×64×3), test_catvnoncat.h5 (50 images, 64×64×3) and my_cat_misu.h5 (a single cat image). Each file contains two datasets: *_set_x (image tensors) and *_set_y (binary labels, 1 for cat, 0 otherwise).
Model architecture : The network is fully connected with the following layer dimensions:
layers_dims = [12288, 20, 7, 5, 1]where 12288 = 64 × 64 × 3 is the flattened input size. The hidden layers contain 20, 7 and 5 neurons respectively, and the output layer has a single sigmoid neuron.
Training configuration :
Number of iterations: 10,000
Learning rate: 0.0075
The model is trained on the 209 training images, then evaluated on the 50‑image test set and on the single “my cat” image.
Results :
Training set accuracy: 100 % (the model perfectly memorizes the training data).
Test set accuracy: 78 % (significant over‑fitting).
Single‑image inference correctly identifies the cat.
Visualization code displays correctly classified images normally, dimmed images for true negatives, and red‑tinted images for misclassifications. Accuracy is shown as a title on each figure.
Analysis of MLP limitations :
Parameter explosion: a fully‑connected layer from a 1k × 1k image would require on the order of 10¹² weights, making computation infeasible.
Gradient vanishing in deep fully‑connected networks hampers training.
Lack of spatial invariance: MLP cannot handle small translations, rotations, or scaling of images without explicit data augmentation.
These drawbacks motivate the transition to convolutional neural networks (CNN) and more advanced architectures, which address both computational efficiency and spatial invariance.
Code excerpt (saving images to H5) :
def save_imgs_to_h5file(h5_fname, x_label, y_label, img_paths_list, img_label_list):
data_imgs = np.random.rand(len(img_paths_list), 64, 64, 3).astype('int')
label_imgs = np.random.rand(len(img_paths_list), 1).astype('int')
for i in range(len(img_paths_list)):
data_imgs[i] = np.array(plt.imread(img_paths_list[i]))
label_imgs[i] = np.array(img_label_list[i])
f = h5py.File(h5_fname, 'w')
f.create_dataset(x_label, data=data_imgs)
f.create_dataset(y_label, data=label_imgs)
f.close()
return data_imgs, label_imgsThe article concludes that solving the listed problems—parameter explosion, feature extraction without enlarging the dataset, and robustness to geometric transformations—requires CNNs, setting the stage for the next tutorial.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.