Artificial Intelligence 15 min read

Using MLP for Image Classification: Implementation, Results, and Limitations

The article demonstrates how a simple fully‑connected MLP can be trained on a small 64×64×3 cat‑vs‑non‑cat dataset, achieving perfect training accuracy but only 78 % test accuracy, and explains that parameter explosion, vanishing gradients, and lack of spatial invariance limit MLPs, motivating the shift to CNNs.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Using MLP for Image Classification: Implementation, Results, and Limitations

This article, written by Tencent MIG backend engineer Zheng Shanyou, explores the feasibility of using a Multi‑Layer Perceptron (MLP) to perform image classification in the era before convolutional neural networks (CNN). It serves as a conceptual bridge to a forthcoming CNN tutorial.

Data preparation : Three H5 files are used – train_catvnoncat.h5 (209 images, 64×64×3), test_catvnoncat.h5 (50 images, 64×64×3) and my_cat_misu.h5 (a single cat image). Each file contains two datasets: *_set_x (image tensors) and *_set_y (binary labels, 1 for cat, 0 otherwise).

Model architecture : The network is fully connected with the following layer dimensions:

layers_dims = [12288, 20, 7, 5, 1]

where 12288 = 64 × 64 × 3 is the flattened input size. The hidden layers contain 20, 7 and 5 neurons respectively, and the output layer has a single sigmoid neuron.

Training configuration :

Number of iterations: 10,000

Learning rate: 0.0075

The model is trained on the 209 training images, then evaluated on the 50‑image test set and on the single “my cat” image.

Results :

Training set accuracy: 100 % (the model perfectly memorizes the training data).

Test set accuracy: 78 % (significant over‑fitting).

Single‑image inference correctly identifies the cat.

Visualization code displays correctly classified images normally, dimmed images for true negatives, and red‑tinted images for misclassifications. Accuracy is shown as a title on each figure.

Analysis of MLP limitations :

Parameter explosion: a fully‑connected layer from a 1k × 1k image would require on the order of 10¹² weights, making computation infeasible.

Gradient vanishing in deep fully‑connected networks hampers training.

Lack of spatial invariance: MLP cannot handle small translations, rotations, or scaling of images without explicit data augmentation.

These drawbacks motivate the transition to convolutional neural networks (CNN) and more advanced architectures, which address both computational efficiency and spatial invariance.

Code excerpt (saving images to H5) :

def save_imgs_to_h5file(h5_fname, x_label, y_label, img_paths_list, img_label_list):
    data_imgs = np.random.rand(len(img_paths_list), 64, 64, 3).astype('int')
    label_imgs = np.random.rand(len(img_paths_list), 1).astype('int')
    for i in range(len(img_paths_list)):
        data_imgs[i] = np.array(plt.imread(img_paths_list[i]))
        label_imgs[i] = np.array(img_label_list[i])
    f = h5py.File(h5_fname, 'w')
    f.create_dataset(x_label, data=data_imgs)
    f.create_dataset(y_label, data=label_imgs)
    f.close()
    return data_imgs, label_imgs

The article concludes that solving the listed problems—parameter explosion, feature extraction without enlarging the dataset, and robustness to geometric transformations—requires CNNs, setting the stage for the next tutorial.

image classificationPythondeep learningneural networksmodel evaluationMLPh5py
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.