Artificial Intelligence 7 min read

Captcha Generation and Recognition Using a Convolutional Neural Network – Project Overview and Implementation

This article presents a complete Python implementation for generating captcha images, loading and preprocessing data, defining a three‑layer convolutional neural network, and training and evaluating the model with TensorBoard, achieving over 99% training accuracy and 93% test accuracy.

Python Programming Learning Circle

Mar 25, 2020

Captcha Generation and Recognition Using a Convolutional Neural Network – Project Overview and Implementation

The project code is divided into three parts: Generate_Captcha (generates training, validation, and test captcha images and reads image data and labels), cnn_model (the convolutional neural network), and driver (model training and evaluation).

1. Configuration

class Config(object):
    width = 160  # 验证码图片的宽
    height = 60  # 验证码图片的高
    char_num = 4  # 验证码字符个数
    characters = range(10)  # 数字[0,9]
    test_folder = 'test'    # 测试集文件夹，下同
    train_folder = 'train'
    validation_folder = 'validation'
    tensorboard_folder = 'tensorboard'  # tensorboard的log路径
    generate_num = (5000, 500, 500)  # 训练集，验证集和测试集数量
    alpha = 1e-3  # 学习率
    Epoch = 100  # 训练轮次
    batch_size = 64     # 批次数量
    keep_prob = 0.5     # dropout比例
    print_per_batch = 20    # 每多少次输出结果
    save_per_batch = 20     # 每多少次写入tensorboard

2. Generate Captcha (class Generate)

Provides methods check_path() to ensure required folders exist and gen_captcha() to create captcha images, storing them in the appropriate directories.

Example captcha image:

3. Data Loading (class ReadData) read_data() returns a numpy.array of images and their labels (file names). label2vec() converts a label string into a one‑hot vector. Example conversion:

label = '1327'
label_vec = [0,1,0,0,0,0,0,0,0,0,
            0,0,0,1,0,0,0,0,0,0,
            0,0,1,0,0,0,0,0,0,0,
            0,0,0,0,0,0,0,1,0,0]

load_data()

loads all images from a folder and returns the image array, labels, and count.

4. Model Definition (cnn_model)

The network uses three convolutional layers, each with a filter size of 5. To mitigate over‑fitting, a dropout layer follows every convolution. The final layers reshape the feature maps into a matrix suitable for classification.

Model architecture illustration:

5. Training & Evaluation next_batch() provides an iterator that yields data in batches. feed_data() feeds a batch to the model. The inputs are: x: image array y: image labels keep_prob: dropout keep probability evaluate() assesses the model on validation and test sets. run_model() orchestrates the full training‑and‑evaluation loop.

6. Current Results

After about 4,000 training iterations the model reaches >99% accuracy on the training set and ~93% on the test set, though slight over‑fitting is observed. Training runs on a CPU and takes roughly four hours per full run.

Images for train ：10000, for validation : 1000, for test : 1000
Epoch : 1
Step     0, train_acc:   7.42%, train_loss:  1.43, val_acc:   9.85%, val_loss:  1.40, improved:*
Step    20, train_acc:  12.50%, train_loss:  0.46, val_acc:  10.35%, val_loss:  0.46, improved:*
... (omitted intermediate steps) ...
Epoch : 51
Step  7860, train_acc: 100.00%, train_loss:  0.01, val_acc:  92.37%, val_loss:  0.08, improved: 
Step  7880, train_acc:  99.61%, train_loss:  0.01, val_acc:  92.28%, val_loss:  0.08, improved: 
Step  7900, train_acc: 100.00%, train_loss:  0.01, val_acc:  92.42%, val_loss:  0.08, improved: 
Step  7920, train_acc: 100.00%, train_loss:  0.00, val_acc:  92.83%, val_loss:  0.08, improved: 
No improvement for over 1000 steps, auto-stopping....
Test accuracy:  93.00%, loss:  0.08

7. TensorBoard

Before each training run the TensorBoard log directory is cleared to keep the visualizations tidy. Accuracy and loss curves are displayed as follows:

- END -

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Image Classification Python

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.