Captcha Generation and Recognition Using a Convolutional Neural Network – Project Overview and Implementation
This article presents a complete Python implementation for generating captcha images, loading and preprocessing data, defining a three‑layer convolutional neural network, and training and evaluating the model with TensorBoard, achieving over 99% training accuracy and 93% test accuracy.
The project code is divided into three parts: Generate_Captcha (generates training, validation, and test captcha images and reads image data and labels), cnn_model (the convolutional neural network), and driver (model training and evaluation).
1. Configuration
<code>class Config(object):
width = 160 # 验证码图片的宽
height = 60 # 验证码图片的高
char_num = 4 # 验证码字符个数
characters = range(10) # 数字[0,9]
test_folder = 'test' # 测试集文件夹,下同
train_folder = 'train'
validation_folder = 'validation'
tensorboard_folder = 'tensorboard' # tensorboard的log路径
generate_num = (5000, 500, 500) # 训练集,验证集和测试集数量
alpha = 1e-3 # 学习率
Epoch = 100 # 训练轮次
batch_size = 64 # 批次数量
keep_prob = 0.5 # dropout比例
print_per_batch = 20 # 每多少次输出结果
save_per_batch = 20 # 每多少次写入tensorboard</code>2. Generate Captcha (class Generate)
Provides methods check_path() to ensure required folders exist and gen_captcha() to create captcha images, storing them in the appropriate directories.
Example captcha image:
3. Data Loading (class ReadData)
read_data() returns a numpy.array of images and their labels (file names). label2vec() converts a label string into a one‑hot vector. Example conversion:
<code>label = '1327'
label_vec = [0,1,0,0,0,0,0,0,0,0,
0,0,0,1,0,0,0,0,0,0,
0,0,1,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,1,0,0]</code>load_data() loads all images from a folder and returns the image array, labels, and count.
4. Model Definition (cnn_model)
The network uses three convolutional layers, each with a filter size of 5. To mitigate over‑fitting, a dropout layer follows every convolution. The final layers reshape the feature maps into a matrix suitable for classification.
Model architecture illustration:
5. Training & Evaluation
next_batch() provides an iterator that yields data in batches. feed_data() feeds a batch to the model. The inputs are:
x : image array
y : image labels
keep_prob : dropout keep probability
evaluate() assesses the model on validation and test sets. run_model() orchestrates the full training‑and‑evaluation loop.
6. Current Results
After about 4,000 training iterations the model reaches >99% accuracy on the training set and ~93% on the test set, though slight over‑fitting is observed. Training runs on a CPU and takes roughly four hours per full run.
<code>Images for train :10000, for validation : 1000, for test : 1000
Epoch : 1
Step 0, train_acc: 7.42%, train_loss: 1.43, val_acc: 9.85%, val_loss: 1.40, improved:*
Step 20, train_acc: 12.50%, train_loss: 0.46, val_acc: 10.35%, val_loss: 0.46, improved:*
... (omitted intermediate steps) ...
Epoch : 51
Step 7860, train_acc: 100.00%, train_loss: 0.01, val_acc: 92.37%, val_loss: 0.08, improved:
Step 7880, train_acc: 99.61%, train_loss: 0.01, val_acc: 92.28%, val_loss: 0.08, improved:
Step 7900, train_acc: 100.00%, train_loss: 0.01, val_acc: 92.42%, val_loss: 0.08, improved:
Step 7920, train_acc: 100.00%, train_loss: 0.00, val_acc: 92.83%, val_loss: 0.08, improved:
No improvement for over 1000 steps, auto-stopping....
Test accuracy: 93.00%, loss: 0.08</code>7. TensorBoard
Before each training run the TensorBoard log directory is cleared to keep the visualizations tidy. Accuracy and loss curves are displayed as follows:
- END -
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.