Artificial Intelligence 6 min read

Automatic PDF Slide Transcription Using Deep Learning OCR

This article demonstrates how to automatically convert PDF slide decks into editable markdown text by first converting each page to images, then applying a deep‑learning OCR pipeline (CTPN for detection and CRNN for recognition) with Python code examples, achieving high transcription accuracy.

Python Programming Learning Circle

Jul 3, 2021

Automatic PDF Slide Transcription Using Deep Learning OCR

Many people need to turn PDF slides into editable text, but traditional tools are cumbersome. This article describes a project by Lucas Soares, a senior machine learning engineer at K1 Digital, who uses OCR to automatically transcribe PDF slides.

The workflow consists of three steps: converting each PDF page to an image, detecting and recognizing text in the images, and displaying example outputs.

First, the PDF is converted to PNG images using the pdf2image library:

from pdf2image import convert_from_path
from pdf2image.exceptions import (PDFInfoNotInstalledError,
    PDFPageCountError,
    PDFSyntaxError)

pdf_path = "path/to/file/intro_RL_Lecture1.pdf"
images = convert_from_path(pdf_path)
for i, image in enumerate(images):
    fname = "image" + str(i) + ".png"
    image.save(fname, "PNG")

Next, the OCR pipeline from the ocr.pytorch repository is used. The CTPN model detects text regions and the CRNN model recognizes the characters. The script processes each image, saves the annotated image and writes the recognized text to a .txt file.

# adapted from this source: https://github.com/courao/ocr.pytorch
%load_ext autoreload
%autoreload 2
import os
from ocr import ocr
import time
import shutil
import numpy as np
import pathlib
from PIL import Image
from glob import glob
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
import pytesseract

def single_pic_proc(image_file):
    image = np.array(Image.open(image_file).convert('RGB'))
    result, image_framed = ocr(image)
    return result, image_framed

image_files = glob('./input_images/*.*')
result_dir = './output_images_with_boxes/'

if os.path.exists(result_dir):
    shutil.rmtree(result_dir)
os.mkdir(result_dir)

for image_file in sorted(image_files):
    result, image_framed = single_pic_proc(image_file)  # detecting and recognizing the text
    filename = pathlib.Path(image_file).name
    output_file = os.path.join(result_dir, image_file.split('/')[-1])
    txt_file = os.path.join(result_dir, image_file.split('/')[-1].split('.')[0] + '.txt')
    txt_f = open(txt_file, 'w')
    Image.fromarray(image_framed).save(output_file)
    for key in result:
        txt_f.write(result[key][1] + '
')
    txt_f.close()

To visualize the results, OpenCV can load an output image, resize it, and display it:

import cv2 as cv

output_dir = pathlib.Path("./output_images_with_boxes")
image = cv.imread(f"{output_dir}/image7.png")
size_reshaped = (int(image.shape[1]), int(image.shape[0]))
image = cv.resize(image, size_reshaped)
cv.imshow("image", image)
cv.waitKey(0)
cv.destroyAllWindows()

Finally, the transcribed text can be read from the generated .txt file:

filename = f"{output_dir}/image7.txt"
with open(filename, "r") as text:
    for line in text.readlines():
        print(line.strip("
"))

The resulting tool can accurately transcribe various documents, from handwritten notes to random text in photos, providing a powerful self‑contained OCR solution without relying on external software.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deep learning image processing OCR PDF conversion

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.