Python 3 Practical Projects: PDF/Word Conversion, Image Processing, and OCR Tools
This tutorial presents seven Python3 utilities—including PDF‑to‑Word, image‑to‑PDF/Word, image compression, filtering, Excel conversion, and OCR—detailing required libraries, step‑by‑step procedures, and complete code examples to streamline everyday file‑format tasks.
The article introduces seven practical Python3 projects that automate common file‑format conversions and image processing tasks, providing step‑by‑step instructions and full code snippets for each tool.
1. PDF to Word
Using pdfplumber to extract text from a PDF and python-docx to create a Word document.
import pdfplumber
import docx
def pdf_to_word(pdf_path, word_path):
with pdfplumber.open(pdf_path) as pdf:
text = ""
for page in pdf.pages:
text += page.extract_text()
doc = docx.Document()
doc.add_paragraph(text)
doc.save(word_path)2. Image to PDF
Utilizing the Pillow library to combine multiple images of equal size into a single PDF file.
from PIL import Image
def image_to_pdf(image_paths, pdf_path):
images = []
for image_path in image_paths:
image = Image.open(image_path)
images.append(image)
images[0].save(pdf_path, save_all=True, append_images=images[1:])3. Image to Word
Inserting images into a Word document with python-docx and Pillow.
import docx
from PIL import Image
def image_to_word(image_path, word_path):
doc = docx.Document()
doc.add_picture(image_path)
doc.save(word_path)4. Image Compression
Compressing images by resizing and saving with Pillow’s optimization options.
from PIL import Image
def compress_image(image_path, compressed_path, quality=50):
image = Image.open(image_path)
image.save(compressed_path, optimize=True, quality=quality)5. Image Filtering (Gaussian Blur)
Applying Gaussian blur for noise reduction while preserving edges using OpenCV.
import cv2
def gaussian_blur(image_path, result_path):
img = cv2.imread(image_path)
blur = cv2.GaussianBlur(img, (5, 5), 0)
cv2.imwrite(result_path, blur)6. Image to Excel
Converting image pixel data into an Excel spreadsheet with Pillow and pandas.
import pandas as pd
from PIL import Image
def image_to_excel(image_paths, excel_path):
df = pd.DataFrame()
for i in range(len(image_paths)):
image = Image.open(image_paths[i])
data = list(image.getdata())
width, height = image.size
arr = pd.DataFrame(data, columns=['R', 'G', 'B'])
arr['row'] = [i + 1] * height
arr['col'] = list(range(1, width + 1)) * height
arr.set_index(['row', 'col'], inplace=True)
df = df.append(arr)
df = df.unstack(level=0)
df.columns = df.columns.droplevel()
df.index.name = None
df.columns.name = None
df.to_excel(excel_path)7. Text Recognition (OCR)
Extracting text from images using Pillow and pytesseract.
import pytesseract
from PIL import Image
def recognize_text(image_path):
image = Image.open(image_path)
text = pytesseract.image_to_string(image)
return textThese seven tools demonstrate how Python can efficiently handle everyday document and image processing needs.
Test Development Learning Exchange
Test Development Learning Exchange
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.