Fundamentals 9 min read

Python Automation for Excel, Word, PDF, and PowerPoint

This tutorial shows how to use Python libraries such as pandas, python-docx, PyPDF2, and python-pptx to read, write, merge, and manipulate Excel, Word, PDF, and PowerPoint files, providing ready‑to‑run code snippets for each common automation scenario.

Test Development Learning Exchange
Test Development Learning Exchange
Test Development Learning Exchange
Python Automation for Excel, Word, PDF, and PowerPoint

This guide demonstrates how to use Python libraries to automate common tasks for Excel, Word, PDF, and PowerPoint files.

Excel Automation

1. Read an Excel file and display the first rows:

import pandas as pd
df = pd.read_excel('example.xlsx')
print(df.head())

2. Write a DataFrame to a new Excel file:

df.to_excel('output.xlsx', index=False)

3. Merge multiple Excel files into one:

import glob
all_dfs = []
for file in glob.glob("*.xlsx"):
    df = pd.read_excel(file)
    all_dfs.append(df)
combined_df = pd.concat(all_dfs, ignore_index=True)
combined_df.to_excel('merged.xlsx', index=False)

Word Automation

4. Read a Word document and print its paragraphs:

from docx import Document
doc = Document('example.docx')
for para in doc.paragraphs:
    print(para.text)

5. Create a new Word document and add a paragraph:

from docx import Document
doc = Document()
doc.add_paragraph('Hello World!')
doc.save('hello.docx')

11. Insert an image into a Word document:

from docx import Document
from docx.shared import Inches
doc = Document()
doc.add_picture('image.png', width=Inches(1.25))
doc.save('document.docx')

15. Protect a Word document (read‑only):

from docx import Document
doc = Document()
doc.add_paragraph('This is a protected document.')
doc.core_properties.protection = 'read-only'
doc.core_properties.content_status = 'Final'
doc.save('protected.docx')

19. Create a table in a Word document:

from docx import Document
doc = Document()
table = doc.add_table(rows=1, cols=3)
hdr_cells = table.rows[0].cells
hdr_cells[0].text = 'Name'
hdr_cells[1].text = 'Age'
hdr_cells[2].text = 'City'
doc.save('table.docx')

PDF Automation

6. Read text from the first page of a PDF:

import PyPDF2
pdf_file = open('example.pdf', 'rb')
read_pdf = PyPDF2.PdfReader(pdf_file)
print(read_pdf.pages[0].extract_text())

7. Convert an entire PDF to a plain‑text file:

from PyPDF2 import PdfReader
import io
reader = PdfReader(open("example.pdf", "rb"))
text = ''
for page in reader.pages:
    text += page.extract_text()
with io.open("output.txt", "w", encoding="utf-8") as text_file:
    text_file.write(text)

12. Merge multiple PDF files into one:

from PyPDF2 import PdfWriter, PdfReader
pdf_writer = PdfWriter()
for filename in ['file1.pdf', 'file2.pdf']:
    pdf_reader = PdfReader(filename)
    for page in range(len(pdf_reader.pages)):
        pdf_writer.add_page(pdf_reader.pages[page])
with open("merged.pdf", "wb") as out:
    pdf_writer.write(out)

16. Add a watermark to a PDF:

from PyPDF2 import PdfReader, PdfWriter
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
import io

def add_watermark(input_pdf_path, output_pdf_path, watermark_text):
    pdf_writer = PdfWriter()
    pdf_reader = PdfReader(input_pdf_path)
    for page in pdf_reader.pages:
        packet = io.BytesIO()
        can = canvas.Canvas(packet, pagesize=letter)
        can.setFont("Helvetica", 80)
        can.setFillColorRGB(0.5, 0.5, 0.5)
        can.drawString(100, 750, watermark_text)
        can.save()
        packet.seek(0)
        new_pdf = PdfReader(packet)
        page.merge_page(new_pdf.pages[0])
        pdf_writer.add_page(page)
    with open(output_pdf_path, "wb") as outputStream:
        pdf_writer.write(outputStream)

add_watermark('original.pdf', 'watermarked.pdf', 'CONFIDENTIAL')

20. Extract images from a PDF and save them as PNG files:

from PyPDF2 import PdfReader
from PIL import Image
pdf = PdfReader(open("example.pdf", "rb"))
for i, page in enumerate(pdf.pages):
    image_list = page.images
    for image_file_object in image_list:
        with open(f'image_{i}.png', 'wb') as fp:
            fp.write(image_file_object.data)

PowerPoint Automation

8. Read text from all slides in a PPTX file:

from pptx import Presentation
prs = Presentation('example.pptx')
for slide in prs.slides:
    for shape in slide.shapes:
        if hasattr(shape, "text"):
            print(shape.text)

9. Create a new slide with a title:

from pptx import Presentation
prs = Presentation()
slide_layout = prs.slide_layouts[1]
slide = prs.slides.add_slide(slide_layout)
title = slide.shapes.title
title.text = "Hello, PowerPoint!"
prs.save('test.pptx')

13. Batch replace text in a PPTX presentation:

from pptx import Presentation
prs = Presentation('input.pptx')
for slide in prs.slides:
    for shape in slide.shapes:
        if hasattr(shape, "text_frame"):
            for paragraph in shape.text_frame.paragraphs:
                for run in paragraph.runs:
                    run.text = run.text.replace('old_text', 'new_text')
prs.save('output.pptx')

17. Automatically generate a table‑of‑contents slide:

from pptx import Presentation
from pptx.util import Inches
prs = Presentation('example.pptx')
title_slide_layout = prs.slide_layouts[0]
slide = prs.slides.add_slide(title_slide_layout)
tf = slide.shapes.title.text_frame
tf.text = "Table of Contents"
# Simplified titles list
titles = ['Introduction', 'Section 1', 'Section 2']
for i, title in enumerate(titles):
    bullet_slide_layout = prs.slide_layouts[1]
    bullet_slide = prs.slides.add_slide(bullet_slide_layout)
    shapes = bullet_slide.shapes
    title_shape = shapes.title
    body_shape = shapes.placeholders[1]
    title_shape.text = title
    p = body_shape.text_frame.add_paragraph()
    p.text = f"Slide {i+2}"
prs.save('toc_presentation.pptx')

Additional Features

10. Filter rows in an Excel DataFrame:

filtered_df = df[df['Column Name'] == 'Some Value']

14. Create an Excel pivot table using pandas:

pivot_table = pd.pivot_table(df, values='Sales', index=['Category'], aggfunc=np.sum)

18. Generate a bar chart from a pandas DataFrame and save it as an image:

import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [2, 3, 4, 5]})
df.plot(kind='bar')
plt.savefig('chart.png')
automationPDFExcelpandaswordpython-docxpowerpoint
Test Development Learning Exchange
Written by

Test Development Learning Exchange

Test Development Learning Exchange

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.