Artificial Intelligence 9 min read

Video Background Replacement Using RobustVideoMatting and Python

This tutorial explains how to use the open‑source RobustVideoMatting project with Python, PyTorch, and OpenCV to perform human portrait segmentation and replace video backgrounds, covering repository setup, model loading, custom segmentation functions, and full video compositing steps.

Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Video Background Replacement Using RobustVideoMatting and Python

Many video chat applications allow background replacement by extracting the human portrait and swapping the background; this tutorial shows how to achieve a similar effect without a green screen using image‑segmentation techniques.

Project setup : clone the RobustVideoMatting repository, install the required Python packages, and download one of the pre‑trained models (e.g., rvm_mobilenetv3.pth or rvm_resnet50.pth ).

git clone https://github.com/PeterL1n/RobustVideoMatting.git
cd RobustVideoMatting
pip install -r requirements_inference.txt

After placing the model file in the project directory, create a Python script that loads the model and calls convert_video to generate a green‑screen video ( com.mp4 ) and its alpha mask ( pha.mp4 ).

import torch
from model import MattingNetwork
from inference import convert_video
model = MattingNetwork('mobilenetv3').eval().cuda()  # or "resnet50"
model.load_state_dict(torch.load('rvm_mobilenetv3.pth'))
convert_video(
    model,
    input_source='input.mp4',
    output_type='video',
    output_composition='com.mp4',
    output_alpha='pha.mp4',
    output_video_mbps=4,
    downsample_ratio=None,
    seq_chunk=12,
)

Image‑level segmentation function : because the original project does not expose a convenient API, a custom human_segment function is defined to run the model on a single image and return a Pillow image with the segmented foreground.

import cv2
import torch
from PIL import Image
from torchvision.transforms import transforms
from model import MattingNetwork

device = "cuda" if torch.cuda.is_available() else "cpu"
segmentor = MattingNetwork('resnet50').eval().cuda()
segmentor.load_state_dict(torch.load('rvm_resnet50.pth'))

def human_segment(model, image):
    src = (transforms.PILToTensor()(image) / 255.)[None].to(device)
    with torch.no_grad():
        fgr, pha, *rec = model(src)
        segmented = torch.cat([src.cpu(), pha.cpu()], dim=1).squeeze(0).permute(1,2,0).numpy()
        segmented = cv2.normalize(segmented, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)
        return Image.fromarray(segmented)

human_segment(segmentor, Image.open('xscn.jpg')).show()

Video‑level segmentation : using a DataLoader to read frames, the model processes each frame, composites it onto a green background, and writes the result with VideoWriter .

from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor
from inference_utils import VideoReader, VideoWriter

reader = VideoReader('input.mp4', transform=ToTensor())
writer = VideoWriter('output.mp4', frame_rate=30)

bgr = torch.tensor([.47, 1, .6]).view(3,1,1).cuda()
rec = [None] * 4
downsample_ratio = 0.25

with torch.no_grad():
    for src in DataLoader(reader):
        fgr, pha, *rec = model(src.cuda(), *rec, downsample_ratio)
        com = fgr * pha + bgr * (1 - pha)
        writer.write(com)

A simple frame‑by‑frame loop can also be used: read each frame with OpenCV, convert it to a Pillow image, run human_segment , convert back to BGR, and display the result.

capture = cv2.VideoCapture('input.mp4')
while True:
    ret, frame = capture.read()
    if not ret:
        break
    image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    result = human_segment(segmentor, Image.fromarray(image))
    result = cv2.cvtColor(np.array(result), cv2.COLOR_RGB2BGR)
    cv2.imshow('result', result)
    cv2.waitKey(10)
    cv2.destroyAllWindows()

Background‑swap workflow : the full pipeline consists of (1) reading human and background video frames, (2) segmenting each human frame, (3) compositing the segmented foreground onto the new background, and (4) writing the composited frames to an output video. Steps 1 and 2 are already covered; steps 3 and 4 are implemented below.

A helper function change_background pastes a segmented PNG onto a background image.

from PIL import Image

def change_background(image, background):
    w, h = image.size
    background = background.resize((w, h))
    background.paste(image, (0, 0), image)
    return background

The final video writer uses OpenCV to combine the segmented foreground with the chosen background video, handling differing video lengths by stopping when the shorter stream ends.

# Read human and background videos
capture = cv2.VideoCapture('input.mp4')
capture_background = cv2.VideoCapture('background.mp4')
fps = capture.get(cv2.CAP_PROP_FPS)
width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
size = (width, height)
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter('output.mp4', fourcc, fps, size)

while True:
    ret1, frame1 = capture.read()
    ret2, frame2 = capture_background.read()
    if not ret1 or not ret2:
        break
    image = cv2.cvtColor(frame1, cv2.COLOR_BGR2RGB)
    segmented = human_segment(segmentor, Image.fromarray(image))
    background = Image.fromarray(cv2.cvtColor(frame2, cv2.COLOR_BGR2RGB))
    changed = change_background(segmented, background)
    changed = cv2.cvtColor(np.array(changed), cv2.COLOR_RGB2BGR)
    out.write(changed)
out.release()

The script automatically selects the shorter video length for the output and can be further refined (e.g., optimizing color‑space conversions).

Pythonimage segmentationvideo processingPyTorchopencvBackground Replacement
Rare Earth Juejin Tech Community
Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.