Video Background Replacement Using RobustVideoMatting and Python
This tutorial explains how to use the open‑source RobustVideoMatting project with Python, PyTorch, and OpenCV to perform human portrait segmentation and replace video backgrounds, covering repository setup, model loading, custom segmentation functions, and full video compositing steps.
Many video chat applications allow background replacement by extracting the human portrait and swapping the background; this tutorial shows how to achieve a similar effect without a green screen using image‑segmentation techniques.
Project setup : clone the RobustVideoMatting repository, install the required Python packages, and download one of the pre‑trained models (e.g., rvm_mobilenetv3.pth or rvm_resnet50.pth ).
git clone https://github.com/PeterL1n/RobustVideoMatting.git
cd RobustVideoMatting
pip install -r requirements_inference.txtAfter placing the model file in the project directory, create a Python script that loads the model and calls convert_video to generate a green‑screen video ( com.mp4 ) and its alpha mask ( pha.mp4 ).
import torch
from model import MattingNetwork
from inference import convert_video
model = MattingNetwork('mobilenetv3').eval().cuda() # or "resnet50"
model.load_state_dict(torch.load('rvm_mobilenetv3.pth'))
convert_video(
model,
input_source='input.mp4',
output_type='video',
output_composition='com.mp4',
output_alpha='pha.mp4',
output_video_mbps=4,
downsample_ratio=None,
seq_chunk=12,
)Image‑level segmentation function : because the original project does not expose a convenient API, a custom human_segment function is defined to run the model on a single image and return a Pillow image with the segmented foreground.
import cv2
import torch
from PIL import Image
from torchvision.transforms import transforms
from model import MattingNetwork
device = "cuda" if torch.cuda.is_available() else "cpu"
segmentor = MattingNetwork('resnet50').eval().cuda()
segmentor.load_state_dict(torch.load('rvm_resnet50.pth'))
def human_segment(model, image):
src = (transforms.PILToTensor()(image) / 255.)[None].to(device)
with torch.no_grad():
fgr, pha, *rec = model(src)
segmented = torch.cat([src.cpu(), pha.cpu()], dim=1).squeeze(0).permute(1,2,0).numpy()
segmented = cv2.normalize(segmented, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)
return Image.fromarray(segmented)
human_segment(segmentor, Image.open('xscn.jpg')).show()Video‑level segmentation : using a DataLoader to read frames, the model processes each frame, composites it onto a green background, and writes the result with VideoWriter .
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor
from inference_utils import VideoReader, VideoWriter
reader = VideoReader('input.mp4', transform=ToTensor())
writer = VideoWriter('output.mp4', frame_rate=30)
bgr = torch.tensor([.47, 1, .6]).view(3,1,1).cuda()
rec = [None] * 4
downsample_ratio = 0.25
with torch.no_grad():
for src in DataLoader(reader):
fgr, pha, *rec = model(src.cuda(), *rec, downsample_ratio)
com = fgr * pha + bgr * (1 - pha)
writer.write(com)A simple frame‑by‑frame loop can also be used: read each frame with OpenCV, convert it to a Pillow image, run human_segment , convert back to BGR, and display the result.
capture = cv2.VideoCapture('input.mp4')
while True:
ret, frame = capture.read()
if not ret:
break
image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
result = human_segment(segmentor, Image.fromarray(image))
result = cv2.cvtColor(np.array(result), cv2.COLOR_RGB2BGR)
cv2.imshow('result', result)
cv2.waitKey(10)
cv2.destroyAllWindows()Background‑swap workflow : the full pipeline consists of (1) reading human and background video frames, (2) segmenting each human frame, (3) compositing the segmented foreground onto the new background, and (4) writing the composited frames to an output video. Steps 1 and 2 are already covered; steps 3 and 4 are implemented below.
A helper function change_background pastes a segmented PNG onto a background image.
from PIL import Image
def change_background(image, background):
w, h = image.size
background = background.resize((w, h))
background.paste(image, (0, 0), image)
return backgroundThe final video writer uses OpenCV to combine the segmented foreground with the chosen background video, handling differing video lengths by stopping when the shorter stream ends.
# Read human and background videos
capture = cv2.VideoCapture('input.mp4')
capture_background = cv2.VideoCapture('background.mp4')
fps = capture.get(cv2.CAP_PROP_FPS)
width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
size = (width, height)
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter('output.mp4', fourcc, fps, size)
while True:
ret1, frame1 = capture.read()
ret2, frame2 = capture_background.read()
if not ret1 or not ret2:
break
image = cv2.cvtColor(frame1, cv2.COLOR_BGR2RGB)
segmented = human_segment(segmentor, Image.fromarray(image))
background = Image.fromarray(cv2.cvtColor(frame2, cv2.COLOR_BGR2RGB))
changed = change_background(segmented, background)
changed = cv2.cvtColor(np.array(changed), cv2.COLOR_RGB2BGR)
out.write(changed)
out.release()The script automatically selects the shorter video length for the output and can be further refined (e.g., optimizing color‑space conversions).
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.