Video Background Replacement Using RobustVideoMatting and Python
This tutorial explains how to use the open‑source RobustVideoMatting project to perform human portrait segmentation and replace video backgrounds, covering environment setup, model loading, custom image‑and‑video matting functions, and final video composition with OpenCV.
Many video chat applications allow users to change the background by extracting the human portrait and replacing the non‑human area, often using simple image replacement; however, with image‑segmentation techniques you can achieve more sophisticated effects similar to movie visual effects.
The article introduces the RobustVideoMatting project (https://github.com/PeterL1n/RobustVideoMatting) and shows how to clone the repository, install dependencies, and download a pre‑trained model (either rvm_mobilenetv3.pth or rvm_resnet50.pth ).
git clone https://github.com/PeterL1n/RobustVideoMatting.git
cd RobustVideoMatting
pip install -r requirements_inference.txtAfter setting up, a Python script is created to load the model and run video matting:
import torch
from model import MattingNetwork
from inference import convert_video
# Choose mobilenetv3 or resnet50
model = MattingNetwork('mobilenetv3').eval().cuda() # or "resnet50"
model.load_state_dict(torch.load('rvm_mobilenetv3.pth'))
convert_video(
model,
input_source='input.mp4',
output_type='video',
output_composition='com.mp4',
output_alpha='pha.mp4',
output_video_mbps=4,
downsample_ratio=None,
seq_chunk=12,
)The script produces com.mp4 (a green‑screen video) and pha.mp4 (alpha mask). To perform custom image matting, the article defines a human_segment function that runs the model on a single image and returns a Pillow image with the segmented result:
import cv2
import torch
from PIL import Image
from torchvision.transforms import transforms
from model import MattingNetwork
device = "cuda" if torch.cuda.is_available() else "cpu"
segmentor = MattingNetwork('resnet50').eval().cuda()
segmentor.load_state_dict(torch.load('rvm_resnet50.pth'))
def human_segment(model, image):
src = (transforms.PILToTensor()(image) / 255.)[None].to(device)
with torch.no_grad():
fgr, pha, *rec = model(src)
segmented = torch.cat([src.cpu(), pha.cpu()], dim=1).squeeze(0).permute(1,2,0).numpy()
segmented = cv2.normalize(segmented, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)
return Image.fromarray(segmented)
human_segment(segmentor, Image.open('xscn.jpg')).show()For video matting, a loop reads frames, applies human_segment , and writes the result. An example using OpenCV is provided:
capture = cv2.VideoCapture("input.mp4")
while True:
ret, frame = capture.read()
if not ret:
break
image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
result = human_segment(segmentor, Image.fromarray(image))
result = cv2.cvtColor(np.array(result), cv2.COLOR_RGB2BGR)
cv2.imshow("result", result)
cv2.waitKey(10)
cv2.destroyAllWindows()The article then outlines the four steps for video background replacement: read foreground and background frames, perform matting on each foreground frame, composite the segmented foreground onto the new background, and write the composed frames to an output video.
A helper function change_background blends a segmented PNG image with a background image:
from PIL import Image
def change_background(image, background):
w, h = image.size
background = background.resize((w, h))
background.paste(image, (0, 0), image)
return backgroundFinally, the full video writing pipeline combines the foreground segmentation with a background video using OpenCV, handling differing video lengths and showing progress with tqdm :
# Read foreground and background videos
capture = cv2.VideoCapture("input.mp4")
capture_background = cv2.VideoCapture('background.mp4')
fps = capture.get(cv2.CAP_PROP_FPS)
width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
size = (width, height)
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter('output.mp4', fourcc, fps, size)
frames = min(capture.get(cv2.CAP_PROP_FRAME_COUNT), capture_background.get(cv2.CAP_PROP_FRAME_COUNT))
bar = tqdm(total=frames)
while True:
ret1, frame1 = capture.read()
ret2, frame2 = capture_background.read()
if not ret1 or not ret2:
break
image = cv2.cvtColor(frame1, cv2.COLOR_BGR2RGB)
segmented = human_segment(segmentor, Image.fromarray(image))
background = Image.fromarray(cv2.cvtColor(frame2, cv2.COLOR_BGR2RGB))
changed = change_background(segmented, background)
changed = cv2.cvtColor(np.array(changed), cv2.COLOR_RGB2BGR)
out.write(changed)
bar.update(1)
out.release()This complete workflow enables developers to replace video backgrounds without a green screen, leveraging deep‑learning‑based portrait matting and standard Python libraries.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.