Cloud Computing 23 min read

Processing 2,500 Short Series in 3 Days: Cutting Costs by 60% with a Balanced AWS Architecture

This case study details how a media platform processed 2,500 short‑series episodes in three days using a serverless AWS pipeline that merged and transcoded videos while reducing total cost by 60% compared with a MediaConvert‑only solution.

Amazon Cloud Developers
Amazon Cloud Developers
Amazon Cloud Developers
Processing 2,500 Short Series in 3 Days: Cutting Costs by 60% with a Balanced AWS Architecture

Background and Requirements

The client needed to handle 2,500 existing short‑series, each containing 80‑120 episodes of 1‑3 minutes (200‑400 MB, HD). The tasks were to merge all episodes of a series into a single video and transcode it to a uniform 2.5 Mbps bitrate. Additional requirements included SRT/ASS subtitle merging, handling 200‑600 series per day, and completing all work before third‑party download links expired.

Technical Challenges

High concurrency : processing many series simultaneously without interference.

Resource scheduling : variable processing time per series required efficient compute allocation.

Cost control : media processing is compute‑intensive, demanding a low‑cost solution.

Fault tolerance : network downloads and video processing could fail, needing robust retry mechanisms.

Observability : real‑time progress monitoring and alerting were essential.

Solution Comparison

Solution 1 – Amazon Step Functions + Amazon Elemental MediaConvert

Pros : fully managed, no infrastructure to maintain; MediaConvert offers broadcast‑grade quality and advanced features (HDR, Dolby).

Cons : high per‑minute cost ($0.015‑$0.030/min), additional Lambda and S3 storage costs, double data transfer (download → S3 → MediaConvert), limited concurrency requiring a support ticket, and total cost exceeding $10,000 for 2,500 series.

Solution 2 – Amazon SQS + Auto Scaling Group + Amazon EC2

Pros : flexible instance selection (e.g., c5.2xlarge), cheaper compute than MediaConvert, direct download from the source without S3 upload.

Cons : complex scaling logic, scaling latency (2‑5 min instance start‑up), conservative down‑scaling, need to manage instance lifecycle, higher operational overhead.

Solution 3 – Amazon Lambda + Amazon Batch + Amazon Fargate (Chosen)

Service Overview :

Amazon Lambda – serverless function that reads metadata from S3, checks DynamoDB for completed tasks, and submits jobs to Batch.

Amazon Batch – manages job queues and automatically provisions Fargate compute environments.

Amazon Fargate – runs containerized FFmpeg workloads on a per‑second billing model.

Amazon DynamoDB – tracks task state (submitted, running, completed, failed).

Amazon S3 – stores input metadata, raw files (temporary), and final outputs.

Amazon CloudWatch + SNS – monitoring, dashboards, and alert notifications.

Advantages :

Lightweight task scheduling with no servers to manage.

Pay‑as‑you‑go pricing (Lambda per‑invocation, Fargate per‑second).

Fast start‑up (30‑60 s containers vs. EC2).

Automatic scaling – up to 100 concurrent tasks without manual Auto Scaling policies.

Cost per series: 2.5 h × $0.46 ≈ $1.15; total compute cost $2,875 for 2,500 series, plus modest storage.

Cost Comparison : MediaConvert total cost >$10,000; Solution 3 total cost $2,875 – a 60 % reduction.

Architecture Details

Data Flow :

Metadata JSON files for each series are stored in Amazon S3.

Lambda reads the metadata, checks DynamoDB for already‑processed series, and submits a Batch job.

Batch places the job in a queue; the Fargate compute environment pulls the job.

The container downloads all episode files (multi‑threaded, resumable), merges them with ffmpeg -f concat -safe 0 -i filelist.txt -c copy output.mp4, probes bitrate, and transcodes only if >4 Mbps using a fast preset.

Subtitles are merged (SRT concatenation, ASS event re‑timestamp).

Resulting video and subtitles are uploaded to S3 via multipart upload; DynamoDB is updated to “completed”.

Batch Job Definition :

Compute environment: Amazon Fargate, On‑Demand, max 800 vCPU (100 tasks × 8 vCPU).

Resources per task: 8 vCPU, 32 GB RAM, 100 GB temporary storage.

Retry policy: 1 automatic retry.

Timeout: 8 hours per task.

Container Image : Amazon Linux 2023 base with Python 3.11 and a statically compiled FFmpeg binary.

Cost Optimization Strategies

Pay‑as‑you‑go: only pay for Lambda invocations and Fargate runtime.

Smart transcoding: skip transcoding when source bitrate ≤4 Mbps.

Storage tiering: raw data kept in S3 Standard (30 TB) then moved to Glacier after 30 days; final outputs moved to Standard‑IA after 7 days.

Estimated total cost for 2,500 series: compute $2,875 + storage $345 / month + negligible Lambda, DynamoDB, CloudWatch fees.

Performance Tuning

Benchmark showed processing time distribution: download 10‑20 %, merge <1 %, transcoding 70‑90 %, upload <0.1 %. Optimizations included increasing HTTP connections for download, using -preset fast for FFmpeg, and multipart upload for S3.

Observability and Fault Tolerance

CloudWatch dashboards track RUNNABLE and RUNNING task counts, failure counts, completed tasks per hour, and average processing time.

Alarms trigger when RUNNABLE tasks exceed 30 min, failure rate >5 %, or task runtime >6 h.

Batch provides one automatic retry; the application adds download retries and records errors in DynamoDB.

Manual retry script scans DynamoDB for failed entries and resubmits them.

Operational Practices

Daily monitoring of CloudWatch dashboards and SNS alerts.

Progress scripts run hourly to summarize task states.

Scaling plan: start with 1 concurrent task, ramp to 50 for stress testing, then to 100 for production; Fargate quota can be increased on demand.

Results

In three days (72 hours) the system processed 2,500 series (≈250,000 episodes, 75 TB input, 20 TB output) at an average of 2.5 hours per series, achieving 800‑1,000 series per day, meeting the link‑expiry deadline, and reducing total cost by 60 % compared with the initial MediaConvert design.

Key Lessons

Choose services that match the problem; a simple FFmpeg + Fargate stack outperformed a heavyweight MediaConvert setup.

Containerization provides environment parity, easier testing, and version control.

Observability must be built from day 1; dashboards and alerts enable rapid issue detection.

Cost optimization should be considered in architecture, not as an afterthought.

Start with a small scale, validate, then expand progressively.

Infrastructure‑as‑code (e.g., AWS CDK) ensures reproducible environments and traceable changes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

serverlessBatch Processingcost optimizationAWSVideo ProcessingFargate
Amazon Cloud Developers
Written by

Amazon Cloud Developers

Official technical community of Amazon Cloud. Shares practical AI/ML, big data, database, modern app development, IoT content, offers comprehensive learning resources, hosts regular developer events, and continuously empowers developers.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.