Python Script for Parsing and Downloading Tencent Video URLs
This article explains how to use a Python program to parse Tencent video URLs, retrieve the cached TS segments via HTTP requests, download them concurrently with multiprocessing, and finally merge the segments into a playable MP4 file, including environment setup and full source code.
Runtime Environment
IDE: PyCharm
Version: Python 3.6
OS: Windows
Goal and Approach
Goal: Parse and download the target URL of a Tencent video.
Approach: Obtain the video URL, use a third‑party VIP video parsing service to get the actual streaming URL, request the cached TS files, download them, and finally convert the TS files into an MP4 for normal playback.
Full Code
<code>import re</code><code>import os,shutil</code><code>import requests,threading</code><code>from urllib.request import urlretrieve</code><code>from pyquery import PyQuery as pq</code><code>from multiprocessing import Pool</code><code>'''</code><code>'''</code><code>class video_down():</code><code> def __init__(self,url):</code><code> # 拼接全民解析url</code><code> self.api='https://jx.618g.com'</code><code> self.get_url = 'https://jx.618g.com/?url=' + url</code><code> #设置UA模拟浏览器访问</code><code> self.head = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}</code><code> #设置多线程数量</code><code> self.thread_num=32</code><code> #当前已经下载的文件数目</code><code> self.i = 0</code><code> #调用网页获取</code><code> html = self.get_page(self.get_url)</code><code> if html:</code><code> # 解析网页</code><code> self.parse_page(html)</code><code> def get_page(self,get_url):</code><code> try:</code><code> print('正在请求目标网页....',get_url)</code><code> response=requests.get(get_url,headers=self.head)</code><code> if response.status_code==200:</code><code> print('请求目标网页完成....\n 准备解析....')</code><code> self.head['referer'] = get_url</code><code> return response.text</code><code> except Exception:</code><code> print('请求目标网页失败,请检查错误重试')</code><code> return None</code><code> def parse_page(self,html):</code><code> print('目标信息正在解析........')</code><code> doc=pq(html)</code><code> self.title=doc('head title').text()</code><code> print(self.title)</code><code> url = doc('#player').attr('src')[14:]</code><code> html=self.get_m3u8_1(url).strip()</code><code> self.url = url[:-10] +html</code><code> print(self.url)</code><code> print('解析完成,获取缓存ts文件.........')</code><code> self.get_m3u8_2(self.url)</code><code> def get_m3u8_1(self,url):</code><code> try:</code><code> response=requests.get(url,headers=self.head)</code><code> html=response.text</code><code> print('获取ts文件成功,准备提取信息')</code><code> return html[-20:]</code><code> except Exception:</code><code> print('缓存文件请求错误1,请检查错误')</code><code> def get_m3u8_2(self,url):</code><code> try:</code><code> response=requests.get(url,headers=self.head)</code><code> html=response.text</code><code> print('获取ts文件成功,准备提取信息')</code><code> self.parse_ts_2(html)</code><code> except Exception:</code><code> print('缓存文件请求错误2,请检查错误')</code><code> def parse_ts_2(self,html):</code><code> pattern=re.compile('.*?(.*?).ts')</code><code> self.ts_lists=re.findall(pattern,html)</code><code> print('信息提取完成......\n准备下载...')</code><code> self.pool()</code><code> def pool(self):</code><code> print('经计算需要下载%d个文件' % len(self.ts_lists))</code><code> self.ts_url = self.url[:-10]</code><code> if self.title not in os.listdir():</code><code> os.makedirs(self.title)</code><code> print('正在下载...所需时间较长,请耐心等待..')</code><code> pool=Pool(16)</code><code> pool.map(self.save_ts,[ts_list for ts_list in self.ts_lists])</code><code> pool.close()</code><code> pool.join()</code><code> print('下载完成')</code><code> self.ts_to_mp4()</code><code> def ts_to_mp4(self):</code><code> print('ts文件正在进行转录mp4......')</code><code> str='copy /b '+self.title+'\*.ts '+self.title+'.mp4'</code><code> os.system(str)</code><code> filename=self.title+'.mp4'</code><code> if os.path.isfile(filename):</code><code> print('转换完成,祝你观影愉快')</code><code> shutil.rmtree(self.title)</code><code> def save_ts(self,ts_list):</code><code> try:</code><code> ts_urls = self.ts_url + '{}.ts'.format(ts_list)</code><code> self.i += 1</code><code> print('当前进度%d/%d'%(self.i,len(self.ts_lists)))</code><code> urlretrieve(url=ts_urls, filename=self.title + '/{}.ts'.format(ts_list))</code><code> except Exception:</code><code> print('保存文件出现错误')</code><code>if __name__ == '__main__':</code><code> url='https://v.qq.com/x/cover/r6ri9qkcu66dna8.html'</code><code> url1='https://v.qq.com/x/cover/5c58griiqftvq00.html'</code><code> url2='https://v.qq.com/x/cover/lcpwn26degwm7t3/z0027injhcq.html'</code><code> url3='https://v.qq.com/x/cover/33bfp8mmgakf0gi.html'</code><code> video_down(url2)</code>Video Cache TS Files
The downloaded TS files are short video fragments that need to be merged into a single MP4 file for normal playback; the script uses a copy command to concatenate them.
Result
After execution, the TS segments are combined into an MP4 file, which can be played normally.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.