Backend Development 6 min read

Downloading Music from NetEase Cloud Music Using Python Web Scraping

This article explains how to use Python's requests and lxml libraries to scrape NetEase Cloud Music's song list, extract song IDs, construct download URLs, and save the audio files locally, while also discussing common obstacles such as anti‑scraping measures and network errors.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Downloading Music from NetEase Cloud Music Using Python Web Scraping

Preface

Recently I wanted to download a few songs to my USB drive, but most music sites require a VIP subscription for popular tracks. As a programmer on a tight budget, I decided to explore free methods and settled on using a Python web crawler.

Implementation Steps

Identify where the music is stored – on the website's server.

Send HTTP requests to retrieve the page content.

Filter out the music file links.

Download the music files.

Specific Implementation

1. Import the third‑party library for sending HTTP requests.

<code>import requests  # sending network requests</code>

Installation command:

<code>pip install requests</code>

2. Import the library for parsing HTML.

<code>from lxml import etree  # data parsing library</code>

Installation command:

<code>pip install lxml</code>

3. Define the NetEase music list URL.

<code>url = 'https://music.163.com/#/discover/toplist?id=3778678'</code>

4. Send a request to obtain the page data.

<code>response = requests.get(url=url)  # request page data</code>

5. Parse the returned HTML.

<code>html = etree.HTML(response.text)  # parse page data</code>

6. Extract all song link elements (the tags).

<code>id_list = html.xpath('//a[contains(@href,"song?")]')  # all song ID collection</code>

Download each song.

<code>base_url = 'http://music.163.com/song/media/outer/url?id='  # download URL prefix
for data in id_list:
    href = data.xpath('./@href')[0]
    music_id = href.split('=')[1]  # music ID
    music_url = base_url + music_id  # full download URL
    music_name = data.xpath('./text()')[0]  # song name
    music = requests.get(url=music_url)
    with open('./music/%s.mp3' % music_name, 'wb') as file:
        file.write(music.content)
        print('<%s> download successful' % music_name)
</code>

Encountered Issues

The original method was learned from a video released half a year ago, but the site now returns errors. The extracted id_list no longer contains plain IDs but obfuscated code, indicating an anti‑scraping mechanism. Additionally, direct download URLs sometimes return a "network too crowded" error (code -460).

<code>base_url = 'http://music.163.com/song/media/outer/url?id='
music_id = '1804320463.mp3'
music_url = base_url + music_id
music = requests.get(url=music_url)
print(music.text)
</code>

The response is:

<code>{"msg":"网络太拥挤,请稍候再试!","code":-460,"message":"网络太拥挤,请稍候再试!"}</code>

Even though the URL appears valid in a browser, the request fails programmatically, suggesting additional verification steps are required.

Conclusion

Websites increasingly employ sophisticated anti‑scraping measures, making simple crawlers unreliable. This article shares my experience learning Python web scraping for music download and seeks advice on how to overcome the current obstacles to retrieve the audio files successfully.

PythonTutorialWeb ScrapingNetEaseRequestslxmlmusic-download
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.