Artificial Intelligence 8 min read

Beat Detection: Concepts, Algorithms, and Applications

The article explains musical beat detection fundamentals, detailing traditional onset‑strength and dynamic‑programming algorithms (as in librosa), compares time‑domain and spectral methods, showcases deep‑learning advances, and describes practical applications such as audio visualisation, rhythm games, and QQ Music’s Super‑DJ automatic remix pipeline.

Tencent Music Tech Team
Tencent Music Tech Team
Tencent Music Tech Team
Beat Detection: Concepts, Algorithms, and Applications

In music, a beat is the basic temporal unit that defines the regular pattern of strong and weak pulses. When listening to a song, involuntary movements such as head‑bobbing, foot‑tapping, or clapping correspond to these beat positions.

Application directions include audio visualization (e.g., switching video scenes according to the beat), rhythm‑based games (e.g., Beat Master, beatmaps), and music stylization (e.g., QQ Music’s “Super DJ” feature).

Beat detection algorithm – the open‑source library librosa provides librosa.beat.beat_track , which implements a dynamic‑programming approach (Ellis, D. P. W., 2007). The process consists of three main steps:

Measure onset strength.

Estimate tempo from onset correlation.

Select peaks in the onset strength that are consistent with the estimated tempo.

Onset detection is a crucial sub‑task because both beat and tempo estimation rely on accurate identification of note onsets. Onsets typically occur at moments of sudden changes in energy, pitch, or timbre. Two common strategies are:

Time‑domain analysis: compute an energy envelope of the waveform and locate abrupt increases. This works well for percussive or plucked instruments but struggles with polyphonic textures.

Frequency‑domain analysis: examine short‑time spectral energy changes, which can be more robust for complex mixes.

The typical DSP workflow for beat detection is illustrated in the figure below:

A comparison of time‑domain energy envelope and short‑time spectral methods is summarized in the table:

Method

Principle

Suitable Scenarios

Unsuitable Scenarios

Time‑domain energy envelope

1. Energy envelope

2. Differential of envelope

3. Peak picking for onsets

Strong onset energy (percussion, plucked instruments)

String instruments / dense mixes

Short‑time spectral

1. Short‑time spectrogram

2. Differential spectrogram

3. Onset envelope

4. Peak picking

Strong onset energy and relatively simple mixes

Complex polyphonic mixes

Dynamic programming formulation (used by librosa) defines the following functions:

C(t): objective function.
O(t): onset energy from detection.
F(t): consistency evaluation between beat intervals and global tempo.

These equations are solved by dynamic programming to obtain the optimal beat sequence. The method performs well when the tempo is stable but degrades when tempo varies, because the third step relies on the tempo estimated in step two.

Empirical results on three audio types (piano, violin, vocal‑dominant) show strong performance on percussive piano tracks, weaker performance on violin, and poor results on vocal‑heavy music, as illustrated by the following figures:

To address the limitations of traditional DSP methods, recent deep‑learning approaches have achieved superior beat and downbeat detection. A typical network architecture is shown below, demonstrating high accuracy for both beat and downbeat estimation:

Super‑DJ feature in QQ Music automatically transforms a regular song into an electronic‑dance version. The pipeline consists of:

Extract MIR features (BPM, beat, downbeat, chord, time signature, chorus timestamps).

Define mixing rules and select loop samples based on the extracted features.

Apply time‑stretching to the original track, then generate loop tracks in real time according to the mixing templates.

Blend generated loops with the original audio and apply global modulation to produce the final EDM output.

The article concludes by summarizing the beat‑detection solutions developed by the QQ Music basic development team as part of the MIRnest system, and invites readers to try the Super‑DJ feature.

Reference:

Ellis, Daniel P. W. “Beat tracking by dynamic programming.” Journal of New Music Research 36.1 (2007): 51‑60.

deep learningDynamic ProgrammingAudio Analysisbeat detectionlibrosamusic information retrieval
Tencent Music Tech Team
Written by

Tencent Music Tech Team

Public account of Tencent Music's development team, focusing on technology sharing and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.