Artificial Intelligence 9 min read

Decoding Strategies for Generative Models: Top‑k, Top‑p, Contrastive Search, Beam Search, and Sampling

The article explains how generative models use deterministic methods like greedy and beam search and stochastic techniques such as top‑k, top‑p, contrastive search and sampling, describing their mechanisms, temperature control, repetition penalties, and practical trade‑offs for balancing fluency, diversity and coherence.

Baidu Geek Talk

Aug 21, 2023

Decoding Strategies for Generative Models: Top‑k, Top‑p, Contrastive Search, Beam Search, and Sampling

Generative models use two main categories of decoding methods: deterministic (e.g., greedy search and beam search) and stochastic (e.g., sampling, top‑k, top‑p, contrastive search). Deterministic methods often produce less natural text, while stochastic methods introduce randomness to improve diversity and fluency.

Top‑k sampling : At each decoding step the model keeps the k highest‑probability tokens and randomly selects one of them as the next token.

Top‑p (nucleus) sampling : The model sorts tokens by probability, accumulates them until the cumulative probability exceeds a threshold p , and then samples from this dynamic set.

Temperature controls randomness: a higher temperature yields a flatter distribution and more diverse output, while a lower temperature makes the distribution sharper and more deterministic.

Contrastive search : Combines model confidence with a degeneration penalty based on cosine similarity between the candidate token and previous context tokens. The penalty discourages repeats; when the penalty weight α is zero, contrastive search reduces to greedy decoding.

Code example for contrastive search:

output = model.generate(
    input_ids,
    penalty_alpha=0.6,  # α in contrastive search
    top_k=4,            # k in contrastive search
    max_length=512
)

Beam search : Keeps the num_beams most likely tokens at each step, expands them, and finally selects the highest‑probability sequence. It mitigates the risk of missing high‑probability sequences but can still produce repeated fragments.

An n‑gram repetition penalty can be applied to beam search to prevent duplicate n‑grams:

beam_output = model.generate(
    input_ids,
    max_length=50,
    num_beams=5,
    no_repeat_ngram_size=2, # prevent repeated 2‑grams
    early_stopping=True
)

Sampling (do_sample=True) : Makes generation nondeterministic. Lowering the temperature makes the distribution sharper; setting temperature to 0 collapses sampling back to greedy decoding, inheriting its repetition issues.

Example of activating sampling without top‑k:

sample_output = model.generate(
    input_ids,
    do_sample=True,
    max_length=50,
    top_k=0
)

Example with temperature control:

sample_output = model.generate(
    input_ids,
    do_sample=True,
    max_length=50,
    top_k=0,
    temperature=0.7
)

Combining top‑k and top‑p (and returning multiple sequences):

sample_outputs = model.generate(
    input_ids,
    do_sample=True,
    max_length=50,
    top_k=50,
    top_p=0.95,
    num_return_sequences=3
)

The article discusses practical trade‑offs: choosing appropriate decoding methods, randomness parameters, and temperature values based on the task and desired output characteristics. It also cites research indicating that high‑quality human language does not strictly follow maximum‑probability rules, highlighting the importance of incorporating randomness and creativity into generation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI Sampling Beam Search Text Generation contrastive search decoding methods Top‑K Top‑P

Written by

Baidu Geek Talk

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.