Artificial Intelligence 16 min read

Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors

This paper investigates selection bias in large language models for multiple‑choice tasks, proposes metrics to quantify symbol‑content binding, introduces Reweighting Symbol‑Content Binding (RSCB) and Point‑wise Intelligent Feedback (PIF) methods, and demonstrates their effectiveness in reducing bias and improving accuracy, including a real‑world Tencent advertising feature‑evaluation deployment.

Tencent Advertising Technology

Aug 13, 2024

ACL 2024 accepted the paper "Strengthened Symbol Binding Makes Large Language Models Reliable Multiple‑Choice Selectors" from Tencent Advertising’s feature‑mining team, addressing the severe selection bias observed when large language models (LLMs) answer multiple‑choice questions.

The authors first illustrate the bias: models tend to favor certain option symbols (e.g., "B") regardless of the actual content, as shown in Figure 1. They adopt the "Answer‑moving Attack" to measure accuracy changes when the correct answer is moved to different symbols, defining a new metric PPA (Proportion of Plurality Agreement) to evaluate the model’s ability to consistently select the same content across all symbol permutations.

Experiments reveal that selection bias persists after supervised fine‑tuning (SFT) and varies with model size and dataset distribution. To mitigate the bias, the paper hypothesizes that strengthening the model’s Multiple Choice Symbol Binding (MCSB) capability will reduce bias.

Two training strategies are explored: (1) Symbol‑Content Binding (SCB), where the model predicts both the option symbol and its content, and (2) Reweighting Symbol‑Content Binding (RSCB), which applies a focal‑loss‑style weighting to prioritize the symbol token during training. The loss function is:

Loss = -\sum_{i}\alpha_i\,w_{symbol}\,\log p_{symbol}(i) - \beta\,\sum_{j}\log p_{content}(j)

To further improve, the authors introduce Point‑wise Intelligent Feedback (PIF), constructing negative samples by randomly pairing wrong contents with symbols and assigning them low scores, then combining cross‑entropy loss for positives with a tailored loss for negatives.

Extensive experiments on datasets such as MMLU and CSQA demonstrate that higher MCSB (measured by PPA) correlates with lower selection bias, confirming the hypothesis. The RSCB + PIF combination consistently reduces bias and improves accuracy across model sizes.

Real‑world deployment in Tencent Advertising’s feature‑accuracy evaluation shows that the improved LLM can reliably assess feature performance, providing accurate reasoning and correct symbol‑content matching, thereby enhancing monitoring and iteration of advertising features.

The paper concludes that strengthening symbol‑content binding is an effective avenue to alleviate selection bias in LLMs, with PIF offering a practical, low‑overhead solution compared to exhaustive permutation training.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

reinforcement learning selection bias multiple choice pointwise feedback Symbol Binding

Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.