Artificial Intelligence 16 min read

Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors

This paper investigates selection bias in large language models for multiple‑choice tasks, proposes metrics to quantify symbol‑content binding, introduces Reweighting Symbol‑Content Binding (RSCB) and Point‑wise Intelligent Feedback (PIF) methods, and demonstrates their effectiveness in reducing bias and improving accuracy, including a real‑world Tencent advertising feature‑evaluation deployment.

Tencent Advertising Technology
Tencent Advertising Technology
Tencent Advertising Technology
Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors

ACL 2024 accepted the paper "Strengthened Symbol Binding Makes Large Language Models Reliable Multiple‑Choice Selectors" from Tencent Advertising’s feature‑mining team, addressing the severe selection bias observed when large language models (LLMs) answer multiple‑choice questions.

The authors first illustrate the bias: models tend to favor certain option symbols (e.g., "B") regardless of the actual content, as shown in Figure 1. They adopt the "Answer‑moving Attack" to measure accuracy changes when the correct answer is moved to different symbols, defining a new metric PPA (Proportion of Plurality Agreement) to evaluate the model’s ability to consistently select the same content across all symbol permutations.

Experiments reveal that selection bias persists after supervised fine‑tuning (SFT) and varies with model size and dataset distribution. To mitigate the bias, the paper hypothesizes that strengthening the model’s Multiple Choice Symbol Binding (MCSB) capability will reduce bias.

Two training strategies are explored: (1) Symbol‑Content Binding (SCB), where the model predicts both the option symbol and its content, and (2) Reweighting Symbol‑Content Binding (RSCB), which applies a focal‑loss‑style weighting to prioritize the symbol token during training. The loss function is:

Loss = -\sum_{i}\alpha_i\,w_{symbol}\,\log p_{symbol}(i) - \beta\,\sum_{j}\log p_{content}(j)

To further improve, the authors introduce Point‑wise Intelligent Feedback (PIF), constructing negative samples by randomly pairing wrong contents with symbols and assigning them low scores, then combining cross‑entropy loss for positives with a tailored loss for negatives.

Extensive experiments on datasets such as MMLU and CSQA demonstrate that higher MCSB (measured by PPA) correlates with lower selection bias, confirming the hypothesis. The RSCB + PIF combination consistently reduces bias and improves accuracy across model sizes.

Real‑world deployment in Tencent Advertising’s feature‑accuracy evaluation shows that the improved LLM can reliably assess feature performance, providing accurate reasoning and correct symbol‑content matching, thereby enhancing monitoring and iteration of advertising features.

The paper concludes that strengthening symbol‑content binding is an effective avenue to alleviate selection bias in LLMs, with PIF offering a practical, low‑overhead solution compared to exhaustive permutation training.

reinforcement learningselection biasMultiple Choicepointwise feedbacksymbol binding
Tencent Advertising Technology
Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.