9 min read

Highlights of Meituan's ACL 2024 Papers: Speculative Decoding, Graph‑Structured Decoding, DolphCoder, and Instruction Fine‑tuning

This article reviews four ACL 2024 papers authored by Meituan’s research team—covering training cost reduction, speculative decoding, code generation optimization, and instruction fine‑tuning—while also announcing a live sharing session at the conference.

Meituan Technology Team

Aug 8, 2024

Highlights of Meituan's ACL 2024 Papers: Speculative Decoding, Graph‑Structured Decoding, DolphCoder, and Instruction Fine‑tuning

Overview

The Meituan technology team selected four papers accepted at ACL 2024 and provides detailed analyses of each work. The topics span training‑cost optimization, speculative decoding techniques, code‑generation improvements, and a deeper investigation of instruction fine‑tuning (IFT). The article also invites readers to a live streaming session on August 12 at 17:00 and a booth (No. 11) at the conference.

1. Speculative Decoding via Early‑exiting for Faster LLM Inference with Thompson Sampling Control Mechanism

Problem: Large language model (LLM) inference incurs high computational cost, limiting practical deployment.

Method: The authors introduce Early‑Exiting Speculative Decoding (EESD). After the first N layers, the model exits early to generate draft tokens, which are refined using a self‑distillation step. A Thompson‑sampling‑based controller automatically decides how many draft tokens to produce each round. The original LLM then validates the draft tokens in a single forward pass, guaranteeing that the final output matches standard autoregressive decoding.

Results: Experiments on 13‑billion and 70‑billion parameter models show a lossless speedup in token generation compared with prior methods, confirming the effectiveness of EESD.

2. Graph‑Structured Speculative Decoding

Problem: Conventional speculative decoding relies on a single hypothesis from a draft model, limiting the potential speedup.

Method: The authors generate multiple draft hypotheses and organize them in a directed acyclic graph (DAG). The DAG merges duplicate token sequences, allowing the system to predict and combine repeated tokens efficiently. This approach, called Graph‑Structured Decoding (GSD), reduces the computational burden of the draft model.

Results: Applying GSD to several LLMs, including a 70‑billion‑parameter LLaMA‑2, yields a generation speed increase of 1.73× – 1.96× over standard speculative decoding.

3. DolphCoder: Echo‑Locating Code Large Language Models with Diverse and Multi‑Objective Instruction Tuning

Problem: Existing Code LLMs achieve strong performance, yet further gains are needed for code generation tasks.

Method: The paper proposes DolphCoder, a self‑evaluating, diverse‑instruction model. It learns multiple instruction objectives and incorporates code‑evaluation goals, encouraging the model to produce varied yet correct solutions. The training combines diverse response generation with an internal code correctness estimator.

Results: DolphCoder outperforms baselines on HumanEval and MBPP benchmarks. The authors highlight two findings: (1) diverse instruction paths improve the model’s coding ability, and (2) better evaluation of solution correctness simultaneously enhances code creation.

4. Learning or Self‑aligning? Rethinking Instruction Fine‑tuning

Problem: Instruction fine‑tuning (IFT) is a core step for adapting LLMs, but the underlying mechanism—whether it injects new knowledge or merely aligns existing knowledge—is unclear.

Method: The authors design a knowledge‑perturbation analysis framework that separates behavior‑pattern changes from additional knowledge injection. By perturbing internal knowledge representations before and after IFT, they assess the contribution of each factor.

Findings: Experiments reveal that attempting to learn extra knowledge via IFT often yields no benefit or even harms performance. Maintaining internal knowledge consistency before and after fine‑tuning is crucial for successful IFT. The study concludes that IFT primarily works by self‑aligning the model’s existing knowledge rather than by learning new information.

Event Announcement

Meituan will host a booth (No. 11) at the ACL 2024 venue and stream a live paper‑reading session on August 12 at 17:00. Attendees are encouraged to reserve a spot and engage with the authors and technical experts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

code generation LLM Speculative Decoding Meituan instruction fine-tuning ACL 2024

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.