LLM Breakthroughs at EMNLP 2025: Embedding Compression, Complex Instructions, Knowledge Scaling

EMNLP 2025 in Suzhou showcases Taobao's booth featuring four cutting‑edge AI papers that introduce a novel embedding compression framework, an automatic iterative refinement method for complex instruction generation, a knowledge infusion scaling law for large language models, and a video caption optimization approach for text‑to‑video generation.

Large Language Modelsembedding compressioninstruction generation

0 likes · 7 min read

LLM Breakthroughs at EMNLP 2025: Embedding Compression, Complex Instructions, Knowledge Scaling

Kuaishou Tech

Oct 26, 2023 · Artificial Intelligence

SHARK: Efficient Embedding Compression for Large-Scale Recommendation Models

The paper introduces SHARK, a two‑component framework that uses a fast Taylor‑expanded permutation method to prune embedding tables and a frequency‑aware quantization scheme to apply mixed‑precision to embeddings, achieving up to 70% memory reduction and 30% QPS improvement in industrial short‑video and e‑commerce recommendation systems.

EfficiencyModel Pruningembedding compression

0 likes · 8 min read

SHARK: Efficient Embedding Compression for Large-Scale Recommendation Models

Alimama Tech

Jan 19, 2022 · Artificial Intelligence

Advances in Alibaba Search Advertising Estimation: Model Deepening, Interaction, and System Efficiency (2021 Review)

The 2021 review of Alibaba’s Mama Search Advertising estimation platform details advances in model deepening—such as hash‑based embedding compression, adaptive dynamic parameters and graph neural networks—model interaction via a multi‑stage cascade with ranking distillation and oracle bias, and system efficiency gains from HPC training, mixed‑precision, multi‑hash embeddings, and fp16 quantization that deliver roughly a thirty‑fold speed‑up.

Ad TechCTRCVR

0 likes · 34 min read