Tagged articles

126 articles

Page 1 of 2

May 26, 2026 · Artificial Intelligence

How On-Policy Distillation (OPD) Solves Core Challenges in Large-Model Post-Training

The article explains how On-Policy Distillation (OPD) combines on‑policy sampling with dense teacher feedback via reverse KL to address low signal density, distribution shift, and capability interference in large‑model post‑training, and compares implementations by Qwen3, GLM‑5, MiMo‑V2 and DeepSeek‑V4.

Knowledge DistillationModel CompressionOPD

0 likes · 20 min read

How On-Policy Distillation (OPD) Solves Core Challenges in Large-Model Post-Training

Baobao Algorithm Notes

May 22, 2026 · Artificial Intelligence

How LiteScale Cuts Wait Times in Large‑Model Post‑Training with Gradient Accumulation

The article examines the bottleneck of synchronous rollout in large‑model post‑training, proposes an asynchronous design using gradient accumulation and a global micro‑batch count to preserve loss equivalence, and introduces LogitsExpress for efficient top‑K knowledge‑distillation communication, all implemented in the lightweight LiteScale framework.

Knowledge Distillationasynchronous rolloutdistributed training

0 likes · 16 min read

How LiteScale Cuts Wait Times in Large‑Model Post‑Training with Gradient Accumulation

AIWalker

May 19, 2026 · Artificial Intelligence

How EUPE’s Three‑Stage Distillation Lets an 86M Model Run Classification, Segmentation and VLM on iPhone in 62 ms (SOTA)

EUPE introduces a three‑stage “scale‑then‑shrink” distillation pipeline that first trains a large proxy model to absorb heterogeneous expert knowledge and then compresses it into an 86M encoder, achieving state‑of‑the‑art performance on image classification, dense prediction and vision‑language tasks on an iPhone with only 62 ms latency.

EUPEEdge AIKnowledge Distillation

0 likes · 16 min read

How EUPE’s Three‑Stage Distillation Lets an 86M Model Run Classification, Segmentation and VLM on iPhone in 62 ms (SOTA)

AIWalker

May 19, 2026 · Artificial Intelligence

Why Attention Transfer Fails for DINOv2 and Other Modern ViTs: Architecture Mismatch Revealed

A large-scale benchmark of 20 pretrained ViT teachers across 11 families shows that attention copy and distillation improve some models but hurt others—especially DINOv2, CLIP, and BEiTv2—due to architecture mismatches, and adding the teachers' native components to students restores the lost performance.

Architecture CompatibilityAttention TransferKnowledge Distillation

0 likes · 13 min read

Why Attention Transfer Fails for DINOv2 and Other Modern ViTs: Architecture Mismatch Revealed

Machine Learning Algorithms & Natural Language Processing

May 14, 2026 · Artificial Intelligence

Turning Multi‑Teacher Conflict into Dynamic Constraints: Robust Reasoning Alignment for Multimodal LLMs (ICML 2026)

APO (Autonomous Preference Optimization) converts the drift and conflict among multiple teacher multimodal LLMs into dynamic negative constraints while treating consensus as a positive preference, enabling robust concept alignment and superior diagnostic accuracy on the CXR‑MAX benchmark, as demonstrated by extensive ICML‑2026 experiments.

APOICML 2026Knowledge Distillation

0 likes · 11 min read

Turning Multi‑Teacher Conflict into Dynamic Constraints: Robust Reasoning Alignment for Multimodal LLMs (ICML 2026)

Machine Heart

May 13, 2026 · Artificial Intelligence

Turning Multi-Teacher Conflict into Dynamic Constraints for Precise Multimodal Model Alignment (ICML 2026)

The paper introduces APO, a novel autonomous preference optimization framework that converts concept drift among multiple teacher multimodal LLMs into dynamic negative constraints and treats consensus as a positive preference, achieving robust concept alignment and surpassing strong teachers on a high‑risk medical X‑ray benchmark.

APOCXR-MAXICML 2026

0 likes · 11 min read

Turning Multi-Teacher Conflict into Dynamic Constraints for Precise Multimodal Model Alignment (ICML 2026)

Liangxu Linux

May 12, 2026 · Artificial Intelligence

How to Deploy Trained Neural Networks on Arduino and Raspberry Pi

Deploying large AI models to tiny embedded devices like Arduino and Raspberry Pi requires aggressive model slimming through quantization, pruning, and distillation, careful selection of runtimes such as TensorFlow Lite, and addressing power, latency, and debugging challenges to achieve real‑time inference.

ArduinoEmbedded AIKnowledge Distillation

0 likes · 7 min read

How to Deploy Trained Neural Networks on Arduino and Raspberry Pi

Baidu Maps Tech Team

Apr 27, 2026 · Artificial Intelligence

How an End-to-End Geolocation Search System Bridges Recall and Ranking

This article details an end-to-end geolocation search solution that unifies pre‑training, recall, and ranking with shared representations, knowledge distillation, and S2 spatial encoding to handle massive POI data, multi‑factor relevance, and real‑time response constraints.

Knowledge DistillationS2 geometryend-to-end retrieval

0 likes · 20 min read

How an End-to-End Geolocation Search System Bridges Recall and Ranking

CodeTrend

Apr 24, 2026 · Artificial Intelligence

How Large Language Models Acquire Tool‑Calling Ability: SFT, RLHF & LoRA Explained

The article explains why pretrained LLMs cannot call tools, then breaks down the three‑stage training pipeline—Supervised Fine‑Tuning, Reinforcement Learning from Human Feedback, and knowledge distillation—showing how each step teaches models to read tool schemas, decide when to invoke a tool, generate JSON calls, and finally transfer the capability to smaller models with LoRA.

AI trainingFunction CallingKnowledge Distillation

0 likes · 19 min read

How Large Language Models Acquire Tool‑Calling Ability: SFT, RLHF & LoRA Explained

Frontend AI Walk

Apr 21, 2026 · Artificial Intelligence

How to Distill Any Expert into an AI Skill: Elon Musk SOP Guide

This article walks you through a complete knowledge‑distillation workflow that turns Elon Musk’s decision‑making logic into a reusable AI skill, covering source collection, Obsidian setup, a six‑step prompting chain, adding personal commentary, and packaging the result for manual or automated AI use.

AI workflowClaudeElon Musk

0 likes · 21 min read

How to Distill Any Expert into an AI Skill: Elon Musk SOP Guide

Architect's Ambition

Apr 20, 2026 · Artificial Intelligence

How to Turn GitHub‑Trending AI Skills into Real‑World Agents with Knowledge Distillation

The article explains why generic AI is insufficient, defines a Skill as the minimal unit of specialized AI, and details a three‑layer knowledge‑distillation methodology—knowledge, logic, style—to build practical person‑ and book‑based AI Skills, illustrated with a complete Wang Yangming Skill implementation and common pitfalls.

AI SkillKnowledge Distillationagent development

0 likes · 12 min read

How to Turn GitHub‑Trending AI Skills into Real‑World Agents with Knowledge Distillation

Architect's Journey

Apr 16, 2026 · Industry Insights

What Really Scares Us When AI Starts “Distilling” Employees?

The article examines the hype around AI tools that turn employees' chats, documents, and emails into a compact “.skill” file, arguing that this so‑called “distillation” is merely knowledge capture, while true value—judgment, responsibility, and intuition—remains uncapturable and the anxiety surrounding it is deliberately manufactured.

AIDigital TwinKnowledge Distillation

0 likes · 6 min read

What Really Scares Us When AI Starts “Distilling” Employees?

Geek Labs

Apr 9, 2026 · Artificial Intelligence

Digital Survival Guide: Cyber Immortality and Anti‑Distillation

The article reviews three open‑source GitHub projects—nuwa.skill, yourself‑skill, and anti‑distill—that aim to distill a person’s thinking into AI‑driven digital twins, explore practical examples, and discuss how such tools can preserve personal knowledge while preventing corporate exploitation.

AI SkillKnowledge DistillationOpen Source

0 likes · 5 min read

Digital Survival Guide: Cyber Immortality and Anti‑Distillation

AI Software Product Manager

Apr 7, 2026 · Industry Insights

When AI‑Powered ‘Skills’ Replace Human Expertise: Risks and Realities

The article examines how open‑source projects that distill a colleague’s knowledge into AI‑driven “skill” files raise profound concerns about talent displacement, loss of apprenticeship pathways, and the unchecked power of automated decision‑making in the tech industry.

AIKnowledge Distillationautomation risk

0 likes · 14 min read

When AI‑Powered ‘Skills’ Replace Human Expertise: Risks and Realities

PaperAgent

Apr 6, 2026 · Artificial Intelligence

Can LLMs Self‑Improve After Deployment? Inside Microsoft’s Online Experiential Learning

Microsoft’s Online Experiential Learning framework lets large language models continuously self‑evolve after deployment by extracting experience from user interactions and consolidating it into model parameters, eliminating the need for human labels, reward models, or server‑side environment access, and demonstrating scalable gains across tasks and model sizes.

AI researchKnowledge DistillationLLM

0 likes · 9 min read

Can LLMs Self‑Improve After Deployment? Inside Microsoft’s Online Experiential Learning

AI Insight Log

Apr 6, 2026 · Artificial Intelligence

When a Colleague Is Cloned into an AI, the Anti‑Distillation Skill Strikes Back on GitHub

The article examines two open‑source Claude Code projects—colleague‑skill, which captures a departing teammate's knowledge into an AI‑driven Skill, and anti‑distill, which sanitizes that Skill to protect personal expertise—detailing their motivations, implementation, and community reaction.

AI SkillClaudeGitHub

0 likes · 7 min read

When a Colleague Is Cloned into an AI, the Anti‑Distillation Skill Strikes Back on GitHub

AIWalker

Mar 22, 2026 · Artificial Intelligence

Can a Single Vision Model Replace Multiple Specialized Networks? Nvidia’s New Aggregated Foundation Model

Nvidia’s latest aggregated vision foundation model consolidates detection, segmentation, and other visual tasks into one network, eliminating the complexity and resource waste of multi‑model stacks; the article explains the challenges of resolution balance and teacher distribution, outlines three model generations (RADIOv2.5, C‑RADIOv3, C‑RADIOv4), and details the novel multi‑teacher distillation techniques that boost performance across benchmarks.

Knowledge DistillationModel AggregationNVIDIA

0 likes · 6 min read

Can a Single Vision Model Replace Multiple Specialized Networks? Nvidia’s New Aggregated Foundation Model

AI Agent Research Hub

Mar 10, 2026 · Artificial Intelligence

How Knowledge Distillation Lets Neural Networks Grow Physical Symmetry Without Hard PINN Constraints

The paper introduces Ψ‑NN, a knowledge‑distillation framework that automatically discovers physics‑consistent network structures for PINNs, eliminating the need for manually imposed loss‑function constraints and achieving faster convergence, higher accuracy, and transferable architectures across PDE problems.

Hierarchical ClusteringKnowledge DistillationNetwork Structure Discovery

0 likes · 26 min read

How Knowledge Distillation Lets Neural Networks Grow Physical Symmetry Without Hard PINN Constraints

AIWalker

Mar 3, 2026 · Artificial Intelligence

How NanoSD Cuts 90% Parameters to Enable Real‑Time Photo Editing on Mobile

NanoSD distills Stable Diffusion 1.5 into a 130 M‑parameter model that runs inference in 20 ms on a Qualcomm SM8750 NPU, using hardware‑aware module pruning, module‑level knowledge distillation, and Bayesian optimization to achieve Pareto‑optimal quality‑efficiency trade‑offs for on‑device image restoration.

Bayesian OptimizationKnowledge DistillationModel Compression

0 likes · 14 min read

How NanoSD Cuts 90% Parameters to Enable Real‑Time Photo Editing on Mobile

PaperAgent

Mar 1, 2026 · Artificial Intelligence

How On-Policy Context Distillation Enables LLMs to Retain Experience Forever

On-Policy Context Distillation (OPCD) compresses transient in‑context knowledge into LLM parameters, allowing models to permanently retain problem‑solving experience without ground‑truth labels; the article details the OPCD framework, training steps, teacher‑student configurations, and experimental results on math, games, and system‑prompt tasks, highlighting its advantages over traditional context distillation.

Artificial IntelligenceKnowledge DistillationLLM

0 likes · 8 min read

How On-Policy Context Distillation Enables LLMs to Retain Experience Forever

Bighead's Algorithm Notes

Feb 20, 2026 · Artificial Intelligence

How Time Distillation Empowers Large Language Models for Time‑Series Forecasting (T‑LLM)

The paper introduces T‑LLM, a time‑distillation framework that transfers predictive behavior from a lightweight teacher model to a general‑purpose LLM, enabling accurate multivariate time‑series forecasting across full‑sample, few‑shot, and zero‑shot settings while eliminating the need for large‑scale pre‑training.

Knowledge DistillationT-LLMfew-shot learning

0 likes · 18 min read

How Time Distillation Empowers Large Language Models for Time‑Series Forecasting (T‑LLM)

Ximalaya Technology Team

Feb 11, 2026 · Artificial Intelligence

How Ximalaya Used Generative AI to Revolutionize Audio Recommendations

This article details Ximalaya's journey from traditional multi‑stage recommendation pipelines to generative AI‑driven models, covering business challenges, architectural and model differences, phased deployments, knowledge distillation, semantic ID encoding, decoder‑only strategies, extensive offline and online evaluations, and future research directions.

Encoder-DecoderKnowledge DistillationRecommendation Systems

0 likes · 24 min read

How Ximalaya Used Generative AI to Revolutionize Audio Recommendations

PaperAgent

Jan 17, 2026 · Artificial Intelligence

How Qwen3‑VL Embedding and Reranker Set New SOTA in Multimodal Retrieval

The article analyzes the Qwen3‑VL‑Embedding and Qwen3‑VL‑Reranker models, detailing their unified vector space, multi‑stage training pipeline, Matryoshka representation learning, quantization techniques, massive synthetic data generation, and benchmark results that push multimodal retrieval performance to a new state‑of‑the‑art.

EmbeddingKnowledge DistillationLarge Language Model

0 likes · 7 min read

How Qwen3‑VL Embedding and Reranker Set New SOTA in Multimodal Retrieval

HyperAI Super Neural

Jan 3, 2026 · Artificial Intelligence

Clone a Voice in 5 seconds with One‑Step Generation: Inside Chatterbox‑Turbo’s High‑Fidelity TTS

Resemble AI’s open‑source Chatterbox‑Turbo reduces TTS generation from ten steps to one, enabling high‑sample‑rate, lossless voice cloning from a 5‑10 second reference while supporting emotional control, side‑language tags, and embedded watermarking for real‑time applications across chatbots, games, podcasts, and education.

Chatterbox‑TurboKnowledge DistillationReal-time Inference

0 likes · 7 min read

Clone a Voice in 5 seconds with One‑Step Generation: Inside Chatterbox‑Turbo’s High‑Fidelity TTS

Network Intelligence Research Center (NIRC)

Dec 30, 2025 · Artificial Intelligence

Bridging Tokenizer Gaps: Cross-Tokenizer Knowledge Distillation at AAAI 2026

This paper introduces SeDi, a semantics‑ and distribution‑aware cross‑tokenizer knowledge distillation framework that aligns teacher and student token spaces via bipartite graph components and top‑K re‑encoding, achieving state‑of‑the‑art performance and lower exposure bias on multiple LLM benchmarks.

AI researchKnowledge DistillationLanguage Models

0 likes · 10 min read

Bridging Tokenizer Gaps: Cross-Tokenizer Knowledge Distillation at AAAI 2026

360 Smart Cloud

Dec 3, 2025 · Artificial Intelligence

How Model Distillation Enhances LLM Performance on the TLM Platform

This article explains the TLM large‑model development platform and details how knowledge distillation—using soft labels, temperature scaling, and combined loss functions—compresses teacher models into efficient student models, with practical steps and evaluation on the platform.

AIKnowledge DistillationLLM

0 likes · 5 min read

How Model Distillation Enhances LLM Performance on the TLM Platform

AsiaInfo Technology: New Tech Exploration

Nov 28, 2025 · Artificial Intelligence

Boosting 5G Complaint Intent Detection with Large-Model-Enhanced Few-Shot Learning

This paper presents a collaborative framework where a large language model generates high‑quality synthetic samples to augment a lightweight model, dramatically improving few‑shot user‑complaint intent recognition in 5G networks, achieving a 21% boost for rare categories and a 9% overall accuracy gain.

Knowledge DistillationLarge Language Modelcomplaint intent detection

0 likes · 27 min read

Boosting 5G Complaint Intent Detection with Large-Model-Enhanced Few-Shot Learning

Alibaba Cloud Big Data AI Platform

Nov 4, 2025 · Artificial Intelligence

How Alibaba Cloud’s PAI Powers Cutting‑Edge LLM Research at EMNLP 2025

EMNLP 2025 in Suzhou will feature Alibaba Cloud’s AI platform PAI presenting four accepted papers on knowledge distillation, small‑model reasoning, distilled reasoning models, and an automated RAG benchmark framework, alongside exhibition demos, networking events, and recruitment opportunities for AI talent.

AI PlatformEMNLP 2025Knowledge Distillation

0 likes · 10 min read

How Alibaba Cloud’s PAI Powers Cutting‑Edge LLM Research at EMNLP 2025

DataFunTalk

Oct 30, 2025 · Artificial Intelligence

How On-Policy Distillation Cuts LLM Training Cost by 90%

Thinking Machines Lab introduces On-Policy Distillation, a post‑training technique that matches reinforcement‑learning performance while reducing compute cost by up to tenfold, and demonstrates its effectiveness through extensive experiments on reasoning, personalization, and catastrophic‑forgetting mitigation.

Knowledge Distillationmodel efficiencyon-policy distillation

0 likes · 15 min read

How On-Policy Distillation Cuts LLM Training Cost by 90%

Bighead's Algorithm Notes

Oct 25, 2025 · Artificial Intelligence

Time Series Paper Digest: Extreme Event Prediction, Multimodal Fusion & Anomaly Detection

This article summarizes four recent arXiv papers on time‑series forecasting, covering a hierarchical knowledge‑distillation framework for extreme events, a graph‑enhanced multimodal fusion network, an interpretable unsupervised anomaly detector, and an adaptive masking loss that improves prediction accuracy.

Anomaly DetectionKnowledge DistillationTime-series

0 likes · 10 min read

Time Series Paper Digest: Extreme Event Prediction, Multimodal Fusion & Anomaly Detection

AI2ML AI to Machine Learning

Oct 13, 2025 · Artificial Intelligence

How Large‑and‑Small Language Model Collaboration Is Shaping the Future

The article argues that combining large, high‑capacity models with lightweight, fine‑tuned small models can cut costs, lower latency, enable specialized vertical tasks, and shift development from chasing ever‑bigger models toward optimal system architectures, outlining key techniques such as state‑space models, knowledge distillation, and staged fine‑tuning.

AI ArchitectureEfficiencyKnowledge Distillation

0 likes · 3 min read

How Large‑and‑Small Language Model Collaboration Is Shaping the Future

Huawei Cloud Developer Alliance

Oct 13, 2025 · Artificial Intelligence

How Quantization, Pruning, and Distillation Shrink AI Models for Edge Devices

This article explains the principles, key methods, and practical effects of model quantization, pruning, and knowledge distillation, comparing their advantages and disadvantages, and showing how combining these techniques enables compact, high‑performance AI models on resource‑constrained devices.

Edge AIKnowledge DistillationModel Compression

0 likes · 7 min read

How Quantization, Pruning, and Distillation Shrink AI Models for Edge Devices

Amap Tech

Sep 2, 2025 · Artificial Intelligence

How Pos2Distill Eliminates Positional Bias in Large Language Models

This article introduces Pos2Distill, a novel knowledge‑distillation framework that transfers capabilities from advantageous to disadvantaged positions in large language models, effectively mitigating positional bias and improving performance on long‑text retrieval and in‑context reasoning tasks.

Knowledge Distillationin-context reasoninglarge language models

0 likes · 10 min read

How Pos2Distill Eliminates Positional Bias in Large Language Models

Data Party THU

Aug 17, 2025 · Artificial Intelligence

Why Do Large Language Models Hallucinate? Unpacking the Probabilistic Roots and Fixes

Large language models often generate confident but false statements—a phenomenon called hallucination—because they predict the next token based on statistical patterns rather than factual understanding, and this article explains the underlying mechanisms and practical mitigation strategies.

Knowledge DistillationLLMRLHF

0 likes · 11 min read

Why Do Large Language Models Hallucinate? Unpacking the Probabilistic Roots and Fixes

Alibaba Cloud Big Data AI Platform

Jul 23, 2025 · Artificial Intelligence

Unlock Efficient LLMs: How Alibaba’s PAI EasyDistill Powers Model Post‑Training

This article explains how Alibaba Cloud's AI platform PAI leverages the EasyDistill framework for post‑training model optimization, covering knowledge distillation concepts, data synthesis techniques, basic and advanced distillation training, the DistilQwen model family, real‑world customer cases, and step‑by‑step practical demos.

AI PlatformEasyDistillKnowledge Distillation

0 likes · 12 min read

Unlock Efficient LLMs: How Alibaba’s PAI EasyDistill Powers Model Post‑Training

Kuaishou Tech

Jul 21, 2025 · Artificial Intelligence

Can AI Models Think on Demand? Inside KAT‑V1 AutoThink’s Dynamic Reasoning

The article introduces KAT‑V1 AutoThink, a dual‑mode large language model that automatically switches between thinking and non‑thinking modes based on problem difficulty, details its novel training paradigm, reinforcement‑learning enhancements, performance benchmarks against leading open‑source models, and provides open‑source resources for further research.

Knowledge DistillationLarge Language Modelauto-think

0 likes · 14 min read

Can AI Models Think on Demand? Inside KAT‑V1 AutoThink’s Dynamic Reasoning

AI Frontier Lectures

Jul 17, 2025 · Artificial Intelligence

Top 8 Tencent Youtu Papers Accepted at ICCV 2025: Innovations in AI and Vision

The 20th ICCV conference announced 8 papers from Tencent Youtu Lab covering stylized face recognition, AI‑generated image detection, heterogeneous knowledge distillation, multi‑conditional diffusion, multimodal LLM distillation, palmprint recognition, low‑light vision, and oracle bone script decipherment, each pushing the frontier of computer vision and AI research.

Artificial IntelligenceICCV 2025Knowledge Distillation

0 likes · 17 min read

Top 8 Tencent Youtu Papers Accepted at ICCV 2025: Innovations in AI and Vision

Alibaba Cloud Big Data AI Platform

Jun 13, 2025 · Artificial Intelligence

How EasyDistill Cuts LLM Costs: Mastering DistilQwen-ThoughtX on Alibaba Cloud

EasyDistill, an open-source framework from Alibaba Cloud PAI, streamlines knowledge distillation for large language models, introducing the DistilQwen-ThoughtX series with variable-length chain-of-thought reasoning, and provides comprehensive best-practice guidance for training, fine-tuning, evaluation, compression, and deployment via the PAI-ModelGallery.

AI inferenceKnowledge DistillationLLM

0 likes · 12 min read

How EasyDistill Cuts LLM Costs: Mastering DistilQwen-ThoughtX on Alibaba Cloud

AI Frontier Lectures

Jun 9, 2025 · Artificial Intelligence

AI Research Highlights: Robo-DM, DeepKD, LLM Security, and Reasoning Innovations

This roundup presents recent AI breakthroughs, including Robo‑DM’s efficient robot dataset management, DeepKD’s decoupled knowledge‑distillation trainer, a novel informed white‑box attack exposing weaknesses in LLM alignment defenses, the RePPL hallucination detector, Self‑GIVE’s associative reasoning framework, and LLM‑driven RL ensemble methods.

AIKnowledge DistillationReasoning

0 likes · 15 min read

AI Research Highlights: Robo-DM, DeepKD, LLM Security, and Reasoning Innovations

AIWalker

Jun 3, 2025 · Artificial Intelligence

DeepKD: Double‑Layer Decoupling and Adaptive Denoising Set New ImageNet SOTA

DeepKD introduces a double‑layer decoupling framework and a dynamic top‑K mask that adaptively denoises low‑confidence logits, addressing conflicts between target and non‑target knowledge flows; extensive experiments on CIFAR‑100, ImageNet‑1K, and MS‑COCO demonstrate consistent accuracy gains and state‑of‑the‑art performance.

GSNRKnowledge DistillationModel Compression

0 likes · 23 min read

DeepKD: Double‑Layer Decoupling and Adaptive Denoising Set New ImageNet SOTA

Alibaba Cloud Big Data AI Platform

May 28, 2025 · Artificial Intelligence

How EasyDistill Simplifies LLM Knowledge Distillation for Faster, Smaller Models

EasyDistill, an open‑source toolkit from Alibaba Cloud AI Platform, streamlines knowledge distillation of large language models by offering modular data synthesis, black‑box and white‑box training, reinforcement‑learning and preference‑optimization techniques, enabling the creation of compact, high‑performance DistilQwen models and accompanying datasets.

DistilQwenEasyDistillKnowledge Distillation

0 likes · 17 min read

How EasyDistill Simplifies LLM Knowledge Distillation for Faster, Smaller Models

Alimama Tech

Apr 23, 2025 · Artificial Intelligence

Explainable LLM-driven Multi-dimensional Distillation for E-Commerce Relevance Learning

The paper introduces an explainable LLM framework (ELLM‑rele) that uses chain‑of‑thought reasoning and a multi‑dimensional knowledge distillation pipeline to compress large‑model relevance judgments into lightweight student models, achieving superior offline relevance scores and online click‑through and conversion improvements in Taobao’s search advertising.

Knowledge DistillationLLMchain-of-thought

0 likes · 17 min read

Explainable LLM-driven Multi-dimensional Distillation for E-Commerce Relevance Learning

Alibaba Cloud Big Data AI Platform

Mar 29, 2025 · Artificial Intelligence

How DistilQwen2.5‑R1 Boosts Small‑Model Reasoning with Innovative Knowledge Distillation

The article introduces the DistilQwen2.5‑R1 series, which leverages a novel knowledge‑distillation pipeline—including CoT data evaluation, improvement, and validation—to transfer deep reasoning abilities from large models like DeepSeek‑R1 to compact models, achieving superior performance across math, code, and scientific benchmarks and providing open‑source checkpoints and deployment guides for practical use.

AI inferenceKnowledge DistillationModel Compression

0 likes · 17 min read

How DistilQwen2.5‑R1 Boosts Small‑Model Reasoning with Innovative Knowledge Distillation

Tencent Cloud Developer

Mar 25, 2025 · Artificial Intelligence

Knowledge Distillation in Diffusion Models: Techniques and Applications

The article explains how knowledge distillation transfers capabilities from large to smaller diffusion models, covering hard and soft labels, temperature scaling, and contrasting it with data distillation, while detailing techniques such as consistency models, progressive distillation, adversarial distillation, and adversarial post‑training for model compression and step reduction.

Knowledge DistillationModel Compressionadversarial post-training

0 likes · 19 min read

Knowledge Distillation in Diffusion Models: Techniques and Applications

Network Intelligence Research Center (NIRC)

Mar 10, 2025 · Artificial Intelligence

Revisiting Knowledge Distillation for Autoregressive Language Models

The article analyzes why larger teacher models can hurt student performance in autoregressive language model distillation, reveals that different tokens require distinct teaching modes, proposes an Adaptive Token‑wise Knowledge Distillation (ATKD) method, and shows through extensive experiments that ATKD consistently improves accuracy by about 3 % and enhances generalization across model sizes.

Knowledge DistillationModel Compressionadaptive teaching

0 likes · 9 min read

Revisiting Knowledge Distillation for Autoregressive Language Models

IT Architects Alliance

Feb 26, 2025 · Artificial Intelligence

DeepSeek Large Model: Core Architecture, Key Technologies, and Training Strategies

The article provides an in‑depth overview of DeepSeek’s large language model, detailing its mixture‑of‑experts and Transformer foundations, novel attention mechanisms, load‑balancing, multi‑token prediction, FP8 mixed‑precision training, and various training regimes such as knowledge distillation and reinforcement learning.

DeepSeekFP8Knowledge Distillation

0 likes · 18 min read

DeepSeek Large Model: Core Architecture, Key Technologies, and Training Strategies

Alibaba Cloud Big Data AI Platform

Feb 25, 2025 · Artificial Intelligence

How DistilQwen2.5 Boosts LLM Efficiency with Dual‑Stage Knowledge Distillation

This article introduces DistilQwen2.5, a lightweight LLM series built on Qwen2.5 that uses a novel two‑layer distillation framework, instruction‑data optimization, and parameter‑fusion techniques to achieve higher performance while drastically reducing computational cost and deployment overhead.

Efficient InferenceKnowledge DistillationLLM

0 likes · 26 min read

How DistilQwen2.5 Boosts LLM Efficiency with Dual‑Stage Knowledge Distillation

Architecture Digest

Feb 25, 2025 · Artificial Intelligence

DeepSeek Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges

DeepSeek’s distillation technology combines data and model distillation to transfer knowledge from large teacher models to compact student models, detailing its definitions, principles, key innovations, architecture, training methods, performance gains, and challenges, especially in multimodal contexts.

AI researchDeepSeekKnowledge Distillation

0 likes · 16 min read

DeepSeek Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges

AsiaInfo Technology: New Tech Exploration

Feb 24, 2025 · Artificial Intelligence

Can Multi‑Teacher Distillation Overcome Catastrophic Forgetting in Continual Learning?

This paper proposes a multi‑teacher distillation framework for continual learning that combines active data rehearsal with feature‑decoupled distillation, demonstrating superior performance on PASCAL VOC and COCO benchmarks while mitigating catastrophic forgetting and balancing stability‑plasticity trade‑offs.

AICatastrophic ForgettingKnowledge Distillation

0 likes · 12 min read

Can Multi‑Teacher Distillation Overcome Catastrophic Forgetting in Continual Learning?

Su San Talks Tech

Feb 23, 2025 · Artificial Intelligence

How DeepSeek’s Distillation Breaks AI Model Limits: Core Principles & Performance

This article explores DeepSeek’s cutting‑edge distillation technology, detailing its definition, underlying principles, innovative data‑model fusion, architecture choices, training strategies, performance gains over large language models, and the remaining challenges in knowledge transfer and multimodal data processing.

DeepSeekKnowledge DistillationModel Compression

0 likes · 16 min read

How DeepSeek’s Distillation Breaks AI Model Limits: Core Principles & Performance

Architects' Tech Alliance

Feb 18, 2025 · Artificial Intelligence

How to Distill DeepSeek LLMs into Lightweight Models for Local Deployment

This article explains DeepSeek's knowledge‑distillation approach for compressing large language models into small, efficient student models, details step‑by‑step local deployment requirements, performance optimizations, and highlights the cost, privacy, and application benefits of running the distilled model on‑premise.

AI inferenceDeepSeekKnowledge Distillation

0 likes · 10 min read

How to Distill DeepSeek LLMs into Lightweight Models for Local Deployment

JD Tech Talk

Feb 13, 2025 · Artificial Intelligence

DeepSeek R1: Concept Overview, Training Principles, and Practical Implementations

This article introduces the DeepSeek family of models, explains the concepts of online search and deep reasoning, details the two‑phase training pipeline with data augmentation and reinforcement learning, and showcases practical experiments and deployment examples for the R1 and distilled variants.

DeepSeekKnowledge DistillationLLM

0 likes · 10 min read

DeepSeek R1: Concept Overview, Training Principles, and Practical Implementations

JD Cloud Developers

Feb 13, 2025 · Artificial Intelligence

Unlocking DeepSeek R1: Concepts, Training Secrets, and Real-World Experiments

This article demystifies DeepSeek R1 by explaining key concepts such as online search integration and the R1 model, detailing its two‑phase training pipeline, core techniques like iterative data enhancement, and showcases practical reproductions, benchmark tests, and deployment examples for AI developers.

DeepSeekKnowledge DistillationLarge Language Model

0 likes · 12 min read

Unlocking DeepSeek R1: Concepts, Training Secrets, and Real-World Experiments

Architects' Tech Alliance

Feb 12, 2025 · Artificial Intelligence

DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data

The article examines DeepSeek‑V3’s low‑cost training using 2048 H800 GPUs, explains how knowledge distillation and high‑quality data improve efficiency, discusses expert concerns about training on AI‑generated content, and outlines the limitations and ceiling effect of distillation techniques.

AI Training EfficiencyAI safetyDeepSeek-V3

0 likes · 7 min read

DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data

Cognitive Technology Team

Feb 7, 2025 · Artificial Intelligence

Knowledge Distillation: Concepts, Techniques, Applications, and Future Directions

This article explains knowledge distillation—a technique introduced by Geoffrey Hinton that transfers knowledge from large teacher models to compact student models—covering its core concepts, loss functions, various distillation strategies, notable applications in edge computing, federated learning, continual learning, and emerging research directions.

Knowledge DistillationModel Compressioncontinual learning

0 likes · 7 min read

Knowledge Distillation: Concepts, Techniques, Applications, and Future Directions

Architect's Alchemy Furnace

Feb 6, 2025 · Artificial Intelligence

How Knowledge Distillation Powers Efficient Large‑Model Deployment

This article explains how knowledge distillation enables massive AI models to be compressed and deployed efficiently, covering its principles, classification dimensions, implementation steps, innovative practices at DeepSeek, real‑world applications, and future research directions.

Artificial IntelligenceDeepSeekKnowledge Distillation

0 likes · 11 min read

How Knowledge Distillation Powers Efficient Large‑Model Deployment

DataFunTalk

Jan 26, 2025 · Artificial Intelligence

58.com’s LingXi Large Language Model Platform: Development, Deployment, and Performance Optimizations

Since the launch of ChatGPT, 58.com has built a Model‑as‑a‑Service platform called LingXi that trains and serves domain‑specific large language models, supports over a hundred internal scenarios with daily inference exceeding ten million calls, and continuously improves performance through quantization, GPU optimization, model miniaturization, and advanced AI applications such as interview assistants, voice agents, and RAG‑enabled agents.

AI PlatformAI applicationsInference Optimization

0 likes · 9 min read

58.com’s LingXi Large Language Model Platform: Development, Deployment, and Performance Optimizations

Kuaishou Tech

Jan 24, 2025 · Artificial Intelligence

KwaiCoder-23BA4-v1: An Efficient Large Code Generation Model via Pruning, Knowledge Distillation, and Granular Upcycling

KwaiCoder-23BA4-v1 is a 23B wide MoE code‑completion model that achieves state‑of‑the‑art performance on HumanEval, BigCodeBench and Fill‑in‑Middle benchmarks by using high‑quality data, a cost‑effective training pipeline that combines model pruning, knowledge distillation and fine‑grained merging, and extensive ablation studies.

AIKnowledge DistillationLarge Language Model

0 likes · 10 min read

KwaiCoder-23BA4-v1: An Efficient Large Code Generation Model via Pruning, Knowledge Distillation, and Granular Upcycling

AIWalker

Jan 18, 2025 · Artificial Intelligence

SnapGen Generates 1024px Images in 1.4 s with Lightweight On‑Device Architecture

SnapGen is a 379 M‑parameter text‑to‑image diffusion model that produces 1024 px images on mobile devices in about 1.4 seconds, using a compact U‑Net design, multi‑stage knowledge distillation, step distillation, and optimized training tricks to outperform much larger models on standard benchmarks.

Knowledge DistillationModel CompressionSnapGen

0 likes · 22 min read

SnapGen Generates 1024px Images in 1.4 s with Lightweight On‑Device Architecture

AIWalker

Jan 12, 2025 · Artificial Intelligence

SnapGen Generates 1024px Images in 1.4 s with Lightweight On‑Device Architecture

SnapGen introduces a compact 379M‑parameter diffusion model that produces 1024‑pixel text‑to‑image results in about 1.4 seconds on a mobile device, achieving competitive FID scores and outperforming much larger models through a series of architecture refinements, advanced training tricks, and multi‑level knowledge distillation.

Knowledge DistillationModel CompressionSnapGen

0 likes · 23 min read

AIWalker

Jan 10, 2025 · Artificial Intelligence

How a Simplified Transformer Enables Lightweight CLIP Training on a Single RTX3090

This paper presents SiCLIP, a framework that simplifies the Transformer architecture, combines weight‑sharing, multi‑stage knowledge distillation, and a novel pair‑matching loss with synthetic captions to train a competitive CLIP model using only one RTX3090 GPU and 1 TB of storage, achieving state‑of‑the‑art data‑size‑parameter‑accuracy trade‑offs.

CLIPKnowledge DistillationLightweight Training

0 likes · 19 min read

How a Simplified Transformer Enables Lightweight CLIP Training on a Single RTX3090

Baobao Algorithm Notes

Nov 19, 2024 · Artificial Intelligence

Demystifying OpenRLHF Loss Functions: From GPTLM to KTO and Beyond

This article walks through the various loss functions used in OpenRLHF—including GPTLMLoss, KDLoss, DPOLoss, KTOLoss, and reward model losses—explaining their mathematical foundations, implementation details, and practical considerations for RLHF training.

DPOKTOKnowledge Distillation

0 likes · 23 min read

Demystifying OpenRLHF Loss Functions: From GPTLM to KTO and Beyond

Alibaba Cloud Big Data AI Platform

Nov 8, 2024 · Artificial Intelligence

How TAPIR Boosts Small LLMs with Task‑Aware Curriculum Planning

The paper introduces TAPIR, a task‑aware curriculum planning framework that distills instruction‑following abilities from black‑box LLM teachers into smaller student models by filtering difficult prompts, resampling tasks, enhancing response styles, and iteratively optimizing across multiple training rounds, achieving superior performance on benchmark evaluations.

Curriculum LearningInstruction TuningKnowledge Distillation

0 likes · 10 min read

How TAPIR Boosts Small LLMs with Task‑Aware Curriculum Planning

Alibaba Cloud Big Data AI Platform

Nov 5, 2024 · Artificial Intelligence

How DistilQwen2 Boosts LLM Performance with Knowledge Distillation

This article introduces DistilQwen2, a lightweight language model derived from Qwen2 via knowledge distillation, detailing its data collection, instruction‑data optimization, training strategies, extensive benchmark evaluations, and practical deployment guides for developers and enterprises.

AIInstruction TuningKnowledge Distillation

0 likes · 21 min read

How DistilQwen2 Boosts LLM Performance with Knowledge Distillation

DataFunSummit

Sep 23, 2024 · Artificial Intelligence

TransLLM: A Framework for Cross‑Language Transfer of Conversational Large Language Models

This article presents TransLLM, a cross‑language migration framework that enables high‑quality conversational LLMs to be transferred to low‑resource languages by preserving advanced capabilities through Recovery KD, LoRA‑based continual pre‑training, and a translation‑thinking‑chain, with extensive experiments showing superior performance and safety over ChatGPT and GPT‑4.

Knowledge DistillationLoRASafety

0 likes · 22 min read

TransLLM: A Framework for Cross‑Language Transfer of Conversational Large Language Models

Xiaohongshu Tech REDtech

Jun 20, 2024 · Artificial Intelligence

Xiaohongshu 2024 Large Model Frontier Paper Sharing Live Event

On June 27, 2024, Xiaohongshu’s technical team will livestream a two‑hour session across WeChat Channels, Bilibili, Douyin and Xiaohongshu, showcasing six top‑conference papers on large‑model advances—including early‑stopping and fine‑grained self‑consistency, novel evaluation methods, negative‑sample‑assisted distillation, and LLM‑based note recommendation—followed by a Q&A and recruitment briefing.

AI researchKnowledge DistillationRecommendation Systems

0 likes · 12 min read

Xiaohongshu 2024 Large Model Frontier Paper Sharing Live Event

AntTech

Mar 11, 2024 · Artificial Intelligence

Can Small Language Models be Good Reasoners in Recommender Systems?

This article presents SLIM, a knowledge‑distillation framework that transfers the reasoning abilities of large language models to compact models for sequential recommendation, enhancing item representation, user profiling, and bias mitigation while achieving comparable performance with far lower computational resources.

AIEfficiencyKnowledge Distillation

0 likes · 12 min read

Can Small Language Models be Good Reasoners in Recommender Systems?

NewBeeNLP

Feb 12, 2024 · Artificial Intelligence

Beyond Dual‑Tower: Advanced Distillation and Interaction Techniques for Recommendation Systems

This article reviews recent advances that enhance dual‑tower recommendation models by injecting interaction information through various knowledge‑distillation strategies and interaction‑enhanced architectures, summarizing methods such as PFD, ENDX, TRMD, VIRT, Distilled‑DualEncoder, ERNIE‑Search, ColBert, IntTower and MVKE.

AI researchKnowledge Distillationdual-tower

0 likes · 13 min read

Beyond Dual‑Tower: Advanced Distillation and Interaction Techniques for Recommendation Systems

Tencent Cloud Developer

Jan 23, 2024 · Information Security

Metis: Understanding and Enhancing In-Network Regular Expressions

Metis combines deterministic finite automata conversion, byte‑level RNN training, and knowledge‑distilled random‑forest models to replace traditional regex matching on resource‑constrained network devices, delivering comparable accuracy while achieving up to 74× higher throughput and significant resource savings in DDoS protection and P4 forwarding.

Anomaly DetectionIn‑network computingKnowledge Distillation

0 likes · 9 min read

Metis: Understanding and Enhancing In-Network Regular Expressions

Tencent Architect

Jan 16, 2024 · Artificial Intelligence

Metis: AI‑Driven In‑Network Regular Expression Enhancement for High‑Performance Traffic Inspection

The article introduces Metis, an AI‑based solution that replaces traditional regular‑expression matching for network traffic inspection, offering faster, more accurate detection, a compact model deployable on resource‑constrained P4 switches, and significant performance and cost benefits for cloud gateway security.

AIKnowledge DistillationP4

0 likes · 9 min read

Metis: AI‑Driven In‑Network Regular Expression Enhancement for High‑Performance Traffic Inspection

Xiaohongshu Tech REDtech

Jan 12, 2024 · Artificial Intelligence

Negative Sample Assisted Distillation for Large Language Models

The AAAI‑2024 paper introduces a Negative Sample Assisted Distillation framework—comprising Negative Assistance Training, Negative Calibration Enhancement, and Adaptive Self‑Consistency—that leverages both correct and incorrect reasoning examples to train a compact LLaMA‑7B student, achieving up to 75.75 % accuracy gains over fine‑tuning on MATH and improving out‑of‑domain benchmarks.

Knowledge DistillationLLMchain-of-thought

0 likes · 13 min read

Negative Sample Assisted Distillation for Large Language Models

DeWu Technology

Dec 20, 2023 · Artificial Intelligence

Coarse Ranking in Recommenders: Key Strategies, Metrics & Optimizations

This article systematically reviews the coarse‑ranking stage of recommendation systems, comparing it with recall and fine‑ranking, defining evaluation metrics, detailing sample design, presenting two technical routes, and exploring optimization directions such as dual‑tower models, knowledge distillation, lightweight fully‑connected layers, multi‑objective and multi‑scenario modeling, followed by practical case studies and results.

Knowledge Distillationcoarse rankingdual-tower

0 likes · 22 min read

Coarse Ranking in Recommenders: Key Strategies, Metrics & Optimizations

Baidu Tech Salon

Oct 25, 2023 · Artificial Intelligence

Intelligent Question Answering Technology in Baidu Search: Development, Modeling, and Retrieval‑Enhanced Generation

The article surveys Baidu Search’s intelligent question‑answering system, tracing its evolution from feature‑engineered retrieval to large pre‑trained and generative models, and detailing hierarchical readers, multi‑teacher distillation, retrieval‑enhanced generation, and instruction decomposition as key techniques for delivering fast, accurate, citation‑rich answers.

Baidu SearchKnowledge DistillationRetrieval-Augmented Generation

0 likes · 18 min read

Intelligent Question Answering Technology in Baidu Search: Development, Modeling, and Retrieval‑Enhanced Generation

Baidu Geek Talk

Oct 25, 2023 · Artificial Intelligence

How Baidu Search Is Transforming Machine Question Answering with Large‑Scale AI Models

This article reviews the evolution of machine question answering, from early feature‑engineered systems to modern large‑language‑model‑driven retrieval‑augmented generation, outlines Baidu Search’s current Retriever‑Reader architecture, discusses challenges such as semantic complexity, latency and answer quality, and presents solutions including hierarchical DocMRC modeling, multi‑teacher knowledge distillation, and instruction decomposition for efficient, high‑quality answers.

BaiduKnowledge DistillationRetrieval-Augmented Generation

0 likes · 18 min read

How Baidu Search Is Transforming Machine Question Answering with Large‑Scale AI Models

Rare Earth Juejin Tech Community

Sep 22, 2023 · Artificial Intelligence

An Introduction to Knowledge Distillation for Model Compression

This article explains the AI model‑compression technique of knowledge distillation, describing how a large teacher network transfers its soft predictions to a lightweight student network using temperature‑scaled softmax, enabling deployment on resource‑constrained devices.

Artificial IntelligenceKnowledge DistillationModel Compression

0 likes · 13 min read

An Introduction to Knowledge Distillation for Model Compression

Alibaba Cloud Big Data AI Platform

Jul 12, 2023 · Artificial Intelligence

How ConaCLIP Boosts Lightweight Text-Image Retrieval with Dual‑Encoder Distillation

ConaCLIP introduces a fully‑connected knowledge interaction graph to distill large dual‑encoder models into compact ones, enhancing text‑image retrieval accuracy and efficiency on edge devices, with extensive experiments and supervision strategies demonstrating significant gains over existing baselines.

AIConaCLIPDual Encoder

0 likes · 9 min read

How ConaCLIP Boosts Lightweight Text-Image Retrieval with Dual‑Encoder Distillation

Xiaohongshu Tech REDtech

Jun 20, 2023 · Artificial Intelligence

Open-Vocabulary Object Attribute Recognition with OvarNet: A Unified Framework for Detection and Attribute Classification

At CVPR 2023 the Xiaohongshu team presented OvarNet, a unified one‑stage Faster‑RCNN model built on CLIP that uses prompt learning and knowledge distillation to jointly detect objects and recognize open‑vocabulary attributes, achieving state‑of‑the‑art results on VAW, MS‑COCO, LSA and OVAD datasets.

Knowledge DistillationMultimodal Learningattribute recognition

0 likes · 12 min read

Open-Vocabulary Object Attribute Recognition with OvarNet: A Unified Framework for Detection and Attribute Classification

DataFunTalk

Apr 26, 2023 · Artificial Intelligence

Serializing Advertising Placement with User Algorithms at Alibaba Health

Alibaba Health’s user algorithm leverages multi‑channel serialized ad placement, using vector‑based three‑tower models, knowledge distillation, and ROI‑oriented optimizations to sequence user touchpoints, improve conversion rates, and enhance model accuracy across diverse marketing channels.

AdvertisingKnowledge DistillationMachine Learning

0 likes · 15 min read

Serializing Advertising Placement with User Algorithms at Alibaba Health

DataFunSummit

Apr 20, 2023 · Artificial Intelligence

Mengzi Lightweight Model Technology System and Advances in Small‑Scale and Retrieval‑Augmented Pretraining

This presentation introduces the Mengzi lightweight model technology stack, covering large‑scale pre‑training, motivations for lightweight models, detailed techniques such as knowledge and sequence‑relation enhancement, training optimization, model compression, retrieval‑augmented pre‑training, multimodal extensions, open‑source releases, and real‑world applications.

Knowledge DistillationMultimodallarge language models

0 likes · 23 min read

Mengzi Lightweight Model Technology System and Advances in Small‑Scale and Retrieval‑Augmented Pretraining

Meituan Technology Team

Apr 13, 2023 · Artificial Intelligence

Peak-First Regularization for Low-Latency Streaming Speech Recognition

The paper presents a low‑latency streaming speech‑recognition solution that reframes latency reduction as a knowledge‑distillation task, using a simple peak‑first regularization term to shift CTC output probabilities leftward and achieve up to 200 ms average latency reduction without harming word error rate.

CTCKnowledge DistillationLatency Reduction

0 likes · 21 min read

Peak-First Regularization for Low-Latency Streaming Speech Recognition

DataFunSummit

Feb 3, 2023 · Artificial Intelligence

Interactive BERT for Relevance in Health E‑commerce Search

This article presents an in‑depth exploration of an interactive BERT‑based relevance model for health e‑commerce search, detailing the business context, query and product feature extraction, domain‑specific sample generation, model architecture enhancements, offline and online performance gains, and practical deployment through knowledge distillation.

AIBERTKnowledge Distillation

0 likes · 14 min read

Interactive BERT for Relevance in Health E‑commerce Search

DataFunTalk

Jan 11, 2023 · Artificial Intelligence

Exploring Interactive BERT for Relevance in Health E‑commerce Search

This article presents a comprehensive overview of Alibaba Health's interactive BERT approach for improving relevance in health e‑commerce search, covering business background, model design, domain‑specific data construction, knowledge‑distilled twin‑tower deployment, experimental results, and a detailed Q&A session.

AIBERTKnowledge Distillation

0 likes · 14 min read

Exploring Interactive BERT for Relevance in Health E‑commerce Search

DataFunTalk

Dec 7, 2022 · Artificial Intelligence

Entire Space Delayed Feedback with Cross‑Task Knowledge Distillation (ESDC) for Multi‑Task E‑commerce Recommendation

This article presents Xiaomi’s e‑commerce recommendation research, addressing four key challenges—sample selection bias, data sparsity, delayed feedback, and knowledge inconsistency—by introducing the Entire Space Delayed Feedback with Cross‑Task Knowledge Distillation (ESDC) model, which combines causal inference, cross‑task distillation, twin networks, and uncertainty weighting to improve CVR prediction and achieve a 15% GMV lift over the baseline.

AICVRDelayed Feedback

0 likes · 11 min read

Entire Space Delayed Feedback with Cross‑Task Knowledge Distillation (ESDC) for Multi‑Task E‑commerce Recommendation

Zuoyebang Tech Team

Sep 15, 2022 · Artificial Intelligence

How We Replaced BERT with a Lightweight TextCNN to Slash GPU Costs

This article describes the production challenges of using BERT for large‑scale text classification at Zuoyebang, explores lightweight alternatives such as knowledge distillation, pruning and quantization, and details a teacher‑student‑active‑learning pipeline that trains a TextCNN model to match BERT performance while dramatically reducing GPU consumption and improving throughput.

BERTKnowledge DistillationModel Compression

0 likes · 13 min read

How We Replaced BERT with a Lightweight TextCNN to Slash GPU Costs

DataFunSummit

Sep 4, 2022 · Artificial Intelligence

Sparse Features in Machine Learning: Challenges, NVIDIA Ampere Structured Sparsity, Knowledge Distillation, and GAN Model Compression

This talk explores the challenges and opportunities of leveraging sparsity in machine learning models, covering fine‑grained and coarse‑grained sparsity, NVIDIA Ampere’s 2:4 structured sparsity, knowledge‑distillation techniques for converting unstructured to structured sparsity, and model compression strategies for generative adversarial networks.

GPU AccelerationGaNKnowledge Distillation

0 likes · 14 min read

Sparse Features in Machine Learning: Challenges, NVIDIA Ampere Structured Sparsity, Knowledge Distillation, and GAN Model Compression

DataFunSummit

Aug 14, 2022 · Artificial Intelligence

Optimizing Pre‑Ranking in Meituan Search: Knowledge Distillation and Neural Architecture Search

This article describes Meituan Search's pre‑ranking (coarse‑ranking) system evolution and presents two major optimization strategies—leveraging knowledge distillation to align coarse‑ranking with fine‑ranking and employing neural architecture search to jointly improve effectiveness and latency—demonstrating significant offline and online performance gains.

Knowledge DistillationMachine LearningNeural Architecture Search

0 likes · 17 min read

Optimizing Pre‑Ranking in Meituan Search: Knowledge Distillation and Neural Architecture Search

Meituan Technology Team

Aug 11, 2022 · Artificial Intelligence

Optimizing Pre‑Ranking in Meituan Search: Knowledge Distillation and Neural Architecture Search

Meituan’s search team upgraded its pre‑ranking layer from simple linear models to end‑to‑end neural networks, boosting effectiveness by applying three knowledge‑distillation techniques—including result‑list, score, and contrastive representation transfer—and by using latency‑aware neural architecture search to automatically select features and network structures, achieving significant recall and CTR gains without added latency.

Knowledge DistillationNeural Architecture Searchefficiency optimization

0 likes · 19 min read

Alimama Tech

Jul 27, 2022 · Artificial Intelligence

CACS: Cascade Architecture for Creative Selection in Advertising

The Cascade Architecture for Creative Selection (CACS) reorders the advertising pipeline by placing a dual‑tower creative‑selection module ahead of ranking, using soft‑label list‑wise distillation and adaptive dropout to jointly optimize creatives and ads, yielding 5% latency increase but significant CTR and RPM gains in Taobao’s search ads.

Knowledge Distillationad rankingadaptive dropout

0 likes · 17 min read

CACS: Cascade Architecture for Creative Selection in Advertising

NetEase Smart Enterprise Tech+

Jun 2, 2022 · Artificial Intelligence

How Knowledge Distillation Shrinks Deep Neural Networks Without Losing Accuracy

Knowledge Distillation, a teacher‑student model compression technique, enables large, high‑performing deep neural networks to transfer their learned representations to smaller models, achieving comparable accuracy with faster inference, reduced resource consumption, and broader applicability in computer‑vision tasks.

AIFitNetKnowledge Distillation

0 likes · 14 min read

How Knowledge Distillation Shrinks Deep Neural Networks Without Losing Accuracy

Alimama Tech

May 25, 2022 · Artificial Intelligence

UKD: Debiasing Conversion Rate Estimation via Uncertainty-regularized Knowledge Distillation

The paper introduces UKD, an uncertainty‑regularized knowledge‑distillation framework that uses a click‑adaptive teacher to generate pseudo‑conversion labels for unclicked impressions and trains a student model with uncertainty‑weighted loss, thereby mitigating sample‑selection bias and achieving up to 3.4% CVR improvement and 4.3% CPA reduction on large‑scale advertising datasets.

CVR debiasingKnowledge Distillationadvertising algorithms

0 likes · 20 min read

UKD: Debiasing Conversion Rate Estimation via Uncertainty-regularized Knowledge Distillation

DaTaobao Tech

May 18, 2022 · Artificial Intelligence

Deep Ranking Optimization for E-commerce Recommendation

The 2021 Taobao New‑Product team boosted e‑commerce recommendation by redesigning the coarse‑ranking stage with a dual‑tower DSSM, low‑cost feature‑crossing, NOVA attention and multi‑task distillation from a fine‑ranking teacher, delivering up to +30‰ GAUC gain and 3‑5 % online CTR and click improvements.

Knowledge DistillationMachine Learningdeep ranking

0 likes · 17 min read

Deep Ranking Optimization for E-commerce Recommendation

DataFunTalk

Apr 22, 2022 · Artificial Intelligence

Inference Optimization Techniques and GPU Parallel Acceleration for Tencent Intelligent Dialogue Models

This article presents a comprehensive overview of inference optimization methods—including model pruning, quantization, knowledge distillation, caching, instruction‑set acceleration, and operator fusion—and details a GPU‑centric parallel acceleration methodology with CUDA basics, performance‑analysis tools, theoretical limits, and practical case studies, all illustrated with real‑world examples from Tencent's intelligent dialogue products.

CachingGPU AccelerationKnowledge Distillation

0 likes · 18 min read

Inference Optimization Techniques and GPU Parallel Acceleration for Tencent Intelligent Dialogue Models

Tencent Cloud Developer

Mar 3, 2022 · Artificial Intelligence

Model Distillation for Query-Document Matching: Techniques and Optimizations

We applied knowledge distillation to a video query‑document BERT matcher, compressing the 12‑layer teacher into production‑ready 1‑layer ALBERT and tiny TextCNN students using combined soft, hard, and relevance losses plus AutoML‑tuned hyper‑parameters, achieving sub‑5 ms latency and up to 2.4% AUC improvement over the original model.

ALBERTAutoMLBERT

0 likes · 12 min read

Model Distillation for Query-Document Matching: Techniques and Optimizations

DataFunTalk

Feb 20, 2022 · Artificial Intelligence

Distilled Reinforcement Learning Framework for Recommendation (DRL-Rec): Design, Modules, and Experimental Evaluation

This article presents DRL-Rec, a distilled reinforcement learning framework for recommendation that integrates an exploring‑filtering module and confidence‑guided distillation to compress RL‑based recommenders while improving accuracy, and reports significant offline and online performance gains on a large‑scale system.

Knowledge Distillationonline experimentsreinforcement learning

0 likes · 16 min read

Distilled Reinforcement Learning Framework for Recommendation (DRL-Rec): Design, Modules, and Experimental Evaluation

DataFunTalk

Dec 24, 2021 · Artificial Intelligence

Large-Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD

This article reviews three consecutive works from Alibaba DAMO Academy on compressing and distilling large pretrained language models—AdaBERT, L2A, and Meta‑KD—detailing their motivations, neural‑architecture‑search‑based designs, loss formulations, experimental results, and insights from a Q&A session.

AIKnowledge DistillationModel Compression

0 likes · 10 min read

Large-Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD

DataFunSummit

Dec 21, 2021 · Artificial Intelligence

Large‑Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD

This talk presents Alibaba DAMO Academy’s recent work on compressing large pretrained language models, covering task‑adaptive AdaBERT, data‑augmented L2A, and meta‑knowledge distillation Meta‑KD, describing their motivations, architectures, NAS‑based search, loss designs, and experimental results across multiple NLP tasks.

Knowledge DistillationModel CompressionNLP

0 likes · 13 min read

Meituan Technology Team

Dec 2, 2021 · Artificial Intelligence

Pretraining Techniques for Search Advertising Relevance at Meituan

Meituan improves search‑ad relevance by applying pre‑trained BERT models enhanced with data‑augmented samples, multi‑task learning, keyword extraction and two‑stage knowledge distillation, producing a lightweight distilled model that, when fused with traditional relevance signals, boosts CTR, lowers Badcase@5 and raises NDCG while preserving revenue.

BERTKnowledge DistillationMachine Learning

0 likes · 30 min read

Pretraining Techniques for Search Advertising Relevance at Meituan

Alimama Tech

Sep 15, 2021 · Artificial Intelligence

Combining Knowledge Distillation, Exposure Forecasting, and Pacing to Guarantee Brand Exposure on Alibaba's Advertising Platform

Alibaba's advertising platform combines knowledge distillation to score traffic, exposure forecasting via GBDT, and PID-based pacing to guarantee contracted impression volumes while improving CTR/CVR, handling delayed exposure and traffic selection, achieving near‑perfect delivery in large promotions.

AlibabaKnowledge Distillationexposure forecasting

0 likes · 17 min read

Baidu Geek Talk

Sep 8, 2021 · Artificial Intelligence

How PP‑OCRv2 Boosts OCR Speed and Accuracy with Five Key Innovations

The article provides a comprehensive technical overview of PaddleOCR's PP‑OCRv2, detailing its five major algorithmic enhancements, performance improvements over previous versions, historical milestones, core capabilities, and links to the open‑source repositories for developers interested in state‑of‑the‑art OCR solutions.

Knowledge DistillationOCRPP-OCRv2

0 likes · 10 min read

How PP‑OCRv2 Boosts OCR Speed and Accuracy with Five Key Innovations

DataFunSummit

Jun 5, 2021 · Artificial Intelligence

Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure‑Preserving Methods

This article reviews BERT’s architecture, analyzes the storage and compute costs of each layer, and systematically presents compression methods—including quantization, pruning, knowledge distillation (Distilled BiLSTM and MobileBERT), and structure‑preserving techniques—aimed at enabling efficient deployment on resource‑constrained mobile devices.

BERTKnowledge DistillationMobile Deployment

0 likes · 15 min read

Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure‑Preserving Methods