Tagged articles

126 articles

Page 2 of 2

Jun 3, 2021 · Artificial Intelligence

Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure-Preserving Methods

This article examines the internal structure of BERT and systematically presents various model‑compression strategies—including quantization, pruning, knowledge distillation, and structure‑preserving techniques—highlighting their impact on storage, computational cost, and inference speed for deployment on resource‑constrained mobile devices.

BERTKnowledge DistillationModel Compression

0 likes · 16 min read

Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure-Preserving Methods

AntTech

Mar 21, 2021 · Artificial Intelligence

Hubble Intelligent Audience Platform: Three‑Generation Algorithm Evolution for Mobile Marketing

The article describes the Hubble Intelligent Audience Platform’s three‑generation algorithmic evolution—starting from a DSSM‑based model, moving to an asynchronous GNN plus lightweight learning architecture, and finally integrating incremental learning with meta‑weighting—to improve audience expansion for mobile marketing campaigns.

AIGraph Neural NetworkKnowledge Distillation

0 likes · 14 min read

Hubble Intelligent Audience Platform: Three‑Generation Algorithm Evolution for Mobile Marketing

Amap Tech

Mar 5, 2021 · Artificial Intelligence

AI Applications in Mobility: Route Planning, ETA Prediction, Dynamic Event Mining, and Global Scheduling

The article surveys Amap’s AI‑driven mobility solutions—from personalized, multi‑objective route planning using Cell‑Based Routing and bias‑aware sorting, through spatio‑temporal ETA prediction and lightweight BERT‑based traffic‑event mining, to rapid POI freshness updates and a future global scheduling system that coordinates vehicles and signals via multi‑agent reinforcement learning.

AIKnowledge DistillationRoute Planning

0 likes · 14 min read

AI Applications in Mobility: Route Planning, ETA Prediction, Dynamic Event Mining, and Global Scheduling

360 Smart Cloud

Mar 4, 2021 · Artificial Intelligence

Optimizing BERT Online Service Deployment at 360 Search

This article describes the challenges of deploying a large BERT model as an online service for 360 Search and details engineering optimizations—including framework selection, model quantization, knowledge distillation, stream scheduling, caching, and dynamic sequence handling—that dramatically improve latency, throughput, and resource utilization.

BERTFP16 quantizationGPU optimization

0 likes · 12 min read

Optimizing BERT Online Service Deployment at 360 Search

360 Tech Engineering

Mar 1, 2021 · Artificial Intelligence

Deploying BERT as an Online Service: Challenges and Optimizations at 360 Search

This article details the engineering challenges of serving a large BERT model in real‑time for 360 Search and describes a series of optimizations—including TensorRT‑based kernel fusion, model quantization, knowledge distillation, multi‑stream execution, caching, and dynamic sequence handling—that together achieve low latency, high throughput, and stable deployment on GPU clusters.

BERTGPUKnowledge Distillation

0 likes · 10 min read

Deploying BERT as an Online Service: Challenges and Optimizations at 360 Search

DataFunTalk

Feb 27, 2021 · Artificial Intelligence

Optimizing Coarse Ranking Models for Short Video Recommendation: From GBDT to Dual‑Tower DNN and Cascading

This article details the practical upgrades of iQIYI's short‑video recommendation coarse‑ranking pipeline, moving from a GBDT model to a dual‑tower DNN, applying knowledge distillation, embedding compression, inference optimizations, and finally a cascade architecture to align with the fine‑ranking model while reducing resource consumption.

Knowledge Distillationcascading modelcoarse ranking

0 likes · 12 min read

Optimizing Coarse Ranking Models for Short Video Recommendation: From GBDT to Dual‑Tower DNN and Cascading

iQIYI Technical Product Team

Feb 26, 2021 · Artificial Intelligence

Optimization of Coarse Ranking Models for Short‑Video Recommendation at iQIYI

iQIYI’s short‑video recommendation team replaced a GBDT coarse‑ranking model with a lightweight dual‑tower DNN, applied knowledge distillation, sparse‑aware embedding optimization, and inference merging, then introduced a cascade MMOE architecture, achieving comparable accuracy with half the memory, ~19 ms latency reduction, and measurable gains in watch time, CTR and engagement.

Knowledge Distillationcascade modelcoarse ranking

0 likes · 15 min read

Optimization of Coarse Ranking Models for Short‑Video Recommendation at iQIYI

58 Tech

Jan 15, 2021 · Artificial Intelligence

Exploring Text Pre‑training Models for Dialogue Classification in Information Security: From TextCNN to RoBERTa and Knowledge Distillation

This article presents a systematic exploration of text pre‑training models for dialogue classification in information‑security scenarios, comparing baseline TextCNN, an enhanced TextCNN_role, RoBERTa with domain‑adaptive pre‑training, and a distilled mini‑model, and discusses their performance, trade‑offs, and future directions.

Dialog ModelingKnowledge DistillationNLP

0 likes · 13 min read

Exploring Text Pre‑training Models for Dialogue Classification in Information Security: From TextCNN to RoBERTa and Knowledge Distillation

DataFunTalk

Jan 15, 2021 · Artificial Intelligence

Zhihu Search Text Relevance Evolution and BERT Knowledge Distillation Practices

This talk by Zhihu search algorithm engineer Shen Zhan details the evolution of text relevance models from TF‑IDF/BM25 to deep semantic matching and BERT, explains the challenges of deploying BERT at scale, and describes practical knowledge‑distillation techniques that improve both online latency and offline storage while maintaining search quality.

BERTKnowledge DistillationMachine Learning

0 likes · 14 min read

Zhihu Search Text Relevance Evolution and BERT Knowledge Distillation Practices

DataFunTalk

Jan 10, 2021 · Artificial Intelligence

Didi's Machine Translation System: Architecture, Techniques, and WMT2020 Competition Experience

This article presents a comprehensive overview of Didi's machine translation platform, covering its evolution from statistical to neural models, the Transformer architecture with relative position and larger FFN, data preparation, training strategies such as back‑translation and knowledge distillation, deployment optimizations with TensorRT, and the team's successful participation in the WMT2020 news translation task.

BLEUKnowledge DistillationTensorRT

0 likes · 14 min read

Didi's Machine Translation System: Architecture, Techniques, and WMT2020 Competition Experience

Sohu Tech Products

Jan 6, 2021 · Artificial Intelligence

Overview of Main Model Compression and Acceleration Techniques: Structural Optimization, Pruning, Quantization, and Knowledge Distillation

This article reviews four mainstream model compression and acceleration methods—structural optimization, pruning, quantization, and knowledge distillation—explaining their principles, implementations, and performance, and presents practical examples such as DistillBERT, TinyBERT, and FastBERT with comparative results.

AIKnowledge DistillationModel Compression

0 likes · 14 min read

Overview of Main Model Compression and Acceleration Techniques: Structural Optimization, Pruning, Quantization, and Knowledge Distillation

DataFunTalk

Dec 25, 2020 · Artificial Intelligence

Exploring Pretraining Model Optimization and Deployment Challenges in NLP

This article reviews the evolution of pretraining models in NLP, discusses the practical challenges of deploying large models such as inference latency, knowledge integration, and task adaptation, and presents Xiaomi’s optimization techniques including knowledge distillation, low‑precision inference, operator fusion, and multi‑granularity segmentation for dialogue systems.

BERTDialogue SystemsInference Optimization

0 likes · 15 min read

Exploring Pretraining Model Optimization and Deployment Challenges in NLP

Didi Tech

Oct 27, 2020 · Artificial Intelligence

Didi's Machine Translation System: Architecture, Techniques, and WMT2020 Competition Experience

Didi's machine translation system combines a Transformer‑big architecture with relative position representations, enlarged feed‑forward networks, iterative back‑translation, knowledge‑distillation and domain fine‑tuning, optimized via TensorRT for speed, achieving a BLEU 36.6 and third place in the WMT2020 Chinese‑to‑English news task.

BLEUKnowledge DistillationTensorRT

0 likes · 15 min read

Didi Tech

Oct 21, 2020 · Artificial Intelligence

Deep Model Compression Techniques for Intelligent Automotive Cockpits

The article reviews deep‑model compression methods—ADMM‑based structured pruning, low‑bit quantization, and teacher‑student knowledge distillation—and their automated AutoCompress workflow, demonstrating how these techniques shrink neural networks enough to run real‑time driver‑monitoring and other intelligent cockpit functions on resource‑limited automotive hardware while preserving accuracy.

ADMMEdge AIKnowledge Distillation

0 likes · 16 min read

Deep Model Compression Techniques for Intelligent Automotive Cockpits

Alibaba Cloud Developer

Jul 16, 2020 · Artificial Intelligence

How BERT‑to‑TextCNN Knowledge Distillation Boosts Spam Opinion Detection

This article examines how large pretrained BERT models can be compressed via knowledge distillation into a lightweight TextCNN classifier for efficient garbage opinion detection, detailing traditional distillation methods, several practical schemes, experimental results, and the advantages of the approach.

BERTKnowledge DistillationModel Compression

0 likes · 9 min read

How BERT‑to‑TextCNN Knowledge Distillation Boosts Spam Opinion Detection

Meituan Technology Team

Jul 9, 2020 · Artificial Intelligence

Optimizing Meituan Search Ranking with BERT: Methods and Practices

The Meituan Search team boosted ranking relevance by training a domain‑specific BERT, applying data augmentation, brand‑sample optimization, knowledge‑graph fusion, multi‑task and pairwise fine‑tuning, joint end‑to‑end training with LambdaLoss ranking models, and compressing the model for low‑latency inference, delivering up to +925 BP offline accuracy gains and measurable CTR and NDCG improvements in production.

BERTKnowledge DistillationMachine Learning

0 likes · 34 min read

Optimizing Meituan Search Ranking with BERT: Methods and Practices

AntTech

Jun 9, 2020 · Artificial Intelligence

Deep Learning Model Compression and Acceleration Techniques for Mobile AI

This article reviews the motivations, challenges, and a comprehensive set of algorithmic, framework, and hardware methods—including structural optimization, quantization, pruning, and knowledge distillation—to compress and accelerate deep learning models for deployment on mobile devices, highlighting benefits such as reduced server load, lower latency, improved reliability, and enhanced privacy.

Knowledge DistillationModel Compressionmobile AI

0 likes · 17 min read

Deep Learning Model Compression and Acceleration Techniques for Mobile AI

DataFunTalk

May 26, 2020 · Artificial Intelligence

Knowledge Distillation Techniques for Recommendation Systems: Methods, Scenarios, and Practical Insights

This article reviews how knowledge distillation—using a large teacher model to guide a smaller student model—can be applied across the recall, coarse‑ranking, and fine‑ranking stages of recommendation systems, detailing logits‑based and feature‑based approaches, joint and two‑stage training, and point‑wise, pair‑wise, and list‑wise loss designs.

Knowledge DistillationMachine LearningModel Compression

0 likes · 31 min read

Knowledge Distillation Techniques for Recommendation Systems: Methods, Scenarios, and Practical Insights

Tencent Tech

Feb 27, 2020 · Artificial Intelligence

How to Speed Up Deep Learning Models: Cutting-Edge Acceleration Techniques

Deep learning models often suffer from slow training and deployment due to their size, but a range of advanced acceleration methods—including model architecture optimization, pruning, quantization, knowledge distillation, and distributed training techniques—can dramatically improve speed and efficiency while maintaining performance.

Knowledge Distillationdeep learningdistributed training

0 likes · 14 min read

How to Speed Up Deep Learning Models: Cutting-Edge Acceleration Techniques

Qunar Tech Salon

Feb 27, 2020 · Artificial Intelligence

iQIYI Dual‑DNN Ranking Model with Online Knowledge Distillation

This article describes iQIYI’s dual‑DNN ranking architecture that combines a high‑capacity teacher network with a lightweight student network via online knowledge distillation, addressing the trade‑off between model effectiveness and inference efficiency in large‑scale recommendation systems.

CTR predictionKnowledge DistillationOnline Learning

0 likes · 12 min read

iQIYI Dual‑DNN Ranking Model with Online Knowledge Distillation

DataFunTalk

Feb 22, 2020 · Artificial Intelligence

Double DNN Ranking Model with Online Knowledge Distillation for Real‑Time Recommendation at iQIYI

The article introduces iQIYI's double‑DNN ranking architecture that combines a high‑performance teacher network with a lightweight student network through online knowledge distillation, detailing the evolution of deep learning‑based ranking models, the motivation for model upgrades, training pipelines, and experimental results that demonstrate significant latency reduction and ROI improvement.

Knowledge DistillationOnline LearningRanking Models

0 likes · 13 min read

Double DNN Ranking Model with Online Knowledge Distillation for Real‑Time Recommendation at iQIYI

iQIYI Technical Product Team

Feb 21, 2020 · Artificial Intelligence

Dual DNN Ranking Model with Online Knowledge Distillation for Recommender Systems

iQIYI’s dual‑DNN ranking model uses an online teacher‑student knowledge‑distillation framework where a complex teacher DNN shares representations with a lightweight student DNN, enabling end‑to‑end training, large‑scale feature crossing, and substantially higher recommendation accuracy while cutting inference latency and model size.

CTR predictionKnowledge DistillationOnline Learning

0 likes · 15 min read

Dual DNN Ranking Model with Online Knowledge Distillation for Recommender Systems

iQIYI Technical Product Team

Jan 17, 2020 · Artificial Intelligence

Ultrafast Video Attention Prediction with Coupled Knowledge Distillation

The paper presents UVA‑Net, a lightweight video‑attention network trained via coupled knowledge distillation, which matches the accuracy of eleven state‑of‑the‑art models while using only 0.68 MB of storage and achieving up to 10,106 FPS on GPU (404 FPS on CPU), thanks to a MobileNetV2‑based CA‑Res block and a teacher‑student framework that leverages low‑resolution inputs to drastically cut parameters and computational cost.

Knowledge DistillationMobile Video ProcessingUVA-Net

0 likes · 5 min read

Ultrafast Video Attention Prediction with Coupled Knowledge Distillation

Hulu Beijing

Apr 30, 2019 · Artificial Intelligence

How Can Deep Neural Networks Be Accelerated and Compressed? Key Techniques Explained

This article reviews why deep neural networks are over‑parameterized, outlines the challenges of deploying them on mobile and embedded devices, and presents six major strategies—pruning, low‑rank approximation, filter selection, quantization, knowledge distillation, and novel architecture design—to accelerate and compress models while preserving performance.

Knowledge Distillationdeep learningmodel acceleration

0 likes · 11 min read

How Can Deep Neural Networks Be Accelerated and Compressed? Key Techniques Explained

Alibaba Cloud Developer

Oct 9, 2018 · Artificial Intelligence

How Rocket Launching Boosts Online CTR Prediction Without Slowing Inference

Rocket Launching introduces a novel co‑training framework that jointly trains a lightweight network and a more powerful booster network, sharing parameters and using gradient‑blocking and hint loss to improve click‑through‑rate prediction accuracy while keeping online inference latency unchanged, validated on public datasets and Alibaba’s ad system.

CTR predictionKnowledge DistillationModel Compression

0 likes · 13 min read

How Rocket Launching Boosts Online CTR Prediction Without Slowing Inference

Alibaba Cloud Developer

Sep 11, 2018 · Artificial Intelligence

Rocket Launching: Boosting Real-Time CTR Prediction Without Extra Latency

Online click‑through‑rate (CTR) prediction demands millisecond‑level response times, yet deep models are too slow; this paper introduces a “Rocket Launching” framework that jointly trains a lightweight net and a powerful booster net, sharing parameters and using gradient‑blocking and hint loss to improve accuracy without increasing inference latency.

CTR predictionKnowledge Distillationco-training

0 likes · 13 min read

Rocket Launching: Boosting Real-Time CTR Prediction Without Extra Latency