Tagged articles
7 articles
Page 1 of 1
AIWalker
AIWalker
May 19, 2026 · Artificial Intelligence

How EUPE’s Three‑Stage Distillation Lets an 86M Model Run Classification, Segmentation and VLM on iPhone in 62 ms (SOTA)

EUPE introduces a three‑stage “scale‑then‑shrink” distillation pipeline that first trains a large proxy model to absorb heterogeneous expert knowledge and then compresses it into an 86M encoder, achieving state‑of‑the‑art performance on image classification, dense prediction and vision‑language tasks on an iPhone with only 62 ms latency.

EUPEEdge AIKnowledge Distillation
0 likes · 16 min read
How EUPE’s Three‑Stage Distillation Lets an 86M Model Run Classification, Segmentation and VLM on iPhone in 62 ms (SOTA)
Data Party THU
Data Party THU
Mar 25, 2026 · Artificial Intelligence

How Knowledge‑Guided Context Optimization Boosts Zero‑Shot Vision‑Language Models

The article analyzes the Base‑to‑New generalization problem of CLIP‑based visual‑language models, explains why standard prompt tuning (CoOp) forgets base knowledge, and presents the KgCoOp framework that adds a knowledge‑guided loss to keep learned prompts close to hand‑crafted ones, dramatically improving unseen‑class performance while preserving efficiency.

CLIPGeneralizationKnowledge-guided Optimization
0 likes · 12 min read
How Knowledge‑Guided Context Optimization Boosts Zero‑Shot Vision‑Language Models
DeepHub IMBA
DeepHub IMBA
Mar 23, 2026 · Artificial Intelligence

How KgCoOp Uses Knowledge‑Guided Context Optimization to Prevent Prompt Tuning Forgetting

The article analyzes why standard prompt tuning (CoOp) causes catastrophic forgetting in visual‑language models, introduces the KgCoOp framework that adds a knowledge‑guided loss to regularize prompts, and shows through extensive experiments on 11 benchmarks that KgCoOp improves unseen‑class accuracy, harmonic mean, and efficiency while discussing trade‑offs and limitations.

Catastrophic ForgettingKnowledge-guided OptimizationPrompt Tuning
0 likes · 11 min read
How KgCoOp Uses Knowledge‑Guided Context Optimization to Prevent Prompt Tuning Forgetting
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jul 12, 2023 · Artificial Intelligence

Comprehensive Guide to Vision Transformer (ViT): Architecture, Patch Tokenization, Embedding, Fine‑tuning, and Performance

This article provides an in‑depth, English‑language overview of Vision Transformer (ViT), covering its Transformer‑based architecture, patch‑to‑token conversion, token and position embeddings, fine‑tuning strategies such as 2‑D interpolation, experimental results versus CNNs, and the model’s broader significance for multimodal AI research.

Fine‑tuningPatch EmbeddingTransformer
0 likes · 25 min read
Comprehensive Guide to Vision Transformer (ViT): Architecture, Patch Tokenization, Embedding, Fine‑tuning, and Performance
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Oct 18, 2022 · Artificial Intelligence

Practical Implementation of Vision Transformer (ViT) for Image Classification in PyTorch

This article walks readers through building, training, and evaluating a Vision Transformer (ViT) model for a five‑class flower classification task, providing detailed code snippets, model architecture explanations, training script adjustments, and experimental results that highlight the importance of pre‑trained weights.

Image ClassificationPyTorchViT
0 likes · 13 min read
Practical Implementation of Vision Transformer (ViT) for Image Classification in PyTorch