Tagged articles

7 articles

Page 1 of 1

May 19, 2026 · Artificial Intelligence

How EUPE’s Three‑Stage Distillation Lets an 86M Model Run Classification, Segmentation and VLM on iPhone in 62 ms (SOTA)

EUPE introduces a three‑stage “scale‑then‑shrink” distillation pipeline that first trains a large proxy model to absorb heterogeneous expert knowledge and then compresses it into an 86M encoder, achieving state‑of‑the‑art performance on image classification, dense prediction and vision‑language tasks on an iPhone with only 62 ms latency.

EUPEEdge AIKnowledge Distillation

0 likes · 16 min read

How EUPE’s Three‑Stage Distillation Lets an 86M Model Run Classification, Segmentation and VLM on iPhone in 62 ms (SOTA)

Data Party THU

Mar 25, 2026 · Artificial Intelligence

How Knowledge‑Guided Context Optimization Boosts Zero‑Shot Vision‑Language Models

The article analyzes the Base‑to‑New generalization problem of CLIP‑based visual‑language models, explains why standard prompt tuning (CoOp) forgets base knowledge, and presents the KgCoOp framework that adds a knowledge‑guided loss to keep learned prompts close to hand‑crafted ones, dramatically improving unseen‑class performance while preserving efficiency.

CLIPGeneralizationKnowledge-guided Optimization

0 likes · 12 min read

How Knowledge‑Guided Context Optimization Boosts Zero‑Shot Vision‑Language Models

DeepHub IMBA

Mar 23, 2026 · Artificial Intelligence

How KgCoOp Uses Knowledge‑Guided Context Optimization to Prevent Prompt Tuning Forgetting

The article analyzes why standard prompt tuning (CoOp) causes catastrophic forgetting in visual‑language models, introduces the KgCoOp framework that adds a knowledge‑guided loss to regularize prompts, and shows through extensive experiments on 11 benchmarks that KgCoOp improves unseen‑class accuracy, harmonic mean, and efficiency while discussing trade‑offs and limitations.

Catastrophic ForgettingKnowledge-guided OptimizationPrompt Tuning

0 likes · 11 min read

How KgCoOp Uses Knowledge‑Guided Context Optimization to Prevent Prompt Tuning Forgetting

Amap Tech

Jul 11, 2025 · Artificial Intelligence

Unified Self‑Supervised Pretraining Accelerates Image Generation and Improves Understanding

The USP framework introduces masked latent modeling within a VAE space to pre‑train ViT encoders, enabling seamless weight transfer to both image classification, segmentation, and diffusion‑based generation tasks, dramatically speeding up DiT and SiT models while preserving strong visual representations.

VAEViTdiffusion models

0 likes · 13 min read

Unified Self‑Supervised Pretraining Accelerates Image Generation and Improves Understanding

Rare Earth Juejin Tech Community

Jul 12, 2023 · Artificial Intelligence

Comprehensive Guide to Vision Transformer (ViT): Architecture, Patch Tokenization, Embedding, Fine‑tuning, and Performance

This article provides an in‑depth, English‑language overview of Vision Transformer (ViT), covering its Transformer‑based architecture, patch‑to‑token conversion, token and position embeddings, fine‑tuning strategies such as 2‑D interpolation, experimental results versus CNNs, and the model’s broader significance for multimodal AI research.

Fine‑tuningPatch EmbeddingTransformer

0 likes · 25 min read

Comprehensive Guide to Vision Transformer (ViT): Architecture, Patch Tokenization, Embedding, Fine‑tuning, and Performance

DataFunSummit

Apr 21, 2023 · Artificial Intelligence

Fine‑Tuning a ViT Image Classification Model on a Small Flower Dataset Using ModelScope

This tutorial walks through the complete process of fine‑tuning a Vision Transformer (ViT) model for 14‑class flower image classification on ModelScope, covering dataset preparation, model loading, training configuration, evaluation, and inference with practical code examples.

Image ClassificationModelScopePython

0 likes · 14 min read

Fine‑Tuning a ViT Image Classification Model on a Small Flower Dataset Using ModelScope

Rare Earth Juejin Tech Community

Oct 18, 2022 · Artificial Intelligence

Practical Implementation of Vision Transformer (ViT) for Image Classification in PyTorch

This article walks readers through building, training, and evaluating a Vision Transformer (ViT) model for a five‑class flower classification task, providing detailed code snippets, model architecture explanations, training script adjustments, and experimental results that highlight the importance of pre‑trained weights.

Image ClassificationPyTorchViT