Tag

low-cost training

1 views collected around this technical thread.

Data Thinking Notes
Data Thinking Notes
Feb 11, 2025 · Artificial Intelligence

Why DeepSeek V3 and R1 Are Redefining LLM Efficiency and Power

This article analyzes DeepSeek's V3 and R1 large language models, detailing their low‑cost Mixture‑of‑Experts architecture, Multi‑Head Latent Attention redesign, distributed training optimizations, and reasoning‑focused innovations that together challenge traditional GPU/NPU compute demands.

AI inferenceDeepSeekMLA
0 likes · 15 min read
Why DeepSeek V3 and R1 Are Redefining LLM Efficiency and Power
DataFunTalk
DataFunTalk
Feb 20, 2023 · Artificial Intelligence

Low‑Cost Open‑Source Replication of ChatGPT Using Colossal‑AI

This article explains how researchers reproduced the full ChatGPT training pipeline—including supervised fine‑tuning, reward‑model training, and RLHF—using the open‑source Colossal‑AI system, dramatically reducing GPU memory and hardware requirements while providing ready‑to‑run code and performance benchmarks.

AI optimizationChatGPTColossal-AI
0 likes · 10 min read
Low‑Cost Open‑Source Replication of ChatGPT Using Colossal‑AI