Tag

scaling laws

1 views collected around this technical thread.

Cognitive Technology Team
Cognitive Technology Team
Mar 7, 2025 · Artificial Intelligence

From Word Embeddings to Large Language Models: A Comprehensive Overview of AI Model Evolution

This article traces the development of AI models—from early word embeddings like Word2Vec and ELMo, through transformer‑based encoders such as BERT and decoder‑only models like GPT‑1/2/3, to recent multimodal systems and scaling laws—explaining their architectures, training methods, and impact on modern AI applications.

AITransformerembedding
0 likes · 22 min read
From Word Embeddings to Large Language Models: A Comprehensive Overview of AI Model Evolution
DataFunTalk
DataFunTalk
Feb 28, 2025 · Artificial Intelligence

DeepSeek LLM Series (V1‑V3) and R1: Architecture, Training Strategies, Evaluation, and Distillation

An in‑depth overview of the DeepSeek LLM series (V1‑V3) and the R1 models, covering their architectures, scaling‑law experiments, data pipelines, training strategies—including MoE, MLA, FP8, multi‑step learning‑rate scheduling, reinforcement learning, and extensive evaluation results, as well as knowledge‑distillation techniques.

AI researchMixture of ExpertsModel Distillation
0 likes · 36 min read
DeepSeek LLM Series (V1‑V3) and R1: Architecture, Training Strategies, Evaluation, and Distillation
Tencent Cloud Developer
Tencent Cloud Developer
Feb 27, 2025 · Artificial Intelligence

DeepSeek LLM Series (V1‑V3, R1) Technical Overview and Analysis

The DeepSeek technical overview details the evolution from the dense 67 B V1 model through the 236 B MoE‑based V2 and 671 B V3 with FP8 training, to the RL‑only R1 series that learns reasoning without supervision, highlighting innovations such as Grouped‑Query Attention, Multi‑Head Latent Attention, load‑balancing‑free MoE, Multi‑Token Prediction, and knowledge distillation, and reporting state‑of‑the‑art benchmark results and open‑source reproduction projects.

AI researchDeepSeekMixture of Experts
0 likes · 37 min read
DeepSeek LLM Series (V1‑V3, R1) Technical Overview and Analysis
Tencent Cloud Developer
Tencent Cloud Developer
Feb 6, 2025 · Artificial Intelligence

DeepSeek V Series: Technical Overview of Scaling Laws, Grouped Query Attention, and Mixture‑of‑Experts

The article reviews DeepSeek’s V‑series papers, explaining how scaling‑law insights, Grouped Query Attention, a depth‑first design, loss‑free load balancing, multi‑token prediction and Multi‑Head Latent Attention together enable economical mixture‑of‑experts LLMs that rival closed‑source models while cutting compute and hardware costs.

DeepSeekGrouped Query AttentionMixture of Experts
0 likes · 13 min read
DeepSeek V Series: Technical Overview of Scaling Laws, Grouped Query Attention, and Mixture‑of‑Experts
DataFunTalk
DataFunTalk
Dec 5, 2024 · Artificial Intelligence

VAR: Scalable Image Generation via Next‑Scale Prediction Wins NeurIPS 2024 Best Paper

The VAR model, a Visual AutoRegressive framework that introduces a novel multi‑scale “next‑scale prediction” paradigm, dramatically improves image generation efficiency and quality, surpasses diffusion models, validates scaling laws in vision, and earned the Best Paper award at NeurIPS 2024.

Image GenerationNeurIPS2024autoregressive models
0 likes · 7 min read
VAR: Scalable Image Generation via Next‑Scale Prediction Wins NeurIPS 2024 Best Paper
DaTaobao Tech
DaTaobao Tech
Oct 30, 2024 · Artificial Intelligence

Understanding OpenAI o1: Chain‑of‑Thought, Scaling Laws, and Training Strategies

The article explains how OpenAI’s o1 model leverages chain‑of‑thought prompting, dual‑system cognitive theory, and new scaling laws—pre‑training on code/math and post‑training reinforcement with step‑wise reward models—to achieve superior reasoning, safety, and performance over GPT‑4, heralding a shift toward models that learn to think.

Chain-of-ThoughtLLMSafety
0 likes · 42 min read
Understanding OpenAI o1: Chain‑of‑Thought, Scaling Laws, and Training Strategies
DataFunSummit
DataFunSummit
Sep 24, 2024 · Artificial Intelligence

Streaming Data Pipelines and Scaling Laws for Efficient Large‑Model Training

The article discusses the challenges of training ever‑larger AI models on internet‑scale data, critiques traditional batch ETL pipelines, and proposes a streaming data‑flow architecture with dynamic data selection and a shared‑memory/Alluxio middle layer to decouple data processing from model training, improving efficiency and scalability.

AI infrastructureLarge Modelsdata pipelines
0 likes · 20 min read
Streaming Data Pipelines and Scaling Laws for Efficient Large‑Model Training
DataFunSummit
DataFunSummit
Sep 1, 2024 · Artificial Intelligence

Data Management in Large Language Model Training: Overview, Pre‑training, SFT, and Future Challenges

This article surveys data management for large language model training, covering an overview, pre‑training data composition, scaling‑law‑driven quantity control, quality filtering, deduplication, harmful‑content removal, instruction fine‑tuning strategies, dynamic data selection, and emerging research challenges such as bias mitigation, multimodal data handling, and synthetic‑data filtering.

Artificial IntelligenceData qualityInstruction Fine-Tuning
0 likes · 18 min read
Data Management in Large Language Model Training: Overview, Pre‑training, SFT, and Future Challenges
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jul 29, 2024 · Artificial Intelligence

Scaling Laws for Dense Retrieval: Empirical Study of Model Size, Training Data, and Annotation Quality

The award‑winning study shows that dense retrieval performance follows precise power‑law scaling with model size, training data quantity, and annotation quality, introduces contrast entropy for evaluation, validates joint scaling formulas on MS MARCO and T2Ranking, and uses cost models to guide budget‑optimal resource allocation.

annotation qualitycontrast entropydense retrieval
0 likes · 13 min read
Scaling Laws for Dense Retrieval: Empirical Study of Model Size, Training Data, and Annotation Quality
DataFunTalk
DataFunTalk
Mar 16, 2023 · Artificial Intelligence

Technical Optimizations and Breakthroughs of GPT‑4: Multimodal Capabilities, Alignment Strategies, and Predictable Scaling

The article summarizes the technical innovations behind GPT‑4, highlighting its multimodal abilities, improved alignment methods, scaling‑law‑based performance prediction, and remaining limitations, while referencing the official OpenAI technical report and community analyses.

AI researchGPT-4alignment
0 likes · 10 min read
Technical Optimizations and Breakthroughs of GPT‑4: Multimodal Capabilities, Alignment Strategies, and Predictable Scaling
Architect
Architect
Feb 9, 2023 · Artificial Intelligence

Emergent Abilities of Large Language Models: Complex Reasoning, Knowledge Reasoning, and Out‑of‑Distribution Robustness

This article reviews recent research on the emergent abilities of large language models—such as chain‑of‑thought reasoning, knowledge retrieval without external sources, and robustness to distribution shifts—examining scaling laws, model size thresholds, and the open questions surrounding a potential paradigm shift from fine‑tuning to in‑context learning.

AI researchEmergent AbilitiesIn-Context Learning
0 likes · 23 min read
Emergent Abilities of Large Language Models: Complex Reasoning, Knowledge Reasoning, and Out‑of‑Distribution Robustness
DataFunTalk
DataFunTalk
Jan 10, 2023 · Artificial Intelligence

Paradigm Shifts in Large Language Model Research and Future Directions

The article reviews the evolution of large language models from the pre‑GPT‑3 era to the present, analyzes the conceptual and technical gaps between Chinese and global research, and outlines key future research directions such as scaling laws, prompting techniques, multimodal training, and efficient model architectures.

AI researchChatGPTIn-Context Learning
0 likes · 73 min read
Paradigm Shifts in Large Language Model Research and Future Directions
Model Perspective
Model Perspective
Nov 21, 2022 · Fundamentals

Why Ants Defy Gravity: The Science of Surface Area vs Volume

The article explains how ants can lift objects many times their weight by leveraging their huge surface‑to‑volume ratio, contrasting this with human scaling, and explores how surface area influences air resistance, falling speed, and even the challenges ants face with water due to surface tension.

antsbiologygravity
0 likes · 5 min read
Why Ants Defy Gravity: The Science of Surface Area vs Volume
Model Perspective
Model Perspective
Oct 30, 2022 · Fundamentals

Why Giants Can’t Exist: The Physics of Scaling and Bone Strength

The article explains that when a human’s height is doubled, its volume and weight increase eightfold while bone strength only quadruples, making the legs unable to support the extra load, illustrating the scaling laws that prevent real giants from existing.

allometric scalingbiomechanicsphysics of bodies
0 likes · 4 min read
Why Giants Can’t Exist: The Physics of Scaling and Bone Strength