Tagged articles

59 articles

Page 1 of 1

May 11, 2026 · Artificial Intelligence

How a 1930‑Era AI Model Without Any Computer Knowledge Learned to Write Python

The talkie‑1930‑13b language model, trained exclusively on English texts published before 1931, surprisingly understands historical events, solves Python coding problems, and exhibits scaling‑law behavior, prompting a detailed comparison with its modern twin talkie‑web‑13b and an analysis of training pipelines, memory categories, and common deployment pitfalls.

AI memoryLLMPython code generation

0 likes · 10 min read

How a 1930‑Era AI Model Without Any Computer Knowledge Learned to Write Python

DataFunTalk

May 10, 2026 · Artificial Intelligence

How AI Is Powering One‑Person Billion‑Dollar Startups and Multi‑Agent Software Collaboration

In a Code with Claude interview, Anthropic co‑founders Dario and Daniela Amodei explain how exponential AI growth—evidenced by an 80× revenue surge—creates compute bottlenecks, drives a shift to multi‑agent collaboration, and forces product teams to rethink development through scaling laws and Amdahl's Law.

Amdahl's LawArtificial IntelligenceCompute Bottleneck

0 likes · 26 min read

How AI Is Powering One‑Person Billion‑Dollar Startups and Multi‑Agent Software Collaboration

Data Party THU

May 2, 2026 · Artificial Intelligence

Finally, Researchers Uncover Deep Learning’s “Newton’s Law”

A new collaborative paper from top universities proposes a unified “Learning Mechanics” framework for deep learning, outlining five research strands—from solvable idealized models and extreme limits to empirical scaling laws and hyper‑parameter theory—while drawing analogies to classical physics and highlighting ten open challenges.

deep learninghyperparameter theorylearning mechanics

0 likes · 16 min read

Finally, Researchers Uncover Deep Learning’s “Newton’s Law”

Machine Heart

Apr 26, 2026 · Artificial Intelligence

Has Deep Learning Discovered Its Own “Newton’s Law”?

A new collaborative paper titled “There Will Be a Scientific Theory of Deep Learning” proposes a unified “Learning Mechanics” framework that connects solvable idealized models, tractable limits, empirical scaling laws, hyperparameter theory, and universal representation behavior, aiming to give deep learning a first‑principles scientific foundation.

deep learninghyperparameterslearning mechanics

0 likes · 14 min read

Has Deep Learning Discovered Its Own “Newton’s Law”?

Machine Learning Algorithms & Natural Language Processing

Apr 21, 2026 · Artificial Intelligence

How a 22‑Year‑Old Reversed‑Engineered Mythos into OpenMythos Using MoE and DeepSeek‑Inspired Attention

OpenMythos re‑creates the Claude Mythos architecture as a Recurrent‑Depth Transformer with MoE routing, achieving comparable performance to larger Transformers while using roughly half the parameters, and demonstrates systematic generalization and depth extrapolation through looped inference in latent space.

AI ArchitectureLooped Language ModelsMixture of Experts

0 likes · 6 min read

How a 22‑Year‑Old Reversed‑Engineered Mythos into OpenMythos Using MoE and DeepSeek‑Inspired Attention

Machine Heart

Apr 7, 2026 · Artificial Intelligence

How Qianxun Raised ¥3 B in 30 Days: AI‑Powered Robotics Secrets

Qianxun Intelligent secured ¥30 billion in funding within a month, leveraged a scaling‑law data engine and the Spirit v1.5 VLA model to achieve breakthrough robot performance, and demonstrated the commercial loop through deployments at JD.com retail and CATL battery lines.

Embodied AIQianxun IntelligentVenture Funding

0 likes · 12 min read

How Qianxun Raised ¥3 B in 30 Days: AI‑Powered Robotics Secrets

Lao Guo's Learning Space

Apr 2, 2026 · Artificial Intelligence

Large Model Pretraining and Fine‑Tuning: A 2026 Technical Guide from Scaling Laws to Post‑Training Revolution

This article explains the full lifecycle of large language models in 2026, covering pretraining fundamentals, the limits of classic Scaling Laws, data‑centric advances, fine‑tuning strategies, RLHF, DPO, and the emerging post‑training methods GRPO, DAPO and RLVR, with concrete benchmarks and cost analyses.

DAPODPOGRPO

0 likes · 17 min read

Large Model Pretraining and Fine‑Tuning: A 2026 Technical Guide from Scaling Laws to Post‑Training Revolution

DataFunSummit

Mar 29, 2026 · Artificial Intelligence

How Code Intelligence Is Evolving: From Foundation Models to Repository‑Level Agents

This article reviews the rapid evolution of code intelligence, covering the history of code foundation models, reinforcement‑learning optimizations, scaling‑law insights, the LoopCoder architecture, rigorous multi‑level evaluation suites, and the emergence of repository‑level code agents, while highlighting open‑source contributions such as Qwen‑Coder.

code evaluationcode intelligencereinforcement learning

0 likes · 15 min read

How Code Intelligence Is Evolving: From Foundation Models to Repository‑Level Agents

Alimama Tech

Mar 26, 2026 · Industry Insights

How Alibaba’s Large User Model (LUM) Boosted CTR by 4.5% and Scaled to Billions of Parameters

The article analyzes the evolution from traditional modular recommendation models to a generative Large User Model (LUM), detailing its three‑stage paradigm, tokenization, training objectives, scaling‑law findings, offline and online experiments, and the AI‑infra innovations that enabled a 4.5% CTR lift in production.

CTR predictionGenerative ModelingLarge Language Models

0 likes · 18 min read

How Alibaba’s Large User Model (LUM) Boosted CTR by 4.5% and Scaled to Billions of Parameters

Machine Learning Algorithms & Natural Language Processing

Mar 24, 2026 · Artificial Intelligence

Jensen Huang Claims AGI Is Already Achieved, Ilya Is Wrong, Programmers to Reach 1 B

In a candid Lex Fridman interview, Nvidia CEO Jensen Huang asserts that AGI has already been realized, disputes Ilya Sutskever’s data‑limit claim, predicts a billion programmers, outlines scaling‑law dynamics, token‑priced AI services, data‑center energy strategies, and his hands‑on management philosophy for the AI era.

AGIAI ManagementData Centers

0 likes · 37 min read

Jensen Huang Claims AGI Is Already Achieved, Ilya Is Wrong, Programmers to Reach 1 B

SuanNi

Mar 4, 2026 · Artificial Intelligence

How to Fit Large Language Models into Cars and Robots: A Hardware‑Aware Scaling Law

This article presents a hardware‑aware co‑design framework for edge‑deployed large language models, revealing a scaling law that balances model accuracy and inference latency, and demonstrates how Pareto‑optimal architectures can be discovered quickly using roofline analysis and systematic search on devices like NVIDIA Jetson Orin.

AI inferencePareto optimizationRoofline Model

0 likes · 15 min read

How to Fit Large Language Models into Cars and Robots: A Hardware‑Aware Scaling Law

Machine Learning Algorithms & Natural Language Processing

Feb 23, 2026 · Artificial Intelligence

System Engineering Behind Billions of Parameters: Insider Training Details from Seven Top AI Labs

This article systematically dissects the engineering decisions behind frontier large‑language‑model training—covering architecture choices, attention variants, optimizer evolution, data‑curation strategies, scaling‑law insights, and post‑training SFT/RL pipelines—based on open‑source reports from seven leading AI laboratories.

Large Language ModelsMixture of Expertsmodel training

0 likes · 26 min read

System Engineering Behind Billions of Parameters: Insider Training Details from Seven Top AI Labs

Top Architect

Feb 14, 2026 · Artificial Intelligence

Why Test‑Time Compute Is the Next Breakthrough for Large Language Models

The article explains how inference‑oriented large language models shift the focus from training‑time resources to test‑time computation, detailing scaling laws, verification techniques, reinforcement‑learning pipelines such as DeepSeek‑R1, and methods for distilling reasoning abilities into smaller, consumer‑grade models.

Large Language ModelsPrompt Engineeringinference compute

0 likes · 19 min read

Why Test‑Time Compute Is the Next Breakthrough for Large Language Models

AI Cyberspace

Jan 18, 2026 · Artificial Intelligence

Understanding Supervised, Unsupervised, Self‑Supervised, Semi‑Supervised, and Reinforcement Learning for Large Language Model Training

The article explains various learning paradigms (supervised, unsupervised, self‑supervised, semi‑supervised, and reinforcement), describes dataset types and quality considerations, outlines preprocessing steps like filtering, deduplication, and tokenization, and discusses scaling laws linking model size, data volume, and compute resources, with concrete examples and code.

Machine Learningdata preprocessingmodel training

0 likes · 26 min read

Understanding Supervised, Unsupervised, Self‑Supervised, Semi‑Supervised, and Reinforcement Learning for Large Language Model Training

JD Tech

Jan 13, 2026 · Artificial Intelligence

Mastering Large Language Models: Transformers, Scaling Laws, and MoE Explained

This extensive guide walks readers through the fundamentals of large language models, covering transformer architecture, pre‑training and fine‑tuning techniques, scaling laws, emergent abilities, mixture‑of‑experts designs, and practical comparisons, providing clear explanations, code snippets, and visual illustrations for deep learning practitioners.

Mixture of Expertsemergent abilitiesfine-tuning

0 likes · 47 min read

Mastering Large Language Models: Transformers, Scaling Laws, and MoE Explained

Architects Research Society

Jan 3, 2026 · Artificial Intelligence

2026: The Year AI Shifts from Scaling Hype to Practical, Small‑Model Innovation

The article forecasts that by 2026 AI will move away from sheer scale‑driven breakthroughs toward more usable, smaller models, world‑model learning, robust agents, and physical integration, emphasizing practical utility, augmentation of human work, and new job opportunities.

AIAugmentationPhysical AI

0 likes · 7 min read

2026: The Year AI Shifts from Scaling Hype to Practical, Small‑Model Innovation

AI2ML AI to Machine Learning

Dec 29, 2025 · Artificial Intelligence

How Brin’s Return Powers Google’s First ‘Sword’: The TPU Hardware Revolution

The article examines Google’s AI resurgence after Sergey Brin’s comeback, detailing the evolution of TPU hardware from v1 to v7, the strategic focus on algorithmic efficiency, comparisons with Nvidia’s B200, the role of JAX/XLA, and how these advances create a powerful competitive moat for Google’s AI infrastructure.

AI hardwareGoogle TPUJAX

0 likes · 8 min read

How Brin’s Return Powers Google’s First ‘Sword’: The TPU Hardware Revolution

PaperAgent

Dec 22, 2025 · Artificial Intelligence

Can Budget‑Aware Tool Use Unlock Scalable AI Agents? A Deep Dive

This article analyzes recent Google research on test‑time scaling and agentization, introducing budget‑aware tool use and the BATS framework, presenting experimental results across 180 configurations, uncovering scaling laws, and offering a predictive model for optimal multi‑agent architectures.

AI agentsBATS frameworkLLM Tool Use

0 likes · 7 min read

Can Budget‑Aware Tool Use Unlock Scalable AI Agents? A Deep Dive

Tencent Technical Engineering

Dec 1, 2025 · Artificial Intelligence

Do Machines Really Think? Inside Deep Reasoning, Scaling Laws & RLHF for LLMs

This article examines whether large language models truly think, explores the origins of deep reasoning through transformer architectures and scaling laws, reviews chain‑of‑thought and its variants, and analyzes how reinforcement learning from human feedback—including PPO, DPO, and GRPO—helps internalise step‑by‑step reasoning while pointing to future directions such as atomic thought, hierarchical models, and training‑free in‑context knowledge bases.

AI alignmentLLMRLHF

0 likes · 35 min read

Do Machines Really Think? Inside Deep Reasoning, Scaling Laws & RLHF for LLMs

AntTech

Nov 11, 2025 · Artificial Intelligence

Breaking the Efficiency Wall: Ant Group’s Bailing Model Paves the Way to AGI

At CNCC 2025, Ant Group’s Vice President Zhou Jun outlined the Bailing large‑model’s five‑layer architecture, hybrid linear attention, Ling Scaling Law, and novel training algorithms that dramatically cut costs and latency, achieving state‑of‑the‑art performance on math and code benchmarks while promoting open‑source collaboration toward AGI.

AGILarge Language ModelsMixture of Experts

0 likes · 8 min read

Breaking the Efficiency Wall: Ant Group’s Bailing Model Paves the Way to AGI

Wu Shixiong's Large Model Academy

Oct 22, 2025 · Artificial Intelligence

Mastering LLM Training: A Step‑by‑Step Blueprint from Data to Alignment

This guide walks through the complete end‑to‑end process of training a large language model from scratch, covering data collection, cleaning, tokenization, pre‑training objectives and engineering, post‑training alignment methods, scaling laws, over‑fitting mitigation, and gradient‑stability techniques.

LLMalignmentgradient stability

0 likes · 9 min read

Mastering LLM Training: A Step‑by‑Step Blueprint from Data to Alignment

Data Party THU

Oct 21, 2025 · Artificial Intelligence

Can Linear‑Time LSTMs Beat Transformers? Scaling Laws Reveal the Answer

The paper presents a systematic scaling‑law study of the linear‑time xLSTM architecture versus quadratic‑time Transformers, evaluating parameter‑data loss surfaces, optimal model size under equal FLOP budgets, and inference latency components, and shows that xLSTM consistently offers better cost‑effectiveness across diverse contexts and budgets.

Inference OptimizationLinear Time ComplexityTransformer

0 likes · 11 min read

Can Linear‑Time LSTMs Beat Transformers? Scaling Laws Reveal the Answer

21CTO

Sep 29, 2025 · Artificial Intelligence

Why Open‑Source Is the Key to China’s AI Future, According to Li Kaifu

Li Kaifu argues that open‑source large‑model ecosystems are essential for China to close the AI gap with the United States, highlighting DeepSeek’s impact, shifting scaling laws, and the emerging role of AI‑to‑AI teaching as the next development frontier.

Artificial IntelligenceChina AILarge Language Models

0 likes · 4 min read

Why Open‑Source Is the Key to China’s AI Future, According to Li Kaifu

Architects' Tech Alliance

Sep 23, 2025 · Artificial Intelligence

Why AI Chips Need High‑Speed Networks: From Scaling Laws to DPU Evolution

This report analyzes how the convergence of Moore's law slowdown and large‑model scaling laws creates a feedback loop between compute power and intelligence, driving the emergence of AI‑specific chips, high‑speed networking, and DPU architectures that together reshape modern AI infrastructure.

AI chipsDPUHigh-Speed Networking

0 likes · 25 min read

Why AI Chips Need High‑Speed Networks: From Scaling Laws to DPU Evolution

Fighter's World

Aug 15, 2025 · Artificial Intelligence

Why GPT‑5 Is Still Far From AGI Yet Near Scalable Profitability

The article analyzes GPT‑5’s release, its unified multi‑model architecture with a real‑time router, improved reasoning, coding and tool‑use capabilities, reduced hallucinations, and how these technical shifts reshape AI commercialization, investment logic, competition and enterprise adoption.

AI commercializationGPT-5Large Language Models

0 likes · 20 min read

Why GPT‑5 Is Still Far From AGI Yet Near Scalable Profitability

Data Party THU

Aug 5, 2025 · Artificial Intelligence

Why State Space Models May Outperform Transformers: A Deep Dive

The article provides a comprehensive technical analysis of state space models (SSM) versus Transformers, covering their core mechanisms, three essential design factors, training efficiency, scaling behavior, tokenization debates, and experimental evidence that highlights the trade‑offs and potential advantages of SSMs in modern AI systems.

MambaState Space ModelTransformer

0 likes · 21 min read

Why State Space Models May Outperform Transformers: A Deep Dive

Data Thinking Notes

Jul 30, 2025 · Artificial Intelligence

Tracing the Evolution of Large Language Models: Key Papers and Breakthroughs

This article reviews the most influential papers in large language model research since 2017, covering foundational works such as the Transformer, GPT‑3, BERT, scaling laws, and recent innovations like FlashAttention, Mamba, and QLoRA, highlighting their core contributions and impact on AI development.

AI researchLarge Language ModelsTransformer

0 likes · 28 min read

Tracing the Evolution of Large Language Models: Key Papers and Breakthroughs

Kuaishou Large Model

Jun 20, 2025 · Artificial Intelligence

How OneRec Revolutionizes Short-Video Recommendations with End-to-End Generative AI

OneRec, an end-to-end generative recommendation system from Kuaishou, uses an encoder-decoder architecture, reward-based preference alignment, and reinforcement learning to dramatically improve video recommendation efficiency, boosting user engagement and reducing operational costs while achieving scaling-law performance comparable to large language models.

EfficiencyKuaishouLarge Models

0 likes · 18 min read

How OneRec Revolutionizes Short-Video Recommendations with End-to-End Generative AI

Kuaishou Tech

Jun 20, 2025 · Artificial Intelligence

How OneRec Redefines Recommendation with End‑to‑End Generative Modeling and RL Alignment

The OneRec system from Kuaishou replaces traditional cascade recommendation pipelines with an encoder‑decoder architecture, leverages reward‑based preference alignment via reinforcement learning, achieves ten‑fold FLOPs gains, cuts operational costs by 90%, and delivers significant user‑engagement improvements across short‑video and local‑service scenarios.

Generative ModelingKuaishouOneRec

0 likes · 17 min read

How OneRec Redefines Recommendation with End‑to‑End Generative Modeling and RL Alignment

AntTech

Jun 18, 2025 · Artificial Intelligence

How Ant Group’s Baoling Models Push Toward AGI with MoE and Multimodal Innovations

In a detailed AICon talk, Ant Group’s Baoling team leader Zhou Jun outlines their latest large‑model training techniques, MoE architecture optimizations, multimodal breakthroughs, open‑source releases, and the strategic roadmap needed to turn AI into a ubiquitous, “scan‑code‑level” everyday assistant.

AI infrastructureLarge Language ModelsMixture of Experts

0 likes · 25 min read

How Ant Group’s Baoling Models Push Toward AGI with MoE and Multimodal Innovations

AI Cyberspace

May 20, 2025 · Artificial Intelligence

Why SuperNode and SuperPOD Are Critical for Scaling AI Models

This article explains the scaling laws behind large language models, the explosive growth of model sizes and compute demands, and why modern AI infrastructure must adopt SuperNode and SuperPOD architectures that combine high‑bandwidth Scale‑Up networks with flexible Scale‑Out networking to overcome bandwidth, latency, and power challenges.

AI scalingSuperPoDdistributed training

0 likes · 42 min read

Why SuperNode and SuperPOD Are Critical for Scaling AI Models

AI Frontier Lectures

May 12, 2025 · Artificial Intelligence

Can Scaling Reinforcement Learning Turn AI Models into Real Thinkers? Insights from Dan Roberts' AI Ascent Talk

In a recent AI Ascent presentation, OpenAI researcher Dan Roberts explained how scaling laws for both pre‑training and reinforcement learning reveal a new test‑time dimension of model performance, showcased the capabilities of the o1 and o3 models, and outlined a massive compute‑scaling strategy aimed at creating AI systems that can reason for years like Einstein.

AIFuture Predictionsmodel evaluation

0 likes · 9 min read

Can Scaling Reinforcement Learning Turn AI Models into Real Thinkers? Insights from Dan Roberts' AI Ascent Talk

21CTO

Apr 20, 2025 · Artificial Intelligence

Microsoft CTO Kevin Scott on AI Scaling Laws and the Rise of Specialized Agents

In a recent interview, Microsoft CTO Kevin Scott discusses the company’s AI progress, emphasizes that scaling laws have not yet reached their limits, predicts a future dominated by many specialized AI agents managed by knowledgeable product managers, and highlights the surprising impact of China’s DeepSeek project.

AIArtificial IntelligenceMicrosoft

0 likes · 3 min read

Microsoft CTO Kevin Scott on AI Scaling Laws and the Rise of Specialized Agents

Cognitive Technology Team

Mar 7, 2025 · Artificial Intelligence

From Word Embeddings to Large Language Models: A Comprehensive Overview of AI Model Evolution

This article traces the development of AI models—from early word embeddings like Word2Vec and ELMo, through transformer‑based encoders such as BERT and decoder‑only models like GPT‑1/2/3, to recent multimodal systems and scaling laws—explaining their architectures, training methods, and impact on modern AI applications.

AIEmbeddingLarge Language Models

0 likes · 22 min read

From Word Embeddings to Large Language Models: A Comprehensive Overview of AI Model Evolution

DataFunTalk

Feb 28, 2025 · Artificial Intelligence

DeepSeek LLM Series (V1‑V3) and R1: Architecture, Training Strategies, Evaluation, and Distillation

An in‑depth overview of the DeepSeek LLM series (V1‑V3) and the R1 models, covering their architectures, scaling‑law experiments, data pipelines, training strategies—including MoE, MLA, FP8, multi‑step learning‑rate scheduling, reinforcement learning, and extensive evaluation results, as well as knowledge‑distillation techniques.

Mixture of Expertsscaling laws

0 likes · 36 min read

DeepSeek LLM Series (V1‑V3) and R1: Architecture, Training Strategies, Evaluation, and Distillation

Tencent Cloud Developer

Feb 27, 2025 · Artificial Intelligence

DeepSeek LLM Series (V1‑V3, R1) Technical Overview and Analysis

The DeepSeek technical overview details the evolution from the dense 67 B V1 model through the 236 B MoE‑based V2 and 671 B V3 with FP8 training, to the RL‑only R1 series that learns reasoning without supervision, highlighting innovations such as Grouped‑Query Attention, Multi‑Head Latent Attention, load‑balancing‑free MoE, Multi‑Token Prediction, and knowledge distillation, and reporting state‑of‑the‑art benchmark results and open‑source reproduction projects.

AI researchDeepSeekMixture of Experts

0 likes · 37 min read

DeepSeek LLM Series (V1‑V3, R1) Technical Overview and Analysis

NewBeeNLP

Feb 21, 2025 · Artificial Intelligence

Do Scaling Laws Still Hold? Analyzing Grok‑3, Deepseek and LLM Training Trends

The article examines whether pre‑training scaling laws remain valid, compares Grok‑3’s architecture and training strategy with Deepseek models, and explores how different scaling approaches—pre‑training, RL‑based, and test‑time—affect the cost‑effectiveness and intelligence of large language models.

AI researchGrok-3scaling laws

0 likes · 11 min read

Do Scaling Laws Still Hold? Analyzing Grok‑3, Deepseek and LLM Training Trends

Tencent Cloud Developer

Feb 6, 2025 · Artificial Intelligence

DeepSeek V Series: Technical Overview of Scaling Laws, Grouped Query Attention, and Mixture‑of‑Experts

The article reviews DeepSeek’s V‑series papers, explaining how scaling‑law insights, Grouped Query Attention, a depth‑first design, loss‑free load balancing, multi‑token prediction and Multi‑Head Latent Attention together enable economical mixture‑of‑experts LLMs that rival closed‑source models while cutting compute and hardware costs.

DeepSeekGrouped Query AttentionLarge Language Models

0 likes · 13 min read

DeepSeek V Series: Technical Overview of Scaling Laws, Grouped Query Attention, and Mixture‑of‑Experts

AIWalker

Jan 18, 2025 · Artificial Intelligence

How InternLM 3.0 Achieves High Performance with Just 4 TB of Training Data

Shanghai AI Laboratory’s InternLM 3.0 upgrade demonstrates that a refined 4 TB token dataset can boost a large‑language model’s performance beyond that of open‑source peers trained on 18 TB, cutting training cost by over 75% while merging regular dialogue with deep reasoning capabilities.

AI evaluationInternLMLarge Language Model

0 likes · 9 min read

How InternLM 3.0 Achieves High Performance with Just 4 TB of Training Data

AIWalker

Jan 16, 2025 · Artificial Intelligence

How InternLM 3.0 Achieves High Performance with Just 4 TB of Training Data

InternLM 3.0 (InternLM‑3) upgrades the Shusheng‑PuYu model by refining data to boost "thinking density", using only 4 TB of tokens to surpass peer open‑source models, cutting training cost by over 75% while merging ordinary dialogue with deep reasoning capabilities.

InternLMLarge Language Modeldata efficiency

0 likes · 9 min read

Baobao Algorithm Notes

Dec 23, 2024 · Artificial Intelligence

From Zero to One: A Practical Guide to Pretraining Large Language Models

This comprehensive guide walks through every stage of building a large‑language‑model pretraining pipeline—from data sourcing, cleaning, and deduplication, to tokenizer design, model architecture choices, training framework selection, optimization tricks, and evaluation methods—providing actionable tips and pitfalls to avoid for both newcomers and seasoned practitioners.

LLM pretrainingdata collectionscaling laws

0 likes · 33 min read

From Zero to One: A Practical Guide to Pretraining Large Language Models

Architects' Tech Alliance

Dec 23, 2024 · Artificial Intelligence

Why High‑Quality, Massive, Diverse Data Fuels AI Breakthroughs

The article explains how breakthroughs in artificial intelligence depend on high‑quality, large‑scale, and diverse training data, outlines the data‑centric AI movement, details a six‑step workflow for building datasets, and surveys the data industry ecosystem supporting large language model development.

AI dataData‑Centric AILarge Language Models

0 likes · 7 min read

Why High‑Quality, Massive, Diverse Data Fuels AI Breakthroughs

Fighter's World

Dec 21, 2024 · Artificial Intelligence

Is Pre‑training Coming to an End? Evaluating Data Sufficiency

The article examines Ilya Sutskever’s claim that pre‑training will end, argues that scaling laws still hold and data is not yet a bottleneck, highlights the scarcity of high‑quality frontier data, and explains why the industry is shifting toward inference‑time compute (o1) as a more sustainable path for large language models.

AI trendsData WallInference‑time Compute

0 likes · 13 min read

Is Pre‑training Coming to an End? Evaluating Data Sufficiency

Architects' Tech Alliance

Nov 10, 2024 · Industry Insights

AI Compute Infrastructure: Trends, Scaling Laws, and the Rise of Massive Clusters

The article analyzes the development of AI compute infrastructure, detailing the three‑level architecture from chip to cluster, the scaling law linking model parameters to compute demand, the rapid growth of massive “ten‑thousand‑card” clusters worldwide, and the emerging demand for inference workloads driving new deployment and scheduling strategies.

AI computeInference DemandInfrastructure

0 likes · 15 min read

AI Compute Infrastructure: Trends, Scaling Laws, and the Rise of Massive Clusters

DaTaobao Tech

Oct 30, 2024 · Artificial Intelligence

Understanding OpenAI o1: Chain‑of‑Thought, Scaling Laws, and Training Strategies

The article explains how OpenAI’s o1 model leverages chain‑of‑thought prompting, dual‑system cognitive theory, and new scaling laws—pre‑training on code/math and post‑training reinforcement with step‑wise reward models—to achieve superior reasoning, safety, and performance over GPT‑4, heralding a shift toward models that learn to think.

LLMSafetychain-of-thought

0 likes · 42 min read

Understanding OpenAI o1: Chain‑of‑Thought, Scaling Laws, and Training Strategies

Baobao Algorithm Notes

Sep 28, 2024 · Artificial Intelligence

Inside Llama 3: A Complete Guide to Modern LLM Training, Architecture, and Optimization

This article provides a thorough, yet concise, overview of Llama 3’s training pipeline, data handling, model architecture, scaling laws, post‑training techniques like SFT and DPO, and inference optimizations such as KV‑Cache, GQA, PagedAttention, and FP8 quantization, highlighting practical insights and benchmark results.

DPOKV CacheLLM training

0 likes · 32 min read

Inside Llama 3: A Complete Guide to Modern LLM Training, Architecture, and Optimization

DataFunSummit

Sep 24, 2024 · Artificial Intelligence

Streaming Data Pipelines and Scaling Laws for Efficient Large‑Model Training

The article discusses the challenges of training ever‑larger AI models on internet‑scale data, critiques traditional batch ETL pipelines, and proposes a streaming data‑flow architecture with dynamic data selection and a shared‑memory/Alluxio middle layer to decouple data processing from model training, improving efficiency and scalability.

AI infrastructuredata pipelinesmultimodal data

0 likes · 20 min read

Streaming Data Pipelines and Scaling Laws for Efficient Large‑Model Training

Architects' Tech Alliance

Sep 4, 2024 · Fundamentals

Why Bigger Transformers Win: Scaling Laws and Parallel Computing Essentials

The article explains OpenAI's 2020 Scaling Laws that show larger transformer models, more data, and greater compute consistently improve performance, introduces the concept of emergent abilities at critical size thresholds, and outlines the core principles of parallel computing such as multi‑processor usage, task decomposition, concurrent execution, and inter‑processor communication.

Concurrencycommunicationemergent abilities

0 likes · 6 min read

Why Bigger Transformers Win: Scaling Laws and Parallel Computing Essentials

DataFunSummit

Sep 1, 2024 · Artificial Intelligence

Data Management in Large Language Model Training: Overview, Pre‑training, SFT, and Future Challenges

This article surveys data management for large language model training, covering an overview, pre‑training data composition, scaling‑law‑driven quantity control, quality filtering, deduplication, harmful‑content removal, instruction fine‑tuning strategies, dynamic data selection, and emerging research challenges such as bias mitigation, multimodal data handling, and synthetic‑data filtering.

data qualityinstruction fine-tuningpretraining

0 likes · 18 min read

Data Management in Large Language Model Training: Overview, Pre‑training, SFT, and Future Challenges

Xiaohongshu Tech REDtech

Jul 29, 2024 · Artificial Intelligence

Scaling Laws for Dense Retrieval: Empirical Study of Model Size, Training Data, and Annotation Quality

The award‑winning study shows that dense retrieval performance follows precise power‑law scaling with model size, training data quantity, and annotation quality, introduces contrast entropy for evaluation, validates joint scaling formulas on MS MARCO and T2Ranking, and uses cost models to guide budget‑optimal resource allocation.

Model Sizeannotation qualitycontrast entropy

0 likes · 13 min read

Scaling Laws for Dense Retrieval: Empirical Study of Model Size, Training Data, and Annotation Quality

NewBeeNLP

Jun 7, 2024 · Artificial Intelligence

Scaling Laws, Synthetic Data, and New Model Architectures: What’s Next?

In a recent round‑table, experts debated the validity of scaling laws, the role of synthetic and semi‑synthetic data in overcoming data scarcity, explored alternatives to Transformers such as RNN‑based models and MOE, and examined techniques for handling long‑context inference efficiently.

Mixture of Expertsmodel architecturescaling laws

0 likes · 12 min read

Scaling Laws, Synthetic Data, and New Model Architectures: What’s Next?

21CTO

Jun 2, 2024 · Artificial Intelligence

Geoff Hinton on Scaling Laws, Multimodal AI, and the Future of Intelligence

In a candid interview, Geoff Hinton reflects on his AI journey—from early disappointments in physiology and philosophy to breakthroughs in neural networks, scaling laws, multimodal learning, fast‑weight concepts, and the ethical challenges shaping the future of artificial intelligence.

AI ethicsGeoff Hintondeep learning

0 likes · 25 min read

Geoff Hinton on Scaling Laws, Multimodal AI, and the Future of Intelligence

NewBeeNLP

May 31, 2024 · Artificial Intelligence

Can Cleaned Web Data Rival Proprietary Corpora for LLM Training?

This article analyzes whether large‑scale web crawls, when meticulously filtered and deduplicated, can match or surpass the performance of high‑quality curated datasets in training large language models, covering dataset composition, processing pipelines, experimental results, scaling‑law implications, and future data‑efficiency strategies.

Artificial IntelligenceDataset CleaningLLM

0 likes · 23 min read

Can Cleaned Web Data Rival Proprietary Corpora for LLM Training?

NewBeeNLP

Mar 15, 2024 · Industry Insights

How Meta’s Generative Recommendation (GR) Is Redefining Feature Engineering

Meta’s new Generative Recommendation (GR) paper replaces a decade‑old hierarchical feature paradigm with an ultra‑long sequence transformer that directly fuses user profiles, behaviors, and targets, offering stronger feature crossing, richer information utilization, and massive compute gains, while revealing scaling‑law effects in recommendation systems.

Generative ModelsMetaRecommendation Systems

0 likes · 9 min read

How Meta’s Generative Recommendation (GR) Is Redefining Feature Engineering

DataFunTalk

Mar 16, 2023 · Artificial Intelligence

Technical Optimizations and Breakthroughs of GPT‑4: Multimodal Capabilities, Alignment Strategies, and Predictable Scaling

The article summarizes the technical innovations behind GPT‑4, highlighting its multimodal abilities, improved alignment methods, scaling‑law‑based performance prediction, and remaining limitations, while referencing the official OpenAI technical report and community analyses.

AI researchGPT-4Large Language Models

0 likes · 10 min read

Technical Optimizations and Breakthroughs of GPT‑4: Multimodal Capabilities, Alignment Strategies, and Predictable Scaling

Architect

Feb 9, 2023 · Artificial Intelligence

Emergent Abilities of Large Language Models: Complex Reasoning, Knowledge Reasoning, and Out‑of‑Distribution Robustness

This article reviews recent research on the emergent abilities of large language models—such as chain‑of‑thought reasoning, knowledge retrieval without external sources, and robustness to distribution shifts—examining scaling laws, model size thresholds, and the open questions surrounding a potential paradigm shift from fine‑tuning to in‑context learning.

AI researchLarge Language Modelschain-of-thought prompting

0 likes · 23 min read

Emergent Abilities of Large Language Models: Complex Reasoning, Knowledge Reasoning, and Out‑of‑Distribution Robustness

DataFunTalk

Jan 10, 2023 · Artificial Intelligence

Paradigm Shifts in Large Language Model Research and Future Directions

The article reviews the evolution of large language models from the pre‑GPT‑3 era to the present, analyzes the conceptual and technical gaps between Chinese and global research, and outlines key future research directions such as scaling laws, prompting techniques, multimodal training, and efficient model architectures.

AI researchChatGPTIn-Context Learning

0 likes · 73 min read

Paradigm Shifts in Large Language Model Research and Future Directions

Model Perspective

Nov 21, 2022 · Fundamentals

Why Ants Defy Gravity: The Science of Surface Area vs Volume

The article explains how ants can lift objects many times their weight by leveraging their huge surface‑to‑volume ratio, contrasting this with human scaling, and explores how surface area influences air resistance, falling speed, and even the challenges ants face with water due to surface tension.

ANTSPhysicsbiology

0 likes · 5 min read

Why Ants Defy Gravity: The Science of Surface Area vs Volume

Model Perspective

Oct 30, 2022 · Fundamentals

Why Giants Can’t Exist: The Physics of Scaling and Bone Strength

The article explains that when a human’s height is doubled, its volume and weight increase eightfold while bone strength only quadruples, making the legs unable to support the extra load, illustrating the scaling laws that prevent real giants from existing.

allometric scalingbiomechanicsphysics of bodies

0 likes · 4 min read

Why Giants Can’t Exist: The Physics of Scaling and Bone Strength