Tagged articles

69 articles

Page 1 of 1

Machine Learning Algorithms & Natural Language Processing

May 20, 2026 · Artificial Intelligence

How New LLM Architectures Like Gemma 4 and DeepSeek V4 Cut Long‑Context Costs

The article surveys recent open‑weight LLM releases—Gemma 4, Laguna XS.2, ZAYA1‑8B and DeepSeek V4—detailing how KV‑cache sharing, per‑layer embeddings, layer‑wise attention budgeting, compressed convolutional attention and manifold‑constrained hyper‑connections dramatically reduce memory and compute for ultra‑long contexts while preserving model quality.

Attention optimizationKV CacheLLM

0 likes · 25 min read

How New LLM Architectures Like Gemma 4 and DeepSeek V4 Cut Long‑Context Costs

Machine Heart

May 14, 2026 · Artificial Intelligence

How SenseNova U1’s Native Unified Architecture Lets a Small Model Beat Larger Ones

SenseNova U1 introduces the NEO‑Unify native unified architecture that eliminates separate vision encoders and VAEs, enabling simultaneous multimodal understanding, reasoning, and generation, and achieves state‑of‑the‑art benchmark scores that surpass larger proprietary models across vision‑language, reasoning, and generation tasks.

Model architectureNEO-UnifyOpen Source

0 likes · 19 min read

How SenseNova U1’s Native Unified Architecture Lets a Small Model Beat Larger Ones

Machine Heart

Apr 13, 2026 · Artificial Intelligence

Embracing the Paradigm Shift: A Comprehensive Review of Large‑Model Latent Space

From early 2024 explorations to a 2026 research surge, this review explains how large‑model latent space replaces explicit token‑based processing, outlines its five analytical lenses—foundation, evolution, mechanism, ability, outlook—compares representational properties, details architectural and computational strategies, enumerates new capabilities, and discusses remaining challenges and future directions.

Artificial IntelligenceLarge ModelsLatent Space

0 likes · 20 min read

Embracing the Paradigm Shift: A Comprehensive Review of Large‑Model Latent Space

SuanNi

Mar 18, 2026 · Artificial Intelligence

Explore the LLM Architecture Gallery: Visualizing Seven Years of Model Evolution

The LLM Architecture Gallery, created by Sebastian Raschka, offers an interactive visual compendium of open‑weight large language models from 2019 to 2026, detailing their core parameters, architectural innovations, and the broader trends shaping modern AI research.

AIArtificial IntelligenceLLM

0 likes · 8 min read

Explore the LLM Architecture Gallery: Visualizing Seven Years of Model Evolution

PaperAgent

Mar 17, 2026 · Artificial Intelligence

Can Attention Replace Fixed Residuals? Inside the ‘Attention Residuals’ Breakthrough

This article analyzes the newly released Attention Residuals paper, explaining how learnable attention weighting replaces fixed residual addition to mitigate information dilution in deep LLMs, detailing the proposed Block AttnRes design, engineering trade‑offs, experimental results, and its significance for foundational model architecture.

Block AttentionLLMModel architecture

0 likes · 9 min read

Can Attention Replace Fixed Residuals? Inside the ‘Attention Residuals’ Breakthrough

AI Explorer

Mar 7, 2026 · Artificial Intelligence

SenseTime’s Multimodal Model Skips the Encoder, Boosting Performance and Shifting AI Design Paradigms

SenseTime eliminates the intermediate encoder in multimodal AI models, allowing direct cross‑modal learning, which yields markedly higher performance at 2‑trillion‑parameter scale while reducing training cost, and may trigger a broader industry move toward simpler, more efficient architectures.

AI Paradigm ShiftEfficiencyLarge Models

0 likes · 6 min read

SenseTime’s Multimodal Model Skips the Encoder, Boosting Performance and Shifting AI Design Paradigms

AI Explorer

Mar 5, 2026 · Artificial Intelligence

Can a Thousand Hours of Data Spark True AI Emergence?

An AI startup claims that training with only a thousand hours of data produced emergent intelligence and outperformed industry leaders in benchmark tests, prompting a debate over whether this represents a paradigm shift in efficient learning or an overhyped breakthrough requiring further validation.

AIModel architecturebenchmark

0 likes · 5 min read

Can a Thousand Hours of Data Spark True AI Emergence?

PaperAgent

Feb 15, 2026 · Artificial Intelligence

How MiniCPM‑SALA Merges Sparse and Linear Attention to Break Long‑Context Limits

MiniCPM‑SALA introduces a hybrid sparse‑linear attention architecture that reduces quadratic compute and memory costs, achieves state‑of‑the‑art performance on long‑context benchmarks, and delivers up to 3.5× faster inference than full‑attention models on sequences up to 1 million tokens.

LLMLinear AttentionModel architecture

0 likes · 17 min read

How MiniCPM‑SALA Merges Sparse and Linear Attention to Break Long‑Context Limits

AI Cyberspace

Feb 15, 2026 · Artificial Intelligence

From GPT-1 to GPT-4o: A Deep Dive into the Evolution of Large Language Models

This article chronicles the rapid progression of GPT models from the 2018 GPT‑1 pre‑training breakthrough through GPT‑2’s multitask learning, GPT‑3’s scaling laws and few‑shot abilities, to GPT‑4’s multimodal capabilities and the 2024 GPT‑4 Turbo, Sora, and GPT‑4o releases, while also explaining core LLM abilities and the decoder‑only architecture of GPT‑2.

AI evolutionGPTModel architecture

0 likes · 20 min read

From GPT-1 to GPT-4o: A Deep Dive into the Evolution of Large Language Models

AI Frontier Lectures

Jan 30, 2026 · Artificial Intelligence

Inside MOVA: Open-Source End-to-End Audio-Video Generation

OpenMOSS and MOSI unveiled MOVA, China’s first high‑performance open‑source audio‑video generation model, detailing its dual‑tower architecture, bridge module, aligned ROPE, multi‑stage data pipeline, training strategies, dual CFG guidance, and benchmark results that surpass leading closed‑source systems.

MOVAModel architectureaudio-video generation

0 likes · 20 min read

Inside MOVA: Open-Source End-to-End Audio-Video Generation

Tencent Technical Engineering

Nov 10, 2025 · Artificial Intelligence

How Large Language Models Evolved in 2025: From DeepSeek to Kimi‑K2 and Beyond

This article maps the rapid evolution of open‑source large language models in 2025, explains the underlying architectural breakthroughs such as MLA, MoE, and NSA, compares dozens of models—including DeepSeek‑V3, OLMo2, Gemma3, Llama4, Qwen3, and Kimi‑K2—and highlights the emergence of powerful AI assistants like Dola, providing developers with a concise technical roadmap.

AI assistantLLM efficiencyMixture of Experts

0 likes · 44 min read

How Large Language Models Evolved in 2025: From DeepSeek to Kimi‑K2 and Beyond

Baobao Algorithm Notes

Nov 3, 2025 · Artificial Intelligence

Inside Kimi Linear: How Aggressive MoE Sparsity and Hybrid Linear Attention Boost a 3B‑Scale LLM

The author details Kimi Linear's architecture, training challenges, aggressive MoE sparsity, hybrid linear attention design, benchmark gains, and post‑training insights, offering a transparent technical review of this 48B‑parameter MoE LLM built on 5.7 T tokens.

Hybrid ModelKimi LinearLLM

0 likes · 9 min read

Inside Kimi Linear: How Aggressive MoE Sparsity and Hybrid Linear Attention Boost a 3B‑Scale LLM

DataFunTalk

Oct 29, 2025 · Artificial Intelligence

Voice Agents Transform Gaming & Insurance: Real‑World Lessons from Silicon Valley

In a Silicon Valley tech conference, Mu Shen shared how voice agents—real‑time, task‑oriented AI—were applied to an open‑world game as an AI NPC and to a Fortune‑500 insurer as an AI tele‑salesperson, revealing technical challenges, model architectures, training strategies, evaluation methods, and key lessons for future deployments.

Model architecturegame AIinsurance automation

0 likes · 19 min read

Voice Agents Transform Gaming & Insurance: Real‑World Lessons from Silicon Valley

AI2ML AI to Machine Learning

Oct 20, 2025 · Artificial Intelligence

nanochat Source Code Deep Dive: Data Prep, Model Design, Training & Evaluation

This article revisits nanochat's core components, detailing the preparation of diverse training datasets, the scaling calculations for tokens and parameters, the model's MQA and KV‑cache design, the full training pipeline with gradient accumulation and mixed‑precision, cost breakdown, inference optimizations, evaluation tasks, and identified limitations with suggested improvements.

KV CacheLLMMQA

0 likes · 9 min read

nanochat Source Code Deep Dive: Data Prep, Model Design, Training & Evaluation

AIWalker

Sep 24, 2025 · Artificial Intelligence

Top 2025 Object Detection Research Paths: From Grounding DINO 1.5 to Open‑Set Breakthroughs

The article outlines four key innovation avenues—architecture redesign, task expansion, information fusion, and paradigm shift—highlighting recent works such as Mr. DETR, Grounding DINO 1.5, SM3Det, and RoboFusion, and offers a curated list of 176 cutting‑edge object‑detection papers with code and datasets for free.

Model architecturedeep learningobject detection

0 likes · 8 min read

Top 2025 Object Detection Research Paths: From Grounding DINO 1.5 to Open‑Set Breakthroughs

Architect

Sep 16, 2025 · Artificial Intelligence

Why Transformers Outperform RNNs: A Beginner’s Guide to Attention and Architecture

This article introduces the Transformer architecture, explaining its attention mechanism, encoder‑decoder design, training and inference processes, and why it surpasses RNN‑based models, while also covering common applications and variations in natural language processing.

Model architectureNLPTransformer

0 likes · 13 min read

Why Transformers Outperform RNNs: A Beginner’s Guide to Attention and Architecture

Data Party THU

Sep 10, 2025 · Industry Insights

MoE vs MoR: Deep Dive into Expert and Recursive Mixture Architectures for LLMs

This article provides a comprehensive technical comparison between Mixture of Experts (MoE) and the newly proposed Mixture of Recursion (MoR) architectures, covering design principles, parameter efficiency, inference latency, training stability, routing mechanisms, hardware deployment considerations, and suitable application scenarios.

Hardware DeploymentInference PerformanceMixture of Experts

0 likes · 13 min read

MoE vs MoR: Deep Dive into Expert and Recursive Mixture Architectures for LLMs

AI Frontier Lectures

Sep 9, 2025 · Artificial Intelligence

Can UniConvNet Expand Receptive Fields While Preserving Gaussian Distribution?

The paper introduces UniConvNet, a novel convolutional architecture that expands the effective receptive field (ERF) of ConvNets without breaking the asymptotically Gaussian distribution (AGD), achieving superior accuracy‑parameter and accuracy‑FLOPs trade‑offs across image classification, detection, and segmentation benchmarks.

Effective Receptive FieldImage ClassificationModel architecture

0 likes · 9 min read

Can UniConvNet Expand Receptive Fields While Preserving Gaussian Distribution?

Baobao Algorithm Notes

Sep 2, 2025 · Artificial Intelligence

How LongCat‑Flash Achieves Record Speed and Efficiency for a 560B MoE Model

LongCat‑Flash is a 560‑billion‑parameter Mixture‑of‑Experts LLM that combines a dynamic zero‑computation expert design, shortcut‑connected MoE communication, variance‑aligned scaling, and a three‑stage agent‑centric pre‑training pipeline, delivering over 100 TPS on H800 GPUs at a cost of $0.70 per million tokens.

Artificial IntelligenceInference OptimizationLarge Language Model

0 likes · 23 min read

How LongCat‑Flash Achieves Record Speed and Efficiency for a 560B MoE Model

Java Tech Enthusiast

Sep 1, 2025 · Artificial Intelligence

How Meituan’s LongCat‑Flash‑Chat Beats Top LLMs with Zero‑Computation Experts

LongCat‑Flash‑Chat, Meituan’s newly open‑sourced 560B MoE model, outperforms leading LLMs on agent tool use and instruction following benchmarks, introduces zero‑computation experts and shortcut‑connected MoE for higher throughput, and demonstrates strong programming and reasoning abilities across diverse evaluation tasks.

Large Language ModelMeituan AIModel architecture

0 likes · 12 min read

How Meituan’s LongCat‑Flash‑Chat Beats Top LLMs with Zero‑Computation Experts

Qborfy AI

Aug 8, 2025 · Artificial Intelligence

Why Transformers Revolutionized AI: A Deep Dive into Self‑Attention

This article explains how the Transformer model replaces sequential RNN processing with parallel self‑attention, detailing its core components, positional encoding, encoder‑decoder workflow, industry impact, and surprising facts such as training speed gains and energy efficiency.

AIModel architectureSelf-Attention

0 likes · 5 min read

Why Transformers Revolutionized AI: A Deep Dive into Self‑Attention

Baobao Algorithm Notes

Aug 4, 2025 · Artificial Intelligence

Why GPT‑OSS Chooses a 64‑Dimensional Attention Head and 2880 Hidden Size

This article analyzes the surprising design choices of the rumored GPT‑OSS 120B model, explaining the rationale behind a 64‑dimensional attention head, the equal hidden and intermediate sizes, and other quirky parameters such as MLP bias and KV‑sink SWA, backed by theoretical formulas and empirical benchmarks.

Attention HeadGPT-OSSMLP Ratio

0 likes · 13 min read

Why GPT‑OSS Chooses a 64‑Dimensional Attention Head and 2880 Hidden Size

AI Frontier Lectures

Jul 31, 2025 · Artificial Intelligence

What’s Driving the Latest LLM Architecture Trends? DeepSeek, OLMo, Gemma, and More Explained

This article examines the evolution of large language model architectures over the past seven years, comparing key design choices such as Multi‑Head Latent Attention, Grouped‑Query Attention, Mixture‑of‑Experts, sliding‑window attention, normalization placement, and optimizer variants across models like DeepSeek V3, OLMo 2, Gemma 3, Llama 4, Qwen 3, SmolLM 3, and Kimi 2.

AI researchLLM comparisonMixture of Experts

0 likes · 30 min read

What’s Driving the Latest LLM Architecture Trends? DeepSeek, OLMo, Gemma, and More Explained

Alibaba Cloud Developer

Jul 24, 2025 · Artificial Intelligence

Unlocking AI Model Choices: From CNNs to Transformers and Fine‑Tuning Strategies

This comprehensive guide walks you through the evolution of AI model architectures—from CNNs and RNNs to Transformers and GANs—explaining their core concepts, typical use cases, and how to select, train, and fine‑tune pre‑trained models using practical code examples.

AIModel architecturePython

0 likes · 25 min read

Unlocking AI Model Choices: From CNNs to Transformers and Fine‑Tuning Strategies

DataFunTalk

Jul 16, 2025 · Artificial Intelligence

MiniMax-M1 Revealed: Hybrid Attention, RL Training, and 1M Token Context

MiniMax’s latest M1 model, unveiled after a $300 million funding round, showcases a 4.56‑trillion‑parameter hybrid‑expert architecture with lightning attention, supporting up to one million tokens, and leverages reinforcement‑learning techniques to enhance long‑context handling, inference efficiency, and system‑2 reasoning capabilities.

AI scalingHybrid AttentionModel architecture

0 likes · 16 min read

MiniMax-M1 Revealed: Hybrid Attention, RL Training, and 1M Token Context

AI Frontier Lectures

Jul 11, 2025 · Artificial Intelligence

How Llama Evolved: From Llama‑1 to Llama‑3 – Architecture, Data, and Performance Insights

This article provides a comprehensive technical analysis of Meta's Llama series, tracing the evolution from Llama‑1 through Llama‑2 to Llama‑3, detailing model architectures, training data pipelines, optimization methods, benchmark results, and the broader impact on the open‑source AI community.

AI researchLLaMAModel architecture

0 likes · 25 min read

How Llama Evolved: From Llama‑1 to Llama‑3 – Architecture, Data, and Performance Insights

Qborfy AI

Jul 1, 2025 · Artificial Intelligence

Why CNNs Outperform Fully Connected Networks: A Deep Dive into Architecture and Applications

This article explains the fundamentals of convolutional neural networks (CNNs), detailing their definition, advantages over fully connected networks, architectural components such as input, hidden, and output layers, key operations like convolution, pooling, and activation, and showcases practical applications and notable insights.

Artificial IntelligenceCNNModel architecture

0 likes · 5 min read

Why CNNs Outperform Fully Connected Networks: A Deep Dive into Architecture and Applications

ITFLY8 Architecture Home

Jun 10, 2025 · Artificial Intelligence

DeepSeek Evolution: Technical Highlights, Architecture, and Performance Explained

This article examines DeepSeek’s various versions, detailing their core modules, underlying principles, architectural diagrams, and performance metrics, offering practical guidance for enthusiasts, professionals, and practitioners while inspiring further exploration of artificial intelligence innovations.

Artificial IntelligenceDeepSeekModel architecture

0 likes · 2 min read

DeepSeek Evolution: Technical Highlights, Architecture, and Performance Explained

IT Services Circle

May 25, 2025 · Artificial Intelligence

DeepSeek Core Technologies and Model Innovations: DeepSeek‑V3 and DeepSeek‑R1 Technical Overview

The article provides a detailed technical overview of DeepSeek's flagship large language models, DeepSeek‑V3 and DeepSeek‑R1, describing their MoE architecture, training frameworks, reinforcement‑learning based fine‑tuning, inference optimizations, and the broader impact of these innovations on the AI landscape while also promoting related books and resources.

AIDeepSeekLarge Language Model

0 likes · 10 min read

DeepSeek Core Technologies and Model Innovations: DeepSeek‑V3 and DeepSeek‑R1 Technical Overview

Tencent Technical Engineering

May 12, 2025 · Artificial Intelligence

Comprehensive Summary and Expansion of Andrej Karpathy’s 7‑Hour LLM Lecture

This article provides a detailed Chinese‑to‑English summary of Andrej Karpathy’s 7‑hour LLM tutorial, covering chat process analysis, tokenization, pre‑training data pipelines, model architecture, training strategies, post‑training fine‑tuning, reinforcement learning, chain‑of‑thought reasoning, and current industry applications.

AILLMModel architecture

0 likes · 25 min read

Comprehensive Summary and Expansion of Andrej Karpathy’s 7‑Hour LLM Lecture

AI Frontier Lectures

May 10, 2025 · Artificial Intelligence

Can the ‘Canon’ Layer Unlock New Limits in Large Language Models?

A new study introduces the lightweight “Canon” layer for large language models, showing how it improves information flow, inference depth, and scalability across Transformers, linear attention, and state‑space architectures, while offering a controlled synthetic pre‑training benchmark for deeper architectural analysis.

AI researchMambaModel architecture

0 likes · 11 min read

Can the ‘Canon’ Layer Unlock New Limits in Large Language Models?

AI2ML AI to Machine Learning

Apr 17, 2025 · Artificial Intelligence

Inside Qwen: A Deep Dive into the Large Model’s Source Code

The article provides a comprehensive technical walkthrough of Qwen’s large‑model series, covering data preparation, tokenization, model tweaks, training settings, RLHF pipeline, Code‑Qwen specifics, Qwen2 and Qwen3 architectural changes, scaling‑law experiments, and detailed source‑code analysis with illustrative diagrams.

Large Language ModelMoEModel architecture

0 likes · 7 min read

Inside Qwen: A Deep Dive into the Large Model’s Source Code

Alibaba Cloud Developer

Mar 26, 2025 · Artificial Intelligence

Why DeepSeek Is Shaking Up the LLM Landscape: Architecture, Performance, and Cost

DeepSeek, a Chinese AI startup, offers open‑source large language models—DeepSeek‑V3 for general tasks and DeepSeek‑R1 for intensive reasoning—featuring MoE, MLA, low‑cost training, and competitive performance against OpenAI’s GPT‑4o, while providing detailed usage guidance and cost analysis.

AI inferenceDeepSeekModel architecture

0 likes · 21 min read

Architect

Mar 10, 2025 · Artificial Intelligence

What Makes DeepSeek’s New Architecture a Game‑Changer? Inside MLA, GRPO, and MoE Innovations

This article analyzes DeepSeek’s latest large‑model breakthroughs, covering the MLA attention compression, GRPO alignment algorithm, MoE load‑balancing redesign, multi‑stage training pipelines, reinforcement‑learning tricks, and performance comparisons with GPT‑4o‑Mini and Llama 3.1, highlighting both strengths and remaining challenges.

AI trainingDeepSeekGRPO

0 likes · 19 min read

What Makes DeepSeek’s New Architecture a Game‑Changer? Inside MLA, GRPO, and MoE Innovations

Alibaba Cloud Developer

Feb 28, 2025 · Artificial Intelligence

How DeepSeek’s RL‑Powered Time Scaling Is Redefining AI Model Training

DeepSeek’s rapid rise is examined through its RL‑based Time Scaling paradigm, cost‑effective architecture, innovative training pipeline, open‑source strategy, and security challenges, highlighting how these breakthroughs disrupt traditional AI model development, lower resource demands, and influence industry dynamics.

AI model trainingDeepSeekModel architecture

0 likes · 13 min read

How DeepSeek’s RL‑Powered Time Scaling Is Redefining AI Model Training

Data Thinking Notes

Feb 19, 2025 · Artificial Intelligence

DeepSeek Evolution: Key Technical Highlights from V1 to R1

This article examines DeepSeek’s various versions, detailing their core modules, underlying principles, architecture diagrams, and performance metrics, while illustrating the internal logic and advantages of each model to guide enthusiasts, professionals, and practitioners toward deeper AI innovation insights.

AIDeepSeekModel architecture

0 likes · 4 min read

DeepSeek Evolution: Key Technical Highlights from V1 to R1

Architects' Tech Alliance

Feb 12, 2025 · Industry Insights

DeepSeek’s Technical Innovations: MoE Architecture, Efficient Inference, and Multimodal Capabilities

The article analyzes DeepSeek’s recent breakthroughs—including its Mixture‑of‑Experts architecture, cost‑effective inference optimizations, high‑accuracy multimodal processing, and open‑source collaboration—while also offering a curated bundle of technical e‑books covering AI chips, networking, storage, and more.

Artificial IntelligenceDeepSeekIndustry Insights

0 likes · 4 min read

DeepSeek’s Technical Innovations: MoE Architecture, Efficient Inference, and Multimodal Capabilities

AI Algorithm Path

Feb 9, 2025 · Artificial Intelligence

Understanding Multi-Token Prediction in DeepSeek‑R1 Architecture

This article dissects the Multi‑Token Prediction (MTP) technique used in DeepSeek‑R1, contrasting it with traditional next‑token prediction, detailing Meta’s MTP design, DeepSeek’s adapted architecture, loss weighting, and why MTP is applied only during training to boost efficiency and model capability.

DeepSeekMTPModel architecture

0 likes · 9 min read

Understanding Multi-Token Prediction in DeepSeek‑R1 Architecture

JavaEdge

Feb 8, 2025 · Artificial Intelligence

Why DeepSeek R1 Rivals ChatGPT o1: Architecture, Training, and Cost Insights

This article provides a detailed technical analysis of DeepSeek's R1 large language model, covering its background, architecture, training methods, hardware optimizations, performance claims, user impressions, deployment options, and the challenges of reproducing its results.

AI trainingDeepSeekGPU Cost

0 likes · 16 min read

Why DeepSeek R1 Rivals ChatGPT o1: Architecture, Training, and Cost Insights

NewBeeNLP

Jan 17, 2025 · Artificial Intelligence

Unlocking Multimodal Intelligence: A Deep Dive into Next Token Prediction

This comprehensive survey examines the foundations, tokenization techniques, model architectures, training paradigms, evaluation benchmarks, and open challenges of multimodal next‑token prediction (MMNTP), offering researchers a clear roadmap for future advances in multimodal AI.

Model architectureNext Token PredictionTraining Paradigms

0 likes · 9 min read

Unlocking Multimodal Intelligence: A Deep Dive into Next Token Prediction

DataFunSummit

Dec 17, 2024 · Artificial Intelligence

Exploring Baidu PaddlePaddle's Multimodal Large Model Innovations and the PaddleMIX Development Kit

This article presents Baidu's latest advances in multimodal large models, detailing their capabilities, architectural evolution, real‑world applications, and the open‑source PaddleMIX toolkit that streamlines data processing, training, fine‑tuning, and high‑performance inference for developers.

AI applicationsModel architecturePaddleMIX

0 likes · 20 min read

Exploring Baidu PaddlePaddle's Multimodal Large Model Innovations and the PaddleMIX Development Kit

Baobao Algorithm Notes

Nov 14, 2024 · Artificial Intelligence

How I Built a 1B‑Parameter Chinese LLM on a Single A100: Lessons Learned

This article details the end‑to‑end process of pre‑training, fine‑tuning, and evaluating a 1‑billion‑parameter Chinese LLM named Steel‑LLM on limited hardware, covering data collection, pipeline design, training framework choices, architectural tweaks, performance results, and practical lessons for resource‑constrained developers.

LLMModel architectureTraining Optimization

0 likes · 18 min read

How I Built a 1B‑Parameter Chinese LLM on a Single A100: Lessons Learned

Zhuanzhuan Tech

Nov 6, 2024 · Artificial Intelligence

Multi-Task Learning for E-commerce Search: Overview, Practices, and Model Design in the Zhuanzhuan Scenario

This article reviews the necessity, benefits, and practical implementations of multi-task learning in e‑commerce search, detailing model selection, architecture extensions such as ESMM and ESM², and future directions for handling user behavior sequences and multi‑objective optimization.

ESMMModel architectureRecommendation Systems

0 likes · 13 min read

Multi-Task Learning for E-commerce Search: Overview, Practices, and Model Design in the Zhuanzhuan Scenario

DataFunSummit

Nov 1, 2024 · Artificial Intelligence

Progress in Multimodal Large Language Models: Background, Architecture, Evolution, Team Work, and Future Outlook

This article reviews recent advances in multimodal large language models, covering their background, architectural components, training strategies, application scenarios, evaluation benchmarks, team research on hallucination mitigation and long‑video understanding, and outlines promising future research directions.

Model architectureevaluation benchmarksfuture research

0 likes · 15 min read

Progress in Multimodal Large Language Models: Background, Architecture, Evolution, Team Work, and Future Outlook

DataFunSummit

Oct 28, 2024 · Artificial Intelligence

Exploration and Practice of Multimodal Large Models at 360

This article presents 360's comprehensive exploration of image‑text multimodal large models, covering background concepts, research routes, three generations of model development, proprietary architectures like SEEChat, 360VL and Inner‑Adaptor, and real‑world AI applications across various products and services.

AI applicationsModel architecturevision-language

0 likes · 19 min read

Exploration and Practice of Multimodal Large Models at 360

NewBeeNLP

Oct 21, 2024 · Artificial Intelligence

Why Do MOE Experts Collapse? An In‑Depth Look at HOME’s Multi‑Task Architecture

This article analyzes the polarization issues in industrial Mixture‑of‑Experts (MoE) frameworks, explains expert collapse, degradation, and under‑fitting, and details the HOME model’s input types, architectural innovations, normalization, gating mechanisms, and related DICE‑BN insights.

Expert NormalizationGating MechanismsMixture of Experts

0 likes · 10 min read

Architect

Sep 26, 2024 · Artificial Intelligence

Decoding OpenAI o1: How RL‑LLM Fusion Powers Next‑Gen Reasoning

This article provides a detailed technical analysis of OpenAI’s o1 model, exploring its enhanced logical reasoning, the likely use of reinforcement learning with hidden chain‑of‑thought generation, multi‑model architecture, training data pipelines, reward modeling, and how these innovations could reshape AI safety and scaling strategies.

AI safetyLLMModel architecture

0 likes · 43 min read

Decoding OpenAI o1: How RL‑LLM Fusion Powers Next‑Gen Reasoning

DataFunTalk

Aug 7, 2024 · Artificial Intelligence

Multi-Scenario Modeling for NetEase Cloud Music Recommendation: Architecture, Challenges, and Results

This article presents NetEase Cloud Music's multi‑scenario recommendation modeling work, detailing background, overall system architecture, key modules, modeling goals, technical difficulties, performance improvements, future outlook, and a comprehensive Q&A session that addresses practical deployment challenges.

AB testingAIMachine Learning

0 likes · 14 min read

Multi-Scenario Modeling for NetEase Cloud Music Recommendation: Architecture, Challenges, and Results

Baobao Algorithm Notes

Jul 25, 2024 · Artificial Intelligence

Why LLaMA 3 405B Matches GPT‑4o: Architecture, Training, and Industry Impact

The article provides an in‑depth analysis of LLaMA 3 405B, covering its dense Transformer architecture, three‑stage pre‑training (initial, long‑context, annealing), iterative post‑training with RM‑guided rejection sampling, the decision against MOE, and the broader implications for both large and small model development.

405BModel architecturemodel distillation

0 likes · 17 min read

Why LLaMA 3 405B Matches GPT‑4o: Architecture, Training, and Industry Impact

Baobao Algorithm Notes

Jul 24, 2024 · Artificial Intelligence

What Powers Meta’s Llama 3 405B? Inside the Architecture, Scaling Laws, and Massive Training Infrastructure

This article dissects Meta’s Llama 3 405‑billion‑parameter model, covering its dense Transformer design, data‑mixing strategy, two‑stage scaling‑law prediction, 4‑D parallelism, custom hardware clusters, training schedules, post‑training alignment methods, and the extensive evaluation results that benchmark it against leading LLMs.

AI infrastructureLlama 3Model architecture

0 likes · 56 min read

What Powers Meta’s Llama 3 405B? Inside the Architecture, Scaling Laws, and Massive Training Infrastructure

Baobao Algorithm Notes

Jun 28, 2024 · Artificial Intelligence

What Makes Gemma 2 a Competitive Open‑Source LLM? Architecture, Training, and Evaluation Insights

The article provides a detailed technical overview of Gemma 2, covering its decoder‑only transformer design, novel attention mechanisms, logit soft‑capping, RMSNorm, knowledge‑distillation training on trillions of tokens, extensive pre‑training infrastructure, and benchmark evaluations that demonstrate its competitiveness against larger proprietary models.

AIGemma 2Model architecture

0 likes · 14 min read

What Makes Gemma 2 a Competitive Open‑Source LLM? Architecture, Training, and Evaluation Insights

NewBeeNLP

Jun 7, 2024 · Artificial Intelligence

Scaling Laws, Synthetic Data, and New Model Architectures: What’s Next?

In a recent round‑table, experts debated the validity of scaling laws, the role of synthetic and semi‑synthetic data in overcoming data scarcity, explored alternatives to Transformers such as RNN‑based models and MOE, and examined techniques for handling long‑context inference efficiently.

Mixture of ExpertsModel architecturescaling laws

0 likes · 12 min read

Scaling Laws, Synthetic Data, and New Model Architectures: What’s Next?

Sohu Tech Products

Apr 24, 2024 · Artificial Intelligence

Evolution, Architecture, Training Data, Methods, and Performance of Meta's Llama Series (Llama 1, 2, 3)

Meta's Llama series has progressed from the 7‑65B Llama‑1 in early 2023 to the 8B and 70B Llama‑3 in 2024, scaling token counts from 1 T to over 15 T, adopting decoder‑only Transformers with RMSNorm, SwiGLU, RoPE and GQA, and adding supervised fine‑tuning, RLHF and DPO, resulting in state‑of‑the‑art benchmark performance and a vibrant open‑source ecosystem.

AILLaMAModel architecture

0 likes · 25 min read

Evolution, Architecture, Training Data, Methods, and Performance of Meta's Llama Series (Llama 1, 2, 3)

NewBeeNLP

Mar 27, 2024 · Artificial Intelligence

Deep Dive into Llama 2: Architecture, Pre‑training, SFT, and Safety Insights

This article provides a comprehensive technical overview of Meta's Llama 2 series, covering its architectural upgrades such as Group Query Attention, the pre‑training dataset and hyper‑parameters, loss behavior, benchmark comparisons, and the supervised fine‑tuning pipeline with safety considerations.

AILlama-2Model architecture

0 likes · 11 min read

Deep Dive into Llama 2: Architecture, Pre‑training, SFT, and Safety Insights

21CTO

Mar 18, 2024 · Artificial Intelligence

Inside Grok-1: Elon Musk’s Open‑Source 314B LLM Architecture Revealed

Elon Musk’s AI startup xAI has open‑sourced its 314‑billion‑parameter Grok‑1 model, detailing its Rust‑based, JAX‑powered architecture, extensive parameter count, training data limits, licensing terms, hardware requirements, and community reactions, offering developers unprecedented access to a competitive large‑language‑model framework.

AIGrok-1JAX

0 likes · 9 min read

Inside Grok-1: Elon Musk’s Open‑Source 314B LLM Architecture Revealed

Bilibili Tech

Mar 1, 2024 · Artificial Intelligence

Bilibili's Self-Developed Video Super-Resolution Algorithm: Background, Optimization Directions, and Implementation Details

Bilibili’s self‑supervised video super‑resolution system upgrades low‑resolution streams to 4K by using three parallel degradation‑branch networks—texture‑enhancing, line‑recovering, and noise‑removing—tailored to anime, game, and real‑world content, delivering sharper edges, finer textures, and measurable quality gains across its online playback pipeline.

AIBilibiliModel architecture

0 likes · 16 min read

Bilibili's Self-Developed Video Super-Resolution Algorithm: Background, Optimization Directions, and Implementation Details

DataFunSummit

Jan 15, 2024 · Artificial Intelligence

Financial Large Language Model: Characteristics, Construction, Architecture, and Practical Applications

This article presents a comprehensive overview of financial large language models, covering their unique characteristics, construction methods, layered technical architecture, evaluation strategies, and real‑world use cases such as quality inspection, AIGC‑driven material generation, sales‑lead mining, and knowledge‑graph‑enhanced intelligent Q&A.

Financial AIModel architecturedata engineering

0 likes · 14 min read

Financial Large Language Model: Characteristics, Construction, Architecture, and Practical Applications

Sohu Tech Products

Dec 27, 2023 · Artificial Intelligence

Analysis of LLaMA Model Architecture in the Transformers Library

This article walks through the core LLaMA implementation in HuggingFace’s Transformers library, detailing the inheritance hierarchy, configuration defaults, model initialization, embedding and stacked decoder layers, the RMSNorm‑based attention and MLP modules, and the forward pass that produces normalized hidden states.

Artificial IntelligenceModel architecturePyTorch

0 likes · 14 min read

Analysis of LLaMA Model Architecture in the Transformers Library

Huawei Cloud Developer Alliance

Dec 14, 2023 · Artificial Intelligence

Unlocking LLaMA: Key Innovations, Architecture Insights, and MindSpore Inference Guide

This article reviews the LLaMA large‑language‑model series, covering its background, architectural innovations such as Add&Norm, SwiGLU, and RoPE, a known reversal‑curse bug, and provides step‑by‑step MindSpore Transformers code for model configuration, inference, and pipeline usage while previewing the upcoming LLaMA‑2 session.

AILLaMAMindSpore

0 likes · 6 min read

Unlocking LLaMA: Key Innovations, Architecture Insights, and MindSpore Inference Guide

Rare Earth Juejin Tech Community

May 5, 2023 · Artificial Intelligence

Limitations of Generative Pre‑trained Transformers: Hallucinations, Memory, Planning, and Architectural Proposals

The article critically examines GPT‑4 and similar transformer models, highlighting persistent hallucinations, outdated knowledge, insufficient domain coverage, lack of planning and memory, and proposes architectural extensions inspired by fast‑slow thinking and differentiable modules to overcome these fundamental constraints.

AI limitationsGPT-4Model architecture

0 likes · 24 min read

Limitations of Generative Pre‑trained Transformers: Hallucinations, Memory, Planning, and Architectural Proposals

DataFunTalk

Mar 6, 2023 · Artificial Intelligence

Explainable Recommendation Algorithms at Alibaba Health: System Design, Feature Engineering, and Experimental Results

This article presents Alibaba Health's exploration of explainable recommendation algorithms, covering business context, data preparation, feature extraction and encoding, model architecture combining selection and prediction components, experimental offline and online results, and a detailed Q&A on implementation challenges and future directions.

AIAlibaba HealthModel architecture

0 likes · 12 min read

Explainable Recommendation Algorithms at Alibaba Health: System Design, Feature Engineering, and Experimental Results

DataFunTalk

Dec 17, 2022 · Artificial Intelligence

Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance

This article presents a comprehensive overview of multimodal pre‑training, describing its motivation, architecture choices, large‑scale Chinese image‑text dataset construction, training optimizations, performance benchmarks, downstream applications, and a Q&A session that highlights practical deployment considerations.

Model architectureMultimodalcomputer vision

0 likes · 16 min read

Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance

Zhuanzhuan Tech

Aug 17, 2022 · Artificial Intelligence

Designing a Scalable Image Classification System for Prohibited Item Detection in a Second‑hand E‑commerce Platform

This article describes how a second‑hand e‑commerce company built a fast, modular image‑classification pipeline using small binary classifiers, efficientNet‑b0, and active‑learning‑driven data annotation to detect prohibited items while keeping inference latency under 200 ms and reducing labeling costs dramatically.

AIImage ClassificationModel architecture

0 likes · 10 min read

Designing a Scalable Image Classification System for Prohibited Item Detection in a Second‑hand E‑commerce Platform

DataFunTalk

Aug 16, 2021 · Artificial Intelligence

Intelligent Risk Control in Live Streaming: Architecture, Challenges, and Model Evolution at Douyu

This article presents Douyu's intelligent risk‑control system for live streaming, detailing the operational, activity, traffic, account, transaction and content safety challenges, the multi‑layer algorithm architecture, and the evolution of models for spam detection, risk scoring, gang identification, behavior sequencing, device fingerprinting, and interpretability.

Artificial IntelligenceMachine LearningModel architecture

0 likes · 13 min read

Intelligent Risk Control in Live Streaming: Architecture, Challenges, and Model Evolution at Douyu

JD Tech Talk

Sep 17, 2020 · Artificial Intelligence

Federated Transfer Learning: Concepts, Examples, and Model Structures

This article introduces the fundamentals of transfer learning and federated transfer learning, explains domain adaptation for sentiment analysis, presents two illustrative examples—mid-level image feature transfer and text-to-image transfer—and outlines the model architecture and loss functions of federated transfer learning frameworks.

Model architectureSentiment Analysisdomain adaptation

0 likes · 14 min read

Federated Transfer Learning: Concepts, Examples, and Model Structures

iQIYI Technical Product Team

Nov 22, 2019 · Artificial Intelligence

Analysis of ICCV 2019 Lightweight Face Recognition Challenge Champion Solutions

The ICCV 2019 Lightweight Face Recognition Challenge attracted 292 teams and defined four strict FLOP‑ and size‑limited protocols for image and video recognition, with champions employing near‑30 GFLOP EfficientNet‑style backbones, novel loss functions, frame‑fusion, and knowledge‑distilled VarGNet models to balance accuracy and computational budget.

ICCV ChallengeLightweight Face RecognitionModel architecture

0 likes · 8 min read

Analysis of ICCV 2019 Lightweight Face Recognition Challenge Champion Solutions

Alibaba Cloud Developer

Aug 19, 2019 · Artificial Intelligence

How RE2 Boosts FAQ Chatbot Accuracy: A Deep Dive into Text Matching Models

This article explains the design and evaluation of RE2, a lightweight yet expressive text‑matching framework for FAQ‑style chatbots, detailing its five‑layer architecture, block‑wise residual connections, experimental results on SNLI, MultiNLI, SciTail, Quora and WikiQA datasets, and its significant performance improvements in Alibaba’s DingXiaoMi service.

FAQ chatbotModel architectureNLP

0 likes · 13 min read

How RE2 Boosts FAQ Chatbot Accuracy: A Deep Dive into Text Matching Models

360 Tech Engineering

May 21, 2019 · Artificial Intelligence

Understanding Residual Networks: Ideas, Mechanisms, Variants, and Insights

This article reviews the concept of residual networks, explains their working principle and data‑flow interpretation, discusses why they improve deep models, analyzes path‑length effects on gradients, and surveys various residual block designs and practical takeaways.

Model architectureResNetensemble

0 likes · 9 min read

Understanding Residual Networks: Ideas, Mechanisms, Variants, and Insights

Hulu Beijing

Mar 7, 2019 · Artificial Intelligence

From AlexNet to ResNeXt: Key Milestones in CNN Evolution

This article traces the evolution of convolutional neural networks from the pioneering AlexNet through VGG, Inception, ResNet, Inception‑v4, Inception‑ResNet and ResNeXt, highlighting architectural innovations, performance gains, and the underlying biological inspirations that shaped modern deep learning models.

AlexNetCNNInception

0 likes · 13 min read

From AlexNet to ResNeXt: Key Milestones in CNN Evolution