Tagged articles
179 articles
Page 1 of 2
Machine Heart
Machine Heart
May 30, 2026 · Artificial Intelligence

How Apple’s AI‑Powered PICO Codec Cuts Image Files to One‑Third While Preserving Quality

Apple’s new PICO perceptual image codec, detailed in the “What Matters in Practical Learned Image Compression” paper, combines a one‑shot context model, TextFidelityLoss, and TilingArtifactLoss to achieve up to 70%‑80% smaller files than AV1, VVC, JPEG AI, and other learned codecs while running in real‑time on an iPhone 17 Pro Max, though it still lags on traditional metrics like PSNR.

AIJPEG AIPICO
0 likes · 10 min read
How Apple’s AI‑Powered PICO Codec Cuts Image Files to One‑Third While Preserving Quality
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
May 12, 2026 · Artificial Intelligence

Silicon Brain: Neural Connections, Symbolic Reasoning, and Reinforcement Learning in AGI

This article analyses DeepMind’s three‑pronged AGI paradigm—combining neural networks, symbolic systems, and reinforcement learning—by dissecting AlphaGo, AlphaFold 2, Gemini, and the Genie‑Sima loop, mapping the biological inspiration, outlining engineering and safety challenges, and proposing research directions for large‑scale deployment in communication scenarios.

AGIDeepMindEngineering Challenges
0 likes · 21 min read
Silicon Brain: Neural Connections, Symbolic Reasoning, and Reinforcement Learning in AGI
AgentGuide
AgentGuide
Apr 26, 2026 · Artificial Intelligence

Can You Explain Large Model Training Without Complex Formulas? A Simple, Clear Guide

This article breaks down the fundamentals of large model training—covering data, parameters, neural networks, loss functions, gradient descent, pre‑training, and fine‑tuning—in plain language so readers can grasp how massive models learn without needing to dive into complex mathematics.

fine-tuninggradient descentloss function
0 likes · 12 min read
Can You Explain Large Model Training Without Complex Formulas? A Simple, Clear Guide
Machine Heart
Machine Heart
Apr 26, 2026 · Artificial Intelligence

Has Deep Learning Discovered Its Own “Newton’s Law”?

A new collaborative paper titled “There Will Be a Scientific Theory of Deep Learning” proposes a unified “Learning Mechanics” framework that connects solvable idealized models, tractable limits, empirical scaling laws, hyperparameter theory, and universal representation behavior, aiming to give deep learning a first‑principles scientific foundation.

deep learninghyperparameterslearning mechanics
0 likes · 14 min read
Has Deep Learning Discovered Its Own “Newton’s Law”?
SuanNi
SuanNi
Apr 10, 2026 · Artificial Intelligence

Can Neural Networks Replace Traditional CPUs? Inside the New Neural Computer

A groundbreaking study shows how Meta AI and KAUST transformed a video‑generation model into a neural‑computer that unifies computation, storage, and I/O, enabling pixel‑perfect command‑line and graphical UI control while highlighting current limitations in arithmetic reasoning and long‑term program stability.

AI video generationHuman‑computer interactionMachine Learning
0 likes · 9 min read
Can Neural Networks Replace Traditional CPUs? Inside the New Neural Computer
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 20, 2026 · Artificial Intelligence

Why Kimi Dropped Residual Connections: A First‑Person Deep Dive into Attention Residuals

This article explains how Attention Residuals (AttnRes) replace traditional residual shortcuts with layer‑wise attention, details the mathematical reformulation, design constraints, static‑Q trick, full and block variants, and presents experimental evidence of significant accuracy gains with modest overhead.

Efficient AttentionNLPRMSNorm
0 likes · 11 min read
Why Kimi Dropped Residual Connections: A First‑Person Deep Dive into Attention Residuals
DeepHub IMBA
DeepHub IMBA
Feb 28, 2026 · Artificial Intelligence

Why Energy‑Based Models Could Outperform Probabilistic LLMs, According to Yann LeCun

Yann LeCun argues that the probability‑driven, token‑by‑token design of current large language models may never reach human‑level intelligence, and explains how Energy‑Based Models replace probability distributions with an energy function, offering more flexible training, inference, and multi‑modal capabilities.

Contrastive DivergenceDensity EstimationEBM
0 likes · 23 min read
Why Energy‑Based Models Could Outperform Probabilistic LLMs, According to Yann LeCun
Data Party THU
Data Party THU
Feb 28, 2026 · Artificial Intelligence

How MIT’s Attention Matching Turns Linear Regression into Fast KV Compression

The article explains MIT’s Attention Matching technique that reformulates large‑model context compression as a linear regression problem, detailing its theoretical foundations, three‑step gradient‑free implementation, architectural adaptations, non‑uniform budgeting, and extensive evaluations showing orders‑of‑magnitude speed gains with minimal accuracy loss.

Attention MatchingKV compressionMemory Optimization
0 likes · 10 min read
How MIT’s Attention Matching Turns Linear Regression into Fast KV Compression
Data Party THU
Data Party THU
Feb 21, 2026 · Artificial Intelligence

Unlocking Compositional Generalization: Meta‑Learning Strategies for Neural Networks

This article examines how meta‑learning combined with compositionality enables neural networks to rapidly adapt to new tasks by formalizing hierarchical optimization, leveraging modular architectures with hypernetworks, and exploiting Transformer latent codes for effective compositional generalization.

Bilevel OptimizationMeta LearningTransformer
0 likes · 5 min read
Unlocking Compositional Generalization: Meta‑Learning Strategies for Neural Networks
Tencent Technical Engineering
Tencent Technical Engineering
Feb 2, 2026 · Artificial Intelligence

Why Neural Networks Are the Hidden Engine Behind Modern AI: From Basics to Large Language Models

This comprehensive guide walks through the fundamentals of neural networks, activation functions, training methods, and how they power large language models, while also covering tokenization, self‑attention, transformer architectures, AI infrastructure, and practical usage through agents and retrieval‑augmented generation.

Agent SystemsArtificial IntelligenceGPU infrastructure
0 likes · 75 min read
Why Neural Networks Are the Hidden Engine Behind Modern AI: From Basics to Large Language Models
AI Architecture Hub
AI Architecture Hub
Jan 7, 2026 · Artificial Intelligence

Why “Attention Is All You Need” Still Shapes AI: A Beginner’s Deep Dive

This article provides a comprehensive, beginner‑friendly walkthrough of the landmark 2017 paper “Attention Is All You Need,” covering its authors, historical context, the shortcomings of RNNs and CNNs, the birth of self‑attention, the Transformer architecture, and its transformative impact on modern AI.

AI historyTransformerattention mechanism
0 likes · 9 min read
Why “Attention Is All You Need” Still Shapes AI: A Beginner’s Deep Dive
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 5, 2025 · Artificial Intelligence

How TinyAI Brings a Full‑Stack AI Framework to Pure Java

TinyAI is a completely Java‑implemented, lightweight full‑stack AI framework that demonstrates how to build a production‑grade deep‑learning system—from low‑level numeric tensors and automatic differentiation to modular neural‑network layers, training pipelines, large‑language‑model implementations, and intelligent agent architectures—while remaining education‑friendly and free of external dependencies.

AI FrameworkAgent SystemCode Examples
0 likes · 33 min read
How TinyAI Brings a Full‑Stack AI Framework to Pure Java
Tencent Cloud Developer
Tencent Cloud Developer
Nov 4, 2025 · Artificial Intelligence

From Functions to Transformers: Mastering Neural Networks Step by Step

This article walks you through the evolution from basic mathematical functions to modern large‑scale models, explaining activation functions, forward and backward propagation, loss calculation, gradient descent, regularization, dropout, word embeddings, RNNs, and the core mechanics of the Transformer architecture.

RNNRegularizationTransformer
0 likes · 15 min read
From Functions to Transformers: Mastering Neural Networks Step by Step
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Oct 21, 2025 · Artificial Intelligence

KANMixer: A New KAN‑Centric Paradigm for Long‑Term Time Series Forecasting

This article reviews the KANMixer model, which places Kolmogorov‑Arnold Networks at the core of a lightweight architecture for long‑term time series forecasting, detailing its design, extensive benchmark experiments on seven real‑world datasets, ablation analyses, and its computational trade‑offs versus MLP and Transformer baselines.

Ablation StudyKANLong-term Time Series Forecasting
0 likes · 8 min read
KANMixer: A New KAN‑Centric Paradigm for Long‑Term Time Series Forecasting
Data Party THU
Data Party THU
Oct 4, 2025 · Artificial Intelligence

Unveiling Transformer Internals: From Theory to PyTorch Code

This article deeply explores the Transformer architecture by combining original paper principles with PyTorch source code, covering encoder‑decoder design, positional encoding assumptions, core parameters, residual connections, attention mechanisms, and detailed implementation snippets to help readers understand and reproduce the model.

Positional EncodingPyTorchTransformer
0 likes · 22 min read
Unveiling Transformer Internals: From Theory to PyTorch Code
Architects' Tech Alliance
Architects' Tech Alliance
Aug 31, 2025 · Artificial Intelligence

Why the Last Decade Became the Golden Age of AI Chip Architecture

The article traces the evolution of AI hardware over the past ten years, outlining three key phases—from early chip limitations that sidelined neural networks, through CPU advances that still fell short, to the rise of GPUs and specialized AI chips that finally unlocked rapid AI deployment, while also highlighting the parallel impact of algorithmic breakthroughs and massive data growth.

AI hardwareBig DataGPU
0 likes · 5 min read
Why the Last Decade Became the Golden Age of AI Chip Architecture
Qborfy AI
Qborfy AI
Aug 7, 2025 · Artificial Intelligence

Understanding RNNs: From Memory Cells to Real‑World Applications

This article explains how recurrent neural networks (RNNs) add memory to neural models, details the gate mechanisms of LSTM and GRU, compares their structures and parameter counts, and illustrates their use in speech recognition, translation, stock prediction, and video generation, while highlighting practical insights and energy considerations.

AIGRULSTM
0 likes · 5 min read
Understanding RNNs: From Memory Cells to Real‑World Applications
Didi Tech
Didi Tech
Jul 31, 2025 · Artificial Intelligence

How to Build Efficient Causal Effect Estimators for Exponential‑Family Outcomes

This article presents a unified framework for efficiently estimating causal treatment effects on exponential‑family outcomes, extending target regularization beyond Gaussian assumptions, deriving bias analysis for plug‑in estimators, proposing DR and TMLE‑based estimators, and validating them on synthetic and real datasets.

causal inferenceexponential familyneural networks
0 likes · 12 min read
How to Build Efficient Causal Effect Estimators for Exponential‑Family Outcomes
Qborfy AI
Qborfy AI
Jul 2, 2025 · Artificial Intelligence

Mastering Activation Functions: From Sigmoid to Swish and When to Use Them

This article explains the role of activation functions in neural networks, compares five classic functions with formulas, performance trade‑offs, and gradient behavior, and provides a Python visualization demo plus several practical insights and real‑world examples.

ReLUSwishactivation functions
0 likes · 7 min read
Mastering Activation Functions: From Sigmoid to Swish and When to Use Them
AI Large Model Application Practice
AI Large Model Application Practice
May 16, 2025 · Artificial Intelligence

Why Residual Connections Keep Deep Neural Networks Stable

This article explains why residual connections are essential in deep neural networks, describing the problems of network degradation and gradient vanishing, how shortcut paths add the input to the layer output, the requirement of matching dimensions, and the resulting stability for training large language models.

LLMResidual Connectionsgradient flow
0 likes · 7 min read
Why Residual Connections Keep Deep Neural Networks Stable
AI Cyberspace
AI Cyberspace
May 3, 2025 · Artificial Intelligence

How Hopfield Networks Mimic Brain Memory: Theory, Math, and Python Demo

This article explores the 1982 Hopfield associative memory neural network, detailing its biological inspiration, energy‑minimization principle, mathematical formulation, training and recall processes, capacity limits, practical Python implementation, and the model's strengths and weaknesses.

Hopfield networkPython implementationassociative memory
0 likes · 21 min read
How Hopfield Networks Mimic Brain Memory: Theory, Math, and Python Demo
IT Services Circle
IT Services Circle
May 2, 2025 · Artificial Intelligence

Understanding Gradient Vanishing in Deep Neural Networks and How to Mitigate It

The article explains why deep networks suffer from gradient vanishing—especially when using sigmoid or tanh activations—covers the underlying mathematics, compares activation functions, and presents practical techniques such as proper weight initialization, batch normalization, residual connections, and code examples to visualize the phenomenon.

Batch NormalizationResNetactivation functions
0 likes · 7 min read
Understanding Gradient Vanishing in Deep Neural Networks and How to Mitigate It
AI Frontier Lectures
AI Frontier Lectures
Apr 30, 2025 · Artificial Intelligence

How Dual‑Domain Strip Attention Revolutionizes Image Restoration

The paper introduces Dual‑Domain Strip Attention Network (DSANet), a lightweight architecture that combines spatial and frequency strip attention to boost multi‑scale representation learning, achieving state‑of‑the‑art performance on dehazing, desnowing, defocus deblurring, and denoising tasks with significantly lower computational cost.

deep learningdual-domain attentionneural networks
0 likes · 10 min read
How Dual‑Domain Strip Attention Revolutionizes Image Restoration
Cognitive Technology Team
Cognitive Technology Team
Apr 12, 2025 · Artificial Intelligence

Analyzing a Trained Neural Network: Visualizing Hidden Layers and Understanding Its Limitations

This article walks through an interactive exploration of a simple two‑hidden‑layer neural network, showing how real‑time visualizations reveal its learned representations, accuracy limits, and why constrained training leads to over‑confident yet unintelligent predictions before introducing backpropagation.

Backpropagationdeep learninghidden layers
0 likes · 10 min read
Analyzing a Trained Neural Network: Visualizing Hidden Layers and Understanding Its Limitations
Cognitive Technology Team
Cognitive Technology Team
Apr 9, 2025 · Artificial Intelligence

How Neural Networks Learn: Gradient Descent and Loss Functions

This article explains how neural networks learn by using labeled training data, describing the role of weights, biases, activation functions, and how gradient descent iteratively adjusts parameters to minimize loss, illustrated with the MNIST digit‑recognition example.

MNISTdeep learninggradient descent
0 likes · 16 min read
How Neural Networks Learn: Gradient Descent and Loss Functions
Cognitive Technology Team
Cognitive Technology Team
Apr 8, 2025 · Artificial Intelligence

Understanding Neural Networks: Structure, Layers, and Activation

This article explains how a simple neural network can recognize handwritten digits by preprocessing images, organizing neurons into input, hidden, and output layers, using weighted sums, biases, sigmoid compression, and matrix multiplication to illustrate the fundamentals of deep learning.

LayersSigmoidactivation functions
0 likes · 16 min read
Understanding Neural Networks: Structure, Layers, and Activation
Python Programming Learning Circle
Python Programming Learning Circle
Feb 18, 2025 · Artificial Intelligence

Getting Started with PyTorch: Installation, Core Operations, and Practical Deep Learning Projects

This article introduces PyTorch, covering installation on CPU/GPU, basic tensor operations, automatic differentiation, building and training neural networks, data loading with DataLoader, image classification on MNIST, model deployment, and useful tips for accelerating deep‑learning workflows.

GPUMachine LearningPyTorch
0 likes · 9 min read
Getting Started with PyTorch: Installation, Core Operations, and Practical Deep Learning Projects
AI Code to Success
AI Code to Success
Feb 13, 2025 · Artificial Intelligence

Why PyTorch Is the Go-To Framework for Modern AI Development

This article introduces PyTorch, explains its dynamic computation graph, Python‑centric design, and tensor operations, surveys its major applications in computer vision, natural language processing, and reinforcement learning, and provides a step‑by‑step tutorial for building and training a multilayer perceptron on the MNIST dataset.

Dynamic Computation GraphMNISTPyTorch
0 likes · 11 min read
Why PyTorch Is the Go-To Framework for Modern AI Development
Cognitive Technology Team
Cognitive Technology Team
Feb 12, 2025 · Artificial Intelligence

Introduction to Neural Networks by Professor Li Yongle

In this introductory session, renowned graduate exam instructor Professor Li Yongle provides a clear, beginner-friendly overview of neural networks, covering basic concepts and their relevance within artificial intelligence, including their structure, learning mechanisms, and typical applications in modern AI systems.

AIdeep learningeducation
0 likes · 1 min read
Introduction to Neural Networks by Professor Li Yongle
AI Code to Success
AI Code to Success
Feb 11, 2025 · Artificial Intelligence

Unlocking TensorFlow: From Basics to Building Your First Linear Regression Model

This article introduces TensorFlow's core concepts—tensors, computational graphs, variables, and sessions—covers its wide range of AI applications from traditional machine learning to deep learning in NLP and computer vision, and provides a step‑by‑step Python tutorial for implementing a simple linear regression model.

AI TutorialMachine LearningPython
0 likes · 6 min read
Unlocking TensorFlow: From Basics to Building Your First Linear Regression Model
Architect
Architect
Feb 10, 2025 · Artificial Intelligence

Evolution of DeepSeek Mixture‑of‑Experts (MoE) Architecture from V1 to V3

This article reviews the development of DeepSeek's Mixture-of-Experts (MoE) models, tracing their evolution from the original DeepSeekMoE V1 through V2 to V3, detailing architectural innovations such as fine‑grained expert segmentation, shared‑expert isolation, load‑balancing losses, device‑limited routing, and the shift from softmax to sigmoid gating.

DeepSeekLLMMixture of Experts
0 likes · 21 min read
Evolution of DeepSeek Mixture‑of‑Experts (MoE) Architecture from V1 to V3
Cognitive Technology Team
Cognitive Technology Team
Feb 9, 2025 · Artificial Intelligence

A Beginner’s Guide to the History and Key Concepts of Deep Learning

From the perceptron’s inception in 1958 to modern Transformer-based models like GPT, this article traces the evolution of deep learning, explaining foundational architectures such as DNNs, CNNs, RNNs, LSTMs, attention mechanisms, and recent innovations like DeepSeek’s MLA, highlighting their principles and impact.

GPTMLAattention
0 likes · 19 min read
A Beginner’s Guide to the History and Key Concepts of Deep Learning
AI Cyberspace
AI Cyberspace
Jan 28, 2025 · Artificial Intelligence

From Biological Neurons to Deep Learning: How MP Models Evolve

This article explains the structure of biological neurons, introduces the McCulloch‑Pitts (MP) mathematical model, shows how manual weight adjustments work, and walks through the development from single‑layer perceptrons to two‑layer networks and modern deep learning techniques, covering activation functions, training algorithms, and practical examples.

BackpropagationMP modelPerceptron
0 likes · 30 min read
From Biological Neurons to Deep Learning: How MP Models Evolve
AI Large Model Application Practice
AI Large Model Application Practice
Jan 20, 2025 · Artificial Intelligence

How Embeddings Transform Simple Character Codes into Powerful Vectors for LLMs

This article explains how embeddings convert basic character indices into high‑dimensional vectors, describes their training via gradient descent, introduces the embedding matrix, and shows how these vectors enable modern language models to capture semantic relationships and be reused across tasks.

EmbeddingsLLMMachine Learning
0 likes · 8 min read
How Embeddings Transform Simple Character Codes into Powerful Vectors for LLMs
AI Large Model Application Practice
AI Large Model Application Practice
Jan 14, 2025 · Artificial Intelligence

Turning Classification Nets into Language Generators: A Step‑by‑Step Guide

This article explains how a simple neural network trained for classification can be adapted to generate natural language by expanding its output layer, encoding characters as numbers, using a sliding‑window context, and recursively predicting the next token, illustrating each step with diagrams and concrete examples.

AILLMlanguage generation
0 likes · 10 min read
Turning Classification Nets into Language Generators: A Step‑by‑Step Guide
AI Large Model Application Practice
AI Large Model Application Practice
Jan 9, 2025 · Artificial Intelligence

How Does Gradient Descent Train a Neural Network? A Step‑by‑Step Guide

This article walks through the complete training cycle of a simple neural network—from random weight initialization and forward propagation with labeled data, through loss calculation and gradient‑based weight updates, to iterative epochs, average loss, and practical issues like gradient explosion and vanishing.

AIMachine Learninggradient descent
0 likes · 11 min read
How Does Gradient Descent Train a Neural Network? A Step‑by‑Step Guide
Model Perspective
Model Perspective
Dec 26, 2024 · Fundamentals

What Makes a Mathematical Model Enduring? Lessons from AI and Ecology

The article explores the characteristics of long‑lasting mathematical models—continuous refinement, expanding applicability, elegant simplicity, extensibility, focus on essence, and philosophical depth—illustrated with examples such as neural networks and the Lotka‑Volterra predator‑prey system, and offers guidance on creating such vibrant models.

Lotka-Volterrainterdisciplinarymathematical models
0 likes · 6 min read
What Makes a Mathematical Model Enduring? Lessons from AI and Ecology
DevOps
DevOps
Dec 5, 2024 · Artificial Intelligence

A Brief History of Artificial Intelligence: From McCulloch‑Pitts Neurons to GPT‑4

This article traces the evolution of artificial intelligence from the 1943 McCulloch‑Pitts neuron model through key milestones such as Turing's test, the Dartmouth conference, the rise of neural networks, deep learning breakthroughs, and recent large language models like GPT‑4, illustrating the field's rapid progress.

Artificial IntelligenceGPTMachine Learning
0 likes · 7 min read
A Brief History of Artificial Intelligence: From McCulloch‑Pitts Neurons to GPT‑4
Model Perspective
Model Perspective
Dec 5, 2024 · Artificial Intelligence

Choosing the Right Activation Function: Pros, Cons, and Best Practices

Activation functions are crucial for neural networks, providing non‑linearity, normalization, and gradient flow; this article reviews common functions such as Sigmoid, Tanh, ReLU, Leaky ReLU, ELU, Noisy ReLU, Softmax, and Swish, comparing their characteristics, advantages, drawbacks, and guidance for selecting the appropriate one.

Machine Learningactivation functionsmodel optimization
0 likes · 10 min read
Choosing the Right Activation Function: Pros, Cons, and Best Practices
DaTaobao Tech
DaTaobao Tech
Nov 13, 2024 · Artificial Intelligence

Understanding Neural Networks and Transformers: Principles, Implementation, and Applications

The article surveys neural networks from basic neuron operations and loss functions through deep architectures to the Transformer model, detailing embeddings, positional encoding, self‑attention, multi‑head attention, residual links, and encoder‑decoder design, and includes PyTorch code examples for linear regression, translation, and fine‑tuning Hugging Face’s MiniRBT for text classification.

AINLPPyTorch
0 likes · 44 min read
Understanding Neural Networks and Transformers: Principles, Implementation, and Applications
Model Perspective
Model Perspective
Oct 17, 2024 · Artificial Intelligence

Visualizing How Neural Networks Approximate Any Function

This article explains the universal approximation theorem, showing how even a simple neural network with one hidden layer can approximate any continuous function by adjusting weights and biases, and illustrates the process with visual examples of step and bump functions, linking theory to recent Nobel recognitions.

AIfunction approximationneural networks
0 likes · 9 min read
Visualizing How Neural Networks Approximate Any Function
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 27, 2024 · Artificial Intelligence

How AI Detects Cluster-Wide Task Slowdowns in Cloud Systems

A new AI‑driven method for detecting cluster‑wide task slowdowns in cloud platforms improves F1 score by 5.3% over state‑of‑the‑art techniques, addressing challenges of composite periodic patterns, training data contamination, and focusing on slowdown anomalies.

Anomaly DetectionTime-seriesUnsupervised Learning
0 likes · 8 min read
How AI Detects Cluster-Wide Task Slowdowns in Cloud Systems
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 26, 2024 · Cloud Computing

How Neural Attention Detects Cluster-Wide Task Slowdowns in Cloud Systems

A new paper accepted at ACM SIGKDD2024 presents a neural‑network‑based framework that uses a skim‑attention mechanism and a picky loss function to accurately detect cluster‑wide task slowdown anomalies in large‑scale cloud platforms, achieving a 5.3% average F1‑score improvement over state‑of‑the‑art methods.

Anomaly DetectionCluster Performanceneural networks
0 likes · 5 min read
How Neural Attention Detects Cluster-Wide Task Slowdowns in Cloud Systems
21CTO
21CTO
Aug 11, 2024 · Artificial Intelligence

Demystifying LLMs: How Tokens, Training, and Transformers Power Generative AI

This article explains the fundamentals of large language models, covering tokenization, probability prediction, Markov chain basics, training data limitations, context windows, and the transition to neural network architectures like Transformers, while providing Python examples and insights into model scaling and the illusion of intelligence.

AILLMTransformer
0 likes · 18 min read
Demystifying LLMs: How Tokens, Training, and Transformers Power Generative AI
Ops Development & AI Practice
Ops Development & AI Practice
Jul 6, 2024 · Artificial Intelligence

How Backpropagation Powers Modern Deep Learning: A Deep Dive

This article explains the backpropagation algorithm—its origins, mathematical basis, step‑by‑step workflow, importance for efficient neural network training, and widespread applications in image recognition, natural language processing, and recommendation systems.

BackpropagationMachine Learningdeep learning
0 likes · 6 min read
How Backpropagation Powers Modern Deep Learning: A Deep Dive
Ops Development & AI Practice
Ops Development & AI Practice
Jul 3, 2024 · Artificial Intelligence

How Do Artificial Neural Networks Mirror Animal Brains? An In‑Depth Overview

This article explains the fundamental concepts and architecture of artificial neural networks, describes their learning process, compares them with biological neural systems, and highlights both the similarities and key differences in structure, learning mechanisms, flexibility, and energy efficiency.

Artificial IntelligenceBiological InspirationMachine Learning
0 likes · 7 min read
How Do Artificial Neural Networks Mirror Animal Brains? An In‑Depth Overview
JD Tech Talk
JD Tech Talk
Jun 25, 2024 · Artificial Intelligence

Understanding Large Language Models: From Parameters to Transformer Architecture

This article explains the fundamental concepts behind large language models, including their two-file structure, training process, neural network basics, perceptron examples, weight and threshold calculations, the TensorFlow Playground, and a detailed walkthrough of the Transformer architecture with tokenization, positional encoding, self‑attention, normalization, and feed‑forward layers.

AIMachine LearningSelf-Attention
0 likes · 20 min read
Understanding Large Language Models: From Parameters to Transformer Architecture
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jun 12, 2024 · Artificial Intelligence

A Simple Introduction to the Transformer Model

This article provides a comprehensive, beginner-friendly explanation of the Transformer architecture, covering its encoder‑decoder structure, self‑attention, multi‑head attention, positional encoding, residual connections, decoding process, final linear and softmax layers, and training considerations, illustrated with numerous diagrams and code snippets.

Self-AttentionTransformerdeep learning
0 likes · 24 min read
A Simple Introduction to the Transformer Model
Architects Research Society
Architects Research Society
May 21, 2024 · Artificial Intelligence

27 Essential AI Papers Recommended by Ilya Sutskever for John Carmack

Ilya Sutskever, former OpenAI chief scientist, shared a curated list of 27 seminal AI research papers—including the Annotated Transformer, Attention Is All You Need, and Deep Residual Learning—with links, claiming mastering them covers roughly 90% of today’s essential artificial‑intelligence knowledge.

AIMachine LearningResearch Papers
0 likes · 7 min read
27 Essential AI Papers Recommended by Ilya Sutskever for John Carmack
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
May 5, 2024 · Artificial Intelligence

Comprehensive Guide to Neural Network Algorithms: Definitions, Structure, Implementation, and Training

This article provides an in‑depth tutorial on neural network algorithms, covering their biological inspiration, significance, advantages and drawbacks, detailed architecture, data preparation, one‑hot encoding, weight initialization, forward and backward propagation, cost functions, regularization, gradient checking, and complete Python code examples.

AIBackpropagationMachine Learning
0 likes · 37 min read
Comprehensive Guide to Neural Network Algorithms: Definitions, Structure, Implementation, and Training
DaTaobao Tech
DaTaobao Tech
Apr 22, 2024 · Artificial Intelligence

Neural Networks and Deep Learning: Principles and MNIST Example

The article reviews recent generative‑AI breakthroughs such as GPT‑5 and AI software engineers, explains that AI systems are deterministic rather than black boxes, and then teaches neural‑network fundamentals—including activation functions, back‑propagation, and a hands‑on MNIST digit‑recognition example with discussion of overfitting and regularization.

MNISTactivation functionsdeep learning
0 likes · 17 min read
Neural Networks and Deep Learning: Principles and MNIST Example
AI Algorithm Path
AI Algorithm Path
Apr 5, 2024 · Artificial Intelligence

Master CNN, RNN, GAN, and Transformer Architectures in One Guide

This article provides a friendly, step‑by‑step overview of five core deep‑learning architectures—CNN, RNN, GAN, Transformers, and encoder‑decoder—explaining their structures, key components, and typical use cases in image and natural‑language processing.

CNNEncoder-DecoderGaN
0 likes · 12 min read
Master CNN, RNN, GAN, and Transformer Architectures in One Guide
Architect
Architect
Mar 26, 2024 · Artificial Intelligence

Why Transformers Outperform RNNs: A Deep Dive into Architecture and Training

This article explains the Transformer model’s core architecture, self‑attention mechanism, encoder‑decoder workflow, training with teacher forcing, inference steps, and why it surpasses RNNs and CNNs, while also outlining its major NLP applications.

NLPTransformerattention mechanism
0 likes · 14 min read
Why Transformers Outperform RNNs: A Deep Dive into Architecture and Training
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Feb 28, 2024 · Artificial Intelligence

Mastering Multi-Task Learning: Network Designs & Loss Balancing

This article reviews the challenges of multi‑task learning, compares various network architectures such as hard‑parameter sharing, MMoE, CGC, and PLE, and examines loss‑balancing techniques like GradNorm, Dynamic Weight Average and task‑prioritization, offering insights on how to mitigate the “seesaw” effect and improve overall performance.

AI researchdynamic weightinggradient normalization
0 likes · 15 min read
Mastering Multi-Task Learning: Network Designs & Loss Balancing
JD Tech
JD Tech
Nov 30, 2023 · Artificial Intelligence

Understanding ChatGPT: Mechanisms, Attention, Emergence, and the Chinese Room

This article examines the principles behind ChatGPT, detailing its continuation-based operation, the role of attention mechanisms and transformer architecture, the scaling of neural networks that leads to emergent abilities, and interprets these phenomena through the lenses of compression theory and the Chinese Room thought experiment.

ChatGPTattention mechanismcompression
0 likes · 27 min read
Understanding ChatGPT: Mechanisms, Attention, Emergence, and the Chinese Room
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Nov 15, 2023 · Artificial Intelligence

Understanding the Transformer Architecture: Encoder, Decoder, and Attention Mechanisms

This article explains the Transformer model, comparing it with RNNs, detailing its encoder‑decoder structure, multi‑head and scaled dot‑product attention, embedding layers, feed‑forward networks, and the final linear‑softmax output, supplemented with diagrams and code examples.

Artificial IntelligenceEncoder-DecoderTransformer
0 likes · 10 min read
Understanding the Transformer Architecture: Encoder, Decoder, and Attention Mechanisms
JD Cloud Developers
JD Cloud Developers
Oct 10, 2023 · Artificial Intelligence

Do Large Language Models Have a Mind? Attention, Emergence & Compression Explained

This article examines whether ChatGPT and other large language models exhibit true Theory of Mind, detailing the role of attention mechanisms, neural network architecture, emergent abilities, the Chinese‑room argument, and how compression of massive textual data underlies their apparent intelligence.

Theory of Mindattention mechanismcompression
0 likes · 30 min read
Do Large Language Models Have a Mind? Attention, Emergence & Compression Explained
MaGe Linux Operations
MaGe Linux Operations
Sep 25, 2023 · Artificial Intelligence

How ChatGPT Works: Inside the Neural Network That Generates Human‑Like Text

Stephen Wolfram explains the inner workings of ChatGPT, covering its transformer architecture, probability‑based word selection, training on massive text corpora, the role of embeddings, neural network layers, attention mechanisms, and the challenges of modeling language, offering a deep technical overview for AI enthusiasts.

AIChatGPTMachine Learning
0 likes · 80 min read
How ChatGPT Works: Inside the Neural Network That Generates Human‑Like Text
Open Source Linux
Open Source Linux
Sep 8, 2023 · Artificial Intelligence

How ChatGPT Works: Inside the Neural Network That Generates Human‑Like Text

This article explains the inner workings of ChatGPT, covering how large language models predict the next token using probability distributions, the role of embeddings, the transformer architecture with attention heads, training methods, loss functions, and why such a massive neural network can produce coherent, human‑like language.

ChatGPTEmbeddingsMachine Learning
0 likes · 79 min read
How ChatGPT Works: Inside the Neural Network That Generates Human‑Like Text
21CTO
21CTO
Aug 15, 2023 · Artificial Intelligence

Why Do Neural Networks Suddenly ‘Grok’ After Long Training? Insights from Google

Google’s recent research reveals that when small neural networks are trained for extended periods on tasks like modular addition, they can abruptly shift from memorizing training data to genuinely generalizing—a sudden “grokking” phenomenon driven by weight decay and the emergence of periodic weight structures.

AI researchGeneralizationMLP
0 likes · 9 min read
Why Do Neural Networks Suddenly ‘Grok’ After Long Training? Insights from Google
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Aug 15, 2023 · Artificial Intelligence

Neural Networks for Rapid Network Configuration: A Concise Overview

The article presents a neural‑algorithmic reasoning approach that replaces slow SMT‑based network configuration tools with a graph‑neural‑network model, describing dataset creation, model architecture, and experiments that show 20‑to‑490× speedups while maintaining over 92% configuration consistency on large topologies.

Graph Neural NetworkNetwork SynthesisSMT
0 likes · 5 min read
Neural Networks for Rapid Network Configuration: A Concise Overview
Alimama Tech
Alimama Tech
Aug 9, 2023 · Artificial Intelligence

End-to-End Inventory Prediction and Contract Allocation for Guaranteed Delivery Advertising

The paper introduces Neural Lagrangian Selling, an end‑to‑end framework that jointly learns traffic forecasting and contract inventory allocation by embedding a differentiable Lagrangian solver and a graph convolutional network into a neural model, achieving higher prediction accuracy, fulfillment rates, utilization, and revenue than two‑stage and other methods.

Graph Neural Networkend-to-end learninginventory optimization
0 likes · 16 min read
End-to-End Inventory Prediction and Contract Allocation for Guaranteed Delivery Advertising
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jul 31, 2023 · Artificial Intelligence

Overview of Deep Neural Network Architectures

This article provides a comprehensive overview of deep neural network families, introducing twelve major architectures—including Feedforward, CNN, RNN, LSTM, DBN, GAN, Autoencoder, Residual, Capsule, Transformer, Attention, and Deep Reinforcement Learning—explaining their principles, structures, training methods, and offering Python/TensorFlow/PyTorch code examples.

CNNGaNPython
0 likes · 29 min read
Overview of Deep Neural Network Architectures
Programmer DD
Programmer DD
Jul 20, 2023 · Artificial Intelligence

Why ChatGPT Mirrors Human Thought: Insights from Stephen Wolfram

ChatGPT, built on massive text training and simple neural network operations, generates human-like language yet lacks true understanding, prompting integration with Wolfram|Alpha’s precise computational language—a synergy highlighted by Stephen Wolfram’s insights on language structure, AI limits, and future computational possibilities.

Artificial IntelligenceChatGPTComputational Language
0 likes · 13 min read
Why ChatGPT Mirrors Human Thought: Insights from Stephen Wolfram
Sohu Tech Products
Sohu Tech Products
Jul 19, 2023 · Artificial Intelligence

Understanding the Inner Workings of ChatGPT and Neural Networks

This article explains how ChatGPT generates text by predicting the next token using large language models, describes the role of probability, temperature, and attention mechanisms in transformers, and discusses neural network training, embeddings, semantic spaces, and the broader implications for artificial intelligence research.

Artificial IntelligenceChatGPTLanguage Models
0 likes · 79 min read
Understanding the Inner Workings of ChatGPT and Neural Networks
DataFunSummit
DataFunSummit
May 29, 2023 · Artificial Intelligence

Neuron‑level Shared Multi‑task Learning for Joint CTR and CVR Prediction

This article introduces a neuron‑level shared multi‑task learning framework that jointly estimates click‑through rate (CTR) and conversion rate (CVR), discusses the background and advantages of multi‑task learning, reviews classic shared‑bottom models, describes the proposed pruning‑based architecture, and presents experimental results demonstrating its effectiveness in large‑scale recommendation systems.

CTRCVRModel Pruning
0 likes · 11 min read
Neuron‑level Shared Multi‑task Learning for Joint CTR and CVR Prediction
21CTO
21CTO
May 7, 2023 · Artificial Intelligence

The Untold Journey of AI’s Godfather: Geoffrey Hinton’s Life, Legacy, and Risks

Geoffrey Hinton, the Canadian cognitive psychologist and computer scientist known as the father of deep learning, rose from a distinguished scientific family, endured decades of skepticism, pioneered deep belief networks, mentored future AI leaders, and now warns of AI risks after leaving Google, embodying a lifelong commitment to humanity and ethical AI.

AI historyAI riskGeoffrey Hinton
0 likes · 15 min read
The Untold Journey of AI’s Godfather: Geoffrey Hinton’s Life, Legacy, and Risks
DataFunTalk
DataFunTalk
Apr 3, 2023 · Artificial Intelligence

Implementing RNN, LSTM, and GRU with PyTorch

This article introduces the basic architectures of recurrent neural networks (RNN), LSTM, and GRU, explains PyTorch APIs such as nn.RNN, nn.LSTM, nn.GRU, details their parameters, demonstrates code examples for building and testing these models, and provides practical insights for deep learning practitioners.

GRULSTMPyTorch
0 likes · 9 min read
Implementing RNN, LSTM, and GRU with PyTorch
DataFunTalk
DataFunTalk
Apr 1, 2023 · Artificial Intelligence

Nvidia Meets OpenAI: Highlights from the GTC Fireside Chat on GPT‑4, Deep Learning History, and the Future of AI

In a GTC fireside chat, Nvidia CEO Jensen Huang and OpenAI co‑founder Ilya Sutskever discuss GPT‑4's multimodal advances, the evolution of deep learning from early neural networks to large‑scale models, the pivotal role of GPUs and datasets like ImageNet, and their vision for more reliable, scalable artificial intelligence.

Artificial IntelligenceGPT-4Multimodal
0 likes · 10 min read
Nvidia Meets OpenAI: Highlights from the GTC Fireside Chat on GPT‑4, Deep Learning History, and the Future of AI
Baidu Geek Talk
Baidu Geek Talk
Mar 8, 2023 · Artificial Intelligence

Understanding Motion Decomposition in AI-Based Image Animation: From Sparse to Dense Optical Flow

The article details how AI‑based image animation and face‑swapping decompose video motion into zero‑order rigid and first‑order affine components via Taylor expansion, using unsupervised U‑Net keypoint extraction, sparse-to-dense optical flow conversion, and dense motion networks that learn masks for region‑wise rigidity and non‑rigid deformation.

Taylor expansionaffine transformationface swapping
0 likes · 21 min read
Understanding Motion Decomposition in AI-Based Image Animation: From Sparse to Dense Optical Flow
Top Architect
Top Architect
Mar 1, 2023 · Artificial Intelligence

Understanding the Internals of ChatGPT: Neural Networks, Embeddings, and Training Techniques

This article provides a comprehensive overview of how ChatGPT works, covering its probabilistic text generation, transformer architecture, embedding representations, neural network training processes, and the underlying principles that enable large language models to produce coherent and meaningful human-like language.

AIChatGPTEmbeddings
0 likes · 80 min read
Understanding the Internals of ChatGPT: Neural Networks, Embeddings, and Training Techniques
Model Perspective
Model Perspective
Jan 12, 2023 · Artificial Intelligence

Neural Networks Explained: Architecture, Training, and Reinforcement Basics

This article introduces neural networks, covering their layered structure, common types like CNNs and RNNs, key components such as activation functions, loss, learning rate, backpropagation, dropout, batch normalization, and extends to reinforcement learning concepts including MDPs, policies, value functions, and Q‑learning.

CNNMachine LearningRNN
0 likes · 6 min read
Neural Networks Explained: Architecture, Training, and Reinforcement Basics
MaGe Linux Operations
MaGe Linux Operations
Nov 26, 2022 · Artificial Intelligence

The Timeless Foundations of Machine Learning: 6 Core Algorithms Explained

Andrew Ng’s latest AI newsletter article revisits six foundational machine‑learning algorithms—linear regression, logistic regression, gradient descent, neural networks, decision trees, and k‑means clustering—tracing their historical origins, core concepts, and lasting impact on modern AI applications.

Decision TreesMachine Learninggradient descent
0 likes · 20 min read
The Timeless Foundations of Machine Learning: 6 Core Algorithms Explained
Model Perspective
Model Perspective
Oct 6, 2022 · Artificial Intelligence

Demystifying RNNs and LSTMs: Architecture, Limits, and Python Forecasting

This article explains the structure and operation of recurrent neural networks (RNNs), their limitations, how long short‑term memory (LSTM) networks overcome these issues with gated mechanisms, and provides a complete Python implementation for time‑series airline passenger forecasting.

LSTMPythonRNN
0 likes · 17 min read
Demystifying RNNs and LSTMs: Architecture, Limits, and Python Forecasting
ELab Team
ELab Team
Aug 24, 2022 · Artificial Intelligence

Demystifying AI: From Linear Regression to Neural Networks with TensorFlow.js

This article walks through the fundamentals of artificial intelligence, explaining linear and logistic regression, loss functions, gradient descent, and neural network basics, illustrated with TensorFlow.js code examples, visual analogies, and practical demos, helping readers grasp core concepts and their real‑world applications.

Artificial IntelligenceMachine LearningTensorFlow.js
0 likes · 18 min read
Demystifying AI: From Linear Regression to Neural Networks with TensorFlow.js
Model Perspective
Model Perspective
Aug 8, 2022 · Artificial Intelligence

Build a Multi‑Layer Perceptron with Keras: Step‑by‑Step Guide

This tutorial walks through using Keras to create, compile, train, and evaluate a multi‑layer perceptron for image classification on the Fashion MNIST dataset, covering data loading, model construction with the Sequential API, hyperparameter choices, and prediction of new samples.

Fashion-MNISTKerasMLP
0 likes · 16 min read
Build a Multi‑Layer Perceptron with Keras: Step‑by‑Step Guide
DataFunSummit
DataFunSummit
Jun 11, 2022 · Artificial Intelligence

Transforming Regular Expressions into Neural Networks for Text Classification and Slot Filling

This article explains how regular expressions can be converted into equivalent neural network models—FA‑RNN for classification and FST‑RNN for slot filling—by leveraging finite‑state automata, tensor decomposition, and pretrained word embeddings, achieving zero‑shot performance and strong results in low‑resource scenarios.

FA-RNNneural networksregular expressions
0 likes · 17 min read
Transforming Regular Expressions into Neural Networks for Text Classification and Slot Filling
Code DAO
Code DAO
May 26, 2022 · Artificial Intelligence

Understanding Denoising Diffusion Probabilistic Models: Fundamentals and Process

This article explains the fundamentals of denoising diffusion probabilistic models, detailing the forward Gaussian noise injection, the reverse reconstruction via learned conditional densities, model architecture, loss functions, and experimental results on synthetic datasets, all supported by key research citations.

Generative ModelsMarkov chainarXiv
0 likes · 8 min read
Understanding Denoising Diffusion Probabilistic Models: Fundamentals and Process
DataFunTalk
DataFunTalk
May 16, 2022 · Artificial Intelligence

Applying Knowledge Graphs to Meituan's Recommendation System: Architecture, Challenges, and Future Directions

This article presents Meituan's large‑scale knowledge graph, its integration into location‑based recommendation, the challenges of explainability, domain diversity, data sparsity and spatiotemporal complexity, and describes a dual‑memory neural network and cross‑domain learning approach that improve recall, ranking and recommendation fairness.

AIcross-domain learningexplainability
0 likes · 15 min read
Applying Knowledge Graphs to Meituan's Recommendation System: Architecture, Challenges, and Future Directions
Baobao Algorithm Notes
Baobao Algorithm Notes
Apr 19, 2022 · Artificial Intelligence

Understanding Nonlinearity in Machine Learning: From Logistic Regression to Neural Networks

The article explores the concept of nonlinearity in machine learning, illustrating why tasks like distinguishing cat versus dog or predicting body shape from height and weight are challenging for linear models, and discusses feature engineering, kernel tricks, and periodic activation functions as strategies to introduce nonlinearity and improve model performance.

feature engineeringkernel methodslogistic regression
0 likes · 7 min read
Understanding Nonlinearity in Machine Learning: From Logistic Regression to Neural Networks
NetEase LeiHuo Testing Center
NetEase LeiHuo Testing Center
Apr 11, 2022 · Artificial Intelligence

Understanding AI: Definitions, Applications in Games and Products, and Basic Machine Learning Concepts

This article explains what artificial intelligence is, distinguishes weak and strong AI, explores its applications in games, product testing, and NetEase's Fuxi platform, and introduces fundamental machine‑learning concepts such as supervised, unsupervised, and reinforcement learning, as well as neural networks and loss functions.

AIMachine Learninggame AI
0 likes · 10 min read
Understanding AI: Definitions, Applications in Games and Products, and Basic Machine Learning Concepts
DeWu Technology
DeWu Technology
Mar 11, 2022 · Artificial Intelligence

Deep Learning in Face Recognition

The article surveys deep‑learning‑based face‑recognition systems, detailing detection, preprocessing, and recognition pipelines, describing evaluation metrics such as TAR, FAR, and Rank‑K, reviewing major datasets like LFW, MS‑Celeb‑1M and VGGFace2, and comparing leading architectures—including FaceNet, CenterLoss, SphereFace and InsightFace—while highlighting their strengths, limitations, real‑world applications, and seminal research references.

AIdatasetsdeep learning
0 likes · 14 min read
Deep Learning in Face Recognition
DataFunSummit
DataFunSummit
Jan 29, 2022 · Artificial Intelligence

Survey of Model Pruning and Quantization Techniques for Deep Learning

This article provides a comprehensive overview of recent advances in deep learning model compression, focusing on pruning methods—including unstructured, structured, filter-wise, channel-wise, shape-wise, and stripe-wise approaches—and quantization techniques such as linear, non‑linear, clustering, power‑of‑two, binary, and 8‑bit quantization, while discussing evaluation criteria, sparsity ratios, fine‑tuning, and training‑aware quantization.

Model Compressiondeep learningneural networks
0 likes · 23 min read
Survey of Model Pruning and Quantization Techniques for Deep Learning
Laiye Technology Team
Laiye Technology Team
Jan 28, 2022 · Artificial Intelligence

Survey of Model Compression and Quantization Techniques for Deep Neural Networks

This article provides a comprehensive overview of deep learning model compression and acceleration methods, detailing pruning strategies, various pruning types, evaluation criteria, sparsity ratios, fine‑tuning procedures, as well as linear and non‑linear quantization approaches, their implementations, and practical considerations.

EfficiencyModel Compressiondeep learning
0 likes · 26 min read
Survey of Model Compression and Quantization Techniques for Deep Neural Networks
Python Programming Learning Circle
Python Programming Learning Circle
Jan 18, 2022 · Artificial Intelligence

Fashion MNIST Image Classification Using TensorFlow 2.x in Python

This tutorial demonstrates how to load the Fashion MNIST dataset, explore and preprocess the images, build and compile a neural network with TensorFlow 2.x, train the model, evaluate its accuracy, and use the trained model to make predictions on clothing images, providing complete Python code examples throughout.

Fashion-MNISTImage ClassificationPython
0 likes · 16 min read
Fashion MNIST Image Classification Using TensorFlow 2.x in Python
Code DAO
Code DAO
Dec 24, 2021 · Artificial Intelligence

Understanding Neural Network Predictions with Integrated Gradients

This article introduces the Integrated Gradients (IG) method for explaining deep neural networks, compares it with saliency maps and Shapley‑based approaches, discusses its axiomatic foundations, and provides a step‑by‑step guide to implementing IG using the open‑source TruLens library, including custom baselines and attribution measures.

Attribution MethodsIntegrated GradientsModel Explainability
0 likes · 14 min read
Understanding Neural Network Predictions with Integrated Gradients
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 21, 2021 · Artificial Intelligence

Boost Time Series Forecasting with Autocorrelated Error Adjustment – A 5‑Line PyTorch Trick

This article explains a NeurIPS 2021 paper that introduces a learnable autocorrelation correction for neural network time‑series models, shows the underlying theory, provides concise PyTorch code implementing the adjustment, reports a ~17% average performance gain across datasets, and lists additional practical tricks for time‑series forecasting.

PyTorchTime-seriesautocorrelation
0 likes · 6 min read
Boost Time Series Forecasting with Autocorrelated Error Adjustment – A 5‑Line PyTorch Trick
Code DAO
Code DAO
Dec 5, 2021 · Artificial Intelligence

Why Neural Networks Need Batch Normalization: Principles and Mechanics

The article explains the principle behind Batch Normalization, why it is essential for training deep neural networks, how it standardizes activations, the role of learnable scale and shift parameters, the computation steps during training and inference, and discusses placement strategies within a model.

Batch Normalizationdeep learninggradient descent
0 likes · 9 min read
Why Neural Networks Need Batch Normalization: Principles and Mechanics
Code DAO
Code DAO
Dec 5, 2021 · Artificial Intelligence

Understanding DeepMind’s PonderNet: A Thinkable Network for MNIST

This article explains DeepMind’s PonderNet framework, which lets any neural network allocate computation adaptively, demonstrates its implementation with PyTorch Lightning on the MNIST dataset, details the underlying theory, loss functions, training procedure, and evaluates its pondering behavior on rotated digit experiments.

Adaptive ComputationMNISTPonderNet
0 likes · 27 min read
Understanding DeepMind’s PonderNet: A Thinkable Network for MNIST
DataFunTalk
DataFunTalk
Dec 4, 2021 · Artificial Intelligence

Practical Deep Learning Training Tricks: Cyclic LR, Flooding, Warmup, RAdam, Adversarial Training, Focal Loss, Dropout, Normalization and More

This article compiles essential deep learning training techniques—including cyclic learning rates, flooding, warmup, RAdam optimizer, adversarial training, focal loss, dropout, batch/group/weight normalization, label smoothing, Wasserstein GAN, skip connections, and weight initialization—providing concise explanations and code snippets for each method.

Regularizationdeep learningneural networks
0 likes · 11 min read
Practical Deep Learning Training Tricks: Cyclic LR, Flooding, Warmup, RAdam, Adversarial Training, Focal Loss, Dropout, Normalization and More