Tagged articles

179 articles

Page 1 of 2

May 30, 2026 · Artificial Intelligence

How Apple’s AI‑Powered PICO Codec Cuts Image Files to One‑Third While Preserving Quality

Apple’s new PICO perceptual image codec, detailed in the “What Matters in Practical Learned Image Compression” paper, combines a one‑shot context model, TextFidelityLoss, and TilingArtifactLoss to achieve up to 70%‑80% smaller files than AV1, VVC, JPEG AI, and other learned codecs while running in real‑time on an iPhone 17 Pro Max, though it still lags on traditional metrics like PSNR.

AIJPEG AIPICO

0 likes · 10 min read

How Apple’s AI‑Powered PICO Codec Cuts Image Files to One‑Third While Preserving Quality

AsiaInfo Technology: New Tech Exploration

May 12, 2026 · Artificial Intelligence

Silicon Brain: Neural Connections, Symbolic Reasoning, and Reinforcement Learning in AGI

This article analyses DeepMind’s three‑pronged AGI paradigm—combining neural networks, symbolic systems, and reinforcement learning—by dissecting AlphaGo, AlphaFold 2, Gemini, and the Genie‑Sima loop, mapping the biological inspiration, outlining engineering and safety challenges, and proposing research directions for large‑scale deployment in communication scenarios.

AGIDeepMindEngineering Challenges

0 likes · 21 min read

Silicon Brain: Neural Connections, Symbolic Reasoning, and Reinforcement Learning in AGI

AgentGuide

Apr 26, 2026 · Artificial Intelligence

Can You Explain Large Model Training Without Complex Formulas? A Simple, Clear Guide

This article breaks down the fundamentals of large model training—covering data, parameters, neural networks, loss functions, gradient descent, pre‑training, and fine‑tuning—in plain language so readers can grasp how massive models learn without needing to dive into complex mathematics.

fine-tuninggradient descentloss function

0 likes · 12 min read

Can You Explain Large Model Training Without Complex Formulas? A Simple, Clear Guide

Machine Heart

Apr 26, 2026 · Artificial Intelligence

Has Deep Learning Discovered Its Own “Newton’s Law”?

A new collaborative paper titled “There Will Be a Scientific Theory of Deep Learning” proposes a unified “Learning Mechanics” framework that connects solvable idealized models, tractable limits, empirical scaling laws, hyperparameter theory, and universal representation behavior, aiming to give deep learning a first‑principles scientific foundation.

deep learninghyperparameterslearning mechanics

0 likes · 14 min read

Has Deep Learning Discovered Its Own “Newton’s Law”?

SuanNi

Apr 10, 2026 · Artificial Intelligence

Can Neural Networks Replace Traditional CPUs? Inside the New Neural Computer

A groundbreaking study shows how Meta AI and KAUST transformed a video‑generation model into a neural‑computer that unifies computation, storage, and I/O, enabling pixel‑perfect command‑line and graphical UI control while highlighting current limitations in arithmetic reasoning and long‑term program stability.

AI video generationHuman‑computer interactionMachine Learning

0 likes · 9 min read

Can Neural Networks Replace Traditional CPUs? Inside the New Neural Computer

Machine Learning Algorithms & Natural Language Processing

Mar 20, 2026 · Artificial Intelligence

Why Kimi Dropped Residual Connections: A First‑Person Deep Dive into Attention Residuals

This article explains how Attention Residuals (AttnRes) replace traditional residual shortcuts with layer‑wise attention, details the mathematical reformulation, design constraints, static‑Q trick, full and block variants, and presents experimental evidence of significant accuracy gains with modest overhead.

Efficient AttentionNLPRMSNorm

0 likes · 11 min read

Why Kimi Dropped Residual Connections: A First‑Person Deep Dive into Attention Residuals

DeepHub IMBA

Feb 28, 2026 · Artificial Intelligence

Why Energy‑Based Models Could Outperform Probabilistic LLMs, According to Yann LeCun

Yann LeCun argues that the probability‑driven, token‑by‑token design of current large language models may never reach human‑level intelligence, and explains how Energy‑Based Models replace probability distributions with an energy function, offering more flexible training, inference, and multi‑modal capabilities.

Contrastive DivergenceDensity EstimationEBM

0 likes · 23 min read

Why Energy‑Based Models Could Outperform Probabilistic LLMs, According to Yann LeCun

Data Party THU

Feb 28, 2026 · Artificial Intelligence

How MIT’s Attention Matching Turns Linear Regression into Fast KV Compression

The article explains MIT’s Attention Matching technique that reformulates large‑model context compression as a linear regression problem, detailing its theoretical foundations, three‑step gradient‑free implementation, architectural adaptations, non‑uniform budgeting, and extensive evaluations showing orders‑of‑magnitude speed gains with minimal accuracy loss.

Attention MatchingKV compressionMemory Optimization

0 likes · 10 min read

How MIT’s Attention Matching Turns Linear Regression into Fast KV Compression

Data Party THU

Feb 21, 2026 · Artificial Intelligence

Unlocking Compositional Generalization: Meta‑Learning Strategies for Neural Networks

This article examines how meta‑learning combined with compositionality enables neural networks to rapidly adapt to new tasks by formalizing hierarchical optimization, leveraging modular architectures with hypernetworks, and exploiting Transformer latent codes for effective compositional generalization.

Bilevel OptimizationMeta LearningTransformer

0 likes · 5 min read

Unlocking Compositional Generalization: Meta‑Learning Strategies for Neural Networks

Tencent Technical Engineering

Feb 2, 2026 · Artificial Intelligence

Why Neural Networks Are the Hidden Engine Behind Modern AI: From Basics to Large Language Models

This comprehensive guide walks through the fundamentals of neural networks, activation functions, training methods, and how they power large language models, while also covering tokenization, self‑attention, transformer architectures, AI infrastructure, and practical usage through agents and retrieval‑augmented generation.

Agent SystemsArtificial IntelligenceGPU infrastructure

0 likes · 75 min read

Why Neural Networks Are the Hidden Engine Behind Modern AI: From Basics to Large Language Models

AI Architecture Hub

Jan 7, 2026 · Artificial Intelligence

Why “Attention Is All You Need” Still Shapes AI: A Beginner’s Deep Dive

This article provides a comprehensive, beginner‑friendly walkthrough of the landmark 2017 paper “Attention Is All You Need,” covering its authors, historical context, the shortcomings of RNNs and CNNs, the birth of self‑attention, the Transformer architecture, and its transformative impact on modern AI.

AI historyTransformerattention mechanism

0 likes · 9 min read

Why “Attention Is All You Need” Still Shapes AI: A Beginner’s Deep Dive

Alibaba Cloud Developer

Nov 5, 2025 · Artificial Intelligence

How TinyAI Brings a Full‑Stack AI Framework to Pure Java

TinyAI is a completely Java‑implemented, lightweight full‑stack AI framework that demonstrates how to build a production‑grade deep‑learning system—from low‑level numeric tensors and automatic differentiation to modular neural‑network layers, training pipelines, large‑language‑model implementations, and intelligent agent architectures—while remaining education‑friendly and free of external dependencies.

AI FrameworkAgent SystemCode Examples

0 likes · 33 min read

How TinyAI Brings a Full‑Stack AI Framework to Pure Java

Tencent Cloud Developer

Nov 4, 2025 · Artificial Intelligence

From Functions to Transformers: Mastering Neural Networks Step by Step

This article walks you through the evolution from basic mathematical functions to modern large‑scale models, explaining activation functions, forward and backward propagation, loss calculation, gradient descent, regularization, dropout, word embeddings, RNNs, and the core mechanics of the Transformer architecture.

RNNRegularizationTransformer

0 likes · 15 min read

From Functions to Transformers: Mastering Neural Networks Step by Step

Bighead's Algorithm Notes

Oct 21, 2025 · Artificial Intelligence

KANMixer: A New KAN‑Centric Paradigm for Long‑Term Time Series Forecasting

This article reviews the KANMixer model, which places Kolmogorov‑Arnold Networks at the core of a lightweight architecture for long‑term time series forecasting, detailing its design, extensive benchmark experiments on seven real‑world datasets, ablation analyses, and its computational trade‑offs versus MLP and Transformer baselines.

Ablation StudyKANLong-term Time Series Forecasting

0 likes · 8 min read

KANMixer: A New KAN‑Centric Paradigm for Long‑Term Time Series Forecasting

Data Party THU

Oct 4, 2025 · Artificial Intelligence

Unveiling Transformer Internals: From Theory to PyTorch Code

This article deeply explores the Transformer architecture by combining original paper principles with PyTorch source code, covering encoder‑decoder design, positional encoding assumptions, core parameters, residual connections, attention mechanisms, and detailed implementation snippets to help readers understand and reproduce the model.

Positional EncodingPyTorchTransformer

0 likes · 22 min read

Unveiling Transformer Internals: From Theory to PyTorch Code

IT Services Circle

Sep 27, 2025 · Artificial Intelligence

Why “Neural Network” and “Deep Learning” Are Actually the Same Thing

The article explains how the terms “neural network” and “deep learning” originated, why they were once treated as distinct branches of AI, and how historical biases and naming politics eventually merged them into a single research direction.

AI historyMachine LearningTerminology

0 likes · 4 min read

Why “Neural Network” and “Deep Learning” Are Actually the Same Thing

Architects' Tech Alliance

Aug 31, 2025 · Artificial Intelligence

Why the Last Decade Became the Golden Age of AI Chip Architecture

The article traces the evolution of AI hardware over the past ten years, outlining three key phases—from early chip limitations that sidelined neural networks, through CPU advances that still fell short, to the rise of GPUs and specialized AI chips that finally unlocked rapid AI deployment, while also highlighting the parallel impact of algorithmic breakthroughs and massive data growth.

AI hardwareBig DataGPU

0 likes · 5 min read

Why the Last Decade Became the Golden Age of AI Chip Architecture

Qborfy AI

Aug 7, 2025 · Artificial Intelligence

Understanding RNNs: From Memory Cells to Real‑World Applications

This article explains how recurrent neural networks (RNNs) add memory to neural models, details the gate mechanisms of LSTM and GRU, compares their structures and parameter counts, and illustrates their use in speech recognition, translation, stock prediction, and video generation, while highlighting practical insights and energy considerations.

AIGRULSTM

0 likes · 5 min read

Understanding RNNs: From Memory Cells to Real‑World Applications

Didi Tech

Jul 31, 2025 · Artificial Intelligence

How to Build Efficient Causal Effect Estimators for Exponential‑Family Outcomes

This article presents a unified framework for efficiently estimating causal treatment effects on exponential‑family outcomes, extending target regularization beyond Gaussian assumptions, deriving bias analysis for plug‑in estimators, proposing DR and TMLE‑based estimators, and validating them on synthetic and real datasets.

causal inferenceexponential familyneural networks

0 likes · 12 min read

How to Build Efficient Causal Effect Estimators for Exponential‑Family Outcomes

AI Large Model Application Practice

Jul 7, 2025 · Artificial Intelligence

Understanding Dropout: Preventing Overfitting in Neural Networks

This article explains what overfitting is, introduces dropout as a regularization technique, describes how dropout randomly deactivates neurons during training and rescales outputs during inference, discusses its limitations, and outlines why large language models may use alternative strategies.

AIDropoutMachine Learning

0 likes · 11 min read

Understanding Dropout: Preventing Overfitting in Neural Networks

Qborfy AI

Jul 2, 2025 · Artificial Intelligence

Mastering Activation Functions: From Sigmoid to Swish and When to Use Them

This article explains the role of activation functions in neural networks, compares five classic functions with formulas, performance trade‑offs, and gradient behavior, and provides a Python visualization demo plus several practical insights and real‑world examples.

ReLUSwishactivation functions

0 likes · 7 min read

Mastering Activation Functions: From Sigmoid to Swish and When to Use Them

AI Large Model Application Practice

May 16, 2025 · Artificial Intelligence

Why Residual Connections Keep Deep Neural Networks Stable

This article explains why residual connections are essential in deep neural networks, describing the problems of network degradation and gradient vanishing, how shortcut paths add the input to the layer output, the requirement of matching dimensions, and the resulting stability for training large language models.

LLMResidual Connectionsgradient flow

0 likes · 7 min read

Why Residual Connections Keep Deep Neural Networks Stable

AI Cyberspace

May 3, 2025 · Artificial Intelligence

How Hopfield Networks Mimic Brain Memory: Theory, Math, and Python Demo

This article explores the 1982 Hopfield associative memory neural network, detailing its biological inspiration, energy‑minimization principle, mathematical formulation, training and recall processes, capacity limits, practical Python implementation, and the model's strengths and weaknesses.

Hopfield networkPython implementationassociative memory

0 likes · 21 min read

How Hopfield Networks Mimic Brain Memory: Theory, Math, and Python Demo

IT Services Circle

May 2, 2025 · Artificial Intelligence

Understanding Gradient Vanishing in Deep Neural Networks and How to Mitigate It

The article explains why deep networks suffer from gradient vanishing—especially when using sigmoid or tanh activations—covers the underlying mathematics, compares activation functions, and presents practical techniques such as proper weight initialization, batch normalization, residual connections, and code examples to visualize the phenomenon.

Batch NormalizationResNetactivation functions

0 likes · 7 min read

Understanding Gradient Vanishing in Deep Neural Networks and How to Mitigate It

AI Frontier Lectures

Apr 30, 2025 · Artificial Intelligence

How Dual‑Domain Strip Attention Revolutionizes Image Restoration

The paper introduces Dual‑Domain Strip Attention Network (DSANet), a lightweight architecture that combines spatial and frequency strip attention to boost multi‑scale representation learning, achieving state‑of‑the‑art performance on dehazing, desnowing, defocus deblurring, and denoising tasks with significantly lower computational cost.

deep learningdual-domain attentionneural networks

0 likes · 10 min read

How Dual‑Domain Strip Attention Revolutionizes Image Restoration

Cognitive Technology Team

Apr 12, 2025 · Artificial Intelligence

Analyzing a Trained Neural Network: Visualizing Hidden Layers and Understanding Its Limitations

This article walks through an interactive exploration of a simple two‑hidden‑layer neural network, showing how real‑time visualizations reveal its learned representations, accuracy limits, and why constrained training leads to over‑confident yet unintelligent predictions before introducing backpropagation.

Backpropagationdeep learninghidden layers

0 likes · 10 min read

Analyzing a Trained Neural Network: Visualizing Hidden Layers and Understanding Its Limitations

Cognitive Technology Team

Apr 9, 2025 · Artificial Intelligence

How Neural Networks Learn: Gradient Descent and Loss Functions

This article explains how neural networks learn by using labeled training data, describing the role of weights, biases, activation functions, and how gradient descent iteratively adjusts parameters to minimize loss, illustrated with the MNIST digit‑recognition example.

MNISTdeep learninggradient descent

0 likes · 16 min read

How Neural Networks Learn: Gradient Descent and Loss Functions

Cognitive Technology Team

Apr 8, 2025 · Artificial Intelligence

Understanding Neural Networks: Structure, Layers, and Activation

This article explains how a simple neural network can recognize handwritten digits by preprocessing images, organizing neurons into input, hidden, and output layers, using weighted sums, biases, sigmoid compression, and matrix multiplication to illustrate the fundamentals of deep learning.

LayersSigmoidactivation functions

0 likes · 16 min read

Understanding Neural Networks: Structure, Layers, and Activation

Python Programming Learning Circle

Feb 18, 2025 · Artificial Intelligence

Getting Started with PyTorch: Installation, Core Operations, and Practical Deep Learning Projects

This article introduces PyTorch, covering installation on CPU/GPU, basic tensor operations, automatic differentiation, building and training neural networks, data loading with DataLoader, image classification on MNIST, model deployment, and useful tips for accelerating deep‑learning workflows.

GPUMachine LearningPyTorch

0 likes · 9 min read

Getting Started with PyTorch: Installation, Core Operations, and Practical Deep Learning Projects

AI Code to Success

Feb 13, 2025 · Artificial Intelligence

Why PyTorch Is the Go-To Framework for Modern AI Development

This article introduces PyTorch, explains its dynamic computation graph, Python‑centric design, and tensor operations, surveys its major applications in computer vision, natural language processing, and reinforcement learning, and provides a step‑by‑step tutorial for building and training a multilayer perceptron on the MNIST dataset.

Dynamic Computation GraphMNISTPyTorch

0 likes · 11 min read

Why PyTorch Is the Go-To Framework for Modern AI Development

Cognitive Technology Team

Feb 12, 2025 · Artificial Intelligence

Introduction to Neural Networks by Professor Li Yongle

In this introductory session, renowned graduate exam instructor Professor Li Yongle provides a clear, beginner-friendly overview of neural networks, covering basic concepts and their relevance within artificial intelligence, including their structure, learning mechanisms, and typical applications in modern AI systems.

AIdeep learningeducation

0 likes · 1 min read

Introduction to Neural Networks by Professor Li Yongle

AI Code to Success

Feb 11, 2025 · Artificial Intelligence

Unlocking TensorFlow: From Basics to Building Your First Linear Regression Model

This article introduces TensorFlow's core concepts—tensors, computational graphs, variables, and sessions—covers its wide range of AI applications from traditional machine learning to deep learning in NLP and computer vision, and provides a step‑by‑step Python tutorial for implementing a simple linear regression model.

AI TutorialMachine LearningPython

0 likes · 6 min read

Unlocking TensorFlow: From Basics to Building Your First Linear Regression Model

Architect

Feb 10, 2025 · Artificial Intelligence

Evolution of DeepSeek Mixture‑of‑Experts (MoE) Architecture from V1 to V3

This article reviews the development of DeepSeek's Mixture-of-Experts (MoE) models, tracing their evolution from the original DeepSeekMoE V1 through V2 to V3, detailing architectural innovations such as fine‑grained expert segmentation, shared‑expert isolation, load‑balancing losses, device‑limited routing, and the shift from softmax to sigmoid gating.

DeepSeekLLMMixture of Experts

0 likes · 21 min read

Evolution of DeepSeek Mixture‑of‑Experts (MoE) Architecture from V1 to V3

Cognitive Technology Team

Feb 9, 2025 · Artificial Intelligence

A Beginner’s Guide to the History and Key Concepts of Deep Learning

From the perceptron’s inception in 1958 to modern Transformer-based models like GPT, this article traces the evolution of deep learning, explaining foundational architectures such as DNNs, CNNs, RNNs, LSTMs, attention mechanisms, and recent innovations like DeepSeek’s MLA, highlighting their principles and impact.

GPTMLAattention

0 likes · 19 min read

A Beginner’s Guide to the History and Key Concepts of Deep Learning

AI Cyberspace

Jan 28, 2025 · Artificial Intelligence

From Biological Neurons to Deep Learning: How MP Models Evolve

This article explains the structure of biological neurons, introduces the McCulloch‑Pitts (MP) mathematical model, shows how manual weight adjustments work, and walks through the development from single‑layer perceptrons to two‑layer networks and modern deep learning techniques, covering activation functions, training algorithms, and practical examples.

BackpropagationMP modelPerceptron

0 likes · 30 min read

From Biological Neurons to Deep Learning: How MP Models Evolve

AI Large Model Application Practice

Jan 20, 2025 · Artificial Intelligence

How Embeddings Transform Simple Character Codes into Powerful Vectors for LLMs

This article explains how embeddings convert basic character indices into high‑dimensional vectors, describes their training via gradient descent, introduces the embedding matrix, and shows how these vectors enable modern language models to capture semantic relationships and be reused across tasks.

EmbeddingsLLMMachine Learning

0 likes · 8 min read

How Embeddings Transform Simple Character Codes into Powerful Vectors for LLMs

AI Large Model Application Practice

Jan 14, 2025 · Artificial Intelligence

Turning Classification Nets into Language Generators: A Step‑by‑Step Guide

This article explains how a simple neural network trained for classification can be adapted to generate natural language by expanding its output layer, encoding characters as numbers, using a sliding‑window context, and recursively predicting the next token, illustrating each step with diagrams and concrete examples.

AILLMlanguage generation

0 likes · 10 min read

Turning Classification Nets into Language Generators: A Step‑by‑Step Guide

AI Large Model Application Practice

Jan 9, 2025 · Artificial Intelligence

How Does Gradient Descent Train a Neural Network? A Step‑by‑Step Guide

This article walks through the complete training cycle of a simple neural network—from random weight initialization and forward propagation with labeled data, through loss calculation and gradient‑based weight updates, to iterative epochs, average loss, and practical issues like gradient explosion and vanishing.

AIMachine Learninggradient descent

0 likes · 11 min read

How Does Gradient Descent Train a Neural Network? A Step‑by‑Step Guide

Model Perspective

Dec 26, 2024 · Fundamentals

What Makes a Mathematical Model Enduring? Lessons from AI and Ecology

The article explores the characteristics of long‑lasting mathematical models—continuous refinement, expanding applicability, elegant simplicity, extensibility, focus on essence, and philosophical depth—illustrated with examples such as neural networks and the Lotka‑Volterra predator‑prey system, and offers guidance on creating such vibrant models.

Lotka-Volterrainterdisciplinarymathematical models

0 likes · 6 min read

What Makes a Mathematical Model Enduring? Lessons from AI and Ecology

DevOps

Dec 5, 2024 · Artificial Intelligence

A Brief History of Artificial Intelligence: From McCulloch‑Pitts Neurons to GPT‑4

This article traces the evolution of artificial intelligence from the 1943 McCulloch‑Pitts neuron model through key milestones such as Turing's test, the Dartmouth conference, the rise of neural networks, deep learning breakthroughs, and recent large language models like GPT‑4, illustrating the field's rapid progress.

Artificial IntelligenceGPTMachine Learning

0 likes · 7 min read

A Brief History of Artificial Intelligence: From McCulloch‑Pitts Neurons to GPT‑4

Model Perspective

Dec 5, 2024 · Artificial Intelligence

Choosing the Right Activation Function: Pros, Cons, and Best Practices

Activation functions are crucial for neural networks, providing non‑linearity, normalization, and gradient flow; this article reviews common functions such as Sigmoid, Tanh, ReLU, Leaky ReLU, ELU, Noisy ReLU, Softmax, and Swish, comparing their characteristics, advantages, drawbacks, and guidance for selecting the appropriate one.

Machine Learningactivation functionsmodel optimization

0 likes · 10 min read

Choosing the Right Activation Function: Pros, Cons, and Best Practices

Alibaba Cloud Developer

Nov 22, 2024 · Artificial Intelligence

Demystifying Neural Networks and Transformers: From Basics to Hands‑On Code

This comprehensive guide walks you through the fundamentals of neural networks, explains the evolution to transformer models, provides detailed Python code for training and inference, and shows how to fine‑tune open‑source AI models for real‑world tasks such as automated technical PM prediction.

AIMachine Learningdeep learning

0 likes · 47 min read

Demystifying Neural Networks and Transformers: From Basics to Hands‑On Code

DaTaobao Tech

Nov 13, 2024 · Artificial Intelligence

Understanding Neural Networks and Transformers: Principles, Implementation, and Applications

The article surveys neural networks from basic neuron operations and loss functions through deep architectures to the Transformer model, detailing embeddings, positional encoding, self‑attention, multi‑head attention, residual links, and encoder‑decoder design, and includes PyTorch code examples for linear regression, translation, and fine‑tuning Hugging Face’s MiniRBT for text classification.

AINLPPyTorch

0 likes · 44 min read

Understanding Neural Networks and Transformers: Principles, Implementation, and Applications

Model Perspective

Oct 17, 2024 · Artificial Intelligence

Visualizing How Neural Networks Approximate Any Function

This article explains the universal approximation theorem, showing how even a simple neural network with one hidden layer can approximate any continuous function by adjusting weights and biases, and illustrates the process with visual examples of step and bump functions, linking theory to recent Nobel recognitions.

AIfunction approximationneural networks

0 likes · 9 min read

Visualizing How Neural Networks Approximate Any Function

DataFunSummit

Oct 9, 2024 · Artificial Intelligence

2024 Nobel Physics Prize Recognizes Hopfield and Hinton for Foundational AI Discoveries

The 2024 Nobel Prize in Physics was awarded to John J. Hopfield and Geoffrey E. Hinton for pioneering neural‑network research that transformed machine learning, underscoring artificial intelligence’s evolution from a technology into a scientific discipline.

Artificial IntelligenceHintonHopfield

0 likes · 6 min read

2024 Nobel Physics Prize Recognizes Hopfield and Hinton for Foundational AI Discoveries

Alibaba Cloud Big Data AI Platform

Aug 27, 2024 · Artificial Intelligence

How AI Detects Cluster-Wide Task Slowdowns in Cloud Systems

A new AI‑driven method for detecting cluster‑wide task slowdowns in cloud platforms improves F1 score by 5.3% over state‑of‑the‑art techniques, addressing challenges of composite periodic patterns, training data contamination, and focusing on slowdown anomalies.

Anomaly DetectionTime-seriesUnsupervised Learning

0 likes · 8 min read

How AI Detects Cluster-Wide Task Slowdowns in Cloud Systems

Alibaba Cloud Big Data AI Platform

Aug 26, 2024 · Cloud Computing

How Neural Attention Detects Cluster-Wide Task Slowdowns in Cloud Systems

A new paper accepted at ACM SIGKDD2024 presents a neural‑network‑based framework that uses a skim‑attention mechanism and a picky loss function to accurately detect cluster‑wide task slowdown anomalies in large‑scale cloud platforms, achieving a 5.3% average F1‑score improvement over state‑of‑the‑art methods.

Anomaly DetectionCluster Performanceneural networks

0 likes · 5 min read

How Neural Attention Detects Cluster-Wide Task Slowdowns in Cloud Systems

21CTO

Aug 11, 2024 · Artificial Intelligence

Demystifying LLMs: How Tokens, Training, and Transformers Power Generative AI

This article explains the fundamentals of large language models, covering tokenization, probability prediction, Markov chain basics, training data limitations, context windows, and the transition to neural network architectures like Transformers, while providing Python examples and insights into model scaling and the illusion of intelligence.

AILLMTransformer

0 likes · 18 min read

Demystifying LLMs: How Tokens, Training, and Transformers Power Generative AI

Ops Development & AI Practice

Jul 6, 2024 · Artificial Intelligence

How Backpropagation Powers Modern Deep Learning: A Deep Dive

This article explains the backpropagation algorithm—its origins, mathematical basis, step‑by‑step workflow, importance for efficient neural network training, and widespread applications in image recognition, natural language processing, and recommendation systems.

BackpropagationMachine Learningdeep learning

0 likes · 6 min read

How Backpropagation Powers Modern Deep Learning: A Deep Dive

Ops Development & AI Practice

Jul 3, 2024 · Artificial Intelligence

How Do Artificial Neural Networks Mirror Animal Brains? An In‑Depth Overview

This article explains the fundamental concepts and architecture of artificial neural networks, describes their learning process, compares them with biological neural systems, and highlights both the similarities and key differences in structure, learning mechanisms, flexibility, and energy efficiency.

Artificial IntelligenceBiological InspirationMachine Learning

0 likes · 7 min read

How Do Artificial Neural Networks Mirror Animal Brains? An In‑Depth Overview

JD Tech Talk

Jun 25, 2024 · Artificial Intelligence

Understanding Large Language Models: From Parameters to Transformer Architecture

This article explains the fundamental concepts behind large language models, including their two-file structure, training process, neural network basics, perceptron examples, weight and threshold calculations, the TensorFlow Playground, and a detailed walkthrough of the Transformer architecture with tokenization, positional encoding, self‑attention, normalization, and feed‑forward layers.

AIMachine LearningSelf-Attention

0 likes · 20 min read

Understanding Large Language Models: From Parameters to Transformer Architecture

Ops Development & AI Practice

Jun 22, 2024 · Artificial Intelligence

Machine Learning Demystified: Traditional Algorithms vs Neural Networks

Machine learning, a core AI discipline, encompasses traditional algorithms like supervised, unsupervised, and reinforcement learning as well as neural network models such as CNNs, RNNs, GANs, and VAEs, each with distinct principles, strengths, and typical application scenarios.

Machine LearningTraditional AlgorithmsUnsupervised Learning

0 likes · 10 min read

Machine Learning Demystified: Traditional Algorithms vs Neural Networks

Rare Earth Juejin Tech Community

Jun 12, 2024 · Artificial Intelligence

A Simple Introduction to the Transformer Model

This article provides a comprehensive, beginner-friendly explanation of the Transformer architecture, covering its encoder‑decoder structure, self‑attention, multi‑head attention, positional encoding, residual connections, decoding process, final linear and softmax layers, and training considerations, illustrated with numerous diagrams and code snippets.

Self-AttentionTransformerdeep learning

0 likes · 24 min read

A Simple Introduction to the Transformer Model

Architects Research Society

May 21, 2024 · Artificial Intelligence

27 Essential AI Papers Recommended by Ilya Sutskever for John Carmack

Ilya Sutskever, former OpenAI chief scientist, shared a curated list of 27 seminal AI research papers—including the Annotated Transformer, Attention Is All You Need, and Deep Residual Learning—with links, claiming mastering them covers roughly 90% of today’s essential artificial‑intelligence knowledge.

AIMachine LearningResearch Papers

0 likes · 7 min read

27 Essential AI Papers Recommended by Ilya Sutskever for John Carmack

Rare Earth Juejin Tech Community

May 5, 2024 · Artificial Intelligence

Comprehensive Guide to Neural Network Algorithms: Definitions, Structure, Implementation, and Training

This article provides an in‑depth tutorial on neural network algorithms, covering their biological inspiration, significance, advantages and drawbacks, detailed architecture, data preparation, one‑hot encoding, weight initialization, forward and backward propagation, cost functions, regularization, gradient checking, and complete Python code examples.

AIBackpropagationMachine Learning

0 likes · 37 min read

Comprehensive Guide to Neural Network Algorithms: Definitions, Structure, Implementation, and Training

DaTaobao Tech

Apr 22, 2024 · Artificial Intelligence

Neural Networks and Deep Learning: Principles and MNIST Example

The article reviews recent generative‑AI breakthroughs such as GPT‑5 and AI software engineers, explains that AI systems are deterministic rather than black boxes, and then teaches neural‑network fundamentals—including activation functions, back‑propagation, and a hands‑on MNIST digit‑recognition example with discussion of overfitting and regularization.

MNISTactivation functionsdeep learning

0 likes · 17 min read

Neural Networks and Deep Learning: Principles and MNIST Example

AI Algorithm Path

Apr 5, 2024 · Artificial Intelligence

Master CNN, RNN, GAN, and Transformer Architectures in One Guide

This article provides a friendly, step‑by‑step overview of five core deep‑learning architectures—CNN, RNN, GAN, Transformers, and encoder‑decoder—explaining their structures, key components, and typical use cases in image and natural‑language processing.

CNNEncoder-DecoderGaN

0 likes · 12 min read

Architect

Mar 26, 2024 · Artificial Intelligence

Why Transformers Outperform RNNs: A Deep Dive into Architecture and Training

This article explains the Transformer model’s core architecture, self‑attention mechanism, encoder‑decoder workflow, training with teacher forcing, inference steps, and why it surpasses RNNs and CNNs, while also outlining its major NLP applications.

NLPTransformerattention mechanism

0 likes · 14 min read

Why Transformers Outperform RNNs: A Deep Dive into Architecture and Training

NetEase Smart Enterprise Tech+

Feb 28, 2024 · Artificial Intelligence

Mastering Multi-Task Learning: Network Designs & Loss Balancing

This article reviews the challenges of multi‑task learning, compares various network architectures such as hard‑parameter sharing, MMoE, CGC, and PLE, and examines loss‑balancing techniques like GradNorm, Dynamic Weight Average and task‑prioritization, offering insights on how to mitigate the “seesaw” effect and improve overall performance.

AI researchdynamic weightinggradient normalization

0 likes · 15 min read

Mastering Multi-Task Learning: Network Designs & Loss Balancing

Rare Earth Juejin Tech Community

Jan 14, 2024 · Artificial Intelligence

Understanding and Implementing LoRA (Low‑Rank Adaptation) for Model Training with PyTorch

This article explains the principle of LoRA (Low‑Rank Adaptation) for large language models, demonstrates how to decompose weight updates into low‑rank matrices, and provides a complete PyTorch implementation that fine‑tunes a small VGG‑19 network on a custom goldfish dataset.

LoRAPyTorchdeep learning

0 likes · 11 min read

Understanding and Implementing LoRA (Low‑Rank Adaptation) for Model Training with PyTorch

JD Tech

Nov 30, 2023 · Artificial Intelligence

Understanding ChatGPT: Mechanisms, Attention, Emergence, and the Chinese Room

This article examines the principles behind ChatGPT, detailing its continuation-based operation, the role of attention mechanisms and transformer architecture, the scaling of neural networks that leads to emergent abilities, and interprets these phenomena through the lenses of compression theory and the Chinese Room thought experiment.

ChatGPTattention mechanismcompression

0 likes · 27 min read

Understanding ChatGPT: Mechanisms, Attention, Emergence, and the Chinese Room

Rare Earth Juejin Tech Community

Nov 15, 2023 · Artificial Intelligence

Understanding the Transformer Architecture: Encoder, Decoder, and Attention Mechanisms

This article explains the Transformer model, comparing it with RNNs, detailing its encoder‑decoder structure, multi‑head and scaled dot‑product attention, embedding layers, feed‑forward networks, and the final linear‑softmax output, supplemented with diagrams and code examples.

Artificial IntelligenceEncoder-DecoderTransformer

0 likes · 10 min read

Understanding the Transformer Architecture: Encoder, Decoder, and Attention Mechanisms

21CTO

Oct 30, 2023 · Artificial Intelligence

Geoffrey Hinton Warns AI Could Take Over Earth Within Five Years – What You Need to Know

Renowned AI pioneer Geoffrey Hinton cautions that rapidly advancing artificial intelligence may surpass human control in as little as five years, highlighting self‑modifying code, the "black‑box" problem, and the urgent need for robust safety regulations.

AI riskAI safetyGeoffrey Hinton

0 likes · 8 min read

Geoffrey Hinton Warns AI Could Take Over Earth Within Five Years – What You Need to Know

JD Cloud Developers

Oct 10, 2023 · Artificial Intelligence

Do Large Language Models Have a Mind? Attention, Emergence & Compression Explained

This article examines whether ChatGPT and other large language models exhibit true Theory of Mind, detailing the role of attention mechanisms, neural network architecture, emergent abilities, the Chinese‑room argument, and how compression of massive textual data underlies their apparent intelligence.

Theory of Mindattention mechanismcompression

0 likes · 30 min read

Do Large Language Models Have a Mind? Attention, Emergence & Compression Explained

MaGe Linux Operations

Sep 25, 2023 · Artificial Intelligence

How ChatGPT Works: Inside the Neural Network That Generates Human‑Like Text

Stephen Wolfram explains the inner workings of ChatGPT, covering its transformer architecture, probability‑based word selection, training on massive text corpora, the role of embeddings, neural network layers, attention mechanisms, and the challenges of modeling language, offering a deep technical overview for AI enthusiasts.

AIChatGPTMachine Learning

0 likes · 80 min read

How ChatGPT Works: Inside the Neural Network That Generates Human‑Like Text

Open Source Linux

Sep 8, 2023 · Artificial Intelligence

How ChatGPT Works: Inside the Neural Network That Generates Human‑Like Text

This article explains the inner workings of ChatGPT, covering how large language models predict the next token using probability distributions, the role of embeddings, the transformer architecture with attention heads, training methods, loss functions, and why such a massive neural network can produce coherent, human‑like language.

ChatGPTEmbeddingsMachine Learning

0 likes · 79 min read

21CTO

Aug 15, 2023 · Artificial Intelligence

Why Do Neural Networks Suddenly ‘Grok’ After Long Training? Insights from Google

Google’s recent research reveals that when small neural networks are trained for extended periods on tasks like modular addition, they can abruptly shift from memorizing training data to genuinely generalizing—a sudden “grokking” phenomenon driven by weight decay and the emergence of periodic weight structures.

AI researchGeneralizationMLP

0 likes · 9 min read

Why Do Neural Networks Suddenly ‘Grok’ After Long Training? Insights from Google

Network Intelligence Research Center (NIRC)

Aug 15, 2023 · Artificial Intelligence

Neural Networks for Rapid Network Configuration: A Concise Overview

The article presents a neural‑algorithmic reasoning approach that replaces slow SMT‑based network configuration tools with a graph‑neural‑network model, describing dataset creation, model architecture, and experiments that show 20‑to‑490× speedups while maintaining over 92% configuration consistency on large topologies.

Graph Neural NetworkNetwork SynthesisSMT

0 likes · 5 min read

Neural Networks for Rapid Network Configuration: A Concise Overview

Alimama Tech

Aug 9, 2023 · Artificial Intelligence

End-to-End Inventory Prediction and Contract Allocation for Guaranteed Delivery Advertising

The paper introduces Neural Lagrangian Selling, an end‑to‑end framework that jointly learns traffic forecasting and contract inventory allocation by embedding a differentiable Lagrangian solver and a graph convolutional network into a neural model, achieving higher prediction accuracy, fulfillment rates, utilization, and revenue than two‑stage and other methods.

Graph Neural Networkend-to-end learninginventory optimization

0 likes · 16 min read

End-to-End Inventory Prediction and Contract Allocation for Guaranteed Delivery Advertising

Rare Earth Juejin Tech Community

Jul 31, 2023 · Artificial Intelligence

Overview of Deep Neural Network Architectures

This article provides a comprehensive overview of deep neural network families, introducing twelve major architectures—including Feedforward, CNN, RNN, LSTM, DBN, GAN, Autoencoder, Residual, Capsule, Transformer, Attention, and Deep Reinforcement Learning—explaining their principles, structures, training methods, and offering Python/TensorFlow/PyTorch code examples.

CNNGaNPython

0 likes · 29 min read

Overview of Deep Neural Network Architectures

Programmer DD

Jul 20, 2023 · Artificial Intelligence

Why ChatGPT Mirrors Human Thought: Insights from Stephen Wolfram

ChatGPT, built on massive text training and simple neural network operations, generates human-like language yet lacks true understanding, prompting integration with Wolfram|Alpha’s precise computational language—a synergy highlighted by Stephen Wolfram’s insights on language structure, AI limits, and future computational possibilities.

Artificial IntelligenceChatGPTComputational Language

0 likes · 13 min read

Why ChatGPT Mirrors Human Thought: Insights from Stephen Wolfram

Sohu Tech Products

Jul 19, 2023 · Artificial Intelligence

Understanding the Inner Workings of ChatGPT and Neural Networks

This article explains how ChatGPT generates text by predicting the next token using large language models, describes the role of probability, temperature, and attention mechanisms in transformers, and discusses neural network training, embeddings, semantic spaces, and the broader implications for artificial intelligence research.

Artificial IntelligenceChatGPTLanguage Models

0 likes · 79 min read

Understanding the Inner Workings of ChatGPT and Neural Networks

DataFunSummit

May 29, 2023 · Artificial Intelligence

Neuron‑level Shared Multi‑task Learning for Joint CTR and CVR Prediction

This article introduces a neuron‑level shared multi‑task learning framework that jointly estimates click‑through rate (CTR) and conversion rate (CVR), discusses the background and advantages of multi‑task learning, reviews classic shared‑bottom models, describes the proposed pruning‑based architecture, and presents experimental results demonstrating its effectiveness in large‑scale recommendation systems.

CTRCVRModel Pruning

0 likes · 11 min read

Neuron‑level Shared Multi‑task Learning for Joint CTR and CVR Prediction

21CTO

May 7, 2023 · Artificial Intelligence

The Untold Journey of AI’s Godfather: Geoffrey Hinton’s Life, Legacy, and Risks

Geoffrey Hinton, the Canadian cognitive psychologist and computer scientist known as the father of deep learning, rose from a distinguished scientific family, endured decades of skepticism, pioneered deep belief networks, mentored future AI leaders, and now warns of AI risks after leaving Google, embodying a lifelong commitment to humanity and ethical AI.

AI historyAI riskGeoffrey Hinton

0 likes · 15 min read

The Untold Journey of AI’s Godfather: Geoffrey Hinton’s Life, Legacy, and Risks

DataFunTalk

Apr 3, 2023 · Artificial Intelligence

Implementing RNN, LSTM, and GRU with PyTorch

This article introduces the basic architectures of recurrent neural networks (RNN), LSTM, and GRU, explains PyTorch APIs such as nn.RNN, nn.LSTM, nn.GRU, details their parameters, demonstrates code examples for building and testing these models, and provides practical insights for deep learning practitioners.

GRULSTMPyTorch

0 likes · 9 min read

Implementing RNN, LSTM, and GRU with PyTorch

DataFunTalk

Apr 1, 2023 · Artificial Intelligence

Nvidia Meets OpenAI: Highlights from the GTC Fireside Chat on GPT‑4, Deep Learning History, and the Future of AI

In a GTC fireside chat, Nvidia CEO Jensen Huang and OpenAI co‑founder Ilya Sutskever discuss GPT‑4's multimodal advances, the evolution of deep learning from early neural networks to large‑scale models, the pivotal role of GPUs and datasets like ImageNet, and their vision for more reliable, scalable artificial intelligence.

Artificial IntelligenceGPT-4Multimodal

0 likes · 10 min read

Nvidia Meets OpenAI: Highlights from the GTC Fireside Chat on GPT‑4, Deep Learning History, and the Future of AI

Baidu Geek Talk

Mar 8, 2023 · Artificial Intelligence

Understanding Motion Decomposition in AI-Based Image Animation: From Sparse to Dense Optical Flow

The article details how AI‑based image animation and face‑swapping decompose video motion into zero‑order rigid and first‑order affine components via Taylor expansion, using unsupervised U‑Net keypoint extraction, sparse-to-dense optical flow conversion, and dense motion networks that learn masks for region‑wise rigidity and non‑rigid deformation.

Taylor expansionaffine transformationface swapping

0 likes · 21 min read

Understanding Motion Decomposition in AI-Based Image Animation: From Sparse to Dense Optical Flow

Top Architect

Mar 1, 2023 · Artificial Intelligence

Understanding the Internals of ChatGPT: Neural Networks, Embeddings, and Training Techniques

This article provides a comprehensive overview of how ChatGPT works, covering its probabilistic text generation, transformer architecture, embedding representations, neural network training processes, and the underlying principles that enable large language models to produce coherent and meaningful human-like language.

AIChatGPTEmbeddings

0 likes · 80 min read

Understanding the Internals of ChatGPT: Neural Networks, Embeddings, and Training Techniques

DataFunSummit

Feb 4, 2023 · Artificial Intelligence

Overview of Deep Learning Algorithms: Supervised, Unsupervised, and Semi‑Supervised Methods

This article introduces deep learning as a powerful AI technique, explains its core algorithms—including supervised, unsupervised, and semi‑supervised approaches—and provides concrete examples such as CNN, RNN, autoencoders, GAN, self‑supervised and transfer learning, illustrated with visual demos.

AIGaNMachine Learning

0 likes · 6 min read

Overview of Deep Learning Algorithms: Supervised, Unsupervised, and Semi‑Supervised Methods

Model Perspective

Jan 12, 2023 · Artificial Intelligence

Neural Networks Explained: Architecture, Training, and Reinforcement Basics

This article introduces neural networks, covering their layered structure, common types like CNNs and RNNs, key components such as activation functions, loss, learning rate, backpropagation, dropout, batch normalization, and extends to reinforcement learning concepts including MDPs, policies, value functions, and Q‑learning.

CNNMachine LearningRNN

0 likes · 6 min read

Neural Networks Explained: Architecture, Training, and Reinforcement Basics

MaGe Linux Operations

Nov 26, 2022 · Artificial Intelligence

The Timeless Foundations of Machine Learning: 6 Core Algorithms Explained

Andrew Ng’s latest AI newsletter article revisits six foundational machine‑learning algorithms—linear regression, logistic regression, gradient descent, neural networks, decision trees, and k‑means clustering—tracing their historical origins, core concepts, and lasting impact on modern AI applications.

Decision TreesMachine Learninggradient descent

0 likes · 20 min read

The Timeless Foundations of Machine Learning: 6 Core Algorithms Explained

Model Perspective

Oct 6, 2022 · Artificial Intelligence

Demystifying RNNs and LSTMs: Architecture, Limits, and Python Forecasting

This article explains the structure and operation of recurrent neural networks (RNNs), their limitations, how long short‑term memory (LSTM) networks overcome these issues with gated mechanisms, and provides a complete Python implementation for time‑series airline passenger forecasting.

LSTMPythonRNN

0 likes · 17 min read

Demystifying RNNs and LSTMs: Architecture, Limits, and Python Forecasting

ELab Team

Aug 24, 2022 · Artificial Intelligence

Demystifying AI: From Linear Regression to Neural Networks with TensorFlow.js

This article walks through the fundamentals of artificial intelligence, explaining linear and logistic regression, loss functions, gradient descent, and neural network basics, illustrated with TensorFlow.js code examples, visual analogies, and practical demos, helping readers grasp core concepts and their real‑world applications.

Artificial IntelligenceMachine LearningTensorFlow.js

0 likes · 18 min read

Demystifying AI: From Linear Regression to Neural Networks with TensorFlow.js

Model Perspective

Aug 9, 2022 · Artificial Intelligence

Build a Regression MLP with Keras: Predict California Housing Prices

Learn how to load the California housing dataset, preprocess features, construct a Keras sequential regression MLP, train it with SGD, evaluate performance, and make predictions, all illustrated with concise Python code snippets.

California HousingKerasMLP

0 likes · 3 min read

Build a Regression MLP with Keras: Predict California Housing Prices

Model Perspective

Aug 8, 2022 · Artificial Intelligence

Build a Multi‑Layer Perceptron with Keras: Step‑by‑Step Guide

This tutorial walks through using Keras to create, compile, train, and evaluate a multi‑layer perceptron for image classification on the Fashion MNIST dataset, covering data loading, model construction with the Sequential API, hyperparameter choices, and prediction of new samples.

Fashion-MNISTKerasMLP

0 likes · 16 min read

Build a Multi‑Layer Perceptron with Keras: Step‑by‑Step Guide

DataFunSummit

Jun 11, 2022 · Artificial Intelligence

Transforming Regular Expressions into Neural Networks for Text Classification and Slot Filling

This article explains how regular expressions can be converted into equivalent neural network models—FA‑RNN for classification and FST‑RNN for slot filling—by leveraging finite‑state automata, tensor decomposition, and pretrained word embeddings, achieving zero‑shot performance and strong results in low‑resource scenarios.

FA-RNNneural networksregular expressions

0 likes · 17 min read

Transforming Regular Expressions into Neural Networks for Text Classification and Slot Filling

Code DAO

May 26, 2022 · Artificial Intelligence

Understanding Denoising Diffusion Probabilistic Models: Fundamentals and Process

This article explains the fundamentals of denoising diffusion probabilistic models, detailing the forward Gaussian noise injection, the reverse reconstruction via learned conditional densities, model architecture, loss functions, and experimental results on synthetic datasets, all supported by key research citations.

Generative ModelsMarkov chainarXiv

0 likes · 8 min read

Understanding Denoising Diffusion Probabilistic Models: Fundamentals and Process

DataFunTalk

May 16, 2022 · Artificial Intelligence

Applying Knowledge Graphs to Meituan's Recommendation System: Architecture, Challenges, and Future Directions

This article presents Meituan's large‑scale knowledge graph, its integration into location‑based recommendation, the challenges of explainability, domain diversity, data sparsity and spatiotemporal complexity, and describes a dual‑memory neural network and cross‑domain learning approach that improve recall, ranking and recommendation fairness.

AIcross-domain learningexplainability

0 likes · 15 min read

Applying Knowledge Graphs to Meituan's Recommendation System: Architecture, Challenges, and Future Directions

Baobao Algorithm Notes

Apr 19, 2022 · Artificial Intelligence

Understanding Nonlinearity in Machine Learning: From Logistic Regression to Neural Networks

The article explores the concept of nonlinearity in machine learning, illustrating why tasks like distinguishing cat versus dog or predicting body shape from height and weight are challenging for linear models, and discusses feature engineering, kernel tricks, and periodic activation functions as strategies to introduce nonlinearity and improve model performance.

feature engineeringkernel methodslogistic regression

0 likes · 7 min read

Understanding Nonlinearity in Machine Learning: From Logistic Regression to Neural Networks

NetEase LeiHuo Testing Center

Apr 11, 2022 · Artificial Intelligence

Understanding AI: Definitions, Applications in Games and Products, and Basic Machine Learning Concepts

This article explains what artificial intelligence is, distinguishes weak and strong AI, explores its applications in games, product testing, and NetEase's Fuxi platform, and introduces fundamental machine‑learning concepts such as supervised, unsupervised, and reinforcement learning, as well as neural networks and loss functions.

AIMachine Learninggame AI

0 likes · 10 min read

Understanding AI: Definitions, Applications in Games and Products, and Basic Machine Learning Concepts

DeWu Technology

Mar 11, 2022 · Artificial Intelligence

Deep Learning in Face Recognition

The article surveys deep‑learning‑based face‑recognition systems, detailing detection, preprocessing, and recognition pipelines, describing evaluation metrics such as TAR, FAR, and Rank‑K, reviewing major datasets like LFW, MS‑Celeb‑1M and VGGFace2, and comparing leading architectures—including FaceNet, CenterLoss, SphereFace and InsightFace—while highlighting their strengths, limitations, real‑world applications, and seminal research references.

AIdatasetsdeep learning

0 likes · 14 min read

DataFunSummit

Jan 29, 2022 · Artificial Intelligence

Survey of Model Pruning and Quantization Techniques for Deep Learning

This article provides a comprehensive overview of recent advances in deep learning model compression, focusing on pruning methods—including unstructured, structured, filter-wise, channel-wise, shape-wise, and stripe-wise approaches—and quantization techniques such as linear, non‑linear, clustering, power‑of‑two, binary, and 8‑bit quantization, while discussing evaluation criteria, sparsity ratios, fine‑tuning, and training‑aware quantization.

Model Compressiondeep learningneural networks

0 likes · 23 min read

Survey of Model Pruning and Quantization Techniques for Deep Learning

Laiye Technology Team

Jan 28, 2022 · Artificial Intelligence

Survey of Model Compression and Quantization Techniques for Deep Neural Networks

This article provides a comprehensive overview of deep learning model compression and acceleration methods, detailing pruning strategies, various pruning types, evaluation criteria, sparsity ratios, fine‑tuning procedures, as well as linear and non‑linear quantization approaches, their implementations, and practical considerations.

EfficiencyModel Compressiondeep learning

0 likes · 26 min read

Survey of Model Compression and Quantization Techniques for Deep Neural Networks

Rare Earth Juejin Tech Community

Jan 21, 2022 · Artificial Intelligence

Software 2.0: From Rule‑Based Programming to Model‑Based Development

Software 2.0 replaces traditional rule‑based logic with neural‑network‑driven models, reshaping development across domains by centering on data, probabilistic outcomes, and new workflows for engineers, product managers, and QA while outlining future trends and practical preparations.

AI trendsModel ProgrammingSoftware 2.0

0 likes · 13 min read

Software 2.0: From Rule‑Based Programming to Model‑Based Development

Python Programming Learning Circle

Jan 18, 2022 · Artificial Intelligence

Fashion MNIST Image Classification Using TensorFlow 2.x in Python

This tutorial demonstrates how to load the Fashion MNIST dataset, explore and preprocess the images, build and compile a neural network with TensorFlow 2.x, train the model, evaluate its accuracy, and use the trained model to make predictions on clothing images, providing complete Python code examples throughout.

Fashion-MNISTImage ClassificationPython

0 likes · 16 min read

Fashion MNIST Image Classification Using TensorFlow 2.x in Python

Code DAO

Dec 24, 2021 · Artificial Intelligence

Understanding Neural Network Predictions with Integrated Gradients

This article introduces the Integrated Gradients (IG) method for explaining deep neural networks, compares it with saliency maps and Shapley‑based approaches, discusses its axiomatic foundations, and provides a step‑by‑step guide to implementing IG using the open‑source TruLens library, including custom baselines and attribution measures.

Attribution MethodsIntegrated GradientsModel Explainability

0 likes · 14 min read

Understanding Neural Network Predictions with Integrated Gradients

Baobao Algorithm Notes

Dec 21, 2021 · Artificial Intelligence

Boost Time Series Forecasting with Autocorrelated Error Adjustment – A 5‑Line PyTorch Trick

This article explains a NeurIPS 2021 paper that introduces a learnable autocorrelation correction for neural network time‑series models, shows the underlying theory, provides concise PyTorch code implementing the adjustment, reports a ~17% average performance gain across datasets, and lists additional practical tricks for time‑series forecasting.

PyTorchTime-seriesautocorrelation

0 likes · 6 min read

Boost Time Series Forecasting with Autocorrelated Error Adjustment – A 5‑Line PyTorch Trick

Code DAO

Dec 5, 2021 · Artificial Intelligence

Why Neural Networks Need Batch Normalization: Principles and Mechanics

The article explains the principle behind Batch Normalization, why it is essential for training deep neural networks, how it standardizes activations, the role of learnable scale and shift parameters, the computation steps during training and inference, and discusses placement strategies within a model.

Batch Normalizationdeep learninggradient descent

0 likes · 9 min read

Why Neural Networks Need Batch Normalization: Principles and Mechanics

Code DAO

Dec 5, 2021 · Artificial Intelligence

Understanding DeepMind’s PonderNet: A Thinkable Network for MNIST

This article explains DeepMind’s PonderNet framework, which lets any neural network allocate computation adaptively, demonstrates its implementation with PyTorch Lightning on the MNIST dataset, details the underlying theory, loss functions, training procedure, and evaluates its pondering behavior on rotated digit experiments.

Adaptive ComputationMNISTPonderNet

0 likes · 27 min read

Understanding DeepMind’s PonderNet: A Thinkable Network for MNIST

DataFunTalk

Dec 4, 2021 · Artificial Intelligence

Practical Deep Learning Training Tricks: Cyclic LR, Flooding, Warmup, RAdam, Adversarial Training, Focal Loss, Dropout, Normalization and More

This article compiles essential deep learning training techniques—including cyclic learning rates, flooding, warmup, RAdam optimizer, adversarial training, focal loss, dropout, batch/group/weight normalization, label smoothing, Wasserstein GAN, skip connections, and weight initialization—providing concise explanations and code snippets for each method.

Regularizationdeep learningneural networks

0 likes · 11 min read

Practical Deep Learning Training Tricks: Cyclic LR, Flooding, Warmup, RAdam, Adversarial Training, Focal Loss, Dropout, Normalization and More