Author

AI Algorithm Path

A public account focused on deep learning, computer vision, and autonomous driving perception algorithms, covering visual CV, neural networks, pattern recognition, related hardware and software configurations, and open-source projects.

136

Articles

Likes

Views

Comments

Latest from AI Algorithm Path

100 recent articles max

AI Algorithm Path

Jun 4, 2025 · Artificial Intelligence

Why LLMs Hallucinate and How to Mitigate the Problem

The article explains that hallucinations in large language models stem mainly from the supervised fine‑tuning stage, illustrates the issue with concrete examples, and presents mitigation techniques such as knowledge‑probing data generation and web‑search tool integration using special tokens.

LLMMetaOpenAssistant

0 likes · 12 min read

Why LLMs Hallucinate and How to Mitigate the Problem

AI Algorithm Path

Jun 3, 2025 · Artificial Intelligence

Inside Tencent’s HunyuanVideo-Avatar: How Open‑Source AI Generates Digital Human Videos

Tencent’s HunyuanVideo-Avatar converts a static portrait and an audio clip into a lip‑synced, expressive video using a multimodal diffusion Transformer, offering open‑source weights, detailed module designs, hardware requirements, code examples, and a candid assessment of its strengths and current limitations.

AI video generationCUDAHunyuanVideo-Avatar

0 likes · 8 min read

AI Algorithm Path

May 27, 2025 · Artificial Intelligence

Reinforcement Learning Tutorial 8: Building State Feature Representations for Objective Optimization

This tutorial explains how to construct state feature vectors for reinforcement‑learning value‑function approximation, covering linear, polynomial, Fourier, and radial‑basis representations, as well as state aggregation techniques such as coarse coding and tile coding, and discusses non‑parametric approaches like kernel methods.

feature engineeringfourier basisfunction approximation

0 likes · 16 min read

Reinforcement Learning Tutorial 8: Building State Feature Representations for Objective Optimization

AI Algorithm Path

May 25, 2025 · Artificial Intelligence

Reinforcement Learning Tutorial 7: Introducing Value Function Approximation Methods

This article explains why tabular reinforcement‑learning methods scale poorly, introduces supervised‑learning‑based value‑function approximation using a parameterized vector w, discusses loss design, stochastic‑gradient updates, bootstrapping, semi‑gradient techniques, and linear function approximation, and summarizes practical implications.

gradient Monte Carlolinear function approximationreinforcement learning

0 likes · 13 min read

Reinforcement Learning Tutorial 7: Introducing Value Function Approximation Methods

AI Algorithm Path

May 24, 2025 · Artificial Intelligence

Claude 4 Unveiled: What the New AI Model Means for Coding, Safety, and Pricing

Claude 4 introduces two upgraded models—Opus 4, touted as the world’s best coding model, and Sonnet 4 with stronger reasoning—along with new tool‑use capabilities, benchmark wins, a controversial safety test showing opportunistic extortion, and detailed pricing and availability in the Cursor IDE.

AI modelAnthropicClaude 4

0 likes · 10 min read

Claude 4 Unveiled: What the New AI Model Means for Coding, Safety, and Pricing

AI Algorithm Path

May 24, 2025 · Artificial Intelligence

How N-step Temporal-Difference Methods Extend TD Learning in Reinforcement AI

This tutorial explains how n-step temporal‑difference (TD) algorithms generalize the one‑step TD and Monte‑Carlo methods, presents the n‑step return update rule, walks through a three‑step TD example, shows how Sarsa and Q‑learning can be extended, and discusses how to choose the optimal n value for a given problem.

Monte CarloQ-Learningalgorithm analysis

0 likes · 9 min read

How N-step Temporal-Difference Methods Extend TD Learning in Reinforcement AI

AI Algorithm Path

May 23, 2025 · Artificial Intelligence

Understanding Temporal‑Difference Algorithms in Reinforcement Learning

This tutorial explains temporal‑difference (TD) learning, compares it with dynamic programming and Monte‑Carlo methods, walks through concrete soccer‑match examples, shows one‑step TD versus constant‑α Monte‑Carlo updates, discusses convergence, bias, and introduces popular TD variants such as Sarsa, Q‑learning, Expected Sarsa and double learning.

Monte CarloQ-LearningTD learning

0 likes · 18 min read

Understanding Temporal‑Difference Algorithms in Reinforcement Learning

AI Algorithm Path

May 22, 2025 · Artificial Intelligence

Monte Carlo Policy Improvement in RL: Epsilon‑Greedy, On‑Policy vs Off‑Policy, and Incremental Updates

This tutorial explains how Monte Carlo methods are enhanced in reinforcement learning through epsilon‑greedy and epsilon‑soft policies, Monte Carlo control, a Blackjack Q‑function example, the distinction between on‑policy and off‑policy learning, importance sampling, and efficient incremental update techniques.

Epsilon-GreedyImportance SamplingMonte Carlo

0 likes · 14 min read

Monte Carlo Policy Improvement in RL: Epsilon‑Greedy, On‑Policy vs Off‑Policy, and Incremental Updates

AI Algorithm Path

May 21, 2025 · Artificial Intelligence

Understanding Monte Carlo Algorithms for Reinforcement Learning with a Blackjack Case Study

This article explains Monte Carlo methods for reinforcement learning, compares model‑free and model‑based approaches, details V‑ and Q‑function estimation using a Blackjack example, and discusses exploration‑exploitation trade‑offs and practical advantages of MC algorithms.

BlackjackModel-freeMonte Carlo

0 likes · 13 min read

Understanding Monte Carlo Algorithms for Reinforcement Learning with a Blackjack Case Study

AI Algorithm Path

May 19, 2025 · Artificial Intelligence

Understanding Policy Evaluation and Improvement in Reinforcement Learning

This article explains how to solve Bellman equations, use iterative policy‑evaluation methods, apply the policy‑improvement theorem, and combine both steps in policy iteration, value iteration, and asynchronous variants, illustrated with a 5‑state example and a 4×4 gridworld.

Bellman equationGridWorldgeneralized policy iteration

0 likes · 15 min read

Understanding Policy Evaluation and Improvement in Reinforcement Learning