Tagged articles

691 articles

Page 5 of 7

Mar 20, 2025 · Artificial Intelligence

Unlocking Large‑Scale Deep Reinforcement Learning: PPO, GAE, and PPG Deep Dive

This comprehensive guide examines large‑scale deep reinforcement learning, detailing policy‑gradient fundamentals, the mathematics of PPO and GAE, practical implementation tricks, reward and observation normalization, network initialization, and the newer Phasic Policy Gradient method, all supported by code snippets and key research references.

Algorithm OptimizationDeep RLGAE

0 likes · 19 min read

Unlocking Large‑Scale Deep Reinforcement Learning: PPO, GAE, and PPG Deep Dive

Baobao Algorithm Notes

Mar 19, 2025 · Artificial Intelligence

Why Does GRPO Loss Start at Zero and Grow During OpenR1 Training?

The article explains why the GRPO loss in OpenR1 and trl starts at zero and then rises, detailing the underlying KL‑divergence formulation, the single‑step update mechanism, and how gradients are preserved despite a zero scalar loss, with code examples from the trl implementation.

GRPOLoss InitializationOpenR1

0 likes · 5 min read

Why Does GRPO Loss Start at Zero and Grow During OpenR1 Training?

JD Retail Technology

Mar 18, 2025 · Artificial Intelligence

Multi‑Agent Reinforcement Learning Based Full‑Chain Computation Allocation (MaRCA) for Advertising Systems

MaRCA, a multi‑agent reinforcement‑learning framework, allocates compute across JD’s advertising playback chain by jointly estimating user value, resource consumption, and action outcomes while dynamically adjusting to real‑time load, achieving roughly 15 % higher ad revenue without extra compute resources.

AdvertisingCompute Schedulingdeep learning

0 likes · 18 min read

Multi‑Agent Reinforcement Learning Based Full‑Chain Computation Allocation (MaRCA) for Advertising Systems

Architect

Mar 17, 2025 · Artificial Intelligence

Can a 7B Language Model Solve Sudoku with Reinforcement Learning? Findings and Lessons

This article details a reinforcement‑learning experiment that teaches 7B‑ and 3B‑parameter language models to solve Sudoku, covering data preparation, GRPO‑based reward design, training configurations, performance comparisons, key insights, and future research directions.

GRPOLanguage ModelsModel Scaling

0 likes · 15 min read

Can a 7B Language Model Solve Sudoku with Reinforcement Learning? Findings and Lessons

Data Thinking Notes

Mar 16, 2025 · Artificial Intelligence

Why DeepSeek R1 Swaps PPO for GRPO: A Deep Dive into RLHF Alternatives

DeepSeek‑R1 replaces the traditional PPO‑based RLHF approach with GRPO, reducing reliance on human‑labeled data by using pure reinforcement learning environments and carefully designed reward mechanisms; the article explains reinforcement learning fundamentals, compares PPO, DPO and GRPO, and offers practical application recommendations.

AI alignmentDPOGRPO

0 likes · 14 min read

Why DeepSeek R1 Swaps PPO for GRPO: A Deep Dive into RLHF Alternatives

Architect

Mar 16, 2025 · Artificial Intelligence

Training a 0.5B LLM with Chain‑of‑Thought Reasoning: From Pre‑training to GRPO Fine‑tuning

This article walks through the complete lifecycle of building a small large‑language model, covering token‑level inference, pre‑training, post‑training steps such as supervised fine‑tuning, reward‑model creation, and reinforcement‑learning methods like DPO, PPO and GRPO, culminating in a practical 0.5B model fine‑tuned for chain‑of‑thought reasoning.

GRPOLLM trainingReward Modeling

0 likes · 22 min read

Training a 0.5B LLM with Chain‑of‑Thought Reasoning: From Pre‑training to GRPO Fine‑tuning

AI Algorithm Path

Mar 14, 2025 · Artificial Intelligence

Understanding Different Types of AI Agents: From Simple Reflex to Multi‑Agent Systems

This article introduces the main categories of AI agents—including simple reflex, model‑based, goal‑based, utility‑based, learning, hierarchical, and multi‑agent systems—explaining their operating principles, typical use cases, advantages, limitations, and providing concrete Python code examples for each.

AI agentsAgent TypesMachine Learning

0 likes · 19 min read

Understanding Different Types of AI Agents: From Simple Reflex to Multi‑Agent Systems

JD Retail Technology

Mar 14, 2025 · Artificial Intelligence

CTR-Driven Advertising Image Generation Using Multimodal Large Language Models

The paper presents CAIG, a CTR‑driven advertising image generation pipeline that pre‑trains a multimodal LLM on e‑commerce data, trains a reward model on CTR‑labeled image pairs, and fine‑tunes generation via product‑centric preference optimization, achieving state‑of‑the‑art online and offline performance.

AICTRMultimodal LLM

0 likes · 11 min read

CTR-Driven Advertising Image Generation Using Multimodal Large Language Models

JD Tech Talk

Mar 13, 2025 · Artificial Intelligence

CTR-Driven Advertising Image Generation with Multimodal Large Language Models

This paper proposes CAIG, a novel method for generating high-CTR advertising images using multimodal large language models, combining reinforcement learning and preference optimization to align generated content with product features.

CTR predictionMultimodal Large Language ModelsPreference Optimization

0 likes · 10 min read

CTR-Driven Advertising Image Generation with Multimodal Large Language Models

NewBeeNLP

Mar 11, 2025 · Artificial Intelligence

How DeepSeek’s New Architecture Redefines LLM Efficiency and Performance

This article analyzes DeepSeek’s recent breakthroughs—including the Multi‑Head Latent Attention (MLA), Group Relative Policy Optimization (GRPO), and a refined Mixture‑of‑Experts design—along with its three‑stage training pipeline, RL‑only R1‑Zero variant, and benchmark comparisons against GPT‑4o‑Mini and Llama 3.1, highlighting both gains and remaining challenges.

DeepSeekLLMMixture of Experts

0 likes · 18 min read

Architect

Mar 9, 2025 · Artificial Intelligence

Experiments with Reinforcement Learning Fine‑Tuning of a 0.5B Qwen Model on the KK Dataset

The author reports a series of reinforcement‑learning‑based fine‑tuning experiments on a 0.5‑billion‑parameter Qwen‑0.5VB instruct model using the KK dataset, detailing reward design adjustments, curriculum‑style data scaling, observed convergence issues, and hypotheses about why small models fail to develop long reasoning chains.

Curriculum LearningLLM fine-tuningreinforcement learning

0 likes · 11 min read

Experiments with Reinforcement Learning Fine‑Tuning of a 0.5B Qwen Model on the KK Dataset

Top Architect

Mar 9, 2025 · Artificial Intelligence

Alibaba Unveils Qwen QwQ-32B: A Compact Open‑Source LLM Rivaling DeepSeek

Alibaba has released the open‑source Qwen QwQ‑32B model, a 32‑billion‑parameter LLM that matches DeepSeek‑R1's performance while being deployable on consumer‑grade GPUs, and the announcement is accompanied by extensive promotional offers for AI‑related products and services.

AI BenchmarkAlibabaLarge Language Model

0 likes · 7 min read

Alibaba Unveils Qwen QwQ-32B: A Compact Open‑Source LLM Rivaling DeepSeek

DataFunTalk

Mar 7, 2025 · Artificial Intelligence

DeepSeek R1 Technical Report: Insights into Reasoning Models and Their Impact

This presentation reviews the development, technical details, and societal impact of DeepSeek's R1 model, explaining its reasoning capabilities, training pipeline, comparisons with other models, and future directions for AI research and product applications.

AI researchDeepSeekR1

0 likes · 53 min read

DeepSeek R1 Technical Report: Insights into Reasoning Models and Their Impact

Baobao Algorithm Notes

Mar 6, 2025 · Artificial Intelligence

Alibaba Unveils QwQ-32B: A 32‑Billion‑Parameter Inference Model with Agent Capabilities

Alibaba has open‑sourced its new QwQ‑32B inference model, a 32.5‑billion‑parameter transformer that rivals top models like DeepSeek‑R1 and o1‑mini, features integrated agent abilities for tool use and critical thinking, and offers a low inference barrier with extensive technical specifications and RL‑based training details.

AlibabaLarge Language ModelTransformer

0 likes · 4 min read

Alibaba Unveils QwQ-32B: A 32‑Billion‑Parameter Inference Model with Agent Capabilities

21CTO

Mar 5, 2025 · Artificial Intelligence

Why Barto and Sutton Won the 2024 Turing Award: The Rise of Reinforcement Learning

The ACM awarded Andrew Barto and Richard Sutton the 2024 Turing Award for pioneering reinforcement learning, detailing their seminal contributions, academic biographies, and urgent warnings about AI safety in a comprehensive overview.

Andrew BartoArtificial IntelligenceMachine Learning

0 likes · 7 min read

Why Barto and Sutton Won the 2024 Turing Award: The Rise of Reinforcement Learning

Ma Wei Says

Mar 4, 2025 · Artificial Intelligence

Microsoft’s Open‑Source Multimodal AI Agent Model Magma: Capabilities and Innovations

On February 25 2025, Microsoft open‑sourced its first multimodal AI agent foundation model, Magma, which extends multimodal processing to images, video, and text, introduces Set‑of‑Mark and Trace‑of‑Mark techniques for spatial‑temporal reasoning, optimizes modular inference for edge devices, and integrates reinforcement learning for adaptive task execution.

Foundation ModelMagmaSet-of-Mark

0 likes · 6 min read

Microsoft’s Open‑Source Multimodal AI Agent Model Magma: Capabilities and Innovations

Architect

Mar 3, 2025 · Artificial Intelligence

Unlocking Reasoning LLMs: Methods, DeepSeek R1 Insights, and Cost‑Effective Strategies

This article examines how to build and improve reasoning‑capable large language models, explains the definition and use‑cases of reasoning models, details DeepSeek‑R1’s training pipeline, compares four key enhancement methods—including inference‑time scaling, pure RL, SFT + RL, and distillation—and offers budget‑friendly advice.

AI researchDeepSeekInference Scaling

0 likes · 27 min read

Unlocking Reasoning LLMs: Methods, DeepSeek R1 Insights, and Cost‑Effective Strategies

Architect

Feb 27, 2025 · Artificial Intelligence

Understanding Inference Large Language Models: DeepSeek‑R1 and the Rise of Test‑Time Computation

This article explains how inference‑oriented large language models such as DeepSeek‑R1 and OpenAI o1‑mini shift AI research from training‑time scaling to test‑time computation, detailing the underlying principles, new scaling laws, verification techniques, reinforcement‑learning pipelines, and practical methods for distilling reasoning capabilities into smaller models.

DeepSeek-R1Large Language Modelsinference

0 likes · 18 min read

Understanding Inference Large Language Models: DeepSeek‑R1 and the Rise of Test‑Time Computation

ShiZhen AI

Feb 27, 2025 · Artificial Intelligence

Step‑by‑Step Guide: Build Your Own Lerobot SO‑ARM100 Robotic Arm from Scratch

This article walks you through the entire process of assembling a low‑cost Lerobot SO‑ARM100 6‑DOF robotic arm, configuring its Feetech servos, calibrating motion, adding dual cameras for teleoperation, collecting a dataset, and training a reinforcement‑learning policy locally or on cloud GPUs, with detailed troubleshooting tips and code examples.

LerobotPythonSO-ARM100

0 likes · 16 min read

Step‑by‑Step Guide: Build Your Own Lerobot SO‑ARM100 Robotic Arm from Scratch

Tencent Technical Engineering

Feb 26, 2025 · Artificial Intelligence

Engineers' Perspectives on DeepSeek: Technical Innovations and Implications

Thirteen engineers praise DeepSeek’s open‑source, reinforcement‑learning‑driven architecture—using FP8 storage and SFT‑free training—to deliver GPT‑4‑level reasoning at one‑twentieth the cost, enabling single‑GPU deployment, lowering barriers for academia and startups, and prompting notable market reactions that could democratize advanced AI.

AI cost reductionDeepSeekFP8

0 likes · 9 min read

Engineers' Perspectives on DeepSeek: Technical Innovations and Implications

DevOps

Feb 23, 2025 · Artificial Intelligence

Understanding Reinforcement Learning, RLHF, PPO and GRPO for AI Applications

This article explains how DeepSeek‑R1‑Zero uses group‑relative policy optimization (GRPO) to enhance inference without labeled data, introduces reinforcement learning with human feedback (RLHF) and its components, and compares the PPO and GRPO algorithms, highlighting their suitable engineering scenarios and practical implications for AI applications.

AI model trainingGRPOPPO

0 likes · 15 min read

Understanding Reinforcement Learning, RLHF, PPO and GRPO for AI Applications

Architect

Feb 22, 2025 · Artificial Intelligence

How Open‑Source Projects Reproduced DeepSeek‑R1 and Pushed LLM Limits

This article reviews the most notable open‑source reproductions of DeepSeek‑R1—including Open R1, OpenThoughts, LIMO and DeepScaleR—detailing their data pipelines, training steps, reinforcement‑learning strategies, dataset constructions, and benchmark results that demonstrate how small, high‑quality data can rival massive‑scale models.

AI researchDataset ConstructionDeepSeek-R1

0 likes · 26 min read

How Open‑Source Projects Reproduced DeepSeek‑R1 and Pushed LLM Limits

Python Programming Learning Circle

Feb 20, 2025 · Artificial Intelligence

Building a StarCraft II AI Bot with DeepMind's pysc2 in Python

This article provides a step‑by‑step guide, complete with Python code examples, for creating a Protoss AI bot using DeepMind's pysc2 library to mine resources, construct buildings, train units, implement scouting, and execute attack strategies against increasingly difficult computer opponents in StarCraft II.

AI botDeepMindStarCraft II

0 likes · 28 min read

Building a StarCraft II AI Bot with DeepMind's pysc2 in Python

Architect's Alchemy Furnace

Feb 19, 2025 · Artificial Intelligence

DeepSeek’s Self‑Correction: Transforming AI Reliability and Safety

The article explores DeepSeek’s innovative self‑correction system—combining a Mixture‑of‑Experts architecture with reinforcement‑learning feedback—to achieve real‑time error detection, dynamic knowledge‑graph updates, and enhanced safety in high‑risk fields like autonomous driving and medical diagnostics.

AI safetyDeepSeekMixture of Experts

0 likes · 9 min read

DeepSeek’s Self‑Correction: Transforming AI Reliability and Safety

Tencent Technical Engineering

Feb 19, 2025 · Artificial Intelligence

Reproduction and Analysis of DeepSeek R1/R1‑zero Reinforcement Learning Experiments

This note surveys four open‑source reproductions of DeepSeek R1/R1‑zero reinforcement‑learning pipelines, re‑implements their training on math and logic datasets using Qwen‑based models, shows that format‑plus‑accuracy rewards improve long‑chain reasoning though stability and scaling remain challenges, and outlines future directions for large‑scale RL and business deployment.

DeepSeek-R1Large Language Modellong chain of thought

0 likes · 39 min read

Reproduction and Analysis of DeepSeek R1/R1‑zero Reinforcement Learning Experiments

AI Algorithm Path

Feb 18, 2025 · Artificial Intelligence

Build DeepSeek‑R1 from Scratch: Complete Training Process with Code Walkthrough

This article provides a step‑by‑step, code‑first guide to reproducing DeepSeek‑R1 from the ground up, covering model selection, dataset preparation, custom reward functions, GRPO reinforcement‑learning training, supervised fine‑tuning, reasoning‑oriented RL, rejection sampling, and model distillation.

DeepSeek-R1LLM trainingPython

0 likes · 48 min read

Build DeepSeek‑R1 from Scratch: Complete Training Process with Code Walkthrough

DataFunTalk

Feb 16, 2025 · Artificial Intelligence

Understanding Reasoning LLMs: DeepSeek R1 Variants, Inference‑Time Scaling, and Training Strategies

This article explains what reasoning language models are, outlines their strengths and weaknesses, details DeepSeek R1's three variants and their training pipelines—including pure reinforcement learning, SFT + RL, and distillation—while also discussing inference‑time scaling techniques and related research such as Sky‑T1 and TinyZero.

DeepSeekInference Scalingmodel distillation

0 likes · 16 min read

Understanding Reasoning LLMs: DeepSeek R1 Variants, Inference‑Time Scaling, and Training Strategies

JD Cloud Developers

Feb 13, 2025 · Artificial Intelligence

Unlocking DeepSeek R1: Concepts, Training Secrets, and Real-World Experiments

This article demystifies DeepSeek R1 by explaining key concepts such as online search integration and the R1 model, detailing its two‑phase training pipeline, core techniques like iterative data enhancement, and showcases practical reproductions, benchmark tests, and deployment examples for AI developers.

DeepSeekKnowledge DistillationLarge Language Model

0 likes · 12 min read

Unlocking DeepSeek R1: Concepts, Training Secrets, and Real-World Experiments

AI Algorithm Path

Feb 12, 2025 · Artificial Intelligence

Essential DeepSeek‑R1 Reading List: Papers Behind the 2025 Hottest LLM

This article compiles a curated reading list of foundational and recent research papers—from the original Transformer to chain‑of‑thought, mixture‑of‑experts, and reinforcement‑learning studies—that together explain the breakthroughs behind DeepSeek‑R1 and guide readers through the technical evolution of modern large language models.

DeepSeekLarge Language ModelMixture of Experts

0 likes · 15 min read

Essential DeepSeek‑R1 Reading List: Papers Behind the 2025 Hottest LLM

Baobao Algorithm Notes

Feb 12, 2025 · Artificial Intelligence

How X‑R1 Triggers Aha Moments in Low‑Cost RL Training of 0.5B LLMs

The X‑R1 open‑source framework demonstrates that a 0.5B language model can achieve rapid reasoning improvements and observable "Aha Moments" using reinforcement learning on a modest 4‑GPU setup, detailing its design, performance metrics, installation steps, and future roadmap.

AILLMOpen Source

0 likes · 6 min read

How X‑R1 Triggers Aha Moments in Low‑Cost RL Training of 0.5B LLMs

Architects' Tech Alliance

Feb 10, 2025 · Industry Insights

What Makes DeepSeek’s New V3 Model Rival GPT‑4o? A Deep Dive into Large‑Scale AI

This article explains what defines a large AI model, compares parameter scales of GPT‑3, GPT‑4 and M6, and analyzes DeepSeek’s recent releases—V3, R1, and Janus‑Pro—highlighting their benchmark performance, reinforcement‑learning techniques, and cost efficiency versus leading proprietary models.

AI BenchmarkDeepSeekLarge Models

0 likes · 5 min read

What Makes DeepSeek’s New V3 Model Rival GPT‑4o? A Deep Dive into Large‑Scale AI

Big Data Technology Architecture

Feb 9, 2025 · Artificial Intelligence

Reproducing Deepseek RI Reasoning Ability with GRPO on Qwen2.5‑7B in Colab

This article explains how to replicate Deepseek RI's slow‑thinking inference using the GRPO reinforcement‑learning algorithm on the Qwen2.5‑7B model in a free Colab notebook, covering the underlying COT concept, reward‑function design, data preparation, training configuration, and observed results.

GRPOLLMPython

0 likes · 14 min read

Reproducing Deepseek RI Reasoning Ability with GRPO on Qwen2.5‑7B in Colab

Top Architect

Feb 9, 2025 · Artificial Intelligence

DeepSeek‑R1: Training Pipeline, Reinforcement‑Learning Techniques, and Experimental Results

The article reviews DeepSeek‑R1’s training methodology—including cold‑start data collection, multi‑stage RL fine‑tuning, SFT data generation, and model distillation—highlights its performance comparable to OpenAI‑o1‑1217, and discusses key contributions, reward design, successful experiments, and failed attempts.

AI researchDeepSeekLLM

0 likes · 12 min read

DeepSeek‑R1: Training Pipeline, Reinforcement‑Learning Techniques, and Experimental Results

Architects' Tech Alliance

Feb 9, 2025 · Artificial Intelligence

How DeepSeek R1 Replicates OpenAI o1 Using Large‑Scale Reinforcement Learning

The article provides an in‑depth technical analysis of DeepSeek R1, explaining how it reproduces OpenAI o1's reasoning abilities through rule‑based large‑scale reinforcement learning, mixed SFT data, and efficient scaling, while discussing its broader impact on AI model development and capability density trends.

AI IndustryCapability DensityDeepSeek

0 likes · 19 min read

How DeepSeek R1 Replicates OpenAI o1 Using Large‑Scale Reinforcement Learning

AI2ML AI to Machine Learning

Feb 8, 2025 · Artificial Intelligence

Analyzing DeepSeek R1 Inference Projects: Source Code, Cold‑Start, and Scaling Techniques

This article examines DeepSeek R1’s three breakthroughs, its low‑cost optimizations that bypass CUDA, and the resulting impact on the AI ecosystem, then provides a detailed technical review of seven open‑source reproductions—Open‑R1, Tiny‑Zero, SimpleScaling‑S1, and simpleRL‑reason—covering their architectures, reinforcement‑learning pipelines, and code implementations.

DeepSeekInference ScalingLarge Language Models

0 likes · 10 min read

Analyzing DeepSeek R1 Inference Projects: Source Code, Cold‑Start, and Scaling Techniques

JavaEdge

Feb 8, 2025 · Artificial Intelligence

Why DeepSeek R1 Rivals ChatGPT o1: Architecture, Training, and Cost Insights

This article provides a detailed technical analysis of DeepSeek's R1 large language model, covering its background, architecture, training methods, hardware optimizations, performance claims, user impressions, deployment options, and the challenges of reproducing its results.

AI trainingDeepSeekGPU Cost

0 likes · 16 min read

Architect

Feb 8, 2025 · Artificial Intelligence

DeepSeek‑R1: From Zero to Full‑Featured AI Model via Cold‑Start Data and Multi‑Stage Training

The article explains how DeepSeek‑R1 improves upon the Zero version by introducing expert‑crafted cold‑start data and a four‑phase multi‑stage training pipeline, resulting in markedly better reasoning, coding, and general knowledge performance across benchmark tests.

AI inferenceDeepSeekcold-start data

0 likes · 8 min read

DeepSeek‑R1: From Zero to Full‑Featured AI Model via Cold‑Start Data and Multi‑Stage Training

Architect

Feb 6, 2025 · Artificial Intelligence

DeepSeek‑R1: Reinforcement‑Learning‑Driven Long‑Chain Reasoning for Large Language Models

The article reviews DeepSeek‑R1, detailing its reinforcement‑learning‑based training pipeline that uses minimal supervised data, cold‑start fine‑tuning, multi‑stage RL, rejection‑sampling SFT, and distillation to achieve reasoning performance comparable to OpenAI‑o1‑1217, while also discussing successful contributions and failed experiments.

AI researchDeepSeek-R1LLM reasoning

0 likes · 11 min read

DeepSeek‑R1: Reinforcement‑Learning‑Driven Long‑Chain Reasoning for Large Language Models

Architect

Feb 3, 2025 · Artificial Intelligence

How DeepSeek‑R1 Uses Pure Reinforcement Learning to Match OpenAI’s o1

This article presents DeepSeek‑R1 and DeepSeek‑R1‑Zero, two next‑generation LLMs trained with pure reinforcement learning and multi‑stage fine‑tuning, details their GRPO training framework, model‑distillation pipeline, open‑source release, and evaluation results that rival OpenAI’s o1‑1217 across reasoning, knowledge, and coding benchmarks.

DeepSeekLLM evaluationLarge Language Models

0 likes · 10 min read

How DeepSeek‑R1 Uses Pure Reinforcement Learning to Match OpenAI’s o1

Cognitive Technology Team

Feb 3, 2025 · Artificial Intelligence

DeepSeek R1 Introduces Group‑Related Policy Optimization for Advanced Reasoning in Large Language Models

DeepSeek AI’s new open‑source model DeepSeek‑R1 leverages a novel Group‑Related Policy Optimization (GRPO) reinforcement‑learning framework and multi‑stage training to dramatically boost complex reasoning performance, achieving AIME 2024 Pass@1 scores comparable to OpenAI’s o1 model.

AIDeepSeekGRPO

0 likes · 4 min read

DeepSeek R1 Introduces Group‑Related Policy Optimization for Advanced Reasoning in Large Language Models

Baobao Algorithm Notes

Jan 22, 2025 · Artificial Intelligence

Can RL‑Only Training Make LLMs Beat OpenAI‑o1? Inside DeepSeek‑R1’s Architecture and Results

DeepSeek‑R1’s open‑source series demonstrates that reinforcement‑learning‑only training can match top‑tier models like OpenAI‑o1, while a small amount of SFT further improves readability; the article dissects its technical report, training pipeline, reward design, distillation strategy, benchmark outcomes, and remaining challenges.

DeepSeekLarge Language ModelSupervised Fine‑Tuning

0 likes · 11 min read

Can RL‑Only Training Make LLMs Beat OpenAI‑o1? Inside DeepSeek‑R1’s Architecture and Results

php Courses

Jan 21, 2025 · Artificial Intelligence

Building Reinforcement Learning Algorithms with PHP

This article introduces reinforcement learning, explains its core concepts, and demonstrates how to implement a simple reinforcement learning algorithm in PHP using libraries such as Keras or TensorFlow, providing complete example code for environment and agent classes and outlining training and testing steps.

Artificial IntelligenceCode ExamplePHP

0 likes · 5 min read

Building Reinforcement Learning Algorithms with PHP

Alimama Tech

Jan 8, 2025 · Artificial Intelligence

Model-Based Reinforcement Learning Auto‑Bidding Algorithms for Online Advertising

The paper introduces a model‑based reinforcement‑learning auto‑bidding framework that learns a neural‑network environment model from real logs, generates confidence‑aware virtual data fused with real data, and employs the COMBO+MICRO stabilizer and a Lagrange‑dual method for ROI‑constrained bidding, delivering up to 6.8 % higher consumption, 5 % GMV growth and 3.7 % ROI improvement on Alibaba’s platform.

auto-biddingbudget constrained biddingmodel-based RL

0 likes · 22 min read

Model-Based Reinforcement Learning Auto‑Bidding Algorithms for Online Advertising

DataFunSummit

Jan 5, 2025 · Artificial Intelligence

Multi‑Objective Deep Reinforcement Learning Framework for E‑commerce Traffic Allocation (MODRL‑TA)

The article presents a CIKM‑2024 paper that introduces MODRL‑TA, a multi‑objective deep reinforcement learning system combining multi‑objective Q‑learning, a cross‑entropy‑based decision‑fusion algorithm, and a progressive data‑augmentation pipeline to dynamically allocate search traffic on JD.com, with both offline and online experiments showing substantial gains in CTR, CVR, and overall platform performance.

cross-entropy methoddeep learninge-commerce

0 likes · 14 min read

Multi‑Objective Deep Reinforcement Learning Framework for E‑commerce Traffic Allocation (MODRL‑TA)

JD Retail Technology

Dec 26, 2024 · Artificial Intelligence

Multi‑Objective Deep Reinforcement Learning Framework for E‑commerce Traffic Allocation (MODRL‑TA)

MODRL‑TA is a multi‑objective deep reinforcement learning framework that unites independent Q‑learning agents, a cross‑entropy‑based decision‑fusion module, and progressive data‑augmentation to overcome cold‑start and multi‑objective trade‑offs in e‑commerce traffic allocation, delivering up to 18% more impressions, 4% higher CTR and 5% higher CVR in live tests.

deep learninge-commercemulti-objective

0 likes · 14 min read

Alimama Tech

Dec 17, 2024 · Artificial Intelligence

AuctionNet: A Novel Benchmark for Decision-Making in Large-Scale Games

AuctionNet is a newly introduced benchmark that recreates a massive, realistic online advertising auction environment using latent diffusion‑generated traffic data, provides an 80 GB dataset of 5 × 10⁸ logs from 48 bidding agents, and offers baseline evaluations—including an Online LP that outperforms others—supporting thousands of fair NeurIPS 2024 competition submissions and open‑source tools for large‑scale game decision‑making research.

Generative Modelsauto-biddingbenchmark

0 likes · 15 min read

AuctionNet: A Novel Benchmark for Decision-Making in Large-Scale Games

Kuaishou Tech

Dec 17, 2024 · Artificial Intelligence

NeurIPS 2024 Auto‑Bidding in Large‑Scale Auctions: Kuaishou Team Wins Both General and AIGB Tracks

The NeurIPS 2024 Auto‑Bidding competition attracted over 15,000 submissions and 1,500 teams, featuring two tracks—General and AI‑Generated Bidding—where Kuaishou’s commercial algorithm team secured first place in both by leveraging reinforcement‑learning‑based online exploration and a decision‑transformer‑driven generative approach, achieving more than a 5% lift in ad revenue.

AdvertisingGenerative ModelsKuaishou

0 likes · 13 min read

NeurIPS 2024 Auto‑Bidding in Large‑Scale Auctions: Kuaishou Team Wins Both General and AIGB Tracks

Baobao Algorithm Notes

Dec 7, 2024 · Artificial Intelligence

What Is Reinforcement Fine-Tuning (RFT) and How Does It Supercharge LLMs?

Reinforcement Fine-Tuning (RFT) combines supervised fine‑tuning with reinforcement learning to teach large language models to reason more effectively, using separate training and validation datasets, graders, and PPO optimization, and has shown superior performance on tasks like gene prediction and math reasoning compared to standard SFT.

AILarge Language ModelsMachine Learning

0 likes · 8 min read

What Is Reinforcement Fine-Tuning (RFT) and How Does It Supercharge LLMs?

AI Product Manager Community

Dec 7, 2024 · Artificial Intelligence

How Reinforcement Fine-Tuning (RFT) Is Redefining AI Customization

Reinforcement Fine-Tuning (RFT), unveiled at OpenAI’s 12‑day launch, introduces a feedback‑loop approach that transforms generic models into specialized experts using reinforcement learning, small data, and domain‑specific scorers, offering product managers a powerful tool for rapid, cost‑effective AI customization across industries.

AI customizationMachine Learningfine-tuning

0 likes · 7 min read

How Reinforcement Fine-Tuning (RFT) Is Redefining AI Customization

Bilibili Tech

Dec 6, 2024 · Artificial Intelligence

Ensemble-based Offline-to-Online Reinforcement Learning (ENOTO): Methodology, Experiments, and Analysis

ENOTO introduces ensemble Q‑networks into the offline‑to‑online reinforcement‑learning pipeline, using minimum‑Q and uncertainty‑driven exploration to stabilize fine‑tuning, boost learning efficiency, and achieve 10‑25 % higher cumulative returns with minimal online interaction across MuJoCo and AntMaze benchmarks.

AntMazeENOTOEnsemble Q-Networks

0 likes · 16 min read

Ensemble-based Offline-to-Online Reinforcement Learning (ENOTO): Methodology, Experiments, and Analysis

Alimama Tech

Dec 4, 2024 · Artificial Intelligence

AIGB: Generative Auto‑Bidding via Diffusion Modeling

AIGB, introduced by Alibaba Mama in 2023, reframes large‑scale ad‑auction auto‑bidding as a generative sequence task using diffusion models, achieving up to 5 % GMV gains, improved stability and interpretability, and is now commercialized, open‑sourced, and featured in a NeurIPS‑endorsed competition.

AIGenerative Modelsauto-bidding

0 likes · 12 min read

AIGB: Generative Auto‑Bidding via Diffusion Modeling

Model Perspective

Dec 3, 2024 · Artificial Intelligence

How Recommendation Algorithms Shape Our Habits—and What You Can Do About It

The article examines how recommendation algorithms reinforce user preferences, turning habits into stable feedback loops, and proposes mathematical models and practical strategies to introduce diversity and break behavioral fixation in the age of algorithmic personalization.

Diversitybehavioral modelinghabit formation

0 likes · 7 min read

How Recommendation Algorithms Shape Our Habits—and What You Can Do About It

DataFunTalk

Nov 30, 2024 · Artificial Intelligence

Interview with Rich Sutton on Continuous Learning, Reinforcement Learning, and the Future of AI

In this extensive interview, Rich Sutton critiques the focus on transient deep learning, advocates for continuous learning, discusses the reward hypothesis, outlines research challenges, offers advice to emerging scholars, and predicts breakthroughs in AI understanding by 2030‑2040.

AI researchcontinuous learningfuture of AI

0 likes · 27 min read

Interview with Rich Sutton on Continuous Learning, Reinforcement Learning, and the Future of AI

Python Programming Learning Circle

Nov 4, 2024 · Artificial Intelligence

Reinforcement Learning with highway‑env and Gym: DQN for Autonomous Driving

This tutorial explains how to install the gym and highway‑env packages, configure a highway simulation environment, process observations and actions, build a DQN network in PyTorch, train the agent, and analyze training results for autonomous driving scenarios.

Autonomous DrivingDQNgym

0 likes · 11 min read

Reinforcement Learning with highway‑env and Gym: DQN for Autonomous Driving

DaTaobao Tech

Oct 30, 2024 · Artificial Intelligence

Understanding OpenAI o1: Chain‑of‑Thought, Scaling Laws, and Training Strategies

The article explains how OpenAI’s o1 model leverages chain‑of‑thought prompting, dual‑system cognitive theory, and new scaling laws—pre‑training on code/math and post‑training reinforcement with step‑wise reward models—to achieve superior reasoning, safety, and performance over GPT‑4, heralding a shift toward models that learn to think.

LLMSafetychain-of-thought

0 likes · 42 min read

Understanding OpenAI o1: Chain‑of‑Thought, Scaling Laws, and Training Strategies

Baobao Algorithm Notes

Oct 21, 2024 · Artificial Intelligence

Unraveling RLHF: From PPO to DPO and Beyond – A Comprehensive Guide

This article provides a thorough, four‑part overview of RLHF for large language models, covering preference‑optimization algorithms (PPO‑based and offline RL approaches), reward‑model training techniques, inference‑time exploration strategies, and practical implementation details including the OpenRLHF framework and resource‑allocation tricks.

DPOLLM optimizationOpenRLHF

0 likes · 27 min read

Unraveling RLHF: From PPO to DPO and Beyond – A Comprehensive Guide

Baobao Algorithm Notes

Oct 15, 2024 · Artificial Intelligence

How DPO Simplifies RLHF: A Deep Dive into Direct Preference Optimization

This article breaks down how Direct Preference Optimization (DPO) mathematically reduces the two‑stage RLHF pipeline into a single‑stage SFT process, explains the underlying loss transformations, and discusses DPO's practical limitations and trade‑offs for large language model alignment.

DPODirect Preference OptimizationMachine Learning

0 likes · 9 min read

How DPO Simplifies RLHF: A Deep Dive into Direct Preference Optimization

DataFunSummit

Sep 27, 2024 · Artificial Intelligence

Advances in Educational Large Language Models for Youth Programming and Personalized Learning

The presentation by Dr. Su Yu outlines challenges such as data sparsity and delayed learning effects in AI‑driven education, introduces three technical breakthroughs—domain‑specific LLM training, small‑knowledge learning via hierarchical knowledge graphs, and reinforcement‑based cognitive recommendation—and showcases product applications like the Frog Programming Platform, AI Programming Learning Machine, and digital‑human AI recorded courses.

AI educationKnowledge GraphPersonalized Learning

0 likes · 18 min read

Advances in Educational Large Language Models for Youth Programming and Personalized Learning

Model Perspective

Sep 27, 2024 · Artificial Intelligence

Modeling Everyday Learning: From Reinforcement to Social Learning

The article explores how everyday decision‑making can be modeled using reinforcement learning and social learning frameworks, illustrating their strengths, limitations, and combined insights for understanding individual and collective behavior.

AIbehavioral modelingdecision making

0 likes · 8 min read

Modeling Everyday Learning: From Reinforcement to Social Learning

Architect

Sep 26, 2024 · Artificial Intelligence

Decoding OpenAI o1: How RL‑LLM Fusion Powers Next‑Gen Reasoning

This article provides a detailed technical analysis of OpenAI’s o1 model, exploring its enhanced logical reasoning, the likely use of reinforcement learning with hidden chain‑of‑thought generation, multi‑model architecture, training data pipelines, reward modeling, and how these innovations could reshape AI safety and scaling strategies.

AI safetyLLMOpenAI o1

0 likes · 43 min read

Decoding OpenAI o1: How RL‑LLM Fusion Powers Next‑Gen Reasoning

JD Tech Talk

Sep 23, 2024 · Artificial Intelligence

JD Advertising R&D: AI‑Driven Solutions for Traffic Valuation, Multimodal Understanding, Auction Mechanisms, Generative Recommendation, and Large‑Model Engineering

The JD Advertising R&D team applies cutting‑edge AI techniques—including query intent models, multimodal representation pipelines, reinforcement‑learning‑based auction mechanisms, generative recommendation with quantized product tokens, and large‑model infrastructure—to boost traffic valuation, ad relevance, revenue, and creative generation across the platform.

AIAdvertisingLarge Models

0 likes · 19 min read

JD Advertising R&D: AI‑Driven Solutions for Traffic Valuation, Multimodal Understanding, Auction Mechanisms, Generative Recommendation, and Large‑Model Engineering

Data Thinking Notes

Sep 13, 2024 · Artificial Intelligence

How OpenAI’s o1 Series Redefines Complex Reasoning and AI Safety

OpenAI’s new o1 series, including o1‑preview and o1‑mini, leverages reinforcement‑learning‑based chain‑of‑thought reasoning to achieve superior performance on academic exams, coding contests, and safety benchmarks, offering faster, cost‑effective options while advancing AI alignment and human‑preference evaluation.

AI safetyLarge Language ModelOpenAI

0 likes · 15 min read

How OpenAI’s o1 Series Redefines Complex Reasoning and AI Safety

Tencent Advertising Technology

Aug 15, 2024 · Artificial Intelligence

Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural Language Understanding

This paper introduces RLLR, a label‑sensitive reward reinforcement learning method that improves natural language understanding tasks by aligning training objectives with label accuracy, and demonstrates its effectiveness across eight public NLU datasets and real‑world advertising feature evaluation, outperforming standard RLHF and SFT baselines.

AdvertisingRLHFlabel-sensitive reward

0 likes · 14 min read

Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural Language Understanding

Tencent Advertising Technology

Aug 13, 2024 · Artificial Intelligence

Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors

This paper investigates selection bias in large language models for multiple‑choice tasks, proposes metrics to quantify symbol‑content binding, introduces Reweighting Symbol‑Content Binding (RSCB) and Point‑wise Intelligent Feedback (PIF) methods, and demonstrates their effectiveness in reducing bias and improving accuracy, including a real‑world Tencent advertising feature‑evaluation deployment.

Symbol Bindingmultiple choicepointwise feedback

0 likes · 16 min read

Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors

Model Perspective

Jul 31, 2024 · Artificial Intelligence

How Monte Carlo Tree Search Powers AlphaGo and Beyond: A Deep Dive

Monte Carlo Tree Search (MCTS) is a statistical heuristic algorithm that builds decision trees through selection, expansion, simulation, and backpropagation, enabling breakthroughs like AlphaGo’s victory and finding applications in robotics, autonomous driving, finance, and bioinformatics.

AI applicationsAlphaGoMCTS

0 likes · 7 min read

How Monte Carlo Tree Search Powers AlphaGo and Beyond: A Deep Dive

Model Perspective

Jul 30, 2024 · Artificial Intelligence

Your Complete AI Learning Roadmap: From Basics to Large Model Mastery

This guide presents a comprehensive AI learning roadmap, dividing study into five progressive stages—from foundational math and programming to core deep‑learning and reinforcement‑learning techniques, large‑model training, industry applications, and future trends—plus curated book lists, tool recommendations, and practical RAG tutorials.

AI learning roadmapAI resourcesLarge Models

0 likes · 9 min read

Your Complete AI Learning Roadmap: From Basics to Large Model Mastery

Alimama Tech

Jul 29, 2024 · Artificial Intelligence

Generative Auto-bidding via Diffusion Modeling (AIGB)

The paper presents AIGB, a generative auto‑bidding framework that replaces reinforcement‑learning with a conditional diffusion model to generate optimal bidding trajectories, and demonstrates through offline benchmarks and Alibaba’s online A/B tests that it consistently outperforms RL baselines, boosting buy count, GMV, and ROI while maintaining low latency.

Generative ModelsMarketing AIauto-bidding

0 likes · 18 min read

Generative Auto-bidding via Diffusion Modeling (AIGB)

php Courses

Jul 29, 2024 · Artificial Intelligence

Building Reinforcement Learning Algorithms with PHP

This article explains the fundamentals of reinforcement learning, demonstrates how PHP can be used with neural‑network libraries such as Keras or TensorFlow to implement a simple reinforcement‑learning agent, provides a complete PHP code example, and discusses its potential applications.

AICode Examplereinforcement learning

0 likes · 5 min read

Kuaishou Tech

Jul 17, 2024 · Artificial Intelligence

Key Technical Innovations in Kuaishou’s “Kuaiyi” Large Model and Its Real-World Applications

The article details Kuaishou’s development of the 175B “Kuaiyi” multimodal large model, presenting eight novel technical innovations—from Temporal Scaling Law and MiLe Loss to MoE‑enhanced reward modeling—and describes how these advances enable high‑performance AI services such as the AI Xiao Kuai chatbot across diverse real‑world scenarios.

AI applicationsLarge Language ModelScaling Law

0 likes · 12 min read

Key Technical Innovations in Kuaishou’s “Kuaiyi” Large Model and Its Real-World Applications

Alimama Tech

Jul 15, 2024 · Artificial Intelligence

Why Auto‑Bidding in Large‑Scale Auctions Is the Hottest NeurIPS Challenge

The article explains how NeurIPS ranks among top AI conferences, introduces the newly selected “Auto‑Bidding in Large‑Scale Auctions” competition, outlines its technical background, four generations of bidding strategies—from classic control to generative models—and details the competition’s tracks, rewards, and how researchers can participate.

AdvertisingNeurIPSauto-bidding

0 likes · 12 min read

Why Auto‑Bidding in Large‑Scale Auctions Is the Hottest NeurIPS Challenge

DataFunSummit

Jul 8, 2024 · Artificial Intelligence

World Models and Causal Inference in Reinforcement Learning: A Comprehensive Overview

This article reviews the role of world (mental) models and causal inference in reinforcement learning, covering their theoretical foundations, model‑based RL frameworks such as Dyna, sample‑efficiency challenges, causal structure learning, distribution correction, dynamics‑reward modeling, and experimental results that demonstrate performance gains across multiple tasks.

causal inferencemodel-based RLreinforcement learning

0 likes · 21 min read

World Models and Causal Inference in Reinforcement Learning: A Comprehensive Overview

Ops Development & AI Practice

Jun 22, 2024 · Artificial Intelligence

Machine Learning Demystified: Traditional Algorithms vs Neural Networks

Machine learning, a core AI discipline, encompasses traditional algorithms like supervised, unsupervised, and reinforcement learning as well as neural network models such as CNNs, RNNs, GANs, and VAEs, each with distinct principles, strengths, and typical application scenarios.

Machine LearningTraditional AlgorithmsUnsupervised Learning

0 likes · 10 min read

Machine Learning Demystified: Traditional Algorithms vs Neural Networks

Baobao Algorithm Notes

May 30, 2024 · Artificial Intelligence

What’s the Latest RLHF Landscape? From PPO to ORPO Explained

This article surveys the current RLHF ecosystem, comparing on‑policy methods like PPO with off‑policy approaches such as DPO, and examines recent variants—including ReMax, GRPO, DPOP, TDPO, and ORPO—highlighting their algorithmic differences, resource trade‑offs, and practical performance insights.

DPOLLMPPO

0 likes · 23 min read

What’s the Latest RLHF Landscape? From PPO to ORPO Explained

Kuaishou Tech

May 27, 2024 · Artificial Intelligence

What Kuaishou’s Four ACL Papers Reveal About the Future of Large Language Models

The 62nd ACL conference accepted four papers from Kuaishou that explore multi‑turn instruction following, self‑agreement reasoning, fine‑grained reinforcement learning, and dynamic routing in Mixture‑of‑Experts models, each with detailed methods, experimental results, author lists, and public arXiv links.

ACL 2024Kuaishou ResearchLarge Language Models

0 likes · 11 min read

What Kuaishou’s Four ACL Papers Reveal About the Future of Large Language Models

NewBeeNLP

May 13, 2024 · Artificial Intelligence

Why DPO Treats LLMs as Q‑Functions: A Deep Theoretical Dive

This article offers a detailed theoretical interpretation of the DPO algorithm, showing how large language models can be viewed as Q‑functions, unifying sequence‑wise and step‑wise decision perspectives, and discussing the resulting implications for reinforcement‑learning‑based alignment research.

DPOLLMPreference Optimization

0 likes · 14 min read

Why DPO Treats LLMs as Q‑Functions: A Deep Theoretical Dive

DataFunSummit

Apr 16, 2024 · Artificial Intelligence

Intelligent Risk Control: Definitions, Expert Systems, Algorithmic Systems, and Emerging AI Techniques

This article explains intelligent risk control as a synergy of expert experience and algorithmic decision‑making, outlines its definition, expert human systems, digital algorithmic systems, and explores advanced AI methods such as reinforcement learning, large language models with knowledge graphs, adversarial learning, graph neural networks, and a practical supply‑chain case study.

Graph Neural NetworkKnowledge GraphLarge Language Model

0 likes · 11 min read

Intelligent Risk Control: Definitions, Expert Systems, Algorithmic Systems, and Emerging AI Techniques

JD Retail Technology

Apr 15, 2024 · Artificial Intelligence

Design and Evolution of JD.com Recommendation Advertising Ranking Auction Mechanism

The article analyzes JD.com's recommendation advertising ranking auction mechanism, detailing its objectives, challenges in traffic value estimation, user interest exploration, and multi‑item auction fairness, and describing the technical evolution from traditional auctions to deep‑learning‑driven solutions.

AdvertisingMachine Learningauction

0 likes · 18 min read

Design and Evolution of JD.com Recommendation Advertising Ranking Auction Mechanism

DataFunTalk

Apr 4, 2024 · Artificial Intelligence

Enhancing Interactive Agents with Large Language Models: The SwiftSage Framework

This article reviews the challenges of textual‑only large language model interaction, introduces benchmark environments such as AFL World and ScienceWorld, compares baseline reinforcement‑learning approaches, and presents SwiftSage—a hybrid system that combines a fast T5‑based small model with a powerful LLM for planning and grounding, demonstrating superior performance, efficiency, and cost‑effectiveness while outlining current limitations and future research directions.

AILarge Language ModelsSwiftSage

0 likes · 22 min read

Enhancing Interactive Agents with Large Language Models: The SwiftSage Framework

DataFunSummit

Apr 2, 2024 · Artificial Intelligence

Reinforcement Learning: Fundamentals, Classic Algorithms, and Applications in Short Video Recommendation

This article provides an in-depth overview of reinforcement learning, covering its goals, mathematical foundations such as Markov Decision Processes, classic algorithms like DQN, and practical applications including short‑video recommendation systems that aim to improve user retention through RL‑based ranking.

DQNMarkov Decision ProcessRL applications

0 likes · 12 min read

Reinforcement Learning: Fundamentals, Classic Algorithms, and Applications in Short Video Recommendation

DataFunTalk

Mar 30, 2024 · Artificial Intelligence

Reinforcement Learning and Multi‑Task Recommendation: Two‑Stage Constrained Actor‑Critic and Multi‑Task RL Approaches at Kuaishou

This talk presents Kuaishou's research on combining reinforcement learning with multi‑task recommendation, detailing a two‑stage constrained actor‑critic method for short‑video ranking, a multi‑task RL framework, experimental results on offline and online systems, and practical Q&A insights.

Kuaishouactor-criticmulti-task recommendation

0 likes · 18 min read

Reinforcement Learning and Multi‑Task Recommendation: Two‑Stage Constrained Actor‑Critic and Multi‑Task RL Approaches at Kuaishou

Python Programming Learning Circle

Mar 28, 2024 · Artificial Intelligence

Tutorial: Setting Up highway‑env with OpenAI Gym and Training a DQN for Autonomous Driving

This article explains how to install the gym and highway‑env packages, configure the environment for various driving scenarios, define observations, actions and rewards, build a DQN network in PyTorch, run the training loop, and analyze the resulting performance metrics.

Autonomous DrivingDQNPython

0 likes · 9 min read

Tutorial: Setting Up highway‑env with OpenAI Gym and Training a DQN for Autonomous Driving

Model Perspective

Mar 8, 2024 · Artificial Intelligence

Master the Three Machine Learning Types and Model Paradigms

This article introduces the three core machine learning categories—supervised, unsupervised, and reinforcement learning—detailing their definitions, typical algorithms, and real‑world applications, and then compares generative and discriminative models, highlighting key examples, characteristics, and use‑case differences.

Discriminative ModelsGenerative ModelsMachine Learning

0 likes · 13 min read

Master the Three Machine Learning Types and Model Paradigms

DataFunTalk

Mar 7, 2024 · Artificial Intelligence

Enhancing Interactive Agents with Large Language Models: The SwiftSage Framework and Benchmark Analysis

This article reviews recent advances in using large language models for interactive embodied agents, introduces the SwiftSage dual‑model framework that combines a fast T5‑based small model with a powerful LLM for planning, evaluates it on benchmarks such as AFL World and ScienceWorld, and discusses efficiency, cost‑effectiveness, limitations, and future research directions.

AILarge Language ModelsSwiftSage

0 likes · 23 min read

Enhancing Interactive Agents with Large Language Models: The SwiftSage Framework and Benchmark Analysis

php Courses

Feb 22, 2024 · Artificial Intelligence

Building Reinforcement Learning Algorithms with PHP

This article introduces reinforcement learning, explains its core concepts, and demonstrates how to implement a simple reinforcement learning algorithm in PHP using neural‑network libraries such as Keras, providing a complete code example that includes environment and agent classes.

Artificial IntelligenceCode ExamplePHP

0 likes · 4 min read

DaTaobao Tech

Jan 31, 2024 · Artificial Intelligence

Highlights of Recent AI Research Papers from Top Conferences (2023)

The article curates standout AI papers from 2023 CCF‑A conferences—including CVPR, ICLR, ACM MM, and INFORMS—showcasing advances such as Swin‑Transformer video quality assessment, cross‑modal e‑commerce product search, transformer‑based vehicle routing heuristics, diffusion‑driven dance generation, and reinforcement‑learning inventory replenishment.

AIMultimediacomputer vision

0 likes · 23 min read

Highlights of Recent AI Research Papers from Top Conferences (2023)

DataFunSummit

Jan 27, 2024 · Artificial Intelligence

Enhancing Interactive Agents with Large Language Models: The SwiftSage Framework

This article reviews recent advances in using large language models for embodied interactive agents, introduces the dual‑modality SwiftSage architecture that combines a fast T5‑based small model with a powerful large model for planning and grounding, and evaluates its performance on benchmarks such as ScienceWorld.

AI2PlanningSwiftSage

0 likes · 23 min read

DataFunTalk

Jan 25, 2024 · Artificial Intelligence

World Models, Reinforcement Learning, and Causal Inference: A Comprehensive Overview

This article presents a detailed overview of world models and their role in reinforcement learning, explains how causal inference can enhance model-based RL, discusses sample efficiency challenges, and shares experimental findings and practical insights from recent research and industry applications.

AIMachine Learningcausal inference

0 likes · 22 min read

World Models, Reinforcement Learning, and Causal Inference: A Comprehensive Overview

Alimama Tech

Jan 10, 2024 · Artificial Intelligence

Advances in Automated Bidding and Auction Mechanisms for Online Advertising

Advances in automated bidding for online ads have progressed from classic control and linear programming to reinforcement‑learning pipelines, offline and sustainable online RL, and finally generative‑model approaches, each enhancing decision strength, adaptability, and fairness while addressing simulation gaps, multi‑objective constraints, and real‑time efficiency.

Auction Designautomated biddinggenerative AI

0 likes · 25 min read

Advances in Automated Bidding and Auction Mechanisms for Online Advertising

DataFunSummit

Dec 27, 2023 · Artificial Intelligence

Two-Stage Constrained Actor-Critic for Short‑Video Recommendation and a Reinforcement‑Learning Multi‑Task Framework

This article presents a two‑stage constrained actor‑critic (TSCAC) algorithm that models short‑video recommendation as a constrained reinforcement‑learning problem, details its theoretical formulation and optimization loss, and validates its superiority through extensive offline and online experiments, followed by a multi‑task reinforcement‑learning framework (RMTL) that further improves multi‑objective recommendation performance.

Recommendation Systemsconstrained optimizationmulti-task learning

0 likes · 16 min read

Two-Stage Constrained Actor-Critic for Short‑Video Recommendation and a Reinforcement‑Learning Multi‑Task Framework

Python Programming Learning Circle

Dec 16, 2023 · Artificial Intelligence

Using highway‑env with OpenAI Gym for Reinforcement Learning: Installation, Configuration, and DQN Training

This tutorial explains how to install the gym and highway‑env packages, configure the highway‑v0 environment, explore its observation types, and implement a DQN agent in Python to train and evaluate autonomous driving policies, complete with code snippets and performance visualizations.

DQNPythongym

0 likes · 9 min read

Using highway‑env with OpenAI Gym for Reinforcement Learning: Installation, Configuration, and DQN Training

Alibaba Cloud Big Data AI Platform

Dec 8, 2023 · Artificial Intelligence

How BeautifulPrompt Automates Prompt Engineering for Text-to-Image Generation

BeautifulPrompt, presented at EMNLP 2023, introduces a deep generation model that automatically crafts high-quality prompts from simple image descriptions, enhancing text-to-image synthesis through data-driven fine‑tuning, reward modeling, and reinforcement learning techniques.

AI Generationreinforcement learningtext-to-image synthesis

0 likes · 8 min read

How BeautifulPrompt Automates Prompt Engineering for Text-to-Image Generation

Sohu Tech Products

Dec 6, 2023 · Artificial Intelligence

Real-time Controllable Multi-Objective Re-ranking Models for Taobao Feed Recommendation

The paper introduces a real‑time controllable, multi‑objective re‑ranking framework for Taobao’s feed recommendation that combines actor‑critic reinforcement learning with hypernetworks to instantly adjust objective weights, handling diverse media and cold‑start constraints while delivering higher click‑through, diversity, and cold‑start ratios with only 20‑25 ms latency.

AlibabaReal-time ControlRecommendation Systems

0 likes · 34 min read

Real-time Controllable Multi-Objective Re-ranking Models for Taobao Feed Recommendation

Kuaishou Tech

Dec 1, 2023 · Artificial Intelligence

Short Video Recommendation Algorithm Frontier Research Forum at CCIR 2023

The CCIR 2023 conference in Beijing, sponsored by Kuaishou, hosted a short‑video recommendation algorithm frontier research forum where over 100 experts and students shared the latest AI‑driven recommendation technologies, open datasets, and interdisciplinary challenges in short‑video platforms.

AIconferencedatasets

0 likes · 8 min read

Short Video Recommendation Algorithm Frontier Research Forum at CCIR 2023

Alimama Tech

Nov 28, 2023 · Artificial Intelligence

Evolution of Alibaba's AI-Driven Advertising Decision Technologies

The article traces Alibaba’s Alimama platform from classic control‑based bidding through linear programming and reinforcement‑learning approaches to generative‑AI‑driven strategies, detailing how deep‑learning models, offline and sustainable online RL frameworks, and large‑language‑model‑based bidding reshape automated auctions, fairness, and scalability in e‑commerce advertising.

AIAuction DesignMachine Learning

0 likes · 38 min read

Evolution of Alibaba's AI-Driven Advertising Decision Technologies

21CTO

Nov 27, 2023 · Artificial Intelligence

What Is the Mysterious Q* Model and Could It Redefine AI?

A speculative look at OpenAI's rumored Q* project explores its possible blend of Q‑learning and A* search, the potential for advanced logical reasoning, and the broader philosophical questions about AI consciousness, alignment, and the future of intelligent systems.

AI alignmentAI consciousnessOpenAI

0 likes · 9 min read

What Is the Mysterious Q* Model and Could It Redefine AI?

Kuaishou Tech

Nov 23, 2023 · Artificial Intelligence

KuaiSim: A Comprehensive User Simulator for Reinforcement Learning in Recommendation Systems

KuaiSim is a comprehensive user simulation environment for recommendation systems that models immediate, long‑term, and cross‑session feedback, supports list‑wise, whole‑session, and retention tasks, provides baselines and evaluation metrics, and demonstrates superior performance on KuaiRand and ML‑1M datasets.

KuaiSimUser Simulationbenchmark

0 likes · 14 min read

KuaiSim: A Comprehensive User Simulator for Reinforcement Learning in Recommendation Systems

DataFunSummit

Nov 21, 2023 · Artificial Intelligence

Automatic Hyperparameter Tuning in Tencent Recommendation System (TRS): Techniques, Evolution, and Practice

This article presents an in‑depth overview of Tencent's TRS automatic hyperparameter tuning, covering background, challenges, the evolution from Bayesian optimization to evolution strategies and reinforcement learning, a systematic platform solution, real‑world deployment results, and a Q&A session.

Bayesian OptimizationEvolution StrategiesOnline Learning

0 likes · 20 min read

Automatic Hyperparameter Tuning in Tencent Recommendation System (TRS): Techniques, Evolution, and Practice

Kuaishou Tech

Nov 21, 2023 · Artificial Intelligence

Kuaishou Academic Forum on Cutting-Edge Short Video Recommendation Algorithms (Nov 23, 2023)

The Kuaishou Academic Forum held on November 23 in Beijing presented cutting‑edge research on short‑video recommendation algorithms, featuring talks on reinforcement learning, user interest modeling, graph neural networks, and a comprehensive recommender‑system simulator, while also offering registration details and a brief company overview.

Kuaishouacademic forumgraph neural networks

0 likes · 5 min read

Kuaishou Academic Forum on Cutting-Edge Short Video Recommendation Algorithms (Nov 23, 2023)

DataFunTalk

Nov 14, 2023 · Artificial Intelligence

Real-Time Controllable Multi-Objective Re‑ranking for Taobao Feed

This article presents a comprehensive study of a controllable multi‑objective re‑ranking model for Taobao's information‑flow recommendation, detailing the challenges of complex feed scenarios, three modeling paradigms (V1‑V3), an actor‑critic reinforcement learning framework with hypernet‑generated weights, and extensive online evaluation results.

Real-time ControlRecommendation SystemsRe‑ranking

0 likes · 31 min read

Real-Time Controllable Multi-Objective Re‑ranking for Taobao Feed

Sohu Tech Products

Nov 8, 2023 · Artificial Intelligence

Two‑Stage Constrained Actor‑Critic for Short‑Video Recommendation and a Reinforcement‑Learning Multi‑Task Recommendation Framework

The presentation introduces a two‑stage constrained actor‑critic algorithm that learns auxiliary policies for interaction signals before optimizing watch‑time under KL constraints, and a reinforcement‑learning multi‑task learning framework that models session‑level dynamics with adaptive multi‑critic weighting, both achieving significant offline and online gains in short‑video recommendation.

Recommendation Systemsactor-criticconstrained optimization

0 likes · 16 min read

Two‑Stage Constrained Actor‑Critic for Short‑Video Recommendation and a Reinforcement‑Learning Multi‑Task Recommendation Framework