Tagged articles

690 articles

Page 6 of 7

Oct 19, 2023 · Artificial Intelligence

Unleashing Game AI: Inside NetEase’s Bray Distributed RL Framework

NetEase’s AI team reveals how their self‑developed distributed reinforcement‑learning platform, Bray, enables high‑level AI agents for the MOBA game Dream of Kingdom 2, covering GameCore integration, weighted random initialization, modular APIs, difficulty scaling, and cost‑effective training for realistic player experiences.

AI FrameworkMoBAdistributed training

0 likes · 9 min read

Unleashing Game AI: Inside NetEase’s Bray Distributed RL Framework

Zhuanzhuan Tech

Oct 18, 2023 · Artificial Intelligence

Design and Implementation of a Home‑Page Recommendation System Using Reinforcement Learning and DPP

This article presents a comprehensive design for Zhuanzhuan's home‑page recommendation pipeline, detailing the system architecture, challenges of traffic efficiency and diversity, and a two‑stage solution that applies Proximal Policy Optimization reinforcement learning in the re‑ranking module and Determinantal Point Process optimization in the coarse‑ranking and traffic‑pool stages, followed by offline simulation, online deployment, and evaluation metrics.

DPPMachine Learningranking

0 likes · 18 min read

Design and Implementation of a Home‑Page Recommendation System Using Reinforcement Learning and DPP

Alimama Tech

Oct 11, 2023 · Artificial Intelligence

How Minimax Regret Optimization Tackles Black‑Box Adversarial Bidding Constraints

This article explains how the Alibaba‑Mama team addresses constrained ROI bidding in a black‑box adversarial environment by introducing a Minimax Regret Optimization framework that aligns training and test distributions, builds a causal world model, and demonstrates robust performance on synthetic and real‑world ad auctions.

adversarial biddingconstrained optimizationminimax regret

0 likes · 14 min read

How Minimax Regret Optimization Tackles Black‑Box Adversarial Bidding Constraints

Baobao Algorithm Notes

Oct 9, 2023 · Artificial Intelligence

Demystifying RLHF and PPO for Large Language Models: Theory and Practice

This article explains why Reinforcement Learning from Human Feedback (RLHF) is crucial for LLM intelligence, outlines the three-stage training pipeline, details InstructGPT's reward model and PPO optimization, and provides a practical guide to implementing RLHF with deep‑learning frameworks.

Artificial IntelligencePPORLHF

0 likes · 17 min read

Demystifying RLHF and PPO for Large Language Models: Theory and Practice

Alibaba Cloud Big Data AI Platform

Sep 13, 2023 · Artificial Intelligence

How Pai‑Megatron‑Patch Accelerates Large Language Model Training on Alibaba Cloud

This article introduces Pai‑Megatron‑Patch, an open‑source tool from Alibaba Cloud that streamlines large language model (LLM) training, weight conversion, FP8 mixed‑precision acceleration, and reinforcement‑learning workflows, providing detailed architecture, key features, code examples, and step‑by‑step usage instructions.

FP8LLM trainingMegatron

0 likes · 19 min read

How Pai‑Megatron‑Patch Accelerates Large Language Model Training on Alibaba Cloud

Alimama Tech

Aug 23, 2023 · Artificial Intelligence

Reinforcement Learning for Pacing in Preloaded Ads (RLTP)

The paper introduces RLTP, a reinforcement‑learning‑based pacing system that models delayed‑impression preloaded ads as an MDP, uses a dueling DQN to select traffic probabilities, and simultaneously meets exposure targets, ensures smooth delivery, and maximizes CTR, outperforming rule‑based and PID baselines while removing complex multi‑stage pipelines.

RLTPad pacingdelayed impression

0 likes · 16 min read

Reinforcement Learning for Pacing in Preloaded Ads (RLTP)

ByteDance SE Lab

Aug 21, 2023 · Artificial Intelligence

How Fastbot Uses Reinforcement Learning for Faster Android GUI Testing

Fastbot is a reusable, model‑based Android GUI testing tool that leverages reinforcement‑learning techniques to learn from previous test runs, accelerating coverage and crash detection through a two‑phase workflow, probabilistic and learning‑based event selection, and provides configurable custom events, widget blocking, and tree‑pruning features.

GUI automationandroid testingfastbot

0 likes · 16 min read

How Fastbot Uses Reinforcement Learning for Faster Android GUI Testing

Python Crawling & Data Mining

Aug 20, 2023 · Artificial Intelligence

What Is RLHF? Benefits, Limits, and Design Tips for Human‑Feedback Reinforcement Learning

This article explains Reinforcement Learning with Human Feedback (RLHF), outlining its definition, suitable tasks, advantages over other reward‑model methods, types of algorithms, challenges of human feedback, and practical strategies to mitigate its limitations for building robust AI systems.

AI alignmentHuman FeedbackMachine Learning

0 likes · 14 min read

What Is RLHF? Benefits, Limits, and Design Tips for Human‑Feedback Reinforcement Learning

Alimama Tech

Aug 16, 2023 · Artificial Intelligence

Personalized Automated Bidding Framework (PerBid) for Fairness‑Aware Online Advertising

PerBid introduces a personalized automated bidding framework that creates context‑aware RL agents for advertiser clusters using a profiling network to embed static and dynamic campaign features, and experiments on Alibaba’s display‑ad platform show up to 10.85% performance gains while markedly improving fairness across heterogeneous advertisers.

Fairnessautomated biddingonline advertising

0 likes · 23 min read

Personalized Automated Bidding Framework (PerBid) for Fairness‑Aware Online Advertising

Baidu Geek Talk

Aug 16, 2023 · Artificial Intelligence

Understanding Reinforcement Learning: From Basics to PPO and Policy Gradient

This article provides a comprehensive overview of reinforcement learning, covering fundamental concepts, differences from supervised learning, algorithm families, policy gradient methods, practical tricks like baselines and reward‑to‑go, and detailed explanations of TRPO and PPO variants with illustrative diagrams.

Machine LearningPPOPolicy Gradient

0 likes · 19 min read

Understanding Reinforcement Learning: From Basics to PPO and Policy Gradient

DataFunTalk

Aug 7, 2023 · Artificial Intelligence

DataFun Decision Intelligence Summit – Reinforcement Learning Forum Overview

The DataFun Decision Intelligence Summit brings together leading researchers and industry experts to present cutting‑edge reinforcement learning algorithms, safety considerations, distributional methods, and real‑world applications such as vehicle routing, recommender systems, and power‑grid scheduling, highlighting future directions and audience takeaways.

AIRecommendation Systemsdistributional RL

0 likes · 12 min read

DataFun Decision Intelligence Summit – Reinforcement Learning Forum Overview

Python Programming Learning Circle

Aug 5, 2023 · Artificial Intelligence

Building and Training a DQN Agent with highway‑env for Autonomous Driving Simulation

This article explains how to install gym and highway‑env, configure the environment, process state, action and reward data, build a DQN model in PyTorch, run training loops, and analyze results for autonomous driving simulations using reinforcement learning.

Autonomous DrivingDQNgym

0 likes · 10 min read

Building and Training a DQN Agent with highway‑env for Autonomous Driving Simulation

Meituan Technology Team

Jul 20, 2023 · Artificial Intelligence

Novelty Recommendation for Meituan Food Delivery: System Design, Challenges, and Solutions

Meituan’s food‑delivery team built a novelty‑focused recommendation pipeline—combining dual‑tower recall, novelty‑aware ranking, personalized mixed‑ranking weights, and reinforcement‑learning insertion—to surface merchants unseen by users, achieving 19% higher exposure novelty, 25% more order novelty, and improved ratings while keeping RPM loss under 0.5%.

food deliverynoveltyranking

0 likes · 28 min read

Novelty Recommendation for Meituan Food Delivery: System Design, Challenges, and Solutions

DataFunSummit

Jun 19, 2023 · Artificial Intelligence

Overview of Decision Intelligence and Reinforcement Learning

This article provides a comprehensive overview of decision intelligence, distinguishing predictive and decision tasks, classifies decision environments, and delves into reinforcement learning fundamentals, algorithms such as SARSA, deep reinforcement learning, and discusses current applications, challenges, and future research directions.

Artificial IntelligenceOptimizationdecision intelligence

0 likes · 12 min read

Overview of Decision Intelligence and Reinforcement Learning

Tencent Tech

Jun 14, 2023 · Artificial Intelligence

How Tencent’s Robot Dog Max Gains Human‑Like Decision‑Making with Pre‑trained AI and RL

Tencent Robotics X unveiled how its robot dog Max combines pre‑trained AI models with reinforcement learning across three learning stages, enabling it to acquire, store, and apply skills for autonomous decision‑making in complex tasks such as the World Chase Tag competition.

AIPre‑trainingSimulation

0 likes · 6 min read

How Tencent’s Robot Dog Max Gains Human‑Like Decision‑Making with Pre‑trained AI and RL

DaTaobao Tech

Jun 9, 2023 · Artificial Intelligence

Generator-Evaluator Architecture for End-to-End Re-ranking in Information Flow

The paper introduces a Generator‑Evaluator (GE) architecture that end‑to‑end re‑ranks information‑flow items using a pointer‑network seq2seq generator and a reward‑estimating evaluator, jointly optimizing relevance and business utilities such as diversity, traffic control, inter‑group ordering, and fixed‑slot insertion, achieving over 70% better‑percentage and significant online gains on Taobao.

Information Flowgenerator-evaluatorranking

0 likes · 19 min read

Generator-Evaluator Architecture for End-to-End Re-ranking in Information Flow

Network Intelligence Research Center (NIRC)

Jun 9, 2023 · Artificial Intelligence

2023 NIRC PhD Graduates Reveal Cutting-Edge AI and Network Intelligence Research

In 2023 the Network Intelligent Research Center celebrated its largest PhD graduating class—seven scholars whose dissertations span deep‑vision hand‑gesture estimation, multi‑scenario network transmission, graph alignment, interactive streaming, knowledge‑defined networking, wireless body‑area networking, and more—showcasing significant AI‑driven advances and high‑impact publications.

Artificial IntelligenceGraph AlignmentNetwork Intelligence

0 likes · 30 min read

2023 NIRC PhD Graduates Reveal Cutting-Edge AI and Network Intelligence Research

Didi Tech

May 23, 2023 · Artificial Intelligence

Driver‑Passenger Matching in Didi’s Ride‑Hailing Market: Algorithms and Techniques

The article surveys Didi’s driver‑passenger matching challenges and presents a suite of solutions—from greedy nearest‑driver and Kuhn‑Munkres bipartite matching to stable marriage, dynamic and one‑to‑many assignments, reinforcement‑learning, routing and queueing models—while validating assumptions statistically, integrating preference‑aware machine learning, and outlining multi‑objective and digital‑twin future research.

OptimizationRide Hailingalgorithm

0 likes · 23 min read

Driver‑Passenger Matching in Didi’s Ride‑Hailing Market: Algorithms and Techniques

DataFunTalk

May 20, 2023 · Artificial Intelligence

Understanding Didi’s Online Marketplace: Core Concepts, Technical Challenges, and Emerging Technologies

This article introduces Didi’s real‑time online marketplace, explains its fundamental principles, network effects, and social efficiency benefits, and examines key technical areas such as mechanism design, decision intelligence, operations research, reinforcement learning, and causal inference that drive its advanced matching and dispatch strategies.

Artificial IntelligenceOperations Researchdecision intelligence

0 likes · 16 min read

Understanding Didi’s Online Marketplace: Core Concepts, Technical Challenges, and Emerging Technologies

Rare Earth Juejin Tech Community

May 8, 2023 · Artificial Intelligence

Understanding the Principles Behind ChatGPT: NLP, Transformers, and Reinforcement Learning

This article explains how ChatGPT works by covering the fundamentals of natural language processing, generative language models, deep learning, the Transformer architecture, attention mechanisms, few‑shot learning, and the reinforcement‑learning techniques that align its outputs with human preferences.

AIChatGPTLarge Language Model

0 likes · 24 min read

Understanding the Principles Behind ChatGPT: NLP, Transformers, and Reinforcement Learning

Kuaishou Tech

Apr 29, 2023 · Artificial Intelligence

RMTL: A Reinforcement Learning Based Multi‑Task Learning Framework for Session‑Level Recommendation

The paper proposes RMTL, a reinforcement‑learning driven multi‑task learning framework that builds session‑level MDPs, trains a multi‑task actor‑critic network with dynamic loss weighting, and demonstrates significant AUC improvements over state‑of‑the‑art MTL recommendation models on public datasets.

actor‑criticadaptive loss weightingmulti-task learning

0 likes · 8 min read

RMTL: A Reinforcement Learning Based Multi‑Task Learning Framework for Session‑Level Recommendation

Kuaishou Tech

Apr 28, 2023 · Artificial Intelligence

How Hyper‑Actor Critic Redefines Reinforcement Learning for Recommendation Systems

This article presents the Hyper‑Actor Critic (HAC) framework that splits reinforcement‑learning policies into continuous hyper‑actions and effective recommendation lists, introduces alignment and supervised losses, and demonstrates superior performance on an online simulator compared to existing RL and supervised methods.

AI researchRecommendation Systemshyper-actor critic

0 likes · 9 min read

How Hyper‑Actor Critic Redefines Reinforcement Learning for Recommendation Systems

Kuaishou Tech

Apr 27, 2023 · Artificial Intelligence

Two-Stage Constrained Actor‑Critic (TSCAC) for Short‑Video Recommendation

The paper models short‑video recommendation as a constrained Markov decision process and introduces a two‑stage constrained actor‑critic algorithm that jointly maximizes watch time while satisfying multiple interaction constraints, demonstrating superior offline and online performance on the KuaiRand dataset and Kuaishou app.

actor-criticconstrained optimizationoffline evaluation

0 likes · 7 min read

Two-Stage Constrained Actor‑Critic (TSCAC) for Short‑Video Recommendation

Kuaishou Tech

Apr 22, 2023 · Artificial Intelligence

Reinforcement Learning for User Retention (RLUR) in Short Video Recommendation Systems

This paper presents RLUR, a reinforcement‑learning algorithm that models user‑retention optimization as an infinite‑horizon request‑based Markov Decision Process, addressing uncertainty, bias, and delayed reward challenges to directly improve retention, DAU, and engagement in short‑video recommendation platforms.

KuaishouRLURUser Retention

0 likes · 8 min read

Reinforcement Learning for User Retention (RLUR) in Short Video Recommendation Systems

Alimama Tech

Apr 3, 2023 · Artificial Intelligence

AI-Generated Bidding (AIGB): Using Generative Models for Automated Advertising Bidding

AI‑Generated Bidding (AIGB) replaces reinforcement‑learning with a conditional generative model that learns the joint distribution of bids, objectives and constraints from historical trajectories, enabling interpretable, diverse, constraint‑aware bidding strategies that improve efficiency, scalability and explainability for large‑scale advertising platforms.

automated biddingconditional modelinggenerative AI

0 likes · 15 min read

AI-Generated Bidding (AIGB): Using Generative Models for Automated Advertising Bidding

Kuaishou Tech

Mar 29, 2023 · Artificial Intelligence

ResAct: A Reinforcement Learning Approach for Long-Term User Retention in Sequential Recommendation

The paper introduces ResAct, a reinforcement‑learning framework that improves long‑term user retention in sequential recommendation by constraining the policy space near the online‑serving policy and employing a conditional variational auto‑encoder, residual actor, and state‑action value network, achieving significant gains over existing methods on a large‑scale short‑video dataset.

ResActUser Retentionreinforcement learning

0 likes · 9 min read

ResAct: A Reinforcement Learning Approach for Long-Term User Retention in Sequential Recommendation

Python Programming Learning Circle

Mar 27, 2023 · Artificial Intelligence

Reinforcement Learning with highway‑env: Installation, Configuration, and DQN Training in Python

This article demonstrates how to install and configure the highway‑env reinforcement‑learning environment, set up a DQN agent in Python, and train it on various traffic scenarios, providing code examples and performance visualizations.

DQNPythonSimulation

0 likes · 10 min read

Reinforcement Learning with highway‑env: Installation, Configuration, and DQN Training in Python

NetEase Smart Enterprise Tech+

Mar 27, 2023 · Artificial Intelligence

How Reinforcement Learning Powers AI Bots in ‘Barbarian Battle 2’

This article details NetEase Zhiji and Dianhun Network's use of reinforcement learning, a distributed training framework, and middleware to create, train, deploy, and iterate AI robots for the game "Barbarian Battle 2", highlighting technical challenges, solutions, and the impact on player experience.

AI botsGame Developmentdistributed training

0 likes · 13 min read

How Reinforcement Learning Powers AI Bots in ‘Barbarian Battle 2’

Python Programming Learning Circle

Mar 10, 2023 · Artificial Intelligence

Google's i‑S2R and GoalsEye: Robot Table‑Tennis Learning from Human Interaction

The article explains how Google's i‑S2R and GoalsEye projects use iterative simulation‑to‑real training, behavior cloning and goal‑conditioned learning to enable robots to play table‑tennis with humans, highlighting the challenges, experimental setup, and performance improvements achieved across player skill levels.

AI researchbehavior cloninghuman-robot interaction

0 likes · 6 min read

Google's i‑S2R and GoalsEye: Robot Table‑Tennis Learning from Human Interaction

Top Architect

Mar 10, 2023 · Artificial Intelligence

Understanding InstructGPT and ChatGPT: Architecture, Training Pipeline, and Performance Analysis

This article provides a comprehensive overview of the GPT series, explains the differences between prompt learning and instruction learning, details the three‑stage training pipeline of InstructGPT/ChatGPT—including supervised fine‑tuning, reward‑model training, and PPO‑based reinforcement learning—examines their strengths, weaknesses, and future research directions, and discusses the broader impact of these models on AI development.

AIChatGPTGPT

0 likes · 22 min read

Understanding InstructGPT and ChatGPT: Architecture, Training Pipeline, and Performance Analysis

21CTO

Feb 23, 2023 · Artificial Intelligence

How Does ChatGPT Really Work? Inside the RLHF Training Process

This article explains ChatGPT’s architecture, the distinction between model capability and consistency, how next‑token and masked‑language‑model training lead to inconsistencies, and how OpenAI’s supervised fine‑tuning, reward‑model training, and PPO reinforcement learning (RLHF) are combined to improve alignment while highlighting the method’s limitations.

AI alignmentChatGPTRLHF

0 likes · 15 min read

How Does ChatGPT Really Work? Inside the RLHF Training Process

IT Architects Alliance

Feb 23, 2023 · Artificial Intelligence

Training a Positive Review Generator with RLHF and PPO

This article demonstrates how to use Reinforcement Learning from Human Feedback (RLHF) with a PPO algorithm and a sentiment‑analysis model to train a language model that generates positive product reviews, covering task definition, data sampling, reward evaluation, model optimization, and experimental results.

GPTPPORLHF

0 likes · 11 min read

Training a Positive Review Generator with RLHF and PPO

DataFunTalk

Feb 20, 2023 · Artificial Intelligence

ChatGPT Technology, Localization Efforts, and Open‑Source Large Models – Overview and Practices

This article presents an overview of ChatGPT technology, its evolution, current challenges, a three‑stage learning process, data organization and evaluation, details of domestic localization efforts, practical solutions, and the release of a Chinese open‑source large model with training guidance.

ChatGPTLarge Language ModelModel Localization

0 likes · 12 min read

ChatGPT Technology, Localization Efforts, and Open‑Source Large Models – Overview and Practices

Architect

Feb 19, 2023 · Artificial Intelligence

Training a Positive Review Generator with RLHF and PPO

This article demonstrates how to apply Reinforcement Learning from Human Feedback (RLHF) using a sentiment‑analysis model as a reward function and Proximal Policy Optimization (PPO) to fine‑tune a language model that generates positive product reviews, complete with code snippets and experimental results.

PPORLHFSentiment Analysis

0 likes · 10 min read

dbaplus Community

Feb 18, 2023 · Artificial Intelligence

Why ChatGPT Still Gets It Wrong: Inside RLHF and Model Consistency

ChatGPT, OpenAI’s latest language model, builds on GPT‑3 but uses supervised fine‑tuning and Reinforcement Learning from Human Feedback (RLHF) to improve alignment, yet its training methods still cause consistency issues such as invalid help, hallucinations, bias, and limited explainability.

ChatGPTPPORLHF

0 likes · 17 min read

Why ChatGPT Still Gets It Wrong: Inside RLHF and Model Consistency

Open Source Linux

Feb 13, 2023 · Artificial Intelligence

How Does ChatGPT Work? Inside RLHF and Model Consistency

This article explains the inner workings of ChatGPT, detailing its evolution from GPT‑3, the role of reinforcement learning from human feedback (RLHF) in improving consistency, the training pipeline steps, and the limitations and evaluation methods of large language models.

AIChatGPTRLHF

0 likes · 15 min read

How Does ChatGPT Work? Inside RLHF and Model Consistency

Kuaishou Tech

Feb 10, 2023 · Artificial Intelligence

Seven Kuaishou Papers Accepted at WWW 2023 on Reinforcement Learning and Recommendation Systems

On January 25, Kuaishou’s community science team announced that seven of its papers were accepted at the ACM Web Conference 2023 (WWW’23), covering reinforcement‑learning‑based user retention, constrained actor‑critic recommendation, divide‑and‑conquer embedding retrieval, causal embedding with contrastive learning, latent action space exploration, dual‑interest factorization attention, and multi‑task reinforcement learning for recommendation.

AIKuaishouWWW 2023

0 likes · 17 min read

Seven Kuaishou Papers Accepted at WWW 2023 on Reinforcement Learning and Recommendation Systems

Laravel Tech Community

Feb 9, 2023 · Artificial Intelligence

Understanding ChatGPT: Architecture, Training Strategies, and Alignment Challenges

This article explains how ChatGPT builds on GPT‑3, describes the supervised‑plus‑reinforcement learning (RLHF) pipeline that fine‑tunes the model, compares model capability with consistency, and discusses the performance evaluation and remaining limitations of large language models.

ChatGPTRLHFalignment

0 likes · 15 min read

Understanding ChatGPT: Architecture, Training Strategies, and Alignment Challenges

Top Architect

Feb 9, 2023 · Artificial Intelligence

How ChatGPT Works: Training, RLHF, and Consistency Issues

ChatGPT, OpenAI’s latest language model, builds on GPT‑3 and improves performance through supervised fine‑tuning, human‑feedback reinforcement learning (RLHF), and PPO optimization, addressing consistency challenges such as misaligned outputs, bias, and hallucinations while evaluating helpfulness, truthfulness, and harmlessness.

ChatGPTRLHFlarge language models

0 likes · 15 min read

How ChatGPT Works: Training, RLHF, and Consistency Issues

DataFunSummit

Feb 8, 2023 · Artificial Intelligence

Technical Architecture and Training Process of ChatGPT

ChatGPT, a dialogue-focused language model, builds on the GPT family and employs techniques such as Reinforcement Learning from Human Feedback (RLHF), the TAMER framework, and a three-stage training pipeline (supervised fine‑tuning, reward modeling, and PPO reinforcement learning) to achieve advanced conversational capabilities.

ChatGPTGPTRLHF

0 likes · 7 min read

Technical Architecture and Training Process of ChatGPT

Architects' Tech Alliance

Feb 7, 2023 · Artificial Intelligence

ChatGPT: Technical Principles, Architecture, and the Role of Human‑Feedback Reinforcement Learning

This article explains how ChatGPT builds on GPT‑3 with improved accuracy and coherence, details its training pipeline that combines supervised fine‑tuning and Reinforcement Learning from Human Feedback (RLHF), discusses consistency challenges, evaluation metrics, and the limitations of the RLHF approach.

AI alignmentChatGPTPPO

0 likes · 15 min read

ChatGPT: Technical Principles, Architecture, and the Role of Human‑Feedback Reinforcement Learning

Model Perspective

Jan 12, 2023 · Artificial Intelligence

Neural Networks Explained: Architecture, Training, and Reinforcement Basics

This article introduces neural networks, covering their layered structure, common types like CNNs and RNNs, key components such as activation functions, loss, learning rate, backpropagation, dropout, batch normalization, and extends to reinforcement learning concepts including MDPs, policies, value functions, and Q‑learning.

CNNMachine LearningRNN

0 likes · 6 min read

Neural Networks Explained: Architecture, Training, and Reinforcement Basics

DataFunTalk

Dec 30, 2022 · Artificial Intelligence

Graph Representation Learning for Drug Package Recommendation: Discriminative and Generative Approaches

This article reviews the challenges of drug package recommendation in smart healthcare and presents two graph‑based solutions—a discriminative model (DPR) that scores existing drug packages and a generative model (DPG) that creates personalized packages—demonstrating superior performance through extensive experiments and analysis.

AI in healthcareGenerative Modelsdrug recommendation

0 likes · 19 min read

Graph Representation Learning for Drug Package Recommendation: Discriminative and Generative Approaches

Alimama Tech

Dec 28, 2022 · Artificial Intelligence

Sustainable Online Reinforcement Learning for Auto-bidding (SORL)

The Sustainable Online Reinforcement Learning (SORL) framework tackles offline inconsistency in auto‑bidding by iteratively gathering safe online data from real ad systems with a Lipschitz‑based exploration method and training a variance‑suppressed conservative Q‑learning policy, achieving safer, more stable, and higher‑performing bids on Alibaba’s platform.

auto-biddingoffline inconsistencyonline advertising

0 likes · 18 min read

Sustainable Online Reinforcement Learning for Auto-bidding (SORL)

Architecture Digest

Dec 15, 2022 · Artificial Intelligence

Technical Overview of ChatGPT: Training Pipeline, RLHF, and Its Potential to Replace Search Engines

This article explains ChatGPT's underlying technology—including its three‑stage training pipeline with supervised fine‑tuning, reward‑model learning, and reinforcement learning from human feedback—while analyzing whether the model can realistically replace traditional search engines such as Google or Baidu.

AIChatGPTLarge Language Model

0 likes · 15 min read

Technical Overview of ChatGPT: Training Pipeline, RLHF, and Its Potential to Replace Search Engines

IT Architects Alliance

Dec 13, 2022 · Artificial Intelligence

Technical Principles and Training Process of ChatGPT

The article explains ChatGPT’s underlying technology, detailing its three-stage training pipeline—supervised fine‑tuning, reward‑model learning, and reinforcement learning with PPO—while discussing its strengths, limitations, and potential integration with traditional search engines.

AIChatGPTLLM

0 likes · 14 min read

Technical Principles and Training Process of ChatGPT

Tencent Cloud Developer

Dec 9, 2022 · Artificial Intelligence

An Overview of ChatGPT: Technology, Training Process, and Applications

The article outlines ChatGPT’s conversational capabilities, its InstructGPT‑based architecture, a three‑stage RLHF training pipeline involving supervised fine‑tuning, human‑ranked response generation, and PPO optimization, and discusses its strengths, limitations, diverse applications, and future directions for multimodal, up‑to‑date assistants.

AI applicationsChatGPTLarge Language Model

0 likes · 18 min read

An Overview of ChatGPT: Technology, Training Process, and Applications

Architect's Guide

Dec 9, 2022 · Artificial Intelligence

Technical Principles and Training Process of ChatGPT

The article explains how ChatGPT builds on the GPT‑3.5 large language model, using human‑annotated data and Reinforcement Learning from Human Feedback (RLHF) across three training stages to improve instruction understanding, answer quality, and continual model enhancement, while also discussing its potential to complement or replace traditional search engines.

AIChatGPTInstruction Tuning

0 likes · 15 min read

IT Architects Alliance

Dec 8, 2022 · Artificial Intelligence

Technical Principles and Training Process of ChatGPT

This article explains the technical foundations of ChatGPT, detailing its three-stage training pipeline—supervised fine‑tuning with human‑annotated data, reward model training via pairwise ranking, and reinforcement learning from human feedback—while also discussing its limitations compared to traditional search engines and potential future enhancements.

AIChatGPTLarge Language Model

0 likes · 14 min read

vivo Internet Technology

Dec 7, 2022 · Artificial Intelligence

Mixing Heterogeneous Queues in Vivo's Information Flow and App Store: Challenges, Practices, and RL/Deep Learning Solutions

Vivo tackles the complex problem of mixing heterogeneous content queues—ads, games, and organic items—in its information‑flow and app‑store by evolving from rule‑based weighting to Q‑learning and deep‑learning position models that respect product constraints, preserve ordering, and balance short‑term revenue with long‑term user experience, while planning deeper personalization and on‑device solutions.

AdvertisingApp StoreInformation Flow

0 likes · 14 min read

Mixing Heterogeneous Queues in Vivo's Information Flow and App Store: Challenges, Practices, and RL/Deep Learning Solutions

Top Architect

Dec 7, 2022 · Artificial Intelligence

Technical Principles of ChatGPT and Its Prospects for Replacing Traditional Search Engines

The article explains how ChatGPT builds on GPT‑3.5 with supervised fine‑tuning, reward‑model training and reinforcement learning from human feedback, analyzes why it cannot yet replace search engines due to hallucinations, knowledge freshness and cost, and proposes a hybrid architecture that combines LLM generation with traditional retrieval to overcome these limitations.

AIChatGPTLarge Language Model

0 likes · 16 min read

Technical Principles of ChatGPT and Its Prospects for Replacing Traditional Search Engines

HomeTech

Nov 16, 2022 · Artificial Intelligence

Fundamentals and Policy Gradient Algorithms in Reinforcement Learning with Applications to Scene Text Recognition

This article introduces the basic concepts of reinforcement learning, derives model‑based and model‑free policy gradient methods—including vanilla policy gradient and Actor‑Critic—explains their mathematical foundations, and demonstrates their use in scene text recognition and image captioning tasks.

AIPolicy Gradientactor-critic

0 likes · 22 min read

Fundamentals and Policy Gradient Algorithms in Reinforcement Learning with Applications to Scene Text Recognition

AntTech

Nov 7, 2022 · Blockchain

Effectively Generating Vulnerable Transaction Sequences in Smart Contracts with Reinforcement Learning‑Guided Fuzzing

This paper presents a reinforcement‑learning‑based fuzzer (RLF) that generates transaction sequences likely to trigger smart‑contract vulnerabilities, combining vulnerability‑driven and coverage‑driven rewards to improve detection efficiency and outperform existing state‑of‑the‑art tools.

RL-based fuzzerreinforcement learning

0 likes · 12 min read

Effectively Generating Vulnerable Transaction Sequences in Smart Contracts with Reinforcement Learning‑Guided Fuzzing

NetEase LeiHuo Testing Center

Nov 4, 2022 · Artificial Intelligence

Applying AI for Game Balance Testing: DNN Victory Prediction and Genetic Algorithm Optimization

This article details a practical AI-driven workflow for a turn‑based card game, covering problem background, data modeling with a DNN victory‑prediction network, reinforcement‑learning‑based data generation, and a genetic‑algorithm search to identify the strongest and weakest team compositions.

AIDNNGame Balance

0 likes · 18 min read

Applying AI for Game Balance Testing: DNN Victory Prediction and Genetic Algorithm Optimization

DataFunTalk

Nov 4, 2022 · Artificial Intelligence

Explainable Knowledge Graph Reasoning: Background, Advances, Motivation, Recent Research, and Outlook

This article reviews explainable knowledge graph reasoning, covering its background, core concepts, downstream applications, major reasoning methods, motivations for interpretability, recent advances such as hierarchical and Bayesian reinforcement learning, meta‑path mining, and future research directions.

explainable AIgraph reasoninghierarchical RL

0 likes · 18 min read

Explainable Knowledge Graph Reasoning: Background, Advances, Motivation, Recent Research, and Outlook

Youku Technology

Oct 28, 2022 · Artificial Intelligence

Enlarging Long‑time Dependencies via Reinforcement‑Learning‑Based Memory Network for Movie Affective Analysis

The authors introduce a reinforcement‑learning‑driven memory network that augments long‑range dependencies for continuous valence‑arousal emotion prediction in movies, integrating five multimodal features and a DDPG‑based update policy, which yields state‑of‑the‑art performance across multiple affective‑analysis and summarization benchmarks.

VA affect modellong‑term dependenciesmemory network

0 likes · 16 min read

Enlarging Long‑time Dependencies via Reinforcement‑Learning‑Based Memory Network for Movie Affective Analysis

Model Perspective

Oct 26, 2022 · Artificial Intelligence

Master Machine Learning Algorithms: Types, Python Code & Real-World Examples

This article categorizes machine learning algorithms into supervised, unsupervised, and reinforcement learning, then details ten common algorithms—including linear regression, logistic regression, decision trees, SVM, Naive Bayes, K‑NN, K‑means, random forest, and dimensionality reduction—accompanied by clear Python code examples and illustrative diagrams.

AlgorithmsMachine LearningPython

0 likes · 14 min read

Master Machine Learning Algorithms: Types, Python Code & Real-World Examples

Sohu Tech Products

Oct 12, 2022 · Artificial Intelligence

AlphaTensor: DeepMind’s AI System for Discovering Faster Matrix Multiplication Algorithms

DeepMind’s AlphaTensor, built on AlphaZero and reinforcement learning, automatically discovers novel, provably correct matrix multiplication algorithms that outperform classic methods like Strassen’s, demonstrating how modern AI can automate algorithm discovery and significantly accelerate computations across many fields.

AIAlphaTensorDeepMind

0 likes · 8 min read

AlphaTensor: DeepMind’s AI System for Discovering Faster Matrix Multiplication Algorithms

Alimama Tech

Sep 21, 2022 · Artificial Intelligence

Alibaba's Three Papers Accepted at NeurIPS 2022

Alibaba’s research team secured three NeurIPS 2022 papers—introducing an Adaptive Parameter Generation network that boosts click‑through rates and revenue, a tuning‑free Global Batch Gradient Aggregation method that speeds recommendation model training by 2.4×, and a Sustainable Online Reinforcement Learning framework that outperforms existing auto‑bidding strategies.

NeurIPSRecommendation Systemsgradient aggregation

0 likes · 6 min read

Alibaba's Three Papers Accepted at NeurIPS 2022

GuanYuan Data Tech Team

Sep 8, 2022 · Artificial Intelligence

How AI Reinforcement Learning Transforms Smart Replenishment in Retail

This article examines the technical challenges of intelligent replenishment—model stability, complexity, generalization, and interpretability—and explains how a few‑shot imitation learning and inverse reinforcement learning framework can overcome these issues to deliver reliable, low‑cost AI‑driven supply‑chain decisions.

AISupply Chainimitation learning

0 likes · 22 min read

How AI Reinforcement Learning Transforms Smart Replenishment in Retail

Alimama Tech

Sep 7, 2022 · Artificial Intelligence

Curriculum-Guided Bayesian Reinforcement Learning for ROI-Constrained Real-Time Bidding

The paper presents a Curriculum‑Guided Bayesian Reinforcement Learning (CBRL) framework that models ROI‑constrained real‑time bidding as a partially observable constrained MDP, using hard‑margin indicator rewards and a curriculum of relaxed proxy problems to achieve fast, constraint‑satisfying, Bayes‑optimal policies that outperform existing methods on large‑scale industrial data.

Bayesian RLCurriculum LearningMDP

0 likes · 15 min read

Curriculum-Guided Bayesian Reinforcement Learning for ROI-Constrained Real-Time Bidding

DataFunTalk

Sep 2, 2022 · Artificial Intelligence

Applying Reinforcement Learning to E‑commerce Traffic Control: Practices and Future Directions

This talk by JD Retail's Zhao Yu explains how reinforcement learning is modeled and deployed for large‑scale traffic control during major sales events, detailing system architecture, reward design, offline simulation, model upgrades, and future research directions.

JD.comRL modelingonline advertising

0 likes · 20 min read

Applying Reinforcement Learning to E‑commerce Traffic Control: Practices and Future Directions

Bilibili Tech

Aug 30, 2022 · Artificial Intelligence

Neural MMO Massive AI Team Survival Challenge: Advances in Multi‑Agent Decision AI

The IJCAI‑2022 Neural MMO Massive AI Team Survival Challenge demonstrated that deep reinforcement‑learning agents can achieve sophisticated cooperation and competition among 128 agents in a large‑scale MMO‑style world, highlighting the growing focus on decision‑AI, the effectiveness of self‑play and CTDE, and the platform’s potential for future research into population‑level behavior, economics, and complex real‑world decision making.

AI competitionDecision AIMassive AI

0 likes · 11 min read

Neural MMO Massive AI Team Survival Challenge: Advances in Multi‑Agent Decision AI

Bilibili Tech

Aug 30, 2022 · Artificial Intelligence

Reinforcement Learning in Neural MMO: Background, Environment, Competition Solution, and Insights

The article reviews reinforcement learning applied to Neural MMO—a large‑scale, multi‑agent MMO environment—detailing its competitive IJCAI 2022 track, the winning LastOrder solution with transformer‑CNN‑LSTM architecture, reward shaping, a Fictitious Self‑Play meta‑solver, and Bilibili’s scalable Newton training framework.

AI in GamesMeta SolverNeural MMO

0 likes · 9 min read

Reinforcement Learning in Neural MMO: Background, Environment, Competition Solution, and Insights

Laiye Technology Team

Aug 29, 2022 · Artificial Intelligence

Evolution of Dialogue Management: From Rule‑Based to Data‑Driven Systems and Industrial Deployments

This article reviews the historical development of dialogue management—from early rule‑based and finite‑state approaches to modern data‑driven and reinforcement‑learning methods—and examines how major industry platforms such as Amazon Alexa, Amazon Lex, and RASA implement these techniques in practice.

Amazon AlexaNLURASA

0 likes · 16 min read

Evolution of Dialogue Management: From Rule‑Based to Data‑Driven Systems and Industrial Deployments

IEG Growth Platform Technology Team

Aug 16, 2022 · Artificial Intelligence

Actor‑Critic Reinforcement Learning for Real‑Time Bidding in Mobile Game Advertising

The paper proposes an actor‑critic reinforcement‑learning model (ACRL) that leverages PPO and a deep structured semantic model to optimize real‑time bidding strategies for mobile game ads under CPM and budget constraints, addressing long user lifecycles and sparse conversion data while demonstrably improving ROI in both offline simulations and online A/B tests.

Mobile AdvertisingROIactor-critic

0 likes · 16 min read

Actor‑Critic Reinforcement Learning for Real‑Time Bidding in Mobile Game Advertising

IEG Growth Platform Technology Team

Aug 10, 2022 · Artificial Intelligence

Two Tencent IEG Papers Accepted at CIKM: Actor‑Critic Reinforcement Learning for Optimal Bidding and Adversarial Adaptation for Cross‑Domain Recommendation

Tencent's IEG Growth Middle Platform team announced that two of its research papers—one presenting an actor‑critic reinforcement learning model for real‑time bidding in online display advertising and the other proposing an adversarial adaptation framework for cross‑domain recommendation—were accepted at the top‑tier CIKM conference, highlighting novel algorithms that achieve state‑of‑the‑art performance and have been deployed to serve billions of daily impressions.

Advertisingadversarial adaptationcross-domain recommendation

0 likes · 4 min read

Two Tencent IEG Papers Accepted at CIKM: Actor‑Critic Reinforcement Learning for Optimal Bidding and Adversarial Adaptation for Cross‑Domain Recommendation

Model Perspective

Aug 5, 2022 · Artificial Intelligence

What Are the Essential Steps and Types of Machine Learning?

Machine learning involves five core steps—from data collection and preparation to model training, evaluation, and improvement—while encompassing supervised, unsupervised, and reinforcement learning methods, each with distinct algorithms and real-world applications across finance, healthcare, and retail.

ApplicationsMachine LearningUnsupervised Learning

0 likes · 7 min read

What Are the Essential Steps and Types of Machine Learning?

NetEase LeiHuo Testing Center

Jul 29, 2022 · Artificial Intelligence

AI‑Powered Compatibility Testing for Mobile Games: Platform Design, Scene Traversal, and Anomaly Detection

This article describes an AI‑driven mobile game compatibility testing framework that combines a cloud device farm, a Poco‑based scene‑traversal module with reinforcement‑learning click strategies, and a computer‑vision anomaly detection model enhanced by data‑augmentation techniques to identify UI defects across diverse devices and game scenarios.

AIScene Traversalreinforcement learning

0 likes · 14 min read

AI‑Powered Compatibility Testing for Mobile Games: Platform Design, Scene Traversal, and Anomaly Detection

GuanYuan Data Tech Team

Jul 28, 2022 · Artificial Intelligence

Unlocking Reinforcement Learning: Core Concepts, Algorithms, and Real‑World Applications

This article introduces reinforcement learning by defining agents, environments, rewards, and policies, explains key concepts such as Markov Decision Processes and Bellman equations, and surveys major algorithms—including dynamic programming, Monte‑Carlo, TD learning, policy gradients, Q‑learning, DQN, and evolution strategies—while highlighting practical challenges and notable case studies like AlphaGo Zero.

Evolution StrategiesMDPMachine Learning

0 likes · 27 min read

Unlocking Reinforcement Learning: Core Concepts, Algorithms, and Real‑World Applications

Youku Technology

Jul 5, 2022 · Artificial Intelligence

Enlarging the Long-time Dependencies via RL-based Memory Network in Movie Affective Analysis

The paper introduces a reinforcement‑learning‑driven memory network that stores and updates historical video information via DDPG, overcoming LSTM/Transformer limitations on long‑duration movie sequences, and achieves state‑of‑the‑art affective prediction on LIRIS‑ACCEDE and related datasets, with real‑world deployments in AI content inspection and film‑element knowledge graphs.

long-term dependenciesmemory networkmovie affective analysis

0 likes · 5 min read

Enlarging the Long-time Dependencies via RL-based Memory Network in Movie Affective Analysis

58 Tech

Jun 24, 2022 · Artificial Intelligence

Reinforcement Learning for Lead Generation in Task‑Oriented Dialogue Systems

This article presents a reinforcement‑learning‑based approach to improve lead‑capture efficiency of a task‑oriented chatbot used in local services, detailing the system architecture, RL algorithms (DQN/DDQN), data construction, model training, offline and online evaluation, and the resulting commercial gains.

Customer ServiceDQNLead Generation

0 likes · 27 min read

Reinforcement Learning for Lead Generation in Task‑Oriented Dialogue Systems

AntTech

Jun 22, 2022 · Cloud Computing

Meta Reinforcement Learning Framework for Predictive Autoscaling in Cloud Environments

This article presents a cloud-native, end‑to‑end autoscaling solution that integrates traffic forecasting, CPU utilization meta‑prediction, and a reinforcement‑learning‑based scaling decision module into a fully differentiable system, achieving higher resource utilization and cost efficiency as demonstrated by ACM SIGKDD 2022 research.

AutoscalingCloud ComputingMeta Learning

0 likes · 10 min read

Meta Reinforcement Learning Framework for Predictive Autoscaling in Cloud Environments

DataFunSummit

Jun 21, 2022 · Artificial Intelligence

JiuGe: An Automatic Chinese Classical Poetry Generation System – Algorithms and Research Overview

This article presents the JiuGe system developed by THUNLP for automatically generating Chinese classical poetry, detailing its research motivations, model architecture—including salient‑clue, working‑memory, topic‑memory, style‑transfer and reinforcement‑learning components—implementation, applications, and future directions.

Artificial IntelligencePoetry Generationdeep learning

0 likes · 18 min read

JiuGe: An Automatic Chinese Classical Poetry Generation System – Algorithms and Research Overview

Huawei Cloud Developer Alliance

Jun 1, 2022 · Artificial Intelligence

How AI Beats Super Mario with PPO in 5 Minutes

This tutorial demonstrates how to use Huawei Cloud ModelArts and the Proximal Policy Optimization (PPO) reinforcement‑learning algorithm to train an AI agent that can clear most Super Mario levels within about 1500 episodes, even for users with no coding experience.

AIModelArtsPPO

0 likes · 6 min read

How AI Beats Super Mario with PPO in 5 Minutes

DataFunSummit

May 16, 2022 · Artificial Intelligence

Reinforcement Learning for E‑commerce Search Ranking: RNN User State Modeling and DDPG Long‑Term Value Optimization

This presentation details how JD applied reinforcement learning—using RNN‑based user state modeling and a DDPG framework—to improve e‑commerce search ranking by optimizing long‑term cumulative value, showing significant offline and online gains in conversion and GMV.

DDPGRNNe-commerce

0 likes · 20 min read

Reinforcement Learning for E‑commerce Search Ranking: RNN User State Modeling and DDPG Long‑Term Value Optimization

Meituan Technology Team

Apr 28, 2022 · Artificial Intelligence

Multi-Action Computation Allocation via Evolutionary Strategies in Meituan Takeaway Advertising

This article analyzes Meituan's delivery advertising system, detailing the shift from linear programming to an evolutionary‑strategy‑based multi‑action allocation (ES‑MACA), describing problem formalization, offline training, reward evaluation, online decision flow, extensive offline and online experiments, and future directions toward reinforcement learning.

AdvertisingMeituanevolutionary strategies

0 likes · 28 min read

Multi-Action Computation Allocation via Evolutionary Strategies in Meituan Takeaway Advertising

Code DAO

Apr 28, 2022 · Artificial Intelligence

Model-Based Reinforcement Learning from Raw Video: A Detailed Walkthrough

The article explains how to train robots to learn tasks directly from raw video using model-based reinforcement learning, covering POMDP formulation, CNN auto‑encoders, latent‑space representations, iLQR optimization, and a step‑by‑step pipeline with concrete examples and references.

CNN autoencoderPOMDPiLQR

0 likes · 11 min read

Model-Based Reinforcement Learning from Raw Video: A Detailed Walkthrough

Code DAO

Apr 24, 2022 · Artificial Intelligence

How Transfer Learning Accelerates Deep Learning Across Vision, NLP, and Reinforcement Learning

The article explains how transfer learning reduces data and time requirements in deep learning by reusing pretrained models for vision, natural language processing, and reinforcement learning, while discussing challenges such as overfitting, the need for progressive networks, entropy regularization, domain adaptation, multi‑task learning, and model distillation.

deep learningdomain adaptationmodel distillation

0 likes · 10 min read

How Transfer Learning Accelerates Deep Learning Across Vision, NLP, and Reinforcement Learning

DaTaobao Tech

Apr 13, 2022 · Artificial Intelligence

Machine‑Learning Based Bandwidth Prediction and Adaptive Streaming for Taobao Live: Concerto, OnRL, and Loki

Alibaba’s Taobao Live team replaced rule‑based bandwidth estimators with three machine‑learning solutions—Concerto, OnRL, and Loki—trained on over a million hours of global live‑stream data, achieving up to 13% throughput gain, threefold stall reduction, and up to 44% lower 95th‑percentile stalls, now deployed commercially.

Machine LearningReal-time Videoadaptive bitrate

0 likes · 14 min read

Machine‑Learning Based Bandwidth Prediction and Adaptive Streaming for Taobao Live: Concerto, OnRL, and Loki

Python Programming Learning Circle

Apr 6, 2022 · Artificial Intelligence

Building a DQN‑based Autonomous Driving Agent with highway‑env in Python

This tutorial explains how to install the gym and highway‑env packages, configure the simulation environment, process state and action representations, implement a DQN network in PyTorch, and train the model while visualizing performance metrics for autonomous driving tasks.

Autonomous DrivingDQNPython

0 likes · 11 min read

Building a DQN‑based Autonomous Driving Agent with highway‑env in Python

Alimama Tech

Mar 16, 2022 · Artificial Intelligence

Deep GSP: Multi‑Objective Deep Learning Based Advertising Auction Mechanism

Deep GSP is a multi‑objective, deep‑learning ad auction that jointly learns rank scores while enforcing game‑theoretic constraints—monotonicity, incentive compatibility, and Nash equilibrium—and a smooth‑transition penalty, using DDPG reinforcement learning to outperform traditional GSP across revenue, clicks, conversions, and add‑to‑cart metrics.

advertising auctionmechanism designmulti-objective optimization

0 likes · 18 min read

Deep GSP: Multi‑Objective Deep Learning Based Advertising Auction Mechanism

DataFunSummit

Mar 12, 2022 · Artificial Intelligence

Evolution of Re‑ranking Techniques in Kuaishou Short‑Video Recommendation System

This article details Kuaishou's short‑video recommendation pipeline, explaining the challenges of large‑scale sequencing, the development of sequence re‑ranking, multi‑content mixing, on‑device re‑ranking, and reinforcement‑learning‑based strategies, and demonstrates how these innovations improve user engagement and business metrics.

KuaishouRecommendation Systemsmulti-content mixing

0 likes · 15 min read

Evolution of Re‑ranking Techniques in Kuaishou Short‑Video Recommendation System

DataFunSummit

Mar 3, 2022 · Artificial Intelligence

Sequence Optimization, Context-Aware CTR Re-Estimation, and Session-Level Auction for JD Advertising Ranking

The article presents JD's technical evolution for advertising ranking, covering technology selection for recommendation ad sorting, context‑aware CTR re‑estimation, reinforcement‑learning‑based sequence optimization, and a session‑level auction mechanism that together improve monetization efficiency and long‑term user value.

CTRauctionreinforcement learning

0 likes · 18 min read

Sequence Optimization, Context-Aware CTR Re-Estimation, and Session-Level Auction for JD Advertising Ranking

DataFunTalk

Feb 24, 2022 · Artificial Intelligence

Sequence Optimization and Context-Aware CTR Re-Estimation for JD Advertising Ranking

The article presents JD's technical evolution for advertising ranking, covering recommendation ad sorting, context‑aware CTR re‑estimation, reinforcement‑learning‑based sequence optimization, and session‑level auction mechanisms, and includes a Q&A that highlights practical gains and implementation challenges.

AdvertisingCTR predictionContext-Aware

0 likes · 14 min read

Sequence Optimization and Context-Aware CTR Re-Estimation for JD Advertising Ranking

DataFunTalk

Feb 20, 2022 · Artificial Intelligence

Distilled Reinforcement Learning Framework for Recommendation (DRL-Rec): Design, Modules, and Experimental Evaluation

This article presents DRL-Rec, a distilled reinforcement learning framework for recommendation that integrates an exploring‑filtering module and confidence‑guided distillation to compress RL‑based recommenders while improving accuracy, and reports significant offline and online performance gains on a large‑scale system.

Knowledge Distillationonline experimentsreinforcement learning

0 likes · 16 min read

Distilled Reinforcement Learning Framework for Recommendation (DRL-Rec): Design, Modules, and Experimental Evaluation

DataFunTalk

Feb 10, 2022 · Artificial Intelligence

Evolution of Re‑ranking Techniques in Kuaishou Short‑Video Recommendation System

This article details the technical evolution of Kuaishou's short‑video recommendation pipeline, focusing on sequence re‑ranking, multi‑content mixing, and on‑device re‑ranking, and explains how transformer‑based models, generator‑evaluator frameworks, and reinforcement‑learning strategies are employed to maximize overall sequence value, user engagement, and revenue.

KuaishouRe‑rankingSequence Modeling

0 likes · 15 min read

IEG Growth Platform Technology Team

Jan 10, 2022 · Artificial Intelligence

Applying Reinforcement Learning to Optimize Advertising Bidding ROI

This article presents a comprehensive overview of using reinforcement learning to solve advertising bidding ROI optimization, covering historical foundations, methodological reasoning, system architecture, practical implementation details, challenges, evaluation metrics, and recommended algorithms for real‑world ad placement scenarios.

AdvertisingROI optimizationad bidding

0 likes · 17 min read

Applying Reinforcement Learning to Optimize Advertising Bidding ROI

DataFunTalk

Jan 3, 2022 · Artificial Intelligence

Intelligent Advertising Delivery System: Budget‑Constrained Bidding, Multi‑Constraint Bidding, Sequential Allocation, and Multi‑Channel Optimization

This article systematically introduces Alibaba's advertising intelligence platform, covering the evolution from simple CPM/CPC models to advanced budget‑constrained, multi‑constraint, and sequential bidding strategies, multi‑channel optimization, and reinforcement‑learning‑based solutions that jointly maximize advertiser ROI and platform revenue.

Machine LearningMulti‑Channelbudget optimization

0 likes · 34 min read

Intelligent Advertising Delivery System: Budget‑Constrained Bidding, Multi‑Constraint Bidding, Sequential Allocation, and Multi‑Channel Optimization

58 Tech

Dec 28, 2021 · Artificial Intelligence

Reinforcement Learning for Cold‑Start Job Recommendation in 58.com

This talk explains how 58.com tackles the cold‑start and interest‑divergence problems of its massive blue‑collar job recruitment platform by modeling the recommendation process as a reinforcement‑learning task, detailing the use of multi‑armed bandit, contextual bandit, and linear‑UCB algorithms, offline evaluation pipelines, online deployment, and observed performance gains.

Contextual Banditcold startjob recommendation

0 likes · 25 min read

Reinforcement Learning for Cold‑Start Job Recommendation in 58.com

DataFunTalk

Dec 17, 2021 · Artificial Intelligence

Applying Reinforcement Learning to Solve Cold‑Start Problems in 58.com Job Recruitment

This talk explains how 58.com’s massive blue‑collar recruitment platform uses reinforcement‑learning techniques—including multi‑armed bandits, contextual MAB, and linear UCB—to address cold‑start and interest‑divergence challenges, describes the system architecture, offline evaluation, online deployment, and reports an 8% uplift in new‑user conversion.

Online Learningcold startcontextual MAB

0 likes · 26 min read

Applying Reinforcement Learning to Solve Cold‑Start Problems in 58.com Job Recruitment

Code DAO

Dec 14, 2021 · Artificial Intelligence

Building a Chess AI from Scratch: Combining AlphaZero and Transformers (Part 2)

This article walks through constructing a learnable chess AI by integrating AlphaZero‑style Monte Carlo Tree Search with a decoder‑only Transformer, detailing the game tree logic, model architecture, input and output encodings, self‑play training loop, and code implementation in PyTorch.

AlphaZeroMonteCarloTreeSearchPyTorch

0 likes · 23 min read

Building a Chess AI from Scratch: Combining AlphaZero and Transformers (Part 2)

IEG Growth Platform Technology Team

Dec 6, 2021 · Artificial Intelligence

Model-Free Reinforcement Learning for ROI Optimization: Methods, Advertising Applications, and Tencent Game Advertising Practice

This article introduces model‑free reinforcement learning fundamentals, reviews mainstream solution methods such as Monte‑Carlo, Temporal‑Difference, and n‑step TD with eligibility traces, discusses their application in online advertising and presents Tencent's game advertising practice, including algorithm choices, reward design, and experimental results.

A3CAdvertisingPPO

0 likes · 17 min read

Model-Free Reinforcement Learning for ROI Optimization: Methods, Advertising Applications, and Tencent Game Advertising Practice

Code DAO

Dec 3, 2021 · Artificial Intelligence

Understanding Actor‑Critic and A2C: From Policy Gradients to REINFORCE in RL

This article derives the policy‑gradient objective for discrete actions, implements the Monte‑Carlo REINFORCE algorithm in PyTorch, explains the actor‑critic framework, introduces Advantage Actor‑Critic (A2C) versus A3C, and demonstrates their performance on the OpenAI Gym CartPole‑v0 environment.

A2COpenAI GymPolicy Gradient

0 likes · 13 min read

Understanding Actor‑Critic and A2C: From Policy Gradients to REINFORCE in RL

Code DAO

Nov 28, 2021 · Artificial Intelligence

Adapting Soft Actor‑Critic for Discrete Action Spaces in Deep Reinforcement Learning

This article explains how to modify the Soft Actor‑Critic (SAC) algorithm—originally designed for continuous actions—to work with discrete action environments, presents the required changes to the actor and critic loss functions, provides a full PyTorch implementation, and evaluates the method on the CartPole‑v1 benchmark.

CartPoleDiscrete ActionsEntropy Regularization

0 likes · 20 min read

Adapting Soft Actor‑Critic for Discrete Action Spaces in Deep Reinforcement Learning

ByteDance Terminal Technology

Oct 26, 2021 · Mobile Development

Fastbot: Cross‑Platform Intelligent Automated Testing System for Android and iOS

This article details ByteDance’s Fastbot system, an AI‑driven cross‑platform automated testing framework for Android and iOS that leverages model‑based testing, reinforcement learning, and image‑based UI analysis to improve test coverage, fault injection, and scalability across mobile applications and games.

AIcross-platformmobile testing

0 likes · 36 min read

Fastbot: Cross‑Platform Intelligent Automated Testing System for Android and iOS

Alimama Tech

Sep 29, 2021 · Artificial Intelligence

Unified Solution to Constrained Bidding in Online Display Advertising (USCB)

The paper proposes a unified solution for real‑time bidding in online display ads that formulates advertiser budget and KPI limits as a constrained linear program, derives a closed‑form optimal bidding function with m+1 parameters, and uses model‑free reinforcement learning to dynamically adjust those parameters, achieving superior traffic‑value capture in large‑scale deployment on Alibaba’s Taobao platform.

Parameter Tuningconstrained optimizationreal-time bidding

0 likes · 11 min read

Unified Solution to Constrained Bidding in Online Display Advertising (USCB)

Python Programming Learning Circle

Sep 27, 2021 · Artificial Intelligence

Training Reinforcement Learning Agents on Street Fighter III Using a MAME Wrapper Python Library

This tutorial explains how to install and use a Python library that wraps the MAME emulator to train reinforcement‑learning agents on arcade games such as Street Fighter III, covering system requirements, installation, environment configuration, debugging, step‑wise simulation, and a simple ConvNet agent example.

AIMAMEPython

0 likes · 4 min read

Training Reinforcement Learning Agents on Street Fighter III Using a MAME Wrapper Python Library

ByteFE

Aug 2, 2021 · Artificial Intelligence

An Overview of Artificial Intelligence, Machine Learning, and Neural Networks

This article provides a beginner‑friendly overview of artificial intelligence, its relationship with machine learning, the four major learning paradigms—supervised, unsupervised, semi‑supervised and reinforcement learning—along with a historical sketch of neural networks, their training workflow, loss functions, back‑propagation, and parameter‑update mechanisms, while also containing a brief recruitment notice.

Artificial IntelligenceMachine LearningUnsupervised Learning

0 likes · 18 min read

An Overview of Artificial Intelligence, Machine Learning, and Neural Networks