Tagged articles

15 articles

Page 1 of 1

Machine Learning Algorithms & Natural Language Processing

May 21, 2026 · Artificial Intelligence

Breaking the UED Bottleneck: PACE Locates the Reinforcement‑Learning Zone of Proximal Development

The paper introduces PACE, a Parameter‑Change based Unsupervised Environment Design method that evaluates training levels by the magnitude of induced policy‑parameter updates, offering a low‑variance, computationally cheap signal that consistently outperforms prior UED approaches on MiniGrid and Craftax benchmarks.

CraftaxCurriculum LearningICML 2026

0 likes · 11 min read

Breaking the UED Bottleneck: PACE Locates the Reinforcement‑Learning Zone of Proximal Development

Machine Heart

May 21, 2026 · Artificial Intelligence

Breaking the Traditional UED Bottleneck: Using RL to Precisely Locate the Zone of Proximal Development

The paper introduces PACE, a Parameter Change Environment Design method that evaluates training levels by measuring induced policy parameter updates, offering a low‑variance learning‑progress signal that outperforms prior UED approaches on MiniGrid and Craftax benchmarks, achieving higher success rates and more stable generalization.

CraftaxCurriculum LearningICML 2026

0 likes · 10 min read

Breaking the Traditional UED Bottleneck: Using RL to Precisely Locate the Zone of Proximal Development

Bighead's Algorithm Notes

Feb 1, 2026 · Artificial Intelligence

Beyond Historical Data: Adaptive Synthesis for Financial Time Series

This article reviews a recent paper that proposes a drift‑aware data‑stream system integrating machine‑learning‑based adaptive control into financial data management, introducing a parametric data‑operation module, a gradient‑based bi‑level optimizer, and a curriculum planner to improve model robustness and risk‑adjusted returns in non‑stationary markets.

Curriculum LearningQuantitative Financeadaptive data synthesis

0 likes · 18 min read

Beyond Historical Data: Adaptive Synthesis for Financial Time Series

Tencent Advertising Technology

Dec 25, 2025 · Artificial Intelligence

How RAVEN Leverages Reinforcement Reasoning for Precise Ad Video Violation Grounding

RAVEN is a reinforcement‑reasoning framework that combines curriculum learning with hierarchical rewards to enable multimodal large language models to accurately locate and classify violation segments in advertisement videos, even under noisy, large‑scale industrial data.

AdvertisingCurriculum LearningMultimodal LLM

0 likes · 17 min read

How RAVEN Leverages Reinforcement Reasoning for Precise Ad Video Violation Grounding

Amap Tech

Nov 4, 2025 · Artificial Intelligence

Spacetime‑GR: AI‑Powered Spatiotemporal Model Transforming POI Recommendations

This article introduces Spacetime‑GR, a large‑scale generative recommendation model that integrates hierarchical geographic POI indexing and spatiotemporal token encoding to enhance POI prediction for Amap, detailing its pre‑training pipeline, data cleaning, curriculum learning strategy, experimental results, scaling law observations, and the resulting improvements in hit rate and discovery rate.

AmapCurriculum LearningLarge Language Model

0 likes · 14 min read

Spacetime‑GR: AI‑Powered Spatiotemporal Model Transforming POI Recommendations

Network Intelligence Research Center (NIRC)

Nov 4, 2025 · Artificial Intelligence

SEAgent: A Self‑Evolving Computer Agent that Learns Software Use Autonomously

SEAgent introduces a self‑evolving framework that enables a GUI agent to master unfamiliar software through autonomous exploration and experience learning, leveraging a curriculum generator, a world‑state model, and GRPO‑based reinforcement with adversarial imitation, achieving state‑of‑the‑art performance on OSWorld.

Curriculum LearningGUI automationSEAgent

0 likes · 6 min read

SEAgent: A Self‑Evolving Computer Agent that Learns Software Use Autonomously

Architect

Mar 9, 2025 · Artificial Intelligence

Experiments with Reinforcement Learning Fine‑Tuning of a 0.5B Qwen Model on the KK Dataset

The author reports a series of reinforcement‑learning‑based fine‑tuning experiments on a 0.5‑billion‑parameter Qwen‑0.5VB instruct model using the KK dataset, detailing reward design adjustments, curriculum‑style data scaling, observed convergence issues, and hypotheses about why small models fail to develop long reasoning chains.

Curriculum LearningLLM fine-tuningreinforcement learning

0 likes · 11 min read

Experiments with Reinforcement Learning Fine‑Tuning of a 0.5B Qwen Model on the KK Dataset

Baobao Algorithm Notes

Mar 5, 2025 · Artificial Intelligence

Why My 0.5B LLM’s Reasoning Collapsed During RLHF on Logic Puzzles

The author experiments with reinforcement‑learning‑from‑human‑feedback on a 0.5B Qwen instruct model using Logic‑RL and Open‑R1, discovers that reward mis‑design and curriculum learning cause the model to produce overly short or incorrect reasoning chains on knight‑and‑knave puzzles, and analyses the underlying causes.

Artificial IntelligenceCurriculum LearningLarge Language Model

0 likes · 11 min read

Why My 0.5B LLM’s Reasoning Collapsed During RLHF on Logic Puzzles

Alibaba Cloud Big Data AI Platform

Nov 8, 2024 · Artificial Intelligence

How TAPIR Boosts Small LLMs with Task‑Aware Curriculum Planning

The paper introduces TAPIR, a task‑aware curriculum planning framework that distills instruction‑following abilities from black‑box LLM teachers into smaller student models by filtering difficult prompts, resampling tasks, enhancing response styles, and iteratively optimizing across multiple training rounds, achieving superior performance on benchmark evaluations.

Curriculum LearningInstruction TuningKnowledge Distillation

0 likes · 10 min read

How TAPIR Boosts Small LLMs with Task‑Aware Curriculum Planning

Baobao Algorithm Notes

Sep 24, 2024 · Artificial Intelligence

From Zero to One: A Practical Guide to Pretraining Large Language Models

This comprehensive guide walks you through every stage of LLM pretraining—from data sourcing, cleaning, and deduplication to tokenizer design, model architecture choices, training framework selection, optimization tricks, and evaluation methods—highlighting common pitfalls and practical solutions for building robust models.

Curriculum LearningData cleaningLLM pretraining

0 likes · 34 min read

From Zero to One: A Practical Guide to Pretraining Large Language Models

DataFunTalk

Aug 24, 2023 · Artificial Intelligence

Multi-Agent Decision Large Models: Challenges, Action Semantic Networks, Permutation Invariance/Equivariance, and Automated Curriculum Learning

This talk outlines the fundamental challenges of multi‑agent decision large models, introduces three core design priors—action semantic networks, permutation invariance/equivariance, and cross‑task automated curriculum learning— and demonstrates how these concepts improve performance across diverse environments such as StarCraft, Neural‑MMO, and SMAC.

AICurriculum Learningaction semantics

0 likes · 12 min read

Multi-Agent Decision Large Models: Challenges, Action Semantic Networks, Permutation Invariance/Equivariance, and Automated Curriculum Learning

Alimama Tech

Sep 7, 2022 · Artificial Intelligence

Curriculum-Guided Bayesian Reinforcement Learning for ROI-Constrained Real-Time Bidding

The paper presents a Curriculum‑Guided Bayesian Reinforcement Learning (CBRL) framework that models ROI‑constrained real‑time bidding as a partially observable constrained MDP, using hard‑margin indicator rewards and a curriculum of relaxed proxy problems to achieve fast, constraint‑satisfying, Bayes‑optimal policies that outperform existing methods on large‑scale industrial data.

Bayesian RLCurriculum LearningMDP

0 likes · 15 min read

Curriculum-Guided Bayesian Reinforcement Learning for ROI-Constrained Real-Time Bidding

Baobao Algorithm Notes

Mar 3, 2022 · Artificial Intelligence

How Hierarchical Curriculum Learning Improves Dialogue Response Selection

This article explains how treating negative response candidates with varying difficulty through a hierarchical curriculum learning framework—combining corpus‑level and instance‑level curricula—enhances dialogue response selection models, backed by experiments on Douban, Ubuntu, and E‑Commerce datasets.

Curriculum Learningdialogue response selectionhierarchical learning

0 likes · 8 min read

How Hierarchical Curriculum Learning Improves Dialogue Response Selection

Youku Technology

Dec 2, 2021 · Artificial Intelligence

Hybrid Curriculum Learning for Emotion Recognition in Conversation

The paper introduces a hybrid curriculum learning framework that tackles emotion shift and confusing labels in emotion recognition in conversation by applying nested curriculum stages at both conversation and utterance levels, enabling a progressive easy‑to‑hard training that markedly boosts classic ERC model performance across four public datasets and is already deployed in Alibaba’s entertainment AI brain script health‑check service.

Curriculum LearningEmotion Recognitionconversation analysis

0 likes · 2 min read

Hybrid Curriculum Learning for Emotion Recognition in Conversation

DataFunTalk

Mar 20, 2019 · Artificial Intelligence

Addressing Sparse Reward Problems in Model-Free Reinforcement Learning

This article reviews the challenges of model‑free reinforcement learning, especially sparse reward issues exemplified by Montezuma’s Revenge, and surveys recent approaches such as expert demonstrations, curriculum learning, self‑play, hierarchical reinforcement learning, and count‑based exploration to mitigate these problems.

Curriculum LearningModel-freeexploration

0 likes · 12 min read

Addressing Sparse Reward Problems in Model-Free Reinforcement Learning