Tag

reinforcement learning

2 views collected around this technical thread.

Kuaishou Tech
Kuaishou Tech
Jun 4, 2025 · Artificial Intelligence

KwaiCoder-AutoThink-preview: An Automatic‑Thinking Large Model Enhanced with Step‑SRPO Reinforcement Learning

The KwaiPilot team released the KwaiCoder‑AutoThink‑preview model, which introduces a novel automatic‑thinking training paradigm and a process‑supervised reinforcement‑learning method called Step‑SRPO, enabling the model to dynamically switch between thinking and non‑thinking modes, reduce inference cost, and achieve up to 20‑point gains on code and math benchmarks while handling large‑scale codebases.

AI researchautomatic thinkingcode generation
0 likes · 12 min read
KwaiCoder-AutoThink-preview: An Automatic‑Thinking Large Model Enhanced with Step‑SRPO Reinforcement Learning
JD Tech Talk
JD Tech Talk
May 27, 2025 · Artificial Intelligence

Solving Real-World AI Challenges at JD Retail: Reward Model Ensembles, Query Expansion, and Model Pruning

This article recounts how JD Retail's young algorithm engineers tackled diverse AI problems—optimizing reward‑model ensembles for ad image generation, building large‑language‑model‑based query expansion, and pruning diffusion models with FFT and RDP—while sharing their technical approaches, code snippets, and growth reflections.

AIalgorithm engineeringlarge language models
0 likes · 14 min read
Solving Real-World AI Challenges at JD Retail: Reward Model Ensembles, Query Expansion, and Model Pruning
JD Tech
JD Tech
May 26, 2025 · Artificial Intelligence

Solving Technical Challenges at JD Retail: Multi‑Reward Models, LLM‑Based Query Expansion, Model Pruning, and Reinforcement Learning

This article details how JD Retail's young algorithm engineers tackled a series of AI engineering problems—including advertising image quality assessment with multi‑reward models, large‑language‑model‑driven query expansion, FFT‑and‑RDP‑based model pruning, and agent‑centric reinforcement learning—while sharing practical growth insights and code snippets.

AIComputer Visionlarge language models
0 likes · 15 min read
Solving Technical Challenges at JD Retail: Multi‑Reward Models, LLM‑Based Query Expansion, Model Pruning, and Reinforcement Learning
IT Services Circle
IT Services Circle
May 25, 2025 · Artificial Intelligence

DeepSeek Core Technologies and Model Innovations: DeepSeek‑V3 and DeepSeek‑R1 Technical Overview

The article provides a detailed technical overview of DeepSeek's flagship large language models, DeepSeek‑V3 and DeepSeek‑R1, describing their MoE architecture, training frameworks, reinforcement‑learning based fine‑tuning, inference optimizations, and the broader impact of these innovations on the AI landscape while also promoting related books and resources.

AIDeepSeekMixture of Experts
0 likes · 10 min read
DeepSeek Core Technologies and Model Innovations: DeepSeek‑V3 and DeepSeek‑R1 Technical Overview
Python Programming Learning Circle
Python Programming Learning Circle
May 23, 2025 · Artificial Intelligence

Useful Python Libraries for Data Science (Beyond pandas and NumPy)

This article introduces a curated list of lesser‑known Python packages for data‑science tasks—including Wget, Pendulum, imbalanced‑learn, FlashText, fuzzywuzzy, PyFlux, Ipyvolume, Dash, and Gym—providing installation commands, brief usage examples, and explanations of when each library is useful.

LibrariesPythondata science
0 likes · 10 min read
Useful Python Libraries for Data Science (Beyond pandas and NumPy)
JD Tech Talk
JD Tech Talk
May 22, 2025 · Artificial Intelligence

From Academic Research to Industrial Anti‑Fraud: Leveraging LLMs, Reinforcement Learning, and Model Distillation for Advertising Risk Detection

The article recounts Xiaoting’s journey from a PhD research background to leading JD.com’s ad‑fraud detection, detailing how large language models, reinforcement learning, and model distillation were applied to identify hidden address codes, reduce false‑positive rates to 0.3%, and balance accuracy with real‑time performance in a high‑traffic e‑commerce environment.

AILLMModel Distillation
0 likes · 11 min read
From Academic Research to Industrial Anti‑Fraud: Leveraging LLMs, Reinforcement Learning, and Model Distillation for Advertising Risk Detection
Tencent Technical Engineering
Tencent Technical Engineering
May 12, 2025 · Artificial Intelligence

Comprehensive Summary and Expansion of Andrej Karpathy’s 7‑Hour LLM Lecture

This article provides a detailed Chinese‑to‑English summary of Andrej Karpathy’s 7‑hour LLM tutorial, covering chat process analysis, tokenization, pre‑training data pipelines, model architecture, training strategies, post‑training fine‑tuning, reinforcement learning, chain‑of‑thought reasoning, and current industry applications.

AILLMModel Architecture
0 likes · 25 min read
Comprehensive Summary and Expansion of Andrej Karpathy’s 7‑Hour LLM Lecture
JD Retail Technology
JD Retail Technology
May 7, 2025 · Artificial Intelligence

Solving Technical Challenges with Large AI Models at JD Retail: Reward Modeling, Query Expansion, and Model Pruning

JD Retail’s engineering team tackles hard AI problems by replacing a monolithic reward model with specialized small models for ad‑image generation, deploying an LLM‑driven query‑expansion pipeline that lifts conversion rates, and pruning text‑to‑image transformers using FFT and RDP to boost throughput 40% without loss, while building comprehensive evaluation tools and a semantic smart‑assistant.

AILarge ModelsReward Modeling
0 likes · 14 min read
Solving Technical Challenges with Large AI Models at JD Retail: Reward Modeling, Query Expansion, and Model Pruning
DevOps
DevOps
May 5, 2025 · Artificial Intelligence

DeepSeek Releases Math‑Specialized Large Model V2 and ProverBench Evaluation Suite

DeepSeek has quietly open‑sourced a new mathematics‑focused large language model, DeepSeek‑Prover‑V2 (available in 671B and 7B variants), achieving 88.9% on MiniF2F and strong results on PutnamBench, alongside the high‑quality ProverBench dataset and a novel recursive theorem‑proving pipeline.

AIDeepSeekProverBench
0 likes · 4 min read
DeepSeek Releases Math‑Specialized Large Model V2 and ProverBench Evaluation Suite
DataFunTalk
DataFunTalk
Apr 25, 2025 · Artificial Intelligence

Does Reinforcement Learning Really Expand Reasoning Capacity in Large Language Models? Insights from Recent Empirical Study

Recent empirical research by Tsinghua’s LeapLab and Shanghai Jiao Tong University reveals that reinforcement‑learning‑based fine‑tuning (RLVR) improves sampling efficiency but does not extend the fundamental reasoning abilities of large language models beyond their base capabilities, as demonstrated across mathematics, code, and visual reasoning benchmarks.

AI researchRLVRlarge language models
0 likes · 12 min read
Does Reinforcement Learning Really Expand Reasoning Capacity in Large Language Models? Insights from Recent Empirical Study
AntTech
AntTech
Apr 24, 2025 · Artificial Intelligence

Key Takeaways from Ant Group and Tsinghua’s Presentations on the AReaL Reinforcement Learning Framework and AWorld Multi‑Agent Framework at ICLR 2025

At ICLR 2025 in Singapore, Ant Group and Tsinghua University showcased the open‑source reinforcement‑learning platform AReaL and the multi‑agent system AWorld, highlighting their recent breakthroughs, system design challenges, performance results on the GAIA benchmark, and upcoming development plans.

AI frameworksICLR2025Multi-Agent Systems
0 likes · 7 min read
Key Takeaways from Ant Group and Tsinghua’s Presentations on the AReaL Reinforcement Learning Framework and AWorld Multi‑Agent Framework at ICLR 2025
Kuaishou Tech
Kuaishou Tech
Apr 24, 2025 · Artificial Intelligence

Two‑Stage History‑Resampling Policy Optimization (SRPO) for Large‑Scale LLM Reinforcement Learning

The article introduces SRPO, a two‑stage history‑resampling reinforcement‑learning framework that systematically tackles common GRPO training issues and achieves state‑of‑the‑art performance on both math and code benchmarks with far fewer training steps, while also revealing emergent self‑reflection behaviors in large language models.

LLM optimizationSRPOcross-domain training
0 likes · 12 min read
Two‑Stage History‑Resampling Policy Optimization (SRPO) for Large‑Scale LLM Reinforcement Learning
AntTech
AntTech
Apr 21, 2025 · Artificial Intelligence

InclusionAI Community to Present AReaL Reinforcement Learning Framework and AWorld Multi‑Agent Framework at ICLR 2025

The InclusionAI open‑source community, initiated by Ant Group, will showcase the latest advances of its reinforcement‑learning framework AReaL and multi‑agent framework AWorld at the ICLR 2025 conference in Singapore, highlighting performance breakthroughs, open‑source contributions, and industry‑focused AI research.

AReaLAWorldAnt Group
0 likes · 5 min read
InclusionAI Community to Present AReaL Reinforcement Learning Framework and AWorld Multi‑Agent Framework at ICLR 2025
DataFunTalk
DataFunTalk
Apr 21, 2025 · Artificial Intelligence

Mechanize: A Controversial AI Startup Aiming to Fully Automate All Work and the Global Economy

Mechanize, a new AI startup founded by Epoch AI co‑founder Tamay Besiroglu, aims to fully automate all white‑collar work and the global economy, targeting a $60 trillion labor market, but faces technical hurdles, investor scrutiny, and widespread criticism over its radical vision.

AI AutomationAI startupsArtificial Intelligence
0 likes · 6 min read
Mechanize: A Controversial AI Startup Aiming to Fully Automate All Work and the Global Economy
Architect
Architect
Apr 17, 2025 · Artificial Intelligence

The Second Half of AI: From Model Innovation to Real‑World Utility

The article argues that artificial intelligence has entered a new phase where reinforcement learning finally generalizes, evaluation becomes more important than pure model performance, and researchers must redesign benchmarks and utility‑focused tasks to drive truly transformative progress.

Artificial IntelligenceResearch Strategyevaluation
0 likes · 16 min read
The Second Half of AI: From Model Innovation to Real‑World Utility
DataFunSummit
DataFunSummit
Mar 30, 2025 · Artificial Intelligence

RLHF Techniques and Challenges in Large Language Models and Multimodal Applications

This article reviews reinforcement learning, RLHF, and related alignment techniques for large language models and multimodal systems, covering fundamentals, recent advances such as InstructGPT, Constitutional AI, RLAIF, Super Alignment, GPT‑4o, video LLMs, and experimental evaluations of proposed methods.

Preference LearningRLHFlarge language models
0 likes · 26 min read
RLHF Techniques and Challenges in Large Language Models and Multimodal Applications
AntTech
AntTech
Mar 27, 2025 · Artificial Intelligence

LMM‑R1: A Two‑Stage Reinforcement Learning Framework for Enhancing Multimodal Model Reasoning

Researchers from Ant Group, Southeast University and others introduced the open‑source LMM‑R1 framework, a two‑stage reinforcement‑learning approach that first strengthens textual reasoning and then generalizes it to multimodal tasks, achieving significant performance gains on benchmarks such as football, Sokoban, and geometry reasoning with modest GPU costs.

AIlarge language modelsmultimodal
0 likes · 8 min read
LMM‑R1: A Two‑Stage Reinforcement Learning Framework for Enhancing Multimodal Model Reasoning
JD Tech
JD Tech
Mar 26, 2025 · Artificial Intelligence

CTR-Driven Advertising Image Generation Using Multimodal Large Language Models (CAIG)

The JD advertising team proposes a CTR‑driven advertising image generation framework (CAIG) that leverages multimodal large language models, a novel reward model, and product‑centric preference optimization to produce ad images with superior click‑through performance, validated by extensive offline and online experiments.

CTR optimizationReward Modeladvertising image generation
0 likes · 10 min read
CTR-Driven Advertising Image Generation Using Multimodal Large Language Models (CAIG)
AntTech
AntTech
Mar 26, 2025 · Artificial Intelligence

BodyGen: A Bio‑Inspired Embodied Co‑Design Framework for Autonomous Robot Evolution

BodyGen, a new embodied co‑design framework presented at ICLR 2025, enables robots to autonomously evolve their morphology and control policies using reinforcement learning and transformer‑based networks, achieving up to 60 % performance gains with a lightweight 1.43 M‑parameter model, and its code is publicly released.

Co-designRoboticsTransformer
0 likes · 10 min read
BodyGen: A Bio‑Inspired Embodied Co‑Design Framework for Autonomous Robot Evolution