Tagged articles
1071 articles
Page 6 of 11
Data Thinking Notes
Data Thinking Notes
Jul 20, 2025 · Artificial Intelligence

Mastering Context Engineering: Boost LLM Performance with Advanced Techniques

Context Engineering, a new discipline for optimizing large language model inputs, expands context windows, compares with prompt engineering, outlines core techniques like information organization, dynamic management, semantic retrieval, and offers practical applications and recommendations to enhance AI performance across domains.

Prompt Engineeringai-optimizationlarge language models
0 likes · 11 min read
Mastering Context Engineering: Boost LLM Performance with Advanced Techniques
Fun with Large Models
Fun with Large Models
Jul 17, 2025 · Artificial Intelligence

How to Integrate Large Models with LangChain: A Step‑by‑Step Tutorial

This tutorial explains LangChain's core modules and three‑layer architecture, shows how to set up a Python environment, and provides concrete code examples for connecting SiliconFlow Qwen3‑8B and DeepSeek models via the init_chat_model API, including result inspection and references to official documentation.

DeepSeekLangChainPython
0 likes · 9 min read
How to Integrate Large Models with LangChain: A Step‑by‑Step Tutorial
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 16, 2025 · Artificial Intelligence

ChunkFlow: Accelerating Long‑Context Model Fine‑Tuning Up to 4.5× Faster

The paper introduces ChunkFlow, an efficient training framework for variable‑length and ultra‑long sequence datasets that powers Qwen models, achieving up to 4.53× speedup over Megatron‑LM and more than 2× overall performance gains by reorganizing data into fixed‑size chunks and employing a state‑aware scheduler.

AI PerformanceChunkFlowGPU efficiency
0 likes · 7 min read
ChunkFlow: Accelerating Long‑Context Model Fine‑Tuning Up to 4.5× Faster
DataFunTalk
DataFunTalk
Jul 16, 2025 · Artificial Intelligence

How Jason Wei’s Breakthroughs Are Shaping the Future of Large Language Models

Jason Wei, a former Google Brain and OpenAI researcher now at Meta, has driven key advances in large language models—including chain‑of‑thought prompting, instruction tuning, emergent abilities, zero‑shot learning, and data augmentation—shaping both AI research paradigms and real‑world applications.

Chain-of-ThoughtInstruction Tuningemergent abilities
0 likes · 7 min read
How Jason Wei’s Breakthroughs Are Shaping the Future of Large Language Models
DataFunTalk
DataFunTalk
Jul 16, 2025 · Artificial Intelligence

MiniMax-M1 Revealed: Hybrid Attention, RL Training, and 1M Token Context

MiniMax’s latest M1 model, unveiled after a $300 million funding round, showcases a 4.56‑trillion‑parameter hybrid‑expert architecture with lightning attention, supporting up to one million tokens, and leverages reinforcement‑learning techniques to enhance long‑context handling, inference efficiency, and system‑2 reasoning capabilities.

AI scalingHybrid Attentionlarge language models
0 likes · 16 min read
MiniMax-M1 Revealed: Hybrid Attention, RL Training, and 1M Token Context
DataFunSummit
DataFunSummit
Jul 15, 2025 · Artificial Intelligence

Unlocking Semantic Search: Elasticsearch Vector Search & RAG Applications

This article explains why traditional keyword search falls short, introduces Elasticsearch's vector search and hybrid retrieval capabilities, and shows how combining it with large language models enables Retrieval‑Augmented Generation (RAG) for more accurate, context‑aware AI-driven search across text and multimedia data.

AIElasticsearchRAG
0 likes · 5 min read
Unlocking Semantic Search: Elasticsearch Vector Search & RAG Applications
DataFunTalk
DataFunTalk
Jul 13, 2025 · Artificial Intelligence

What 2025’s AI API Market Data Reveals About the Future of Large Models

An in‑depth analysis of 2025 H1 OpenRouter token usage shows explosive growth in Q1, highlights Google Gemini’s market dominance, reveals diverse long‑tail demand across domains, and examines shifting API preferences, offering key insights into the evolving landscape of large‑model services.

AI market analysisAPI trendsOpenRouter
0 likes · 10 min read
What 2025’s AI API Market Data Reveals About the Future of Large Models
DataFunSummit
DataFunSummit
Jul 13, 2025 · Artificial Intelligence

How Alibaba Tackles Low-Resource Language Data for Multilingual LLMs

In this interview, Alibaba International’s senior data‑science expert Li Haijun explains the challenges of low‑resource languages for multilingual large models and details a five‑step data‑collection, augmentation, quality‑optimization, engineering, and evaluation framework that powers their cross‑border e‑commerce AI applications.

AIlarge language modelslow-resource languages
0 likes · 12 min read
How Alibaba Tackles Low-Resource Language Data for Multilingual LLMs
AI Frontier Lectures
AI Frontier Lectures
Jul 11, 2025 · Artificial Intelligence

How Llama Evolved: From Llama‑1 to Llama‑3 – Architecture, Data, and Performance Insights

This article provides a comprehensive technical analysis of Meta's Llama series, tracing the evolution from Llama‑1 through Llama‑2 to Llama‑3, detailing model architectures, training data pipelines, optimization methods, benchmark results, and the broader impact on the open‑source AI community.

AI researchLLaMAlarge language models
0 likes · 25 min read
How Llama Evolved: From Llama‑1 to Llama‑3 – Architecture, Data, and Performance Insights
Kuaishou Tech
Kuaishou Tech
Jul 10, 2025 · Artificial Intelligence

How MODA’s Modular Duplex Attention Solves Multimodal Attention Imbalance and Boosts Emotion Understanding

The paper introduces MODA, a modular duplex attention multimodal model that addresses severe cross‑modal attention imbalance in existing large multimodal models, proposes a novel attention paradigm and masking scheme, and demonstrates significant performance gains across 21 benchmarks in perception, cognition, and emotion tasks, earning a Spotlight paper at ICML 2025.

Emotion RecognitionMoDAattention mechanisms
0 likes · 13 min read
How MODA’s Modular Duplex Attention Solves Multimodal Attention Imbalance and Boosts Emotion Understanding
Nightwalker Tech
Nightwalker Tech
Jul 10, 2025 · Artificial Intelligence

Master Prompt Engineering: From Basics to Advanced AI Prompt Techniques

This comprehensive guide introduces Prompt Engineering, explaining its core concepts, why clear prompts matter, and how to craft effective instructions using roles, tasks, requirements, and examples, while covering beginner to advanced techniques such as chain‑of‑thought, self‑correction, and building reusable prompt workflows for AI models.

AIChatGPTPrompt Engineering
0 likes · 29 min read
Master Prompt Engineering: From Basics to Advanced AI Prompt Techniques
DataFunSummit
DataFunSummit
Jul 8, 2025 · Artificial Intelligence

Explore Cutting-Edge AI Knowledge Graphs: From Multimodal GraphRAG to Industry Applications

This article presents a curated catalog of cutting‑edge AI resources, covering multimodal GraphRAG, knowledge‑graph and large‑model integration, financial industry AI products, Chinese‑medicine decision support, AI‑driven knowledge‑graph evolution, private‑domain Q&A pipelines, and emerging trends and standards, with a QR code for the full ebook.

Artificial IntelligenceDocument IntelligenceKnowledge Graph
0 likes · 2 min read
Explore Cutting-Edge AI Knowledge Graphs: From Multimodal GraphRAG to Industry Applications
Data Thinking Notes
Data Thinking Notes
Jul 6, 2025 · Artificial Intelligence

How Quantization Shrinks Giant AI Models for Edge Devices

This article explains why quantizing massive AI models is essential for deploying them on resource‑constrained devices, outlines core quantization concepts, techniques, and methods, compares their pros and cons, and presents practical application scenarios such as smartphones, autonomous driving, IoT, and edge computing.

AI deploymentEdge AIModel Quantization
0 likes · 9 min read
How Quantization Shrinks Giant AI Models for Edge Devices
dbaplus Community
dbaplus Community
Jul 6, 2025 · Artificial Intelligence

Why Build AI Agents? Benefits, Challenges, and Real-World Examples

This article explores the definition of AI agents, examines why they are essential despite challenges like latency and hallucinations, highlights their advantages such as lowered development barriers and workflow simplification, and presents real-world cases and future multi‑agent prospects.

AI agentsPrompt Engineeringlarge language models
0 likes · 25 min read
Why Build AI Agents? Benefits, Challenges, and Real-World Examples
DataFunTalk
DataFunTalk
Jul 5, 2025 · Artificial Intelligence

Is AI Turning Human Thought into a Uniform, Safe Echo Chamber?

Recent studies from MIT, Cornell and Santa Clara reveal that reliance on AI tools like ChatGPT reduces brain activity, narrows creative thinking, and drives cultural homogenization, prompting urgent reflection on the trade‑off between efficiency and originality in human expression.

Artificial IntelligenceCultural Homogenizationcognitive science
0 likes · 12 min read
Is AI Turning Human Thought into a Uniform, Safe Echo Chamber?
Nightwalker Tech
Nightwalker Tech
Jul 4, 2025 · Artificial Intelligence

Bypass Membership Limits: Access Overseas LLMs Easily with Chatbox

This guide explains how to overcome domestic membership restrictions and quickly connect to overseas large language models such as ChatGPT, Gemini, Claude, and Grok using the open‑source Chatbox client, covering download, configuration, model selection, and various interaction modes with step‑by‑step screenshots.

AI modelsChatboxTutorial
0 likes · 8 min read
Bypass Membership Limits: Access Overseas LLMs Easily with Chatbox
Instant Consumer Technology Team
Instant Consumer Technology Team
Jul 3, 2025 · Artificial Intelligence

Why Buying an AI Appliance Is a Strategic Pitfall for Enterprises

Enterprises rushing to purchase DeepSeek AI appliances and smart‑agent platforms often face hidden technical, data, and organizational challenges that turn promised "plug‑and‑play" solutions into costly missteps, highlighting the need for realistic strategy, robust data governance, and continuous capability building.

AI capability buildingAI deploymentData Governance
0 likes · 28 min read
Why Buying an AI Appliance Is a Strategic Pitfall for Enterprises
iQIYI Technical Product Team
iQIYI Technical Product Team
Jul 3, 2025 · Artificial Intelligence

Three iQIYI AI Papers Break New Ground at ACL 2025 & INTERSPEECH 2025

iQIYI’s AI research team secured three paper acceptances—two at ACL 2025 (including a main conference and a Findings paper) and one at INTERSPEECH 2025—covering long‑context large language model evaluation, Chinese novel summarization, and efficient Thai speech recognition, with links to each work.

ACL 2025AI researchINTERSPEECH 2025
0 likes · 7 min read
Three iQIYI AI Papers Break New Ground at ACL 2025 & INTERSPEECH 2025
AI Frontier Lectures
AI Frontier Lectures
Jul 2, 2025 · Artificial Intelligence

Can Language Models Self‑Edit? Inside the SEAL Framework for Self‑Adapting LLMs

This article reviews recent AI self‑evolution research and provides an in‑depth analysis of the SEAL (Self‑Adapting Language) framework, which enables large language models to generate and learn from their own synthetic data through a nested reinforcement‑learning and fine‑tuning loop, with experimental results on few‑shot and knowledge‑integration tasks.

Meta LearningSEALfew-shot learning
0 likes · 11 min read
Can Language Models Self‑Edit? Inside the SEAL Framework for Self‑Adapting LLMs
DataFunTalk
DataFunTalk
Jul 2, 2025 · Artificial Intelligence

How Multimodal Large Models Are Revolutionizing Complex Document OCR

In a detailed interview, Zhao Chenyang explains how multimodal large models (VLM) overcome the limitations of traditional OCR in mixed layouts, table reconstruction, and handwritten text by leveraging self‑supervised pre‑training, lightweight fine‑tuning, and hybrid pipelines that dramatically cut annotation costs and improve recall rates.

AI deploymentdocument OCRhybrid pipeline
0 likes · 13 min read
How Multimodal Large Models Are Revolutionizing Complex Document OCR
Tencent Cloud Developer
Tencent Cloud Developer
Jul 2, 2025 · Artificial Intelligence

Big Model Evolution: From Transformers to Enterprise Deployment

This article surveys the rapid evolution of large language models from the Transformer breakthrough to trillion‑parameter capabilities, explains key techniques such as self‑attention, MoE and KV‑Cache, explores practical aspects like temperature tuning, sales AI applications, and compares private versus cloud deployment strategies for enterprises.

Enterprise DeploymentKV CacheTemperature
0 likes · 6 min read
Big Model Evolution: From Transformers to Enterprise Deployment
DataFunTalk
DataFunTalk
Jul 1, 2025 · Artificial Intelligence

Will OpenAI Reach ASI First? Dylan Patel’s Bold Prediction

In a candid hour‑long interview, SemiAnalysis founder Dylan Patel predicts OpenAI will be the first to achieve artificial superintelligence (ASI), while dissecting GPT‑4.5’s failure, Meta’s costly AI missteps, Apple’s strategic lag, and the shifting partnership between OpenAI and Microsoft.

AI competitionASIApple
0 likes · 11 min read
Will OpenAI Reach ASI First? Dylan Patel’s Bold Prediction
Ops Development Stories
Ops Development Stories
Jul 1, 2025 · Artificial Intelligence

From Lean to AIOps: How AI is Transforming Modern Operations

This comprehensive guide walks through the evolution from Lean and Agile practices to DevOps and finally AIOps, explaining core concepts, key algorithms, the role of large language models, RAG‑based root‑cause analysis, and practical implementation steps for intelligent operations.

LeanRAGRoot Cause Analysis
0 likes · 19 min read
From Lean to AIOps: How AI is Transforming Modern Operations
DataFunSummit
DataFunSummit
Jun 30, 2025 · Artificial Intelligence

How Large Language Models Are Evolving Toward Autonomous Meta‑Learning Agents

This talk reviews the rapid evolution of generative large‑model AI from rule‑based systems to massive pre‑training, examines the current bottlenecks in continual learning and knowledge discovery, and proposes large‑scale meta‑learning—especially context‑based reinforcement learning (ICRL)—as a path toward truly autonomous, self‑learning agents.

AI researchMeta Learningautonomous agents
0 likes · 24 min read
How Large Language Models Are Evolving Toward Autonomous Meta‑Learning Agents
DataFunTalk
DataFunTalk
Jun 30, 2025 · Artificial Intelligence

Wenxin 4.5 Series: Open‑Source Multimodal MoE Models and FastDeploy Guide

The Wenxin 4.5 series introduces ten open‑source models—including large‑scale MoE and dense variants—featuring a novel multimodal heterogeneous architecture, high training efficiency, SOTA benchmark performance, and comprehensive toolkits (ERNIEKit, FastDeploy) for fine‑tuning and multi‑hardware deployment.

ERNIEKitFastDeployMoE
0 likes · 8 min read
Wenxin 4.5 Series: Open‑Source Multimodal MoE Models and FastDeploy Guide
DataFunTalk
DataFunTalk
Jun 29, 2025 · Artificial Intelligence

Large Models Boost Douyin User Experience: Expert Insights

In an interview at the DA Digital Intelligence Conference, ByteDance AI specialist Cai Conghuai explains how large language models, combined with techniques like SFT, DPO, and RAG, are reshaping Douyin's user‑experience signal detection, root‑cause analysis, and evaluation, while outlining future AI‑agent breakthroughs.

AIDPORAG
0 likes · 12 min read
Large Models Boost Douyin User Experience: Expert Insights
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 26, 2025 · Artificial Intelligence

Master Cloud AI Inference: Load‑Testing Strategies with Alibaba PAI‑EAS

This article explains how Alibaba Cloud’s PAI‑EAS platform enables efficient, scalable AI inference by detailing distributed architecture, serverless resource scheduling, comprehensive load‑testing modes, key performance metrics, and step‑by‑step usage instructions, helping developers optimize latency, throughput, and cost for large language models.

AI inferenceAlibaba PAICloud Computing
0 likes · 7 min read
Master Cloud AI Inference: Load‑Testing Strategies with Alibaba PAI‑EAS
Alimama Tech
Alimama Tech
Jun 25, 2025 · Artificial Intelligence

Introducing ROLL: A Scalable, User‑Friendly RL Framework for Large‑Scale LLM Training

ROLL is an open‑source reinforcement‑learning framework designed for large language model post‑training that combines multi‑task RL, agentic support, flexible algorithm configuration, elastic resource scheduling, and rich observability, delivering significant accuracy gains across benchmarks while remaining easy to use for researchers, product developers, and infrastructure engineers.

AI FrameworkOpen SourceRLHF
0 likes · 11 min read
Introducing ROLL: A Scalable, User‑Friendly RL Framework for Large‑Scale LLM Training
DeWu Technology
DeWu Technology
Jun 25, 2025 · Artificial Intelligence

Engineering Large Language Models with Spring AI: From Basics to RAG and Function Calls

This article walks through the fundamentals of large language models, their stateless and structured-output nature, explains how Spring‑AI provides a Java‑friendly API for model integration, covers RAG architecture, the MCP protocol, and demonstrates end‑to‑end code examples for building intelligent agents.

AI integrationFunction CallingJava
0 likes · 15 min read
Engineering Large Language Models with Spring AI: From Basics to RAG and Function Calls
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jun 24, 2025 · Artificial Intelligence

How Transformers and Mixture-of-Experts Power Large Language Models

This article explores the role of Transformers and Mixture‑of‑Experts in large models, outlines five fine‑tuning methods, compares traditional and agentic RAG, presents classic agent design patterns, text‑chunking strategies, levels of intelligent agent systems, and explains KV‑caching techniques.

Mixture of ExpertsRAGTransformer
0 likes · 2 min read
How Transformers and Mixture-of-Experts Power Large Language Models
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Jun 23, 2025 · Artificial Intelligence

How Generative Data‑Driven Model Distillation Boosts Large‑Model Performance and Cuts Compute

This article examines generative data‑driven model distillation as a technique that not only compresses large language models but also improves their accuracy, addresses data‑privacy constraints, and reduces computational costs, offering a practical roadmap and real‑world results from a corporate AI platform.

Knowledge TransferMaaS platformai-optimization
0 likes · 22 min read
How Generative Data‑Driven Model Distillation Boosts Large‑Model Performance and Cuts Compute
Programmer Xu Shu
Programmer Xu Shu
Jun 23, 2025 · Artificial Intelligence

From Bag‑of‑Words to ChatGPT: How Large Language Models Evolved

Tracing the evolution of large language models—from early bag‑of‑words techniques, through word embeddings, RNNs, attention mechanisms, Transformers, BERT, and GPT—this article explains each breakthrough, its limitations, and how they culminated in ChatGPT’s conversational AI.

AI evolutionChatGPTTransformer
0 likes · 12 min read
From Bag‑of‑Words to ChatGPT: How Large Language Models Evolved
Data Thinking Notes
Data Thinking Notes
Jun 22, 2025 · Artificial Intelligence

What Powers the Rise of AI Agents? Inside the Tech Behind Agentic AI

This report explores the fundamentals, core technologies, leading platforms, current state, and future outlook of AI Agents and Agentic AI, detailing how large language models and mature infrastructure enable autonomous, reactive, proactive, and adaptive agents, and examines prominent projects such as Manus, Genspark, and Lovart.

AI agentsAutonomous SystemsTechnology Report
0 likes · 5 min read
What Powers the Rise of AI Agents? Inside the Tech Behind Agentic AI
DataFunTalk
DataFunTalk
Jun 22, 2025 · Artificial Intelligence

How Cursor’s CEO Envisions the Future of AI‑Powered Programming

In this interview, Cursor CEO Michael Truell explains the company’s mission to revolutionize coding with AI, discusses the evolution of AI‑assisted development, shares insights on product strategy, scaling challenges, and the broader impact of intent‑driven programming on software engineering.

AI programmingCursorintent-driven programming
0 likes · 37 min read
How Cursor’s CEO Envisions the Future of AI‑Powered Programming
AI Algorithm Path
AI Algorithm Path
Jun 20, 2025 · Artificial Intelligence

Beginner’s Guide to Visual Language Models – Day 1: What They Are and Why They Matter

This article introduces visual‑language models (VLMs), explaining how they combine large language models with visual encoders, why they overcome the rigidity of traditional computer‑vision systems, their key advantages, modular architecture, training methods, and practical applications such as image captioning and visual question answering.

AI applicationscomputer visioncontrastive learning
0 likes · 8 min read
Beginner’s Guide to Visual Language Models – Day 1: What They Are and Why They Matter
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 19, 2025 · Artificial Intelligence

Can Adaptive Chain‑of‑Thought Learning Halve LLM Thinking Time?

The article introduces the Think When You Need (TWYN) method, a reinforcement‑learning approach that dynamically adapts chain‑of‑thought length, dramatically cuts redundant token generation in large language models, and maintains or improves accuracy across diverse reasoning benchmarks.

Efficiencyadaptive inferencechain-of-thought
0 likes · 9 min read
Can Adaptive Chain‑of‑Thought Learning Halve LLM Thinking Time?
Fun with Large Models
Fun with Large Models
Jun 19, 2025 · Artificial Intelligence

How GraphRAG Boosts Answer Accuracy with Knowledge Graphs (Part 1)

This article explains GraphRAG’s architecture, compares it with traditional RAG, and presents experimental results showing that GraphRAG’s knowledge‑graph‑driven retrieval markedly improves answer accuracy, especially on low‑match, multi‑paragraph queries.

GraphRAGKnowledge GraphPerformance Evaluation
0 likes · 11 min read
How GraphRAG Boosts Answer Accuracy with Knowledge Graphs (Part 1)
AntTech
AntTech
Jun 18, 2025 · Artificial Intelligence

How Ant Group’s Baoling Models Push Toward AGI with MoE and Multimodal Innovations

In a detailed AICon talk, Ant Group’s Baoling team leader Zhou Jun outlines their latest large‑model training techniques, MoE architecture optimizations, multimodal breakthroughs, open‑source releases, and the strategic roadmap needed to turn AI into a ubiquitous, “scan‑code‑level” everyday assistant.

AI InfrastructureMixture of ExpertsOpen Source
0 likes · 25 min read
How Ant Group’s Baoling Models Push Toward AGI with MoE and Multimodal Innovations
Instant Consumer Technology Team
Instant Consumer Technology Team
Jun 17, 2025 · Artificial Intelligence

Mastering Fine‑Tuning Datasets: From Basics to Advanced LLM Techniques

This comprehensive guide explains the importance of fine‑tuning datasets for large language models, covering task classification, dataset formats, supervised and instruction tuning, domain adaptation, multimodal data, and practical code examples to help practitioners build effective training, validation, and test sets.

Instruction Tuningdataset preparationfine-tuning
0 likes · 33 min read
Mastering Fine‑Tuning Datasets: From Basics to Advanced LLM Techniques
Data Thinking Notes
Data Thinking Notes
Jun 15, 2025 · Artificial Intelligence

Mastering Fine-Tuning: From Basics to Advanced Techniques for Large Language Models

Fine‑tuning transforms a general‑purpose large language model into a domain‑specific expert by training on a small, labeled dataset, and this guide explains its background, core concepts, technical mechanisms, various methods—including full‑parameter, LoRA, adapters, and prompt tuning—plus practical use cases, advantages, challenges, and best‑practice recommendations.

AIAdapterLoRA
0 likes · 13 min read
Mastering Fine-Tuning: From Basics to Advanced Techniques for Large Language Models
ByteFE
ByteFE
Jun 13, 2025 · Artificial Intelligence

How AI Coding Powered a 3‑Day English Learning App: Insights from ByteDance’s TRAE

In a three‑day sprint, ByteDance’s VP Hong Dingkun built an English‑learning app using the AI‑coding platform TRAE, illustrating how large‑model‑driven code completion, natural‑language programming, and AI‑enhanced development can dramatically boost productivity, democratize coding, and push the limits of software intelligence.

AI codingByteDancelarge language models
0 likes · 14 min read
How AI Coding Powered a 3‑Day English Learning App: Insights from ByteDance’s TRAE
Zuoyebang Tech Team
Zuoyebang Tech Team
Jun 12, 2025 · Information Security

How AI‑Powered RAG and Agents Are Revolutionizing Enterprise Security Operations

This article explains how the rise of AI large‑model technology and Retrieval‑Augmented Generation (RAG) combined with autonomous AI agents enable a three‑layer network‑boundary defense, address deep operational challenges such as alert overload and response latency, and dramatically improve incident‑response efficiency in large‑scale enterprises.

AI agentsAI securityIncident Response
0 likes · 16 min read
How AI‑Powered RAG and Agents Are Revolutionizing Enterprise Security Operations
Open Source Linux
Open Source Linux
Jun 12, 2025 · Artificial Intelligence

From Transformers to DeepSeek‑R1: The Evolution of Large Language Models (2017‑2025)

This article chronicles the rapid development of large language models from the 2017 Transformer breakthrough through the rise of BERT, GPT‑3, multimodal models, alignment techniques like RLHF, and finally the cost‑efficient DeepSeek‑R1 in 2025, highlighting key innovations, scaling trends, and real‑world impacts.

AI alignmentModel ScalingTransformer
0 likes · 26 min read
From Transformers to DeepSeek‑R1: The Evolution of Large Language Models (2017‑2025)
Architects' Tech Alliance
Architects' Tech Alliance
Jun 11, 2025 · Artificial Intelligence

From Transformers to DeepSeek‑R1: The 2017‑2025 Evolution of Large Language Models

This article chronicles the rapid development of large language models from the 2017 Transformer breakthrough through the rise of BERT, GPT‑3, ChatGPT, multimodal systems like GPT‑4V/o, and the recent cost‑efficient DeepSeek‑R1, highlighting key architectural innovations, scaling trends, alignment techniques, and their transformative impact on AI research and industry.

AI alignmentBERTCost‑Efficient Inference
0 likes · 26 min read
From Transformers to DeepSeek‑R1: The 2017‑2025 Evolution of Large Language Models
DataFunTalk
DataFunTalk
Jun 9, 2025 · Artificial Intelligence

Can AI Models Pass the Chinese Math Gaokao? A Fair, Objective Test

The author conducts a transparent, objective assessment of several large language models on the 2025 Chinese national math exam, converting all questions to LaTeX, applying strict Gaokao scoring rules, and revealing each model's strengths and weaknesses across single‑choice, multiple‑choice, and fill‑in‑the‑blank items.

AI benchmarkingGaokaolarge language models
0 likes · 7 min read
Can AI Models Pass the Chinese Math Gaokao? A Fair, Objective Test
DataFunSummit
DataFunSummit
Jun 6, 2025 · Artificial Intelligence

Automating High‑Quality NL2SQL Data Synthesis with Intermediate Representations

This work tackles the difficulty of incorporating extensive domain knowledge into in‑domain NL2SQL tasks by proposing an intermediate‑representation‑based data synthesis method that decouples knowledge compliance from SQL generation, enabling automated creation of high‑quality training data with 60× human efficiency and over 97% accuracy.

NL2SQLSQL Generationdata synthesis
0 likes · 2 min read
Automating High‑Quality NL2SQL Data Synthesis with Intermediate Representations
Code Mala Tang
Code Mala Tang
Jun 5, 2025 · Artificial Intelligence

Mastering LLM Prompts: Proven Techniques to Get Precise Answers

By rethinking how we interact with large language models—using role‑play, task decomposition, chain‑of‑thought, ReAct, and other advanced prompting strategies—readers can transform generic ChatGPT answers into precise, context‑aware responses, leveraging pattern recognition and context windows for superior AI assistance.

AI reasoningLLM techniquesPrompt Engineering
0 likes · 21 min read
Mastering LLM Prompts: Proven Techniques to Get Precise Answers
Kuaishou Large Model
Kuaishou Large Model
Jun 5, 2025 · Artificial Intelligence

7 Kuaishou Papers Accepted at ACL 2025 Reveal Cutting‑Edge AI Advances

Kuaishou's foundational large‑model team secured seven papers at the prestigious ACL 2025 conference, covering alignment bias during model training, safety in inference, decoding strategies, fine‑grained video‑temporal understanding, and new evaluation benchmarks that push the frontier of multimodal large language models.

ACL 2025benchmarklarge language models
0 likes · 16 min read
7 Kuaishou Papers Accepted at ACL 2025 Reveal Cutting‑Edge AI Advances
Fun with Large Models
Fun with Large Models
Jun 5, 2025 · Artificial Intelligence

EvalScope: The Ultimate Large‑Model Evaluation Framework You Control

This article introduces EvalScope, an open‑source framework for evaluating large language models, detailing its architecture, built‑in benchmarks, installation steps, and step‑by‑step guides for both performance stress testing and dataset‑based capability assessment, enabling users to independently verify model quality without relying on official documentation.

EvalScopebenchmark datasetslarge language models
0 likes · 12 min read
EvalScope: The Ultimate Large‑Model Evaluation Framework You Control
Kuaishou Tech
Kuaishou Tech
Jun 5, 2025 · Artificial Intelligence

7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding

Kuaishou’s foundational large-model team has secured seven papers at ACL 2025, spanning alignment bias in training, safety defenses during inference, decoding strategies, fine-grained video-temporal understanding, reward fairness in RLHF, multimodal captioning benchmarks, and methods to curb hallucinations in vision-language models.

AI safetyaclbenchmark
0 likes · 13 min read
7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding
AntTech
AntTech
Jun 4, 2025 · Artificial Intelligence

LLaDA and LLaDA‑V: Large Language Diffusion Models and Their Multimodal Extensions

This article presents the LLaDA series of diffusion‑based large language models, explains how their generative‑modeling principle yields language intelligence comparable to autoregressive models, and details the multimodal LLaDA‑V architecture, training methods, experimental results, and broader implications for AI research.

Generative Modelingdiffusion modelsinstruction following
0 likes · 10 min read
LLaDA and LLaDA‑V: Large Language Diffusion Models and Their Multimodal Extensions
DataFunTalk
DataFunTalk
Jun 3, 2025 · Artificial Intelligence

Meta‑Capability Alignment: Psychologically Inspired Training to Endow Large Language Models with Stable Reasoning

Researchers from NUS, Tsinghua and Salesforce AI Research introduce a meta‑capability alignment framework that integrates deductive, inductive and abductive reasoning via a psychology‑based triple, automatically generates and validates training data, and demonstrates over 10% accuracy gains on math, coding and scientific benchmarks for 7B and 32B models.

Meta‑Capability AlignmentReasoninglarge language models
0 likes · 8 min read
Meta‑Capability Alignment: Psychologically Inspired Training to Endow Large Language Models with Stable Reasoning
Baobao Algorithm Notes
Baobao Algorithm Notes
Jun 3, 2025 · Artificial Intelligence

Can 1K Fine‑Tuning Replace 100K RL Steps? Insights from Re‑distillation Research

An extensive analysis shows that a 1K‑sample fine‑tuning stage can replicate the generalization gains of thousands of reinforcement‑learning steps, explains the compressibility of RL, introduces a sample‑effect theory, and demonstrates that re‑distillation and small‑scale SFT dramatically improve LLM performance.

Re-distillationSample Effectlarge language models
0 likes · 23 min read
Can 1K Fine‑Tuning Replace 100K RL Steps? Insights from Re‑distillation Research
Data Thinking Notes
Data Thinking Notes
Jun 2, 2025 · Artificial Intelligence

Why Pre‑Training Powers Modern AI: From Theory to Real‑World Applications

Pre‑training enables AI models to first acquire a universal knowledge map from massive unlabelled text, then quickly adapt to specific tasks with minimal labelled data, offering superior generalization, reduced annotation costs, and versatile applications across chatbots, content creation, retrieval, coding assistance, and more.

AI applicationsTransformerlarge language models
0 likes · 14 min read
Why Pre‑Training Powers Modern AI: From Theory to Real‑World Applications
AntTech
AntTech
May 31, 2025 · Artificial Intelligence

Machine Reasoning and Deep Thinking: Insights from Ant Financial’s NLP Lead Wu Wei

The article explores how DeepSeek R1 and long‑thinking chains have revived interest in machine reasoning, tracing the evolution of natural‑language models, defining reasoning as logical knowledge composition, and outlining future research directions in efficient reasoning architectures and deep‑thinking applications.

AI researchEfficient Reasoningdeep thinking
0 likes · 8 min read
Machine Reasoning and Deep Thinking: Insights from Ant Financial’s NLP Lead Wu Wei
AntTech
AntTech
May 30, 2025 · Artificial Intelligence

Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI

The Ant Group’s 10th Technical Open Day gathered leading AI experts who examined the current state and future directions of multimodal large models, embodied AI, world models, transformer architectures, and vertical applications, offering a comprehensive view of the challenges and opportunities on the path toward AGI.

AGIAI safetyEmbodied AI
0 likes · 16 min read
Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI
Model Perspective
Model Perspective
May 30, 2025 · Artificial Intelligence

Why Large Language Models Are Just Mathematical Functions: A Rational Perspective

The article argues that large language models are fundamentally mathematical functions that model human language, emphasizing their role as simplified representations, explaining their structural nature, sources of errors, the importance of prompts as boundary conditions, and the need for clear usage assumptions to avoid anthropomorphic misconceptions.

AI fundamentalsPrompt Engineeringlarge language models
0 likes · 11 min read
Why Large Language Models Are Just Mathematical Functions: A Rational Perspective
DevOps
DevOps
May 28, 2025 · Artificial Intelligence

Google Proposes a “Sufficient Context” Framework to Strengthen Enterprise Retrieval‑Augmented Generation Systems

Google researchers introduce a “sufficient context” framework that classifies retrieved passages as adequate or inadequate for answering a query, enabling large language models in enterprise RAG systems to decide when to answer, refuse, or request more information, thereby improving accuracy and reducing hallucinations.

AI reliabilityEnterprise AIRAG
0 likes · 9 min read
Google Proposes a “Sufficient Context” Framework to Strengthen Enterprise Retrieval‑Augmented Generation Systems
JD Cloud Developers
JD Cloud Developers
May 27, 2025 · Artificial Intelligence

How JD’s Young AI Engineers Tackle Real-World Model Challenges

Young JD algorithm engineers share how they solve tough AI problems—from optimizing large‑model training and reward‑model design for ad image generation, to building LLM‑based query expansion, agent evaluation, and model pruning with FFT and RDP—illustrating practical breakthroughs and personal growth in cutting‑edge AI research.

AIModel PruningReward Modeling
0 likes · 15 min read
How JD’s Young AI Engineers Tackle Real-World Model Challenges
Efficient Ops
Efficient Ops
May 26, 2025 · Artificial Intelligence

How AI Agents Are Revolutionizing AIOps: Boosting Automation and Efficiency

This article explains how AI agents enhance large‑model capabilities for AIOps, detailing single‑agent use cases like knowledge retrieval, tool guidance, and fault diagnosis, as well as multi‑agent collaborations, required skills, and future prospects for autonomous operations.

AIOperationsagent
0 likes · 7 min read
How AI Agents Are Revolutionizing AIOps: Boosting Automation and Efficiency
JD Tech
JD Tech
May 26, 2025 · Artificial Intelligence

Solving Technical Challenges at JD Retail: Multi‑Reward Models, LLM‑Based Query Expansion, Model Pruning, and Reinforcement Learning

This article details how JD Retail's young algorithm engineers tackled a series of AI engineering problems—including advertising image quality assessment with multi‑reward models, large‑language‑model‑driven query expansion, FFT‑and‑RDP‑based model pruning, and agent‑centric reinforcement learning—while sharing practical growth insights and code snippets.

AIcomputer visionlarge language models
0 likes · 15 min read
Solving Technical Challenges at JD Retail: Multi‑Reward Models, LLM‑Based Query Expansion, Model Pruning, and Reinforcement Learning
AI Frontier Lectures
AI Frontier Lectures
May 25, 2025 · Artificial Intelligence

Can Alternating Generation‑Reduction Make LLMs Think Faster? Introducing PENCIL

The paper presents PENCIL, a novel alternating generation‑and‑erasure reasoning paradigm that achieves optimal space‑time complexity for chain‑of‑thought tasks, dramatically improves accuracy and efficiency on hard SAT, QBF, and Einstein puzzle benchmarks, and is provably Turing‑complete.

Benchmark ResultsPencilchain-of-thought
0 likes · 12 min read
Can Alternating Generation‑Reduction Make LLMs Think Faster? Introducing PENCIL
DataFunTalk
DataFunTalk
May 24, 2025 · Artificial Intelligence

Why Apple and WeChat’s AI Rollouts Are Slower Than Expected

The article analyses how privacy concerns, data‑security priorities and an application‑first strategy cause both Apple’s Apple Intelligence and WeChat’s AI features to lag behind hype, examining product decisions, technical constraints, and the potential future of AI agents within these ecosystems.

AI integrationApplelarge language models
0 likes · 13 min read
Why Apple and WeChat’s AI Rollouts Are Slower Than Expected
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
May 19, 2025 · Artificial Intelligence

How WASP Generates High‑Quality DP Synthetic Data with Multi‑Model Collaboration

WASP is a privacy‑preserving framework that fuses multiple pretrained language models through a weighted Top‑Q voting scheme to synthesize differential‑private data, dramatically improving downstream task performance even when only a few private samples are available, and it scales to federated settings.

Multi-Model Fusiondifferential privacyfederated learning
0 likes · 19 min read
How WASP Generates High‑Quality DP Synthetic Data with Multi‑Model Collaboration
21CTO
21CTO
May 17, 2025 · Artificial Intelligence

Are Large Language Models Killing Stack Overflow? Data Shows the Decline

Recent data confirms that large language models have dramatically reduced Stack Overflow’s monthly question volume, dropping to levels seen in 2009, with key milestones from 2014 to 2025 illustrating how policy changes, the pandemic surge, and the rise of ChatGPT accelerated the platform’s decline.

Data AnalysisQuestion Volumelarge language models
0 likes · 5 min read
Are Large Language Models Killing Stack Overflow? Data Shows the Decline
Fighter's World
Fighter's World
May 17, 2025 · Industry Insights

Hidden Roadblocks That Sabotage B2B Large Model Products

The article dissects why many B2B GenAI projects fail to scale despite heavy investment, highlighting overlooked challenges in data preparation, model specialization, product integration, user experience, and organizational culture, and proposes concrete ways to bridge these gaps.

B2BData EngineeringGenAI
0 likes · 21 min read
Hidden Roadblocks That Sabotage B2B Large Model Products
Architects' Tech Alliance
Architects' Tech Alliance
May 16, 2025 · Industry Insights

Can DeepSeek Survive the AI Arms Race? A Deep Dive into Its Challenges and Competition

The article provides a comprehensive analysis of DeepSeek’s rise in the large‑model market, examining its technical merits, security and customization hurdles, slowing innovation, fierce competition from OpenAI, Google and Alibaba’s Qwen3, as well as the fragility of its open‑source ecosystem and data preparation, ultimately questioning its long‑term viability.

AI modelsDeepSeekOpen Source
0 likes · 13 min read
Can DeepSeek Survive the AI Arms Race? A Deep Dive into Its Challenges and Competition
Architect's Guide
Architect's Guide
May 13, 2025 · Artificial Intelligence

DeepSeek Model Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges

This article provides a comprehensive overview of DeepSeek's model distillation technology, detailing its definition, key innovations, architecture, training methods, performance gains, and the remaining challenges such as the implicit performance ceiling and multimodal data distillation.

DeepSeekKnowledge Transferai-optimization
0 likes · 14 min read
DeepSeek Model Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges
Baidu Geek Talk
Baidu Geek Talk
May 12, 2025 · Artificial Intelligence

One‑Click Deployment of Baidu Qwen3 Large Models on Baidu Baige AI Platform

This guide explains how to use Baidu Baige's AI heterogeneous computing platform to deploy the eight‑model Qwen3 family—including dense and MoE variants—via a one‑click process, covering resource configuration, inference acceleration options, and post‑deployment service access.

AIBaidu BaigeCloud AI
0 likes · 4 min read
One‑Click Deployment of Baidu Qwen3 Large Models on Baidu Baige AI Platform
Youzan Coder
Youzan Coder
May 12, 2025 · Artificial Intelligence

How Large Language Models Empower Business Development Engineers: Data Analysis, Model Training, and Rapid Prototyping

This article demonstrates how large language models can augment business development engineers by providing data insight, automating algorithm training, and enabling low‑cost rapid product prototyping, thereby transforming traditional backend‑focused roles into full‑stack, AI‑enhanced innovators.

AIData AnalysisPython
0 likes · 10 min read
How Large Language Models Empower Business Development Engineers: Data Analysis, Model Training, and Rapid Prototyping
AI Frontier Lectures
AI Frontier Lectures
May 10, 2025 · Artificial Intelligence

Can the ‘Canon’ Layer Unlock New Limits in Large Language Models?

A new study introduces the lightweight “Canon” layer for large language models, showing how it improves information flow, inference depth, and scalability across Transformers, linear attention, and state‑space architectures, while offering a controlled synthetic pre‑training benchmark for deeper architectural analysis.

AI researchMambacanonical layer
0 likes · 11 min read
Can the ‘Canon’ Layer Unlock New Limits in Large Language Models?
JD Tech
JD Tech
May 8, 2025 · Artificial Intelligence

The Emerging Boom of Large Model Applications and Why 2025 Will Be the Turning Point

Amid the AI wave, large language models like DeepSeek R1 are poised to explode by 2025, driven by open-source, low-cost access and superior reasoning, with successful deployment requiring four key factors—domain expertise, knowledge bases, robust search, and engineered agent architectures—to unlock value beyond simple chat.

2025AI applicationsAgent Architecture
0 likes · 10 min read
The Emerging Boom of Large Model Applications and Why 2025 Will Be the Turning Point
Frontend AI Walk
Frontend AI Walk
May 7, 2025 · Artificial Intelligence

How Cursor AI Coding Tool Transforms Development Workflow

The article introduces Cursor, an AI‑powered coding assistant, outlines its supported large models, demonstrates practical front‑end use cases such as automatic layout creation, button logic, screenshot‑to‑code generation, error fixing and code cleanup, and reflects on prompt engineering and tool selection.

AI coding assistantCursorFrontend Development
0 likes · 6 min read
How Cursor AI Coding Tool Transforms Development Workflow
Architect
Architect
May 5, 2025 · Artificial Intelligence

How Agentic RAG‑R1 Turns Retrieval‑Augmented Generation into an Autonomous AI Agent

Agentic RAG‑R1, an open‑source project from Peking University, combines Retrieval‑Augmented Generation with an agentic AI loop, introduces the GRPO reinforcement‑learning optimizer, supports LoRA‑based fine‑tuning, quantization and multimodal tool calls, and demonstrates significant accuracy gains on the MedQA benchmark across both Chinese and English test sets.

LLM Tool UseOpen SourceRetrieval-Augmented Generation
0 likes · 8 min read
How Agentic RAG‑R1 Turns Retrieval‑Augmented Generation into an Autonomous AI Agent
AI Frontier Lectures
AI Frontier Lectures
May 5, 2025 · Industry Insights

What Will Large Language Models Look Like in the Next Five Years? A Deep Dive into Trends and Challenges

The article reviews five years of AI model evolution, analyzes current scaling and reinforcement‑learning trends, and forecasts architectural, mathematical, and infrastructure directions for large language models through 2030, highlighting potential breakthroughs and the risks of over‑reliance on benchmarks.

AI trendsModel Scalingindustry analysis
0 likes · 22 min read
What Will Large Language Models Look Like in the Next Five Years? A Deep Dive into Trends and Challenges
Code Mala Tang
Code Mala Tang
May 2, 2025 · Artificial Intelligence

Debunking Common Misconceptions About the Model Context Protocol (MCP)

This article clarifies three major misunderstandings about the Model Context Protocol (MCP), explaining that it does not require large‑model support, works even without function‑calling capabilities, and is not natively built into models, while outlining how MCP standardizes context augmentation through a black‑box server architecture.

AIFunction CallingMCP
0 likes · 5 min read
Debunking Common Misconceptions About the Model Context Protocol (MCP)
JD Tech
JD Tech
Apr 30, 2025 · Artificial Intelligence

TimeHF: A Billion‑Scale Time Series Forecasting Model Guided by Human Feedback

The JD Supply Chain algorithm team introduces TimeHF, a billion‑parameter time‑series large model that leverages RLHF to boost demand‑forecast accuracy by over 10%, detailing dataset construction, the PCTLM architecture, a custom RLHF framework (TPO), and extensive SOTA experimental results.

Big DataRLHFSupply Chain
0 likes · 10 min read
TimeHF: A Billion‑Scale Time Series Forecasting Model Guided by Human Feedback
Cognitive Technology Team
Cognitive Technology Team
Apr 30, 2025 · Artificial Intelligence

AI Claims of Human-Level Intelligence Unveiled: Reliance on Massive Rules Over True Reasoning

The article critiques AI giants’ claims of nearing human-level intelligence, highlighting research that shows current models rely on massive rule memorization rather than genuine reasoning, leading to brittleness in navigation, mathematics, and adaptability, and emphasizing the need to understand these limitations for future progress.

AI limitationsArtificial IntelligenceMachine Learning
0 likes · 8 min read
AI Claims of Human-Level Intelligence Unveiled: Reliance on Massive Rules Over True Reasoning
Data Thinking Notes
Data Thinking Notes
Apr 29, 2025 · Artificial Intelligence

From Transformers to DeepSeek‑R1: How LLMs Evolved to 2025

This article chronicles the evolution of large language models from the 2017 Transformer breakthrough through BERT, GPT series, multimodal models, and recent cost‑efficient innovations like DeepSeek‑R1, highlighting key architectures, training methods, alignment techniques, and their transformative impact on AI applications.

AI alignmentTransformerlarge language models
0 likes · 29 min read
From Transformers to DeepSeek‑R1: How LLMs Evolved to 2025
DevOps
DevOps
Apr 28, 2025 · Artificial Intelligence

Vibe Coding: An Introduction to AI‑Driven Natural‑Language Programming

This article introduces Vibe Coding, an AI‑driven programming approach proposed by Andrej Karpathy, explains its core concepts, workflow, advantages, tools, use cases, best practices, and future outlook, and provides a complete example of generating a simple weather app using natural‑language prompts.

AI programmingBest PracticesVibe Coding
0 likes · 16 min read
Vibe Coding: An Introduction to AI‑Driven Natural‑Language Programming
ITPUB
ITPUB
Apr 28, 2025 · Artificial Intelligence

How Large Language Models are Transforming Automotive Operations and Optimization

In this interview, an automotive industry expert explains how large language models and advanced operations‑optimization techniques are reshaping vehicle design, production planning, logistics, and customer services, while also discussing implementation challenges, team requirements, and future AI‑driven opportunities.

AI adoptionAutomotive AIRAG
0 likes · 15 min read
How Large Language Models are Transforming Automotive Operations and Optimization
ZhongAn Tech Team
ZhongAn Tech Team
Apr 28, 2025 · Artificial Intelligence

Weekly Tech Overview: Major AI Model Updates, Industry Funding, and Expert Perspectives on AI Agents and Consciousness

This weekly technology digest highlights significant advancements in artificial intelligence, including OpenAI's GPT-4o upgrades, Tencent's Hunyuan 3D v2.5 release, and major funding rounds for xAI and Manus, alongside expert discussions on the future evolution of AI agent networks and the theoretical possibility of machine consciousness.

AI agentsAI fundingArtificial Intelligence
0 likes · 7 min read
Weekly Tech Overview: Major AI Model Updates, Industry Funding, and Expert Perspectives on AI Agents and Consciousness
Volcano Engine Developer Services
Volcano Engine Developer Services
Apr 28, 2025 · Artificial Intelligence

How ByteBrain’s AI‑Powered Infra is Redefining Cloud and Database Performance

ByteDance’s ByteBrain team showcases how large‑model AI, operations research, and system‑level innovations have produced award‑winning papers and billions of yuan in cost savings while improving on‑call efficiency, database estimation, and cloud infrastructure reliability.

AICloud ComputingInfrastructure Optimization
0 likes · 9 min read
How ByteBrain’s AI‑Powered Infra is Redefining Cloud and Database Performance
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Apr 25, 2025 · Artificial Intelligence

How Evidence Generation Boosts Document-Grounded Dialogue with LLMs

This study introduces DGDE, a document‑grounded dialogue framework that leverages large language model‑generated evidence, combining retrieval, reranking, fine‑tuning, and iterative question correction to markedly improve accuracy, comprehensiveness, coherence, and completeness on the Doc2dial benchmark.

document-grounded dialogueevidence generationfine-tuning
0 likes · 21 min read
How Evidence Generation Boosts Document-Grounded Dialogue with LLMs
DataFunTalk
DataFunTalk
Apr 25, 2025 · Artificial Intelligence

Does Reinforcement Learning Really Expand Reasoning Capacity in Large Language Models? Insights from Recent Empirical Study

Recent empirical research by Tsinghua’s LeapLab and Shanghai Jiao Tong University reveals that reinforcement‑learning‑based fine‑tuning (RLVR) improves sampling efficiency but does not extend the fundamental reasoning abilities of large language models beyond their base capabilities, as demonstrated across mathematics, code, and visual reasoning benchmarks.

AI researchRLVRReasoning
0 likes · 12 min read
Does Reinforcement Learning Really Expand Reasoning Capacity in Large Language Models? Insights from Recent Empirical Study
Didi Tech
Didi Tech
Apr 24, 2025 · Artificial Intelligence

Algorithmic Foundations and Evolution of Natural Language Processing

The article surveys the Algorithmic Foundations of Engineering R&D series, tracing NLP’s evolution from rule‑based systems to today’s multimodal large‑model era, reviewing core machine‑learning and deep‑learning techniques, transformer breakthroughs, representation learning, optimization methods, and emerging research such as retrieval‑augmented generation and AI agents.

AINLPTransformer
0 likes · 43 min read
Algorithmic Foundations and Evolution of Natural Language Processing
Alimama Tech
Alimama Tech
Apr 23, 2025 · Artificial Intelligence

How AI Agents Outsmart Humans in the “Who Is Spy” Campus Challenge

The campus AI Agent competition showcased how large‑language‑model‑powered agents can reason, deceive, and collaborate in a social deduction game, revealing model performance trends, participant insights, and future directions for multi‑agent AI research.

AIAgent Competitionlarge language models
0 likes · 6 min read
How AI Agents Outsmart Humans in the “Who Is Spy” Campus Challenge
DevOps
DevOps
Apr 22, 2025 · Artificial Intelligence

How to Think About Agent Frameworks: A Critical Review of Design Patterns, Challenges, and LangGraph

This article critically examines popular agent frameworks, compares OpenAI and Anthropic definitions, highlights the core difficulty of maintaining proper context for reliable agents, and presents LangGraph’s declarative and imperative features along with practical guidance for building production‑grade agent systems.

AI EngineeringAgent FrameworksAgent Systems
0 likes · 24 min read
How to Think About Agent Frameworks: A Critical Review of Design Patterns, Challenges, and LangGraph
Architects' Tech Alliance
Architects' Tech Alliance
Apr 22, 2025 · Artificial Intelligence

What Are AI Agents? Definitions, Types, and Cutting‑Edge Technologies Explained

This article provides a comprehensive overview of AI agents, covering their definition, classification into language‑based, vision‑based, and multimodal types, core capabilities such as understanding, perception, planning, and action, and recent breakthroughs like OpenAI ComputerUse, SpiritSight, and MobileFlow.

AI agentsComputerUseMobileFlow
0 likes · 9 min read
What Are AI Agents? Definitions, Types, and Cutting‑Edge Technologies Explained
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 22, 2025 · Artificial Intelligence

How DistilQwen2.5-DS3-0324 Achieves Fast, Accurate Reasoning via Quick‑Think Distillation

This article introduces DistilQwen2.5-DS3-0324, a distilled language model series that balances rapid inference with strong reasoning by applying a fast‑thinking chain‑of‑thought strategy, details its two‑stage distillation framework, evaluation on diverse benchmarks, and provides code for downloading and using the models.

chain-of-thoughtdeep learningfast inference
0 likes · 17 min read
How DistilQwen2.5-DS3-0324 Achieves Fast, Accurate Reasoning via Quick‑Think Distillation
Architect
Architect
Apr 21, 2025 · Artificial Intelligence

Microsoft Research Releases BitNet b1.58 2B4T: A 1‑Bit Native Large Language Model with Ultra‑Low Memory and Energy Consumption

Microsoft Research introduced BitNet b1.58 2B4T, a native 1‑bit large language model with 2 billion parameters trained on 4 trillion tokens, achieving only 0.4 GB non‑embedding memory, 0.028 J decoding energy, and 29 ms CPU latency while matching full‑precision performance.

1-bit LLMAI researchBitNet
0 likes · 7 min read
Microsoft Research Releases BitNet b1.58 2B4T: A 1‑Bit Native Large Language Model with Ultra‑Low Memory and Energy Consumption
Baidu Tech Salon
Baidu Tech Salon
Apr 16, 2025 · Artificial Intelligence

Release of the 'Fangsheng' Large Model Benchmark Results (Q1 2025) and Overview of Baidu's Wenxin 4.5 and X1 Models

The China AI Industry Alliance unveiled its Q1 2025 Fangsheng benchmark, showing Baidu’s new multimodal models—Wenxin 4.5 leading basic abilities and Wenxin X1 excelling in reasoning—available for free on the Wenxin Yiyan platform, while Baidu pledges major 2025 investments in AI, data‑center and cloud infrastructure.

AIFactTestingWenxin
0 likes · 4 min read
Release of the 'Fangsheng' Large Model Benchmark Results (Q1 2025) and Overview of Baidu's Wenxin 4.5 and X1 Models
Baidu Geek Talk
Baidu Geek Talk
Apr 16, 2025 · Industry Insights

What Do the Latest AIIA FactTesting Benchmarks Reveal About China’s Large Language Models?

At the AIIA’s 14th plenary meeting in Nanjing, the FactTesting benchmark released its Q1 2025 results, evaluating over 200 large models and highlighting Baidu’s Wenxin 4.5 and Wenxin X1 as leaders in basic and reasoning capabilities, while outlining the expanded multimodal and agent testing roadmap for the year.

AI BenchmarkChina AIFactTesting
0 likes · 5 min read
What Do the Latest AIIA FactTesting Benchmarks Reveal About China’s Large Language Models?
DaTaobao Tech
DaTaobao Tech
Apr 16, 2025 · Artificial Intelligence

Comparative Analysis of AI Development Tools (2024‑2025)

The 2024‑2025 comparative review evaluates cloud‑based AI development platforms, AI‑native code editors, IDE plugins, and their underlying large language models—detailing features, user experience, pricing, open‑source status, strengths and weaknesses, offering recommendations for UI prototyping, full‑stack projects, and forecasting future multimodal, collaborative AI‑assisted development trends.

AI Development Toolscode generationlarge language models
0 likes · 24 min read
Comparative Analysis of AI Development Tools (2024‑2025)
Data Thinking Notes
Data Thinking Notes
Apr 15, 2025 · Artificial Intelligence

Understanding AI Agents: From Reinforcement Learning to LLM-Powered Planning

Professor Li Hongyi’s lecture provides a comprehensive, step‑by‑step exploration of AI agents, covering their definitions, reinforcement‑learning roots, LLM integration, memory mechanisms, tool usage, planning strategies, benchmarks, and practical examples, offering a valuable resource for anyone studying modern artificial intelligence.

AI agentsMemoryPlanning
0 likes · 67 min read
Understanding AI Agents: From Reinforcement Learning to LLM-Powered Planning
Baidu Geek Talk
Baidu Geek Talk
Apr 14, 2025 · Artificial Intelligence

PaddlePaddle Framework 3.0: Five Core Breakthroughs Reshaping Large Model Development

PaddlePaddle Framework 3.0 delivers five breakthroughs—dynamic‑static unified automatic parallelism, integrated training‑inference pipelines, high‑order scientific differentiation, a neural‑network compiler with automatic operator fusion, and streamlined heterogeneous chip adaptation—drastically reducing development effort, boosting training speed, and expanding compatibility for large‑scale AI models.

AI InfrastructureModel Inference OptimizationPaddlePaddle
0 likes · 23 min read
PaddlePaddle Framework 3.0: Five Core Breakthroughs Reshaping Large Model Development
AI Algorithm Path
AI Algorithm Path
Apr 13, 2025 · Artificial Intelligence

Understanding GRPO: Group Relative Policy Optimization for LLM Training

The article explains GRPO, a reinforcement‑learning algorithm that extends PPO with group sampling, no value network, dual penalties and KL regularisation, showing how it improves efficiency and stability when fine‑tuning large language models such as DeepSeek‑Math and DeepSeek‑R1.

DeepSeekGRPOPPO
0 likes · 6 min read
Understanding GRPO: Group Relative Policy Optimization for LLM Training
AntTech
AntTech
Apr 10, 2025 · Artificial Intelligence

Ant Group Presents Four AI Research Papers at ICLR 2025 Live Showcase

At the ICLR 2025 live session in Singapore, Ant Group showcased four cutting‑edge papers—CodePlan, Animate‑X, Group Position Embedding, and OmniKV—demonstrating advances in large‑language‑model reasoning, universal character animation, layout‑aware document understanding, and efficient long‑context inference.

AI researchReasoningdocument understanding
0 likes · 6 min read
Ant Group Presents Four AI Research Papers at ICLR 2025 Live Showcase