Tag

large language model

1 views collected around this technical thread.

JD Tech
JD Tech
Jun 16, 2025 · Artificial Intelligence

How JD Engineers Leverage LLMs and Sparse Models to Boost Search and Ads

This article showcases three JD tech case studies—using large language models for e‑commerce query expansion, applying sparse large models with scaling‑law experiments to improve ad prediction, and building proactive risk‑prevention systems—to illustrate practical AI engineering that drives higher recall, conversion, and system robustness.

advertisinge‑commercelarge language model
0 likes · 8 min read
How JD Engineers Leverage LLMs and Sparse Models to Boost Search and Ads
TAL Education Technology
TAL Education Technology
Jun 13, 2025 · Operations

How Large Language Models Are Revolutionizing Fault Localization

This article explores how the rapid rise of large language models and techniques like Retrieval‑Augmented Generation, Chain‑of‑Thought prompting, and multi‑agent architectures can dramatically improve the speed, accuracy, and automation of fault localization in modern operations environments.

CoTOperationsRAG
0 likes · 14 min read
How Large Language Models Are Revolutionizing Fault Localization
Baidu Tech Salon
Baidu Tech Salon
Jun 11, 2025 · Artificial Intelligence

Why Baidu’s Wenxin Model Dominates IDC’s 2025 Large Model Evaluation

IDC’s 2025 China foundational large‑model evaluation crowns Baidu’s Wenxin as the top performer, scoring perfect marks in seven of eight criteria and highlighting its superior multimodal, dialogue, and ecosystem capabilities among twelve leading models.

AIBaidu WenxinIDC evaluation
0 likes · 5 min read
Why Baidu’s Wenxin Model Dominates IDC’s 2025 Large Model Evaluation
Nightwalker Tech
Nightwalker Tech
Jun 11, 2025 · Artificial Intelligence

Turn Your AI Coding Assistant into a Critical Mentor, Not Just a Tool

This guide explains how to shift AI coding tools like Cursor, Windsurf, and RooCode from simple code generators into proactive mentors that critique, suggest improvements, and adopt multiple specialized modes, while also covering prompt design, multi‑round dialogue, and practical code examples.

AISoftware Developmentcoding assistant
0 likes · 15 min read
Turn Your AI Coding Assistant into a Critical Mentor, Not Just a Tool
DataFunSummit
DataFunSummit
Jun 10, 2025 · Artificial Intelligence

How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety

Quwan Technology presents its Kaitian social large model, designed for personalized, emotionally rich, multimodal AI interactions, detailing its scene‑specific goals, CPT+SFT+RLHF training pipeline, data desensitization, LoRA fine‑tuning, evaluation methods, pruning, latency trade‑offs, safety mechanisms, and future feedback loops.

AI safetyLoRARLHF
0 likes · 13 min read
How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 6, 2025 · Artificial Intelligence

How dots.llm1 Sets New Benchmarks for Open‑Source MoE Language Models

dots.llm1, an open‑source 142‑billion‑parameter Mixture‑of‑Experts language model from hi lab, achieves Qwen2.5‑72B‑level performance after training on 11.2 T high‑quality tokens, and the release includes full models, intermediate checkpoints, and detailed training pipelines for the research community.

AI researchMixture of ExpertsTraining Efficiency
0 likes · 10 min read
How dots.llm1 Sets New Benchmarks for Open‑Source MoE Language Models
Kuaishou Tech
Kuaishou Tech
Jun 4, 2025 · Artificial Intelligence

KwaiCoder-AutoThink-preview: An Automatic‑Thinking Large Model Enhanced with Step‑SRPO Reinforcement Learning

The KwaiPilot team released the KwaiCoder‑AutoThink‑preview model, which introduces a novel automatic‑thinking training paradigm and a process‑supervised reinforcement‑learning method called Step‑SRPO, enabling the model to dynamically switch between thinking and non‑thinking modes, reduce inference cost, and achieve up to 20‑point gains on code and math benchmarks while handling large‑scale codebases.

AI researchCode Generationautomatic thinking
0 likes · 12 min read
KwaiCoder-AutoThink-preview: An Automatic‑Thinking Large Model Enhanced with Step‑SRPO Reinforcement Learning
Efficient Ops
Efficient Ops
May 29, 2025 · Artificial Intelligence

DeepSeek R1 0528 Update: New Features, Performance Gains Over OpenAI o3

DeepSeek quietly launched the R1 0528 model, which early testers report matches OpenAI’s o3 in benchmarks and style, while adding deeper chain‑of‑thought reasoning, better writing output, and extended thinking windows, and the announcement is followed by a promotion for the GOPS Global Ops Conference.

AI performanceChain-of-ThoughtDeepSeek
0 likes · 3 min read
DeepSeek R1 0528 Update: New Features, Performance Gains Over OpenAI o3
IT Services Circle
IT Services Circle
May 25, 2025 · Artificial Intelligence

DeepSeek Core Technologies and Model Innovations: DeepSeek‑V3 and DeepSeek‑R1 Technical Overview

The article provides a detailed technical overview of DeepSeek's flagship large language models, DeepSeek‑V3 and DeepSeek‑R1, describing their MoE architecture, training frameworks, reinforcement‑learning based fine‑tuning, inference optimizations, and the broader impact of these innovations on the AI landscape while also promoting related books and resources.

AIDeepSeekMixture of Experts
0 likes · 10 min read
DeepSeek Core Technologies and Model Innovations: DeepSeek‑V3 and DeepSeek‑R1 Technical Overview
Tencent Technical Engineering
Tencent Technical Engineering
May 23, 2025 · Artificial Intelligence

The Evolution, Challenges, and Future Directions of AI Agents

An in‑depth overview traces the development of AI agents from early LLM milestones to modern “class‑Agent” models, examines core components such as memory, tool use, planning and reflection, analyzes current limitations, and outlines emerging solutions like workflows, multi‑agent systems, and model‑as‑product paradigms.

AI AgentAgentic WorkflowFunction Call
0 likes · 40 min read
The Evolution, Challenges, and Future Directions of AI Agents
Baidu Tech Salon
Baidu Tech Salon
May 21, 2025 · Artificial Intelligence

Baidu AI Day 2024: Wenxin X1 Turbo Sets New Benchmark with Top‑Level Evaluation and Advanced Multimodal Capabilities

At Baidu AI Day in Beijing, the company unveiled the Wenxin 4.5 Turbo and X1 Turbo models, detailing multimodal training breakthroughs, self‑feedback loops, enhanced reasoning and tool‑calling, while the China Academy of Information and Communications Technology awarded X1 Turbo the highest "4+" rating across 24 capability tests, highlighting its leading position in domestic large‑model performance.

Artificial IntelligenceBaiduWenxin
0 likes · 9 min read
Baidu AI Day 2024: Wenxin X1 Turbo Sets New Benchmark with Top‑Level Evaluation and Advanced Multimodal Capabilities
Tencent Technical Engineering
Tencent Technical Engineering
May 19, 2025 · Artificial Intelligence

RAG, Agents, and Multimodal Large Models: Evolution, Challenges, and Future Trends

This article examines the evolution of large model technologies—including Retrieval‑Augmented Generation, AI agents, and multimodal models—detailing their technical foundations, practical challenges, industry applications, and future development trends, offering a comprehensive perspective for AI practitioners and researchers.

AI AgentRAGknowledge retrieval
0 likes · 14 min read
RAG, Agents, and Multimodal Large Models: Evolution, Challenges, and Future Trends
DataFunSummit
DataFunSummit
May 17, 2025 · Artificial Intelligence

Integrating Knowledge Graphs with DeepSeek AI for Enterprise Knowledge Management

This presentation explores how combining knowledge graphs with DeepSeek large‑model agents can revolutionize enterprise knowledge management, detailing DeepSeek’s technical strengths, the graph‑model complementarity paradigm, various knowledge types, practical frameworks, case studies, and future outlooks for AI‑enhanced intelligent systems.

Artificial IntelligenceDeepSeekEnterprise Knowledge Management
0 likes · 23 min read
Integrating Knowledge Graphs with DeepSeek AI for Enterprise Knowledge Management
Architecture and Beyond
Architecture and Beyond
May 16, 2025 · Artificial Intelligence

Understanding AI Hallucinations: The Fictional Reality of Large Language Models

The essay explores why AI systems produce hallucinations by viewing their reality as a vast fictional narrative built from human language data, arguing that their knowledge is bounded by the corpus they ingest, and reflecting on philosophical limits of language and truth.

AIhallucinationknowledge limits
0 likes · 11 min read
Understanding AI Hallucinations: The Fictional Reality of Large Language Models
Alimama Tech
Alimama Tech
May 14, 2025 · Artificial Intelligence

Deep Research‑Driven Risk Root‑Cause Analysis with Domain Graph Constraints for Large‑Scale Advertising Traffic

This article presents a large‑scale advertising risk‑control solution that combines deep‑research paradigms, domain‑graph constraints, and large language models to enable explainable, responsible, and high‑precision fraud detection, detailing system architecture, challenges, demo workflow, and future directions.

AIDeep Researchadvertising fraud
0 likes · 11 min read
Deep Research‑Driven Risk Root‑Cause Analysis with Domain Graph Constraints for Large‑Scale Advertising Traffic
Alimama Tech
Alimama Tech
May 12, 2025 · Artificial Intelligence

Universal Recommendation Model (URM): A General Large‑Model Recall System for Advertising

The article presents the Universal Recommendation Model (URM), a large‑language‑model‑based recall framework that integrates world knowledge and e‑commerce expertise through knowledge injection and prompt‑driven alignment, achieving significant offline recall gains and a 3.1% increase in ad consumption while meeting high‑QPS, low‑latency production constraints.

advertisinghigh QPSlarge language model
0 likes · 17 min read
Universal Recommendation Model (URM): A General Large‑Model Recall System for Advertising
DevOps
DevOps
May 5, 2025 · Artificial Intelligence

DeepSeek Releases Math‑Specialized Large Model V2 and ProverBench Evaluation Suite

DeepSeek has quietly open‑sourced a new mathematics‑focused large language model, DeepSeek‑Prover‑V2 (available in 671B and 7B variants), achieving 88.9% on MiniF2F and strong results on PutnamBench, alongside the high‑quality ProverBench dataset and a novel recursive theorem‑proving pipeline.

AIDeepSeekProverBench
0 likes · 4 min read
DeepSeek Releases Math‑Specialized Large Model V2 and ProverBench Evaluation Suite
Architects' Tech Alliance
Architects' Tech Alliance
May 2, 2025 · Artificial Intelligence

DeepSeek‑Prover‑V2‑671B: A Massive AI Model for Formal Mathematical Theorem Proving

DeepSeek‑Prover‑V2‑671B, a 671 billion‑parameter AI model released on Hugging Face, dramatically advances formal mathematical theorem proving with MoE architecture, FP8 quantization, 163 k token context, superior performance over GPT‑4 Turbo and other models, and broad implications for research and industry.

AIDeepSeekFP8 Quantization
0 likes · 11 min read
DeepSeek‑Prover‑V2‑671B: A Massive AI Model for Formal Mathematical Theorem Proving
DataFunTalk
DataFunTalk
Apr 29, 2025 · Artificial Intelligence

ChatGPT Adds Shopping Feature and Alibaba Unveils Qwen3 Model Series

OpenAI announced new shopping capabilities for ChatGPT, improving product recommendation, visual presentation, and direct purchase links, while Alibaba released the Qwen3 series of large and MoE language models with detailed parameter counts and benchmark performance, highlighting rapid advancements in consumer‑focused AI applications.

AIArtificial IntelligenceChatGPT
0 likes · 4 min read
ChatGPT Adds Shopping Feature and Alibaba Unveils Qwen3 Model Series
Java Architecture Diary
Java Architecture Diary
Apr 29, 2025 · Artificial Intelligence

Why Qwen3 Is the New Powerhouse in Open‑Source AI Models

Qwen3 introduces a suite of open‑source models—from a 235B expert model to compact 0.6B versions—offering competitive performance against top proprietary models, multilingual support, flexible thinking modes, and low deployment requirements, with detailed usage instructions via Ollama and OpenRouter.

OllamaQwen3large language model
0 likes · 8 min read
Why Qwen3 Is the New Powerhouse in Open‑Source AI Models