Tagged articles
2079 articles
Page 16 of 21
AI Large Model Application Practice
AI Large Model Application Practice
Feb 17, 2025 · Artificial Intelligence

Mastering Structured Output for DeepSeek‑R1 with LangChain, LangGraph, and ReAct Agents

DeepSeek‑R1 excels at deep reasoning but lacks native structured output; this guide explains why structured output matters, outlines common API‑level techniques, and provides three practical solutions—using an auxiliary model with a LangChain chain, a LangGraph workflow, and a ReAct agent—complete with code snippets and JSON‑mode tips.

DeepSeekLLMLangChain
0 likes · 12 min read
Mastering Structured Output for DeepSeek‑R1 with LangChain, LangGraph, and ReAct Agents
Code Mala Tang
Code Mala Tang
Feb 16, 2025 · Artificial Intelligence

17 Proven Prompt Engineering Techniques to Master LLM Interactions

This article presents 17 practical prompt‑engineering strategies—ranging from zero‑shot and few‑shot prompting to role, style, and chain‑of‑thought methods—explaining their usage, ideal scenarios, and concrete examples to help you obtain higher‑quality responses from large language models.

Artificial IntelligenceChatGPTLLM
0 likes · 14 min read
17 Proven Prompt Engineering Techniques to Master LLM Interactions
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Feb 15, 2025 · Artificial Intelligence

FinRL‑DeepSeek: How Integrating DeepSeek with RL Improves Portfolio Returns (Code Open‑Source)

This article reviews a new risk‑sensitive trading agent that combines reinforcement learning with large language models to extract stock recommendations and news‑based risk scores, describes the extended CVaR‑PPO algorithm, presents extensive experiments on the FNSPID dataset, and discusses the resulting performance gains and future work.

Algorithmic TradingCVaRDeepSeek
0 likes · 10 min read
FinRL‑DeepSeek: How Integrating DeepSeek with RL Improves Portfolio Returns (Code Open‑Source)
Alibaba Cloud Developer
Alibaba Cloud Developer
Feb 14, 2025 · Artificial Intelligence

Unlock Faster LLM Inference: Full Stack of Chips, Frameworks & Services

The article examines the end‑to‑end architecture for large‑model inference, detailing seven layers—from chip hardware and programming toolkits to deep‑learning frameworks, inference accelerators, model providers, compute platforms, application orchestration, and traffic management—highlighting key vendors, open‑source projects, and performance‑optimizing techniques.

AI hardwareLLMOpen-source
0 likes · 12 min read
Unlock Faster LLM Inference: Full Stack of Chips, Frameworks & Services
Architect
Architect
Feb 13, 2025 · Artificial Intelligence

How to Build a Mini ChatGPT on a Single GPU with MiniMind

This article provides a comprehensive, step‑by‑step guide to training and fine‑tuning a miniature large‑language model called MiniMind, covering lightweight model design, open‑source training pipelines, required datasets, tokenizer options, and deployment via a web UI, all using PyTorch on modest hardware.

AILLMMiniMind
0 likes · 11 min read
How to Build a Mini ChatGPT on a Single GPU with MiniMind
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Feb 13, 2025 · Cloud Computing

Deploy DeepSeek‑R1 LLM on Alibaba Cloud ACK One with ACS GPU in Minutes

This guide walks you through deploying the DeepSeek‑R1 large‑language‑model inference service on Alibaba Cloud ACK One registered clusters using ACS GPU compute, covering model preparation, OSS storage setup, PersistentVolume configuration, arena‑based service deployment, and verification steps with concrete commands and parameters.

ACK OneACS GPUDeepSeek
0 likes · 14 min read
Deploy DeepSeek‑R1 LLM on Alibaba Cloud ACK One with ACS GPU in Minutes
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Feb 13, 2025 · Artificial Intelligence

Deploying DeepSeek‑R1 671B Distributed Inference Service on Alibaba Cloud ACK with vLLM and Dify

This article explains how to quickly deploy the full‑parameter DeepSeek‑R1 671B model in a multi‑node GPU‑enabled Kubernetes cluster on Alibaba Cloud ACK, covering prerequisites, model parallelism, vLLM‑Ray distributed deployment, service verification, and integration with Dify to build a private AI Q&A assistant.

DeepSeekDifyDistributed Deployment
0 likes · 12 min read
Deploying DeepSeek‑R1 671B Distributed Inference Service on Alibaba Cloud ACK with vLLM and Dify
JD Tech Talk
JD Tech Talk
Feb 13, 2025 · Artificial Intelligence

DeepSeek R1: Concept Overview, Training Principles, and Practical Implementations

This article introduces the DeepSeek family of models, explains the concepts of online search and deep reasoning, details the two‑phase training pipeline with data augmentation and reinforcement learning, and showcases practical experiments and deployment examples for the R1 and distilled variants.

DeepSeekKnowledge DistillationLLM
0 likes · 10 min read
DeepSeek R1: Concept Overview, Training Principles, and Practical Implementations
Baobao Algorithm Notes
Baobao Algorithm Notes
Feb 13, 2025 · Artificial Intelligence

How to Build and Improve Reasoning LLMs: Methods, Trade‑offs, and DeepSeek Insights

This article explains what reasoning language models are, when they are needed, and reviews four main techniques— inference‑time scaling, pure reinforcement learning, combined SFT + RL, and distillation—illustrated with DeepSeek‑R1’s development, cost analysis, and low‑budget alternatives.

AI researchDeepSeekInference Scaling
0 likes · 27 min read
How to Build and Improve Reasoning LLMs: Methods, Trade‑offs, and DeepSeek Insights
vivo Internet Technology
vivo Internet Technology
Feb 12, 2025 · Artificial Intelligence

Bidirectional Optimization of NLLB-200 and ChatGPT for Low-Resource Language Translation

The paper proposes a bidirectional optimization framework that fine‑tunes the low‑resource NLLB‑200 translation model with LoRA using data generated by ChatGPT, while also translating low‑resource prompts with NLLB before feeding them to LLMs, thereby improving multilingual translation quality yet requiring careful validation of noisy synthetic data.

LLMLoRANLLB
0 likes · 28 min read
Bidirectional Optimization of NLLB-200 and ChatGPT for Low-Resource Language Translation
JD Retail Technology
JD Retail Technology
Feb 12, 2025 · Artificial Intelligence

Accelerating Generative Recommendation with NVIDIA TensorRT‑LLM in JD Advertising

JD Advertising accelerates its generative‑recall recommendation system by integrating NVIDIA TensorRT‑LLM, which simplifies the pipeline, injects LLM knowledge, scales to billions of parameters, and delivers over five‑fold throughput gains, one‑fifth the cost, and significant CTR improvements in both recommendation and search.

Inference OptimizationLLMRecommendation Systems
0 likes · 13 min read
Accelerating Generative Recommendation with NVIDIA TensorRT‑LLM in JD Advertising
Architect
Architect
Feb 10, 2025 · Artificial Intelligence

Evolution of DeepSeek Mixture‑of‑Experts (MoE) Architecture from V1 to V3

This article reviews the development of DeepSeek's Mixture-of-Experts (MoE) models, tracing their evolution from the original DeepSeekMoE V1 through V2 to V3, detailing architectural innovations such as fine‑grained expert segmentation, shared‑expert isolation, load‑balancing losses, device‑limited routing, and the shift from softmax to sigmoid gating.

DeepSeekLLMMixture of Experts
0 likes · 21 min read
Evolution of DeepSeek Mixture‑of‑Experts (MoE) Architecture from V1 to V3
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Feb 10, 2025 · Artificial Intelligence

Hybrid Cloud Elastic LLM Inference Solution with ACK Edge and KServe

This article presents a hybrid‑cloud solution that uses ACK Edge and KServe to dynamically allocate on‑premise and cloud GPU resources for large‑language‑model inference, addressing tidal traffic patterns, reducing costs, and ensuring high availability through elastic scaling and custom scheduling policies.

ACK@EdgeAuto ScalingKServe
0 likes · 13 min read
Hybrid Cloud Elastic LLM Inference Solution with ACK Edge and KServe
JD Retail Technology
JD Retail Technology
Feb 10, 2025 · Artificial Intelligence

JD Merchant Intelligent Assistant: Multi‑Agent Architecture and Technical Exploration

The JD Merchant Intelligent Assistant employs a large‑language‑model‑driven multi‑agent architecture with dynamic ReAct planning, enabling merchants to query and execute store operations in under a second with over 90 % decision accuracy, while reducing inference cost, hallucinations, and engineering effort across diverse e‑commerce tasks.

AILLMReAct
0 likes · 25 min read
JD Merchant Intelligent Assistant: Multi‑Agent Architecture and Technical Exploration
Top Architect
Top Architect
Feb 9, 2025 · Artificial Intelligence

DeepSeek‑R1: Training Pipeline, Reinforcement‑Learning Techniques, and Experimental Results

The article reviews DeepSeek‑R1’s training methodology—including cold‑start data collection, multi‑stage RL fine‑tuning, SFT data generation, and model distillation—highlights its performance comparable to OpenAI‑o1‑1217, and discusses key contributions, reward design, successful experiments, and failed attempts.

AI researchDeepSeekLLM
0 likes · 12 min read
DeepSeek‑R1: Training Pipeline, Reinforcement‑Learning Techniques, and Experimental Results
Infra Learning Club
Infra Learning Club
Feb 8, 2025 · Artificial Intelligence

Multi-Agent LLMs Explained: Benefits, Workflows, and Leading Frameworks

The article surveys the rise of multi‑agent LLM systems, detailing how specialized agents collaborate on tasks such as travel planning, outlining their workflow, comparing them with single‑agent models, listing prominent frameworks, and discussing current challenges and research citations.

AIAgent CollaborationAutoGen
0 likes · 13 min read
Multi-Agent LLMs Explained: Benefits, Workflows, and Leading Frameworks
MaGe Linux Operations
MaGe Linux Operations
Feb 7, 2025 · Artificial Intelligence

How to Deploy DeepSeek R1 Locally: A Step‑by‑Step AI Model Guide

This article walks you through everything you need to know about DeepSeek R1—including its different model sizes, hardware requirements, installation tools like Ollama, LM Studio and Docker, and how to set up a visual interface with Open‑WebUI or Dify—for offline, private, and cost‑effective AI inference.

AIDeepSeekDocker
0 likes · 15 min read
How to Deploy DeepSeek R1 Locally: A Step‑by‑Step AI Model Guide
iKang Technology Team
iKang Technology Team
Feb 7, 2025 · Artificial Intelligence

Retrieval‑Augmented Generation (RAG) with LangChain: Concepts and Python Implementation

Retrieval‑Augmented Generation (RAG) using LangChain lets developers enhance large language models by embedding user queries, fetching relevant documents from a vector store, inserting the context into a prompt template, and generating concise, source‑grounded answers, offering low‑cost, up‑to‑date knowledge while reducing hallucinations and fine‑tuning expenses.

LLMLangChainRAG
0 likes · 10 min read
Retrieval‑Augmented Generation (RAG) with LangChain: Concepts and Python Implementation
Top Architect
Top Architect
Feb 6, 2025 · Artificial Intelligence

Deploying DeepSeek R1 671B Model Locally with Ollama: Quantization, Hardware Requirements, and Step‑by‑Step Guide

This article provides a comprehensive tutorial on locally deploying the full‑size DeepSeek R1 671B model using Ollama, covering dynamic quantization options, hardware specifications, detailed installation commands, configuration files, performance observations, and practical recommendations for consumer‑grade systems.

AIDeepSeekGPU
0 likes · 14 min read
Deploying DeepSeek R1 671B Model Locally with Ollama: Quantization, Hardware Requirements, and Step‑by‑Step Guide
Alibaba Cloud Developer
Alibaba Cloud Developer
Feb 5, 2025 · Artificial Intelligence

10 Common Prompt Engineering Mistakes and How to Overcome Them

This article lists ten common misconceptions about prompt engineering, explains why each is flawed, and offers practical insights and strategies—such as using the CO‑STAR framework, tailoring prompts to specific models, keeping prompts concise, and continuously testing and refining—to help readers communicate effectively with large language models.

AI misconceptionsLLMlarge language models
0 likes · 10 min read
10 Common Prompt Engineering Mistakes and How to Overcome Them
21CTO
21CTO
Feb 4, 2025 · Artificial Intelligence

Is DeepSeek the Next Challenger to ChatGPT? A Deep Dive into Its AI Edge

This article explains what DeepSeek is, how its open‑source large language model works, its unique multilingual training, free access, the DeepSeek‑Coder variant, and compares its capabilities and goals with ChatGPT, highlighting strengths, limitations, and market impact.

AI modelsChatGPT comparisonDeepSeek
0 likes · 7 min read
Is DeepSeek the Next Challenger to ChatGPT? A Deep Dive into Its AI Edge
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 1, 2025 · Artificial Intelligence

Deploy DeepSeek-V3 and R1 Models with One-Click on Alibaba Cloud PAI Model Gallery

This article introduces Alibaba Cloud's PAI Model Gallery, detailing the DeepSeek-V3 and DeepSeek‑R1 large language models, their architectures and parameters, and provides a step‑by‑step guide for one‑click deployment of these models and their distilled variants using vLLM or BladeLLM.

AI inferenceAlibaba CloudDeepSeek
0 likes · 6 min read
Deploy DeepSeek-V3 and R1 Models with One-Click on Alibaba Cloud PAI Model Gallery
CSS Magic
CSS Magic
Jan 31, 2025 · Artificial Intelligence

Cursor vs. Windsurf vs. GitHub Copilot: Hands‑On Comparison of Three AI Code Editors

The article conducts a practical, step‑by‑step evaluation of Cursor, Windsurf, and GitHub Copilot’s multi‑file editing capabilities using a simple web‑chat bot, revealing that Cursor completes all required UI, storage, and application changes in a single interaction, while the others need two rounds, with Copilot showing notable improvement on a retest.

AI code editorCursorGitHub Copilot
0 likes · 9 min read
Cursor vs. Windsurf vs. GitHub Copilot: Hands‑On Comparison of Three AI Code Editors
DataFunSummit
DataFunSummit
Jan 30, 2025 · Databases

Mature Practices for Building Risk‑Control Knowledge Graphs on NebulaGraph and Leveraging Large Language Models

This article explains how NebulaGraph’s large‑scale graph database can be used to construct real‑time risk‑control knowledge graphs, describes practical applications such as community detection and path analysis, and explores how large language models enhance graph queries through Text‑to‑GQL, agents, exploration chains, and semi‑structured knowledge extraction.

AIGraph DatabaseKnowledge Graph
0 likes · 11 min read
Mature Practices for Building Risk‑Control Knowledge Graphs on NebulaGraph and Leveraging Large Language Models
DataFunSummit
DataFunSummit
Jan 29, 2025 · Artificial Intelligence

Tencent OlaChat: An LLM‑Powered Intelligent Business Intelligence Platform – Architecture, Capabilities, and Practice

This article presents Tencent's OlaChat intelligent BI platform, detailing its evolution from traditional to intelligent BI, the impact of large language models on data analytics, the system's multi‑task dialogue, metadata retrieval enhancements, Text2SQL solutions, and real‑world deployment insights.

AIBusiness IntelligenceData Platform
0 likes · 21 min read
Tencent OlaChat: An LLM‑Powered Intelligent Business Intelligence Platform – Architecture, Capabilities, and Practice
Architect
Architect
Jan 27, 2025 · Artificial Intelligence

How to Build a Retrieval‑Augmented Generation QA Assistant for an Open Platform

This article details a step‑by‑step design of a RAG‑based intelligent Q&A assistant for the DeWu Open Platform, covering background, RAG fundamentals, system architecture, technology selection, prompt engineering with CO‑STAR, data preprocessing, vector store setup, LangChain.js implementation, similarity search, runnable chaining, debugging, and future prospects.

AILLMLangChain
0 likes · 28 min read
How to Build a Retrieval‑Augmented Generation QA Assistant for an Open Platform
DataFunTalk
DataFunTalk
Jan 26, 2025 · Artificial Intelligence

58.com’s LingXi Large Language Model Platform: Development, Deployment, and Performance Optimizations

Since the launch of ChatGPT, 58.com has built a Model‑as‑a‑Service platform called LingXi that trains and serves domain‑specific large language models, supports over a hundred internal scenarios with daily inference exceeding ten million calls, and continuously improves performance through quantization, GPU optimization, model miniaturization, and advanced AI applications such as interview assistants, voice agents, and RAG‑enabled agents.

AI applicationsAI platformInference Optimization
0 likes · 9 min read
58.com’s LingXi Large Language Model Platform: Development, Deployment, and Performance Optimizations
DataFunSummit
DataFunSummit
Jan 24, 2025 · Artificial Intelligence

Exploring LLM‑Based Generative Business Intelligence (GenBI): Architecture, Implementation, and Lessons Learned

With the rise of LLM‑based generative AI, this article examines the emerging GenBI (Generative Business Intelligence) paradigm, detailing why self‑serving analytics are needed, the progress of Text‑to‑SQL, an LLM‑driven agent architecture, practical AWS Bedrock implementation, technical choices, lessons learned, and future outlook.

AWS BedrockBusiness IntelligenceGenerative AI
0 likes · 18 min read
Exploring LLM‑Based Generative Business Intelligence (GenBI): Architecture, Implementation, and Lessons Learned
DataFunSummit
DataFunSummit
Jan 21, 2025 · Artificial Intelligence

NVIDIA NeMo Full Stack: End‑to‑End Large Language Model Training, Alignment, and RLHF

This article presents NVIDIA's NeMo technology stack for end‑to‑end large language model (LLM) training, covering the full software pipeline, model alignment with reinforcement learning from human feedback (RLHF), performance optimizations such as model parallelism, FP8, TensorRT‑LLM inference, dynamic load balancing, and future research directions.

GPU optimizationLLMNeMo
0 likes · 24 min read
NVIDIA NeMo Full Stack: End‑to‑End Large Language Model Training, Alignment, and RLHF
ByteFE
ByteFE
Jan 20, 2025 · Artificial Intelligence

Eino: An Open‑Source Golang Framework for Large‑Model Application Development

Eino is a Golang‑based, open‑source framework that streamlines the full devops lifecycle of large‑model applications by providing stable, strongly‑typed components, graph‑based orchestration, built‑in tooling, and extensible architecture to help developers quickly build reliable AI services.

AIFrameworkGolang
0 likes · 13 min read
Eino: An Open‑Source Golang Framework for Large‑Model Application Development
AI Large Model Application Practice
AI Large Model Application Practice
Jan 20, 2025 · Artificial Intelligence

How Embeddings Transform Simple Character Codes into Powerful Vectors for LLMs

This article explains how embeddings convert basic character indices into high‑dimensional vectors, describes their training via gradient descent, introduces the embedding matrix, and shows how these vectors enable modern language models to capture semantic relationships and be reused across tasks.

EmbeddingsLLMmachine learning
0 likes · 8 min read
How Embeddings Transform Simple Character Codes into Powerful Vectors for LLMs
DataFunTalk
DataFunTalk
Jan 18, 2025 · Artificial Intelligence

Understanding Xiaohongshu’s Content Recommendation Mechanisms: NoteLLM and SSD

This article analyzes Xiaohongshu’s content recommendation system by reviewing two official papers, detailing the NoteLLM framework for interest discovery and the Sliding Spectrum Decomposition (SSD) method for diversified recommendations, and explaining their underlying models, loss functions, and experimental results.

DiversityLLMRecommendation Systems
0 likes · 13 min read
Understanding Xiaohongshu’s Content Recommendation Mechanisms: NoteLLM and SSD
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jan 17, 2025 · Artificial Intelligence

Elastic Scaling of Large Language Model Inference on Alibaba Cloud ACK with Knative, ResourcePolicy, and Fluid

This article explains how to reduce inference cost and improve performance for large language models on Alibaba Cloud ACK by using Knative's request‑based autoscaling, custom ResourcePolicy priority scheduling, and Fluid data‑caching to achieve elastic scaling, resource pre‑emption, and faster model loading.

FluidKnativeKubernetes
0 likes · 22 min read
Elastic Scaling of Large Language Model Inference on Alibaba Cloud ACK with Knative, ResourcePolicy, and Fluid
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 15, 2025 · Artificial Intelligence

How Multi-Token Prediction Boosts LLM Training and Inference Efficiency

This article reviews the evolution of Multi‑Token Prediction (MTP) techniques—from early blockwise parallel decoding to Meta's and DeepSeek's implementations—explaining their architectures, training and inference workflows, and the speed‑up gains they offer for large language models.

DeepSeekInference AccelerationLLM
0 likes · 20 min read
How Multi-Token Prediction Boosts LLM Training and Inference Efficiency
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 15, 2025 · Artificial Intelligence

Build an Education‑Focused RAG Solution Using Alibaba PAI

This guide explains how to create a Retrieval‑Augmented Generation (RAG) solution for education on Alibaba PAI, covering knowledge‑base construction with PAI‑Designer, model deployment, connection setup in LangStudio, workflow configuration, online deployment, and a legal‑domain case comparison that highlights RAG's accuracy benefits.

Alibaba PAIEmbeddingKnowledge Base
0 likes · 14 min read
Build an Education‑Focused RAG Solution Using Alibaba PAI
Bilibili Tech
Bilibili Tech
Jan 14, 2025 · Artificial Intelligence

Technical Practices and Productization of Intelligent Advertising Title Generation for Bilibili

We built an LLM‑powered system for Bilibili that automatically creates ad titles from user keywords, employing fluency, style, and quality classifiers, mixed domain data cleaning, and alignment methods such as SFT, DPO and KTO, resulting in a product that now generates about ten percent of daily titles and drives significant ad spend.

AI alignmentAd Title GenerationBilibili
0 likes · 24 min read
Technical Practices and Productization of Intelligent Advertising Title Generation for Bilibili
JD Tech Talk
JD Tech Talk
Jan 14, 2025 · Artificial Intelligence

Advantages and Engineering Implementation of Generative Recommendation Systems Using Large Language Models

This article explains how generative recommendation systems powered by large language models simplify the recommendation pipeline, integrate world knowledge, benefit from scaling laws, and require specialized engineering optimizations such as TensorRT‑LLM deployment, inference acceleration, and hybrid model strategies to achieve low latency and high throughput in real‑world e‑commerce scenarios.

AIInference OptimizationLLM
0 likes · 10 min read
Advantages and Engineering Implementation of Generative Recommendation Systems Using Large Language Models
JD Cloud Developers
JD Cloud Developers
Jan 14, 2025 · Artificial Intelligence

How Generative Recommendation Systems Transform E‑Commerce with LLMs

This article explains how large language models reshape recommendation systems by simplifying pipelines, integrating world knowledge, and leveraging scaling laws, and details the engineering steps for deploying generative recall models—including product encoding, user prompting, model training, TensorRT‑LLM optimization, and continuous performance improvements.

Generative RecommendationLLMRecommendation Systems
0 likes · 13 min read
How Generative Recommendation Systems Transform E‑Commerce with LLMs
AI Large Model Application Practice
AI Large Model Application Practice
Jan 14, 2025 · Artificial Intelligence

Turning Classification Nets into Language Generators: A Step‑by‑Step Guide

This article explains how a simple neural network trained for classification can be adapted to generate natural language by expanding its output layer, encoding characters as numbers, using a sliding‑window context, and recursively predicting the next token, illustrating each step with diagrams and concrete examples.

AILLMlanguage generation
0 likes · 10 min read
Turning Classification Nets into Language Generators: A Step‑by‑Step Guide
Java Architecture Diary
Java Architecture Diary
Jan 10, 2025 · Artificial Intelligence

Generate Structured JSON with Ollama LLM Using Java

This guide explains why structured JSON output from LLMs is essential, walks through installing and running Ollama, and provides a complete Java Spring Boot implementation—including POJOs, service code, and best‑practice tips—to retrieve AI‑generated data in a reliable, parsable format.

AIJSONLLM
0 likes · 7 min read
Generate Structured JSON with Ollama LLM Using Java
Tencent Advertising Technology
Tencent Advertising Technology
Jan 9, 2025 · Artificial Intelligence

Applying Large Language Models to Search Advertising: End‑to‑End Generative Recall and System Optimizations

This report details how large language models (LLMs) were integrated into Tencent's search advertising pipeline—from early extraction‑distillation experiments in 2023 to a 2024 end‑to‑end generative recall architecture—showing significant improvements in relevance, diversity, and revenue through knowledge injection, supervised fine‑tuning, constrained beam‑search decoding, and high‑performance inference services.

AIBeam SearchLLM
0 likes · 11 min read
Applying Large Language Models to Search Advertising: End‑to‑End Generative Recall and System Optimizations
Data Thinking Notes
Data Thinking Notes
Jan 7, 2025 · Databases

Unlocking LLM-Powered Text-to-SQL: From Basics to Cutting-Edge Techniques

This article provides a comprehensive overview of LLM-based Text-to-SQL technology, covering its background, evolution, challenges, various LLM-driven methods, benchmark datasets, evaluation metrics, and future research directions to guide researchers and practitioners in advancing natural language interfaces for databases.

DatabaseLLMPrompt Engineering
0 likes · 18 min read
Unlocking LLM-Powered Text-to-SQL: From Basics to Cutting-Edge Techniques
Infra Learning Club
Infra Learning Club
Jan 7, 2025 · Artificial Intelligence

How GitHub Copilot Workspace Made Me Fear Unemployment

The author experiments with GitHub Copilot Workspace to automatically generate a WeChat mini‑program for family library management, documents the prompting process, code generation, bug fixes, UI tweaks, and reflects on the broader impact of AI‑driven development on programmers' future jobs.

AI Code GenerationGitHub CopilotLLM
0 likes · 5 min read
How GitHub Copilot Workspace Made Me Fear Unemployment
DataFunSummit
DataFunSummit
Jan 7, 2025 · Artificial Intelligence

Tencent OlaChat: Intelligent Data Analysis Platform – Research, Architecture, and Capabilities

This article presents the Tencent PCG OlaChat team's research and practice in intelligent data analysis, covering the DIKW model, evolution of BI platforms, the impact of large language models, challenges of third‑generation data products, detailed product features, agent architecture, system design, and related academic publications.

Data AnalysisIntelligent BILLM
0 likes · 19 min read
Tencent OlaChat: Intelligent Data Analysis Platform – Research, Architecture, and Capabilities
DevOps
DevOps
Jan 6, 2025 · Artificial Intelligence

Ten Popular Large Language Model Deployment Engines and Tools: Features, Advantages, and Limitations

This article reviews ten mainstream LLM deployment solutions—including WebLLM, LM Studio, Ollama, vLLM, LightLLM, OpenLLM, HuggingFace TGI, GPT4ALL, llama.cpp, and Triton Inference Server—detailing their technical characteristics, strengths, drawbacks, and example deployment workflows for both personal and enterprise environments.

AI inferenceGPU AccelerationLLM
0 likes · 16 min read
Ten Popular Large Language Model Deployment Engines and Tools: Features, Advantages, and Limitations
DeWu Technology
DeWu Technology
Jan 6, 2025 · Artificial Intelligence

Design and Implementation of a Retrieval‑Augmented Generation (RAG) Answering Assistant for the Dewu Open Platform

The paper describes building a Retrieval‑Augmented Generation assistant for the Dewu Open Platform that leverages GPT‑4o‑mini, OpenAI embeddings, Milvus vector store, and LangChain.js to semantically retrieve API documentation, structure user queries, and generate accurate, JSON‑formatted answers, thereby reducing manual support and hallucinations.

AILLMLangChain
0 likes · 28 min read
Design and Implementation of a Retrieval‑Augmented Generation (RAG) Answering Assistant for the Dewu Open Platform
Fighter's World
Fighter's World
Jan 4, 2025 · Industry Insights

Is Unlimited Digital Labor Arriving? A Deep Dive into Salesforce’s Agentforce 2.0

Salesforce’s Agentforce 2.0 positions AI agents as a limitless digital labor platform, reshaping enterprise software with a new agent‑first model, consumption‑based pricing, and real‑world case studies that illustrate productivity gains, cost reductions, and strategic advantages in today’s AI‑driven market.

AI AgentsAgentforceDigital Labor
0 likes · 19 min read
Is Unlimited Digital Labor Arriving? A Deep Dive into Salesforce’s Agentforce 2.0
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jan 3, 2025 · Cloud Native

How to Enable LLM Traffic Observability with Alibaba Cloud Service Mesh (ASM)

This guide explains how to use Alibaba Cloud Service Mesh (ASM) to add infrastructure‑level observability for large language model (LLM) traffic, covering custom access‑log fields, new Prometheus metrics for token usage, and adding model dimensions to native Istio metrics, with step‑by‑step commands and configuration examples.

ASMKubernetesLLM
0 likes · 14 min read
How to Enable LLM Traffic Observability with Alibaba Cloud Service Mesh (ASM)
AI Large Model Application Practice
AI Large Model Application Practice
Jan 3, 2025 · Artificial Intelligence

How to Build an Orchestrator‑Workers AI Agent Workflow with Pydantic AI

This article explains the Orchestrator‑Workers pattern from Anthropic’s “Build effective agents”, compares it with routing and parallel modes, distinguishes it from Supervisor agents, and provides a step‑by‑step Python implementation using Pydantic AI, including model definitions, prompts, orchestration logic, worker execution, and a test example.

AI AgentsLLMOrchestrator-Workers
0 likes · 9 min read
How to Build an Orchestrator‑Workers AI Agent Workflow with Pydantic AI
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 3, 2025 · Artificial Intelligence

Build an Education‑Focused Retrieval‑Augmented Generation (RAG) Solution with Alibaba PAI

This guide walks you through creating a RAG‑enhanced AI solution for education using Alibaba PAI, covering prerequisite setup, knowledge‑base construction with PAI‑Designer, model deployment, connection configuration, workflow assembly, and a side‑by‑side comparison of RAG versus non‑RAG answers.

AI platformLLMMilvus
0 likes · 16 min read
Build an Education‑Focused Retrieval‑Augmented Generation (RAG) Solution with Alibaba PAI
Infra Learning Club
Infra Learning Club
Jan 2, 2025 · Artificial Intelligence

Three Major LLM Trends in 2025: Ubiquitous Agents, Rising Small Models, and Multimodal Fusion

In 2025, large language models will see three key trends—agents becoming pervasive in daily life and industry, the emergence of efficient small models for edge and specialized tasks, and the integration of multimodal capabilities that combine text, images, and audio to enable more natural human‑machine interaction.

AI trendsLLMagents
0 likes · 4 min read
Three Major LLM Trends in 2025: Ubiquitous Agents, Rising Small Models, and Multimodal Fusion
DataFunSummit
DataFunSummit
Jan 1, 2025 · Artificial Intelligence

Challenges and Evaluation Strategies for LLM Agents in 2024

The article outlines the rapid progress of LLM agents in 2024 while highlighting key difficulties in planning capabilities, evaluation methods, dataset generation, and metric design, and suggests practical combinations and product‑level enhancements to improve efficiency, accuracy, and usability.

AILLMPlanning
0 likes · 3 min read
Challenges and Evaluation Strategies for LLM Agents in 2024
ByteFE
ByteFE
Dec 31, 2024 · Artificial Intelligence

In‑Depth Review of Cursor: AI‑Powered Coding Assistant, Capabilities, Use Cases, and Limitations

This article evaluates the Cursor AI coding assistant, describing its context‑aware indexing, Composer panel, and code‑generation features, while outlining practical scenarios such as Q&A, test creation, language conversion, and prototype development, and discussing its inherent randomness, domain‑knowledge gaps, and best‑practice recommendations for developers.

AI coding assistantLLMcode generation
0 likes · 27 min read
In‑Depth Review of Cursor: AI‑Powered Coding Assistant, Capabilities, Use Cases, and Limitations
ZhongAn Tech Team
ZhongAn Tech Team
Dec 28, 2024 · Artificial Intelligence

Weekly AI Digest Issue 8: OpenAI Robotics, ModernBERT Upgrade, Spatial Cognition, LLM Agent Evolution, and GNN‑LLM Fusion

This issue surveys recent AI developments, covering OpenAI's renewed robot program, the ModernBERT encoder upgrade, spatial reasoning advances in multimodal models, automated environment generation for LLM agents, and a novel GNN‑LLM approach for label‑free node classification.

Artificial IntelligenceBERTLLM
0 likes · 10 min read
Weekly AI Digest Issue 8: OpenAI Robotics, ModernBERT Upgrade, Spatial Cognition, LLM Agent Evolution, and GNN‑LLM Fusion
DataFunTalk
DataFunTalk
Dec 28, 2024 · Big Data

Next‑Generation Data Analysis Platform: Integrating Chat BI and Headless BI

This article examines the current challenges of enterprise data analysis platforms, outlines three traditional analysis modes, and presents a next‑generation solution that combines Headless BI’s semantic modeling with Chat BI’s large‑language‑model interaction to deliver a more efficient, secure, and user‑friendly analytics experience.

ChatBIData AnalysisDataGovernance
0 likes · 15 min read
Next‑Generation Data Analysis Platform: Integrating Chat BI and Headless BI
Volcano Engine Developer Services
Volcano Engine Developer Services
Dec 26, 2024 · Artificial Intelligence

How LLMs Can Auto-Generate Unit Tests: Insights from ByteDance’s QCon Talk

This article summarizes ByteDance’s quality‑efficiency expert Zhao Liang’s QCon presentation on using large language models to automatically generate unit tests, covering pain points, goals, data‑quality engineering, model‑analysis fusion, architecture, evaluation metrics, and future plans for a production‑grade testing tool.

AILLMTest Generation
0 likes · 26 min read
How LLMs Can Auto-Generate Unit Tests: Insights from ByteDance’s QCon Talk
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Dec 24, 2024 · Artificial Intelligence

Build a Medical RAG Solution with Alibaba PAI: Step-by-Step Guide

Learn how to create a Retrieval‑Augmented Generation (RAG) system for medical applications using Alibaba's PAI platform, covering knowledge‑base construction with PAI‑Designer, template setup in PAI‑LangStudio, deployment of LLM and embedding models, vector database integration, and end‑to‑end workflow configuration.

EmbeddingLLMMilvus
0 likes · 18 min read
Build a Medical RAG Solution with Alibaba PAI: Step-by-Step Guide
NewBeeNLP
NewBeeNLP
Dec 23, 2024 · Artificial Intelligence

What’s New in Qwen2.5? A Deep Dive into the Latest LLM Advances

The Qwen2.5 Technical Report introduces a new series of large language models with up to 72 B parameters, expanded pre‑training data to 18 trillion tokens, advanced supervised fine‑tuning and reinforcement learning pipelines, and demonstrates strong performance across comprehension, reasoning, coding, and long‑context tasks.

LLMLarge Language ModelQwen2.5
0 likes · 5 min read
What’s New in Qwen2.5? A Deep Dive into the Latest LLM Advances
DataFunSummit
DataFunSummit
Dec 22, 2024 · Artificial Intelligence

From Concept to Deployment: The Evolution of 1688’s AI Purchasing Assistant “Yuanbao”

This article chronicles the development of 1688’s AI buyer assistant “Yuanbao”, detailing why an e‑commerce AI assistant is needed, its functional design, MVP constraints, the shift to a data‑driven 2.0 version, future prospects, and a Q&A, providing practical insights for AI product rollout in B‑to‑C platforms.

AILLMagent
0 likes · 24 min read
From Concept to Deployment: The Evolution of 1688’s AI Purchasing Assistant “Yuanbao”
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 18, 2024 · Artificial Intelligence

How STAR Enables Training‑Free Recommendations with Large Language Models

The article reviews the STAR framework, a training‑free recommendation approach that leverages large language model embeddings and collaborative co‑occurrence scores to retrieve and rank items, and evaluates its performance, hyper‑parameter effects, and ablation studies against existing LLM‑based recommender methods.

Artificial IntelligenceLLMRecommendation Systems
0 likes · 10 min read
How STAR Enables Training‑Free Recommendations with Large Language Models
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 17, 2024 · Frontend Development

Choosing the Best LangChain Text Splitter for Frontend LLM Apps

This article compares five LangChain text splitters—CharacterTextSplitter, RecursiveCharacterTextSplitter, TokenTextSplitter, MarkdownTextSplitter, and LatexTextSplitter—by examining their principles, pros and cons, and ideal use cases, helping developers select the most suitable splitter for their frontend large‑model applications.

Frontend DevelopmentJavaScriptLLM
0 likes · 10 min read
Choosing the Best LangChain Text Splitter for Frontend LLM Apps
Huolala Tech
Huolala Tech
Dec 17, 2024 · Artificial Intelligence

How to Secure AI Agents: Privacy Risks, Threats, and Governance Strategies

This article examines the rapid growth of AI agents, outlines typical privacy and security challenges such as data leakage, model attacks, and prompt injection, and proposes comprehensive governance and technical measures to mitigate these risks in enterprise deployments.

AI AgentsGovernanceLLM
0 likes · 22 min read
How to Secure AI Agents: Privacy Risks, Threats, and Governance Strategies
Huolala Safety Emergency Response Center
Huolala Safety Emergency Response Center
Dec 17, 2024 · Information Security

How Secure Are AI Agents? Risks, Attacks, and Governance Strategies

This article examines the rapid growth of AI agents, outlines their core components and classifications, analyzes a wide range of privacy and security threats—including data leakage, prompt injection, jailbreak, backdoor, hallucination, and memory attacks—and proposes practical governance measures to mitigate these risks.

AI AgentsGovernanceLLM
0 likes · 25 min read
How Secure Are AI Agents? Risks, Attacks, and Governance Strategies
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 16, 2024 · Artificial Intelligence

What Do Leading Open‑Source LLMs Do After Pretraining? A Deep Dive into Post‑Training Strategies

This article surveys the post‑training pipelines of major open‑source large language models released this year, detailing their alignment algorithms, data synthesis, reward modeling, DPO/GRPO variants, long‑context handling, tool use, and model‑averaging techniques, and highlights emerging trends such as data‑centric pipelines and iterative weak‑to‑strong alignment.

AI researchLLMalignment
0 likes · 99 min read
What Do Leading Open‑Source LLMs Do After Pretraining? A Deep Dive into Post‑Training Strategies
ZhongAn Tech Team
ZhongAn Tech Team
Dec 15, 2024 · Artificial Intelligence

AI Weekly Digest Issue 6: OpenAI’s AI Christmas Season, LeCun’s AGI Forecast, Chinese Text‑to‑Image Breakthrough, and EchoMimic V2

This issue reviews OpenAI’s twelve‑day product launch, LeCun’s surprising AGI timeline, a new Chinese text‑to‑image capability from ByteDance’s Doubao, and the open‑source EchoMimic V2 digital‑human system, highlighting trends, technical details, and industry reactions across the AI landscape.

Artificial IntelligenceChinese Text GenerationEchoMimic
0 likes · 13 min read
AI Weekly Digest Issue 6: OpenAI’s AI Christmas Season, LeCun’s AGI Forecast, Chinese Text‑to‑Image Breakthrough, and EchoMimic V2
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 15, 2024 · Artificial Intelligence

What Are the Best Practices for Retrieval‑Augmented Generation (RAG)?

This comprehensive study evaluates various components of Retrieval‑Augmented Generation pipelines—including query classification, chunking, embedding models, vector databases, retrieval, re‑ranking, summarization, and generator fine‑tuning—identifies optimal configurations, and proposes best‑practice guidelines for both performance‑maximizing and efficiency‑balanced RAG systems.

Best PracticesLLMRAG
0 likes · 17 min read
What Are the Best Practices for Retrieval‑Augmented Generation (RAG)?
Fighter's World
Fighter's World
Dec 14, 2024 · Industry Insights

Sequoia’s 2025 AI Outlook: From Hype to Real‑World Value

Sequoia Capital’s 2025 AI outlook argues that the industry is shifting from early excitement and massive spending to a phase focused on differentiated large‑model providers, AI‑search as a killer app, and a more disciplined, ROI‑driven investment climate.

2025 predictionsAIAI Search
0 likes · 16 min read
Sequoia’s 2025 AI Outlook: From Hype to Real‑World Value
DevOps
DevOps
Dec 12, 2024 · Artificial Intelligence

The Future of Large Language Models: From Consumer Q&A to Agentic Workflows

Andrew Ng highlights that large language models are shifting from optimizing simple question‑answering for consumers to supporting complex agentic workflows, including tool usage, computer interaction, and multi‑agent collaboration, signaling a major evolution in AI capabilities.

AI AgentsAI trendsAnthropic
0 likes · 8 min read
The Future of Large Language Models: From Consumer Q&A to Agentic Workflows
AI Large Model Application Practice
AI Large Model Application Practice
Dec 12, 2024 · Artificial Intelligence

Mastering AutoGen: Build Multi‑Agent LLM Applications in Minutes

AutoGen, Microsoft’s advanced multi‑agent framework, lets developers quickly assemble collaborative LLM agents—supporting chat, tool use, and hierarchical group chats—through concise Python code, with examples ranging from simple two‑agent dialogues to complex three‑agent reporting pipelines, while outlining its strengths, limitations, and upcoming v0.4 enhancements.

AIAutoGenFramework
0 likes · 9 min read
Mastering AutoGen: Build Multi‑Agent LLM Applications in Minutes
Airbnb Technology Team
Airbnb Technology Team
Dec 12, 2024 · Artificial Intelligence

Airbnb Automation Platform v2: Enabling LLM‑Driven Conversational AI

Airbnb’s Automation Platform v2 replaces the rigid, workflow‑driven architecture of v1 with an LLM‑centric design that orchestrates context gathering, chain‑of‑thought reasoning, tool execution, and guardrails, enabling more natural, scalable, and safe conversational AI while preserving the reliability of traditional workflows.

AI ArchitectureAirbnbConversational AI
0 likes · 11 min read
Airbnb Automation Platform v2: Enabling LLM‑Driven Conversational AI
37 Interactive Technology Team
37 Interactive Technology Team
Dec 9, 2024 · Artificial Intelligence

Optimizing Request Concurrency for LLM Workflows: Rationale, Implementation, and Results

By breaking iterable inputs into parallel LLM calls and batching 20 items across three languages within Dify’s platform limits, the workflow achieves 43‑64% average runtime reductions and markedly higher success rates, demonstrating that request‑level concurrency dramatically improves throughput for large‑scale translation tasks.

CozeDifyLLM
0 likes · 6 min read
Optimizing Request Concurrency for LLM Workflows: Rationale, Implementation, and Results