Tagged articles

2079 articles

Page 16 of 21

Feb 17, 2025 · Artificial Intelligence

How to Deploy the Full-Feature DeepSeek LLM Locally and on Alibaba Cloud

This guide walks you through preparing the environment, installing Docker, cloning the DeepSeek repository, running the model with Docker or Ollama for quick start, using the enterprise API, and deploying the same model on Alibaba Cloud's free Bailei service within minutes.

AIAlibaba CloudDeepSeek

0 likes · 6 min read

How to Deploy the Full-Feature DeepSeek LLM Locally and on Alibaba Cloud

AI Large Model Application Practice

Feb 17, 2025 · Artificial Intelligence

Mastering Structured Output for DeepSeek‑R1 with LangChain, LangGraph, and ReAct Agents

DeepSeek‑R1 excels at deep reasoning but lacks native structured output; this guide explains why structured output matters, outlines common API‑level techniques, and provides three practical solutions—using an auxiliary model with a LangChain chain, a LangGraph workflow, and a ReAct agent—complete with code snippets and JSON‑mode tips.

DeepSeekLLMLangChain

0 likes · 12 min read

Mastering Structured Output for DeepSeek‑R1 with LangChain, LangGraph, and ReAct Agents

Code Mala Tang

Feb 16, 2025 · Artificial Intelligence

17 Proven Prompt Engineering Techniques to Master LLM Interactions

This article presents 17 practical prompt‑engineering strategies—ranging from zero‑shot and few‑shot prompting to role, style, and chain‑of‑thought methods—explaining their usage, ideal scenarios, and concrete examples to help you obtain higher‑quality responses from large language models.

Artificial IntelligenceChatGPTLLM

0 likes · 14 min read

17 Proven Prompt Engineering Techniques to Master LLM Interactions

Bighead's Algorithm Notes

Feb 15, 2025 · Artificial Intelligence

FinRL‑DeepSeek: How Integrating DeepSeek with RL Improves Portfolio Returns (Code Open‑Source)

This article reviews a new risk‑sensitive trading agent that combines reinforcement learning with large language models to extract stock recommendations and news‑based risk scores, describes the extended CVaR‑PPO algorithm, presents extensive experiments on the FNSPID dataset, and discusses the resulting performance gains and future work.

Algorithmic TradingCVaRDeepSeek

0 likes · 10 min read

FinRL‑DeepSeek: How Integrating DeepSeek with RL Improves Portfolio Returns (Code Open‑Source)

Alibaba Cloud Developer

Feb 14, 2025 · Artificial Intelligence

Unlock Faster LLM Inference: Full Stack of Chips, Frameworks & Services

The article examines the end‑to‑end architecture for large‑model inference, detailing seven layers—from chip hardware and programming toolkits to deep‑learning frameworks, inference accelerators, model providers, compute platforms, application orchestration, and traffic management—highlighting key vendors, open‑source projects, and performance‑optimizing techniques.

AI hardwareLLMOpen-source

0 likes · 12 min read

Unlock Faster LLM Inference: Full Stack of Chips, Frameworks & Services

AI Large Model Application Practice

Feb 14, 2025 · Artificial Intelligence

Why Sub‑word Tokenizers Power Modern LLMs: From Characters to Tokens

This article explains how language models evolved from character‑level embeddings to word‑level and finally to sub‑word tokenizers, highlighting the efficiency, vocabulary coverage, and practical engineering challenges of sub‑word segmentation in modern AI systems.

AI fundamentalsLLMSubword Tokenization

0 likes · 8 min read

Why Sub‑word Tokenizers Power Modern LLMs: From Characters to Tokens

JD Tech

Feb 14, 2025 · Artificial Intelligence

JD Merchant Intelligent Assistant – Multi‑Agent System Architecture, Planning, and Evaluation

JD’s Merchant Intelligent Assistant leverages a large‑language‑model‑based multi‑agent architecture to provide 24/7 e‑commerce support, detailing its evolution, planning techniques, online inference, evaluation methods, sample generation, and practical insights for scalable AI‑driven operations.

E-commerce AILLMPlanning

0 likes · 22 min read

Architect

Feb 13, 2025 · Artificial Intelligence

How to Build a Mini ChatGPT on a Single GPU with MiniMind

This article provides a comprehensive, step‑by‑step guide to training and fine‑tuning a miniature large‑language model called MiniMind, covering lightweight model design, open‑source training pipelines, required datasets, tokenizer options, and deployment via a web UI, all using PyTorch on modest hardware.

AILLMMiniMind

0 likes · 11 min read

How to Build a Mini ChatGPT on a Single GPU with MiniMind

Alibaba Cloud Infrastructure

Feb 13, 2025 · Cloud Computing

Deploy DeepSeek‑R1 LLM on Alibaba Cloud ACK One with ACS GPU in Minutes

This guide walks you through deploying the DeepSeek‑R1 large‑language‑model inference service on Alibaba Cloud ACK One registered clusters using ACS GPU compute, covering model preparation, OSS storage setup, PersistentVolume configuration, arena‑based service deployment, and verification steps with concrete commands and parameters.

ACK OneACS GPUDeepSeek

0 likes · 14 min read

Deploy DeepSeek‑R1 LLM on Alibaba Cloud ACK One with ACS GPU in Minutes

Alibaba Cloud Infrastructure

Feb 13, 2025 · Artificial Intelligence

Deploying DeepSeek‑R1 671B Distributed Inference Service on Alibaba Cloud ACK with vLLM and Dify

This article explains how to quickly deploy the full‑parameter DeepSeek‑R1 671B model in a multi‑node GPU‑enabled Kubernetes cluster on Alibaba Cloud ACK, covering prerequisites, model parallelism, vLLM‑Ray distributed deployment, service verification, and integration with Dify to build a private AI Q&A assistant.

DeepSeekDifyDistributed Deployment

0 likes · 12 min read

Deploying DeepSeek‑R1 671B Distributed Inference Service on Alibaba Cloud ACK with vLLM and Dify

JD Tech Talk

Feb 13, 2025 · Artificial Intelligence

DeepSeek R1: Concept Overview, Training Principles, and Practical Implementations

This article introduces the DeepSeek family of models, explains the concepts of online search and deep reasoning, details the two‑phase training pipeline with data augmentation and reinforcement learning, and showcases practical experiments and deployment examples for the R1 and distilled variants.

DeepSeekKnowledge DistillationLLM

0 likes · 10 min read

DeepSeek R1: Concept Overview, Training Principles, and Practical Implementations

Baobao Algorithm Notes

Feb 13, 2025 · Artificial Intelligence

How to Build and Improve Reasoning LLMs: Methods, Trade‑offs, and DeepSeek Insights

This article explains what reasoning language models are, when they are needed, and reviews four main techniques— inference‑time scaling, pure reinforcement learning, combined SFT + RL, and distillation—illustrated with DeepSeek‑R1’s development, cost analysis, and low‑budget alternatives.

AI researchDeepSeekInference Scaling

0 likes · 27 min read

How to Build and Improve Reasoning LLMs: Methods, Trade‑offs, and DeepSeek Insights

Baobao Algorithm Notes

Feb 12, 2025 · Artificial Intelligence

How X‑R1 Triggers Aha Moments in Low‑Cost RL Training of 0.5B LLMs

The X‑R1 open‑source framework demonstrates that a 0.5B language model can achieve rapid reasoning improvements and observable "Aha Moments" using reinforcement learning on a modest 4‑GPU setup, detailing its design, performance metrics, installation steps, and future roadmap.

AILLMOpen Source

0 likes · 6 min read

How X‑R1 Triggers Aha Moments in Low‑Cost RL Training of 0.5B LLMs

vivo Internet Technology

Feb 12, 2025 · Artificial Intelligence

Bidirectional Optimization of NLLB-200 and ChatGPT for Low-Resource Language Translation

The paper proposes a bidirectional optimization framework that fine‑tunes the low‑resource NLLB‑200 translation model with LoRA using data generated by ChatGPT, while also translating low‑resource prompts with NLLB before feeding them to LLMs, thereby improving multilingual translation quality yet requiring careful validation of noisy synthetic data.

LLMLoRANLLB

0 likes · 28 min read

Bidirectional Optimization of NLLB-200 and ChatGPT for Low-Resource Language Translation

Alibaba Cloud Infrastructure

Feb 12, 2025 · Artificial Intelligence

Deploying DeepSeek‑R1 Distilled Qwen‑32B‑FP8 Model on Alibaba Cloud GPU Instances with Docker and OpenWebUI

This guide explains how to prepare an Alibaba Cloud GPU instance, install Docker and NVIDIA tools, pull or build a container image, and run the FP8‑quantized DeepSeek‑R1‑Distill‑Qwen‑32B model using vLLM and OpenWebUI for both offline and online inference.

DeepSeekFP8 quantizationGPU

0 likes · 18 min read

Deploying DeepSeek‑R1 Distilled Qwen‑32B‑FP8 Model on Alibaba Cloud GPU Instances with Docker and OpenWebUI

DataFunSummit

Feb 12, 2025 · Artificial Intelligence

Didi's ChatBI: Evolution, Exploration, and Future of AI‑Powered Business Intelligence

This article details Didi's journey since early 2023 in building ChatBI, covering the evolution of BI platforms, the technical advances behind intelligent BI such as LLM‑driven NL2SQL, two main product paths, practical implementations, key challenges, and future directions for AI‑enhanced data analysis.

AIBusiness IntelligenceChatBI

0 likes · 12 min read

Didi's ChatBI: Evolution, Exploration, and Future of AI‑Powered Business Intelligence

JD Retail Technology

Feb 12, 2025 · Artificial Intelligence

Accelerating Generative Recommendation with NVIDIA TensorRT‑LLM in JD Advertising

JD Advertising accelerates its generative‑recall recommendation system by integrating NVIDIA TensorRT‑LLM, which simplifies the pipeline, injects LLM knowledge, scales to billions of parameters, and delivers over five‑fold throughput gains, one‑fifth the cost, and significant CTR improvements in both recommendation and search.

Inference OptimizationLLMRecommendation Systems

0 likes · 13 min read

Accelerating Generative Recommendation with NVIDIA TensorRT‑LLM in JD Advertising

Architect's Alchemy Furnace

Feb 11, 2025 · Artificial Intelligence

How to Build a High‑Performance Local Enterprise Knowledge Base with AI

This article explains how to design and implement an on‑premise enterprise knowledge base by covering data preprocessing, vector database selection, LLM integration, system architecture, security, deployment, testing, and cost‑control, providing practical code snippets and best‑practice recommendations.

AIData ProcessingLLM

0 likes · 22 min read

How to Build a High‑Performance Local Enterprise Knowledge Base with AI

Infra Learning Club

Feb 11, 2025 · Artificial Intelligence

How to Run DeepSeek R1 Locally and Build a RAG System with Ollama and LangChain

This guide walks you through installing Ollama, pulling the open‑source DeepSeek R1 model, and using LangChain and Streamlit to create a locally hosted Retrieval‑Augmented Generation (RAG) system that can answer questions from uploaded PDFs without any cloud API.

DeepSeekLLMLangChain

0 likes · 6 min read

How to Run DeepSeek R1 Locally and Build a RAG System with Ollama and LangChain

Ops Development & AI Practice

Feb 10, 2025 · Artificial Intelligence

Mastering LLM Output: How Temperature, Top‑K, Top‑P & Max Tokens Shape AI Text

This article explains how the key LLM parameters—Temperature, Top‑K, Top‑P, and MaxOutputTokens—affect randomness, creativity, candidate selection, and output length, and provides practical guidance on tuning them for different AI text generation tasks.

AI GenerationLLMTemperature

0 likes · 7 min read

Mastering LLM Output: How Temperature, Top‑K, Top‑P & Max Tokens Shape AI Text

Architect

Feb 10, 2025 · Artificial Intelligence

Evolution of DeepSeek Mixture‑of‑Experts (MoE) Architecture from V1 to V3

This article reviews the development of DeepSeek's Mixture-of-Experts (MoE) models, tracing their evolution from the original DeepSeekMoE V1 through V2 to V3, detailing architectural innovations such as fine‑grained expert segmentation, shared‑expert isolation, load‑balancing losses, device‑limited routing, and the shift from softmax to sigmoid gating.

DeepSeekLLMMixture of Experts

0 likes · 21 min read

Evolution of DeepSeek Mixture‑of‑Experts (MoE) Architecture from V1 to V3

Alibaba Cloud Infrastructure

Feb 10, 2025 · Artificial Intelligence

Hybrid Cloud Elastic LLM Inference Solution with ACK Edge and KServe

This article presents a hybrid‑cloud solution that uses ACK Edge and KServe to dynamically allocate on‑premise and cloud GPU resources for large‑language‑model inference, addressing tidal traffic patterns, reducing costs, and ensuring high availability through elastic scaling and custom scheduling policies.

ACK@EdgeAuto ScalingKServe

0 likes · 13 min read

Hybrid Cloud Elastic LLM Inference Solution with ACK Edge and KServe

JD Retail Technology

Feb 10, 2025 · Artificial Intelligence

JD Merchant Intelligent Assistant: Multi‑Agent Architecture and Technical Exploration

The JD Merchant Intelligent Assistant employs a large‑language‑model‑driven multi‑agent architecture with dynamic ReAct planning, enabling merchants to query and execute store operations in under a second with over 90 % decision accuracy, while reducing inference cost, hallucinations, and engineering effort across diverse e‑commerce tasks.

AILLMReAct

0 likes · 25 min read

JD Merchant Intelligent Assistant: Multi‑Agent Architecture and Technical Exploration

JD Cloud Developers

Feb 10, 2025 · Artificial Intelligence

How to Deploy DeepSeek LLM Locally on JD Cloud GPU with Ollama and Chatbox

Learn step‑by‑step how to prepare a JD Cloud GPU instance, install GPU drivers, deploy Ollama, run DeepSeek‑R1 models, configure graphical clients like Chatbox on Windows and macOS, and optionally feed local data using AnythingLLM to build an offline knowledge base.

AnythingLLMChatboxDeepSeek

0 likes · 19 min read

How to Deploy DeepSeek LLM Locally on JD Cloud GPU with Ollama and Chatbox

Big Data Technology Architecture

Feb 9, 2025 · Artificial Intelligence

Reproducing Deepseek RI Reasoning Ability with GRPO on Qwen2.5‑7B in Colab

This article explains how to replicate Deepseek RI's slow‑thinking inference using the GRPO reinforcement‑learning algorithm on the Qwen2.5‑7B model in a free Colab notebook, covering the underlying COT concept, reward‑function design, data preparation, training configuration, and observed results.

GRPOLLMPython

0 likes · 14 min read

Reproducing Deepseek RI Reasoning Ability with GRPO on Qwen2.5‑7B in Colab

Top Architect

Feb 9, 2025 · Artificial Intelligence

DeepSeek‑R1: Training Pipeline, Reinforcement‑Learning Techniques, and Experimental Results

The article reviews DeepSeek‑R1’s training methodology—including cold‑start data collection, multi‑stage RL fine‑tuning, SFT data generation, and model distillation—highlights its performance comparable to OpenAI‑o1‑1217, and discusses key contributions, reward design, successful experiments, and failed attempts.

AI researchDeepSeekLLM

0 likes · 12 min read

DeepSeek‑R1: Training Pipeline, Reinforcement‑Learning Techniques, and Experimental Results

Infra Learning Club

Feb 8, 2025 · Artificial Intelligence

Why People Pay for DeepSeek Installation Packages (and How to Install It Yourself)

The article explains that DeepSeek is an open‑source LLM that many sellers monetize by offering paid installation packages, outlines the model lineup and size options, and provides a step‑by‑step guide to install and run DeepSeek locally with Ollama and Open WebUI.

AI modelsDeepSeekLLM

0 likes · 7 min read

Why People Pay for DeepSeek Installation Packages (and How to Install It Yourself)

Open Source Tech Hub

Feb 8, 2025 · Artificial Intelligence

How to Integrate DeepSeek’s Open‑Source LLM into Your Application

This guide introduces DeepSeek, outlines its cutting‑edge open‑source LLMs, and provides step‑by‑step instructions for accessing the admin backend, adding and configuring DeepSeek models, setting API endpoints and keys, and enabling frontend access.

APIDeepSeekLLM

0 likes · 3 min read

How to Integrate DeepSeek’s Open‑Source LLM into Your Application

Infra Learning Club

Feb 8, 2025 · Artificial Intelligence

Multi-Agent LLMs Explained: Benefits, Workflows, and Leading Frameworks

The article surveys the rise of multi‑agent LLM systems, detailing how specialized agents collaborate on tasks such as travel planning, outlining their workflow, comparing them with single‑agent models, listing prominent frameworks, and discussing current challenges and research citations.

AIAgent CollaborationAutoGen

0 likes · 13 min read

Multi-Agent LLMs Explained: Benefits, Workflows, and Leading Frameworks

Alibaba Cloud Infrastructure

Feb 8, 2025 · Artificial Intelligence

Deploying a Production‑Ready DeepSeek‑R1 Inference Service on Alibaba Cloud ACK with KServe

This guide explains how to deploy a production‑ready DeepSeek‑R1 inference service on Alibaba Cloud ACK using KServe, covering model preparation, storage configuration, service deployment, observability, autoscaling, model acceleration, gray‑release and GPU‑shared inference.

DeepSeekGPUKServe

0 likes · 13 min read

Deploying a Production‑Ready DeepSeek‑R1 Inference Service on Alibaba Cloud ACK with KServe

Full-Stack DevOps & Kubernetes

Feb 8, 2025 · Artificial Intelligence

Deploy DeepSeek‑R1 on Tencent Cloud with Ollama: A Complete Step‑by‑Step Guide

This guide walks you through preparing a Tencent Cloud account, creating a Cloud Studio workspace, installing Ollama, downloading and running the DeepSeek‑R1 large language model, interacting via terminal or API, and managing resources and model versions.

AI model deploymentAPIDeepSeek

0 likes · 8 min read

Deploy DeepSeek‑R1 on Tencent Cloud with Ollama: A Complete Step‑by‑Step Guide

MaGe Linux Operations

Feb 7, 2025 · Artificial Intelligence

How to Deploy DeepSeek R1 Locally: A Step‑by‑Step AI Model Guide

This article walks you through everything you need to know about DeepSeek R1—including its different model sizes, hardware requirements, installation tools like Ollama, LM Studio and Docker, and how to set up a visual interface with Open‑WebUI or Dify—for offline, private, and cost‑effective AI inference.

AIDeepSeekDocker

0 likes · 15 min read

How to Deploy DeepSeek R1 Locally: A Step‑by‑Step AI Model Guide

iKang Technology Team

Feb 7, 2025 · Artificial Intelligence

Retrieval‑Augmented Generation (RAG) with LangChain: Concepts and Python Implementation

Retrieval‑Augmented Generation (RAG) using LangChain lets developers enhance large language models by embedding user queries, fetching relevant documents from a vector store, inserting the context into a prompt template, and generating concise, source‑grounded answers, offering low‑cost, up‑to‑date knowledge while reducing hallucinations and fine‑tuning expenses.

LLMLangChainRAG

0 likes · 10 min read

Retrieval‑Augmented Generation (RAG) with LangChain: Concepts and Python Implementation

Top Architect

Feb 6, 2025 · Artificial Intelligence

Deploying DeepSeek R1 671B Model Locally with Ollama: Quantization, Hardware Requirements, and Step‑by‑Step Guide

This article provides a comprehensive tutorial on locally deploying the full‑size DeepSeek R1 671B model using Ollama, covering dynamic quantization options, hardware specifications, detailed installation commands, configuration files, performance observations, and practical recommendations for consumer‑grade systems.

AIDeepSeekGPU

0 likes · 14 min read

Deploying DeepSeek R1 671B Model Locally with Ollama: Quantization, Hardware Requirements, and Step‑by‑Step Guide

Alibaba Cloud Developer

Feb 5, 2025 · Artificial Intelligence

10 Common Prompt Engineering Mistakes and How to Overcome Them

This article lists ten common misconceptions about prompt engineering, explains why each is flawed, and offers practical insights and strategies—such as using the CO‑STAR framework, tailoring prompts to specific models, keeping prompts concise, and continuously testing and refining—to help readers communicate effectively with large language models.

AI misconceptionsLLMlarge language models

0 likes · 10 min read

10 Common Prompt Engineering Mistakes and How to Overcome Them

21CTO

Feb 4, 2025 · Artificial Intelligence

Is DeepSeek the Next Challenger to ChatGPT? A Deep Dive into Its AI Edge

This article explains what DeepSeek is, how its open‑source large language model works, its unique multilingual training, free access, the DeepSeek‑Coder variant, and compares its capabilities and goals with ChatGPT, highlighting strengths, limitations, and market impact.

AI modelsChatGPT comparisonDeepSeek

0 likes · 7 min read

Is DeepSeek the Next Challenger to ChatGPT? A Deep Dive into Its AI Edge

AIWalker

Feb 4, 2025 · Artificial Intelligence

Meta’s Open‑Source MILS Enables LLMs to See and Hear Without Training – SOTA on Images, Video, and Audio

The paper introduces MILS, a training‑free multimodal iterative LLM solver that lets large language models perceive and generate across image, video, and audio domains, achieving new state‑of‑the‑art results without any task‑specific data or fine‑tuning.

AI researchLLMMILS

0 likes · 18 min read

Meta’s Open‑Source MILS Enables LLMs to See and Hear Without Training – SOTA on Images, Video, and Audio

Alibaba Cloud Big Data AI Platform

Feb 1, 2025 · Artificial Intelligence

Deploy DeepSeek-V3 and R1 Models with One-Click on Alibaba Cloud PAI Model Gallery

This article introduces Alibaba Cloud's PAI Model Gallery, detailing the DeepSeek-V3 and DeepSeek‑R1 large language models, their architectures and parameters, and provides a step‑by‑step guide for one‑click deployment of these models and their distilled variants using vLLM or BladeLLM.

AI inferenceAlibaba CloudDeepSeek

0 likes · 6 min read

Deploy DeepSeek-V3 and R1 Models with One-Click on Alibaba Cloud PAI Model Gallery

CSS Magic

Jan 31, 2025 · Artificial Intelligence

Cursor vs. Windsurf vs. GitHub Copilot: Hands‑On Comparison of Three AI Code Editors

The article conducts a practical, step‑by‑step evaluation of Cursor, Windsurf, and GitHub Copilot’s multi‑file editing capabilities using a simple web‑chat bot, revealing that Cursor completes all required UI, storage, and application changes in a single interaction, while the others need two rounds, with Copilot showing notable improvement on a retest.

AI code editorCursorGitHub Copilot

0 likes · 9 min read

Cursor vs. Windsurf vs. GitHub Copilot: Hands‑On Comparison of Three AI Code Editors

DataFunSummit

Jan 30, 2025 · Databases

Mature Practices for Building Risk‑Control Knowledge Graphs on NebulaGraph and Leveraging Large Language Models

This article explains how NebulaGraph’s large‑scale graph database can be used to construct real‑time risk‑control knowledge graphs, describes practical applications such as community detection and path analysis, and explores how large language models enhance graph queries through Text‑to‑GQL, agents, exploration chains, and semi‑structured knowledge extraction.

AIGraph DatabaseKnowledge Graph

0 likes · 11 min read

Mature Practices for Building Risk‑Control Knowledge Graphs on NebulaGraph and Leveraging Large Language Models

DataFunSummit

Jan 29, 2025 · Artificial Intelligence

Tencent OlaChat: An LLM‑Powered Intelligent Business Intelligence Platform – Architecture, Capabilities, and Practice

This article presents Tencent's OlaChat intelligent BI platform, detailing its evolution from traditional to intelligent BI, the impact of large language models on data analytics, the system's multi‑task dialogue, metadata retrieval enhancements, Text2SQL solutions, and real‑world deployment insights.

AIBusiness IntelligenceData Platform

0 likes · 21 min read

Architect

Jan 27, 2025 · Artificial Intelligence

How to Build a Retrieval‑Augmented Generation QA Assistant for an Open Platform

This article details a step‑by‑step design of a RAG‑based intelligent Q&A assistant for the DeWu Open Platform, covering background, RAG fundamentals, system architecture, technology selection, prompt engineering with CO‑STAR, data preprocessing, vector store setup, LangChain.js implementation, similarity search, runnable chaining, debugging, and future prospects.

AILLMLangChain

0 likes · 28 min read

How to Build a Retrieval‑Augmented Generation QA Assistant for an Open Platform

DataFunTalk

Jan 26, 2025 · Artificial Intelligence

58.com’s LingXi Large Language Model Platform: Development, Deployment, and Performance Optimizations

Since the launch of ChatGPT, 58.com has built a Model‑as‑a‑Service platform called LingXi that trains and serves domain‑specific large language models, supports over a hundred internal scenarios with daily inference exceeding ten million calls, and continuously improves performance through quantization, GPU optimization, model miniaturization, and advanced AI applications such as interview assistants, voice agents, and RAG‑enabled agents.

AI applicationsAI platformInference Optimization

0 likes · 9 min read

58.com’s LingXi Large Language Model Platform: Development, Deployment, and Performance Optimizations

DataFunSummit

Jan 24, 2025 · Artificial Intelligence

Exploring LLM‑Based Generative Business Intelligence (GenBI): Architecture, Implementation, and Lessons Learned

With the rise of LLM‑based generative AI, this article examines the emerging GenBI (Generative Business Intelligence) paradigm, detailing why self‑serving analytics are needed, the progress of Text‑to‑SQL, an LLM‑driven agent architecture, practical AWS Bedrock implementation, technical choices, lessons learned, and future outlook.

AWS BedrockBusiness IntelligenceGenerative AI

0 likes · 18 min read

Exploring LLM‑Based Generative Business Intelligence (GenBI): Architecture, Implementation, and Lessons Learned

AI Large Model Application Practice

Jan 23, 2025 · Artificial Intelligence

Mastering Microsoft AutoGen 0.4: Build Async Multi‑Agent Apps from Scratch

This article provides a comprehensive, step‑by‑step guide to Microsoft AutoGen 0.4, explaining its layered architecture, core concepts such as Agent, Runtime, and Agent ID, and demonstrating both a simple Hello‑World multi‑agent example and an AI‑enabled agent with full Python code snippets.

AsyncAutoGenFramework

0 likes · 13 min read

Mastering Microsoft AutoGen 0.4: Build Async Multi‑Agent Apps from Scratch

DataFunSummit

Jan 21, 2025 · Artificial Intelligence

NVIDIA NeMo Full Stack: End‑to‑End Large Language Model Training, Alignment, and RLHF

This article presents NVIDIA's NeMo technology stack for end‑to‑end large language model (LLM) training, covering the full software pipeline, model alignment with reinforcement learning from human feedback (RLHF), performance optimizations such as model parallelism, FP8, TensorRT‑LLM inference, dynamic load balancing, and future research directions.

GPU optimizationLLMNeMo

0 likes · 24 min read

NVIDIA NeMo Full Stack: End‑to‑End Large Language Model Training, Alignment, and RLHF

ByteFE

Jan 20, 2025 · Artificial Intelligence

Eino: An Open‑Source Golang Framework for Large‑Model Application Development

Eino is a Golang‑based, open‑source framework that streamlines the full devops lifecycle of large‑model applications by providing stable, strongly‑typed components, graph‑based orchestration, built‑in tooling, and extensible architecture to help developers quickly build reliable AI services.

AIFrameworkGolang

0 likes · 13 min read

Eino: An Open‑Source Golang Framework for Large‑Model Application Development

AI Large Model Application Practice

Jan 20, 2025 · Artificial Intelligence

How Embeddings Transform Simple Character Codes into Powerful Vectors for LLMs

This article explains how embeddings convert basic character indices into high‑dimensional vectors, describes their training via gradient descent, introduces the embedding matrix, and shows how these vectors enable modern language models to capture semantic relationships and be reused across tasks.

EmbeddingsLLMmachine learning

0 likes · 8 min read

How Embeddings Transform Simple Character Codes into Powerful Vectors for LLMs

DataFunTalk

Jan 18, 2025 · Artificial Intelligence

Understanding Xiaohongshu’s Content Recommendation Mechanisms: NoteLLM and SSD

This article analyzes Xiaohongshu’s content recommendation system by reviewing two official papers, detailing the NoteLLM framework for interest discovery and the Sliding Spectrum Decomposition (SSD) method for diversified recommendations, and explaining their underlying models, loss functions, and experimental results.

DiversityLLMRecommendation Systems

0 likes · 13 min read

Understanding Xiaohongshu’s Content Recommendation Mechanisms: NoteLLM and SSD

Alibaba Cloud Infrastructure

Jan 17, 2025 · Artificial Intelligence

Elastic Scaling of Large Language Model Inference on Alibaba Cloud ACK with Knative, ResourcePolicy, and Fluid

This article explains how to reduce inference cost and improve performance for large language models on Alibaba Cloud ACK by using Knative's request‑based autoscaling, custom ResourcePolicy priority scheduling, and Fluid data‑caching to achieve elastic scaling, resource pre‑emption, and faster model loading.

FluidKnativeKubernetes

0 likes · 22 min read

Elastic Scaling of Large Language Model Inference on Alibaba Cloud ACK with Knative, ResourcePolicy, and Fluid

AI Large Model Application Practice

Jan 16, 2025 · Artificial Intelligence

Boosting AI Agent Accuracy with External Validation and Multi‑Path Optimization

The article explains how AI agents can move beyond single‑turn responses by using two enhanced reflection strategies—external tool validation and multi‑path optimization (LATS)—to iteratively improve output quality, reliability, and applicability in complex, high‑stakes tasks.

AIExternal ValidationLATS

0 likes · 10 min read

Boosting AI Agent Accuracy with External Validation and Multi‑Path Optimization

Baobao Algorithm Notes

Jan 15, 2025 · Artificial Intelligence

How Multi-Token Prediction Boosts LLM Training and Inference Efficiency

This article reviews the evolution of Multi‑Token Prediction (MTP) techniques—from early blockwise parallel decoding to Meta's and DeepSeek's implementations—explaining their architectures, training and inference workflows, and the speed‑up gains they offer for large language models.

DeepSeekInference AccelerationLLM

0 likes · 20 min read

How Multi-Token Prediction Boosts LLM Training and Inference Efficiency

Alibaba Cloud Big Data AI Platform

Jan 15, 2025 · Artificial Intelligence

Build an Education‑Focused RAG Solution Using Alibaba PAI

This guide explains how to create a Retrieval‑Augmented Generation (RAG) solution for education on Alibaba PAI, covering knowledge‑base construction with PAI‑Designer, model deployment, connection setup in LangStudio, workflow configuration, online deployment, and a legal‑domain case comparison that highlights RAG's accuracy benefits.

Alibaba PAIEmbeddingKnowledge Base

0 likes · 14 min read

Build an Education‑Focused RAG Solution Using Alibaba PAI

Bilibili Tech

Jan 14, 2025 · Artificial Intelligence

Technical Practices and Productization of Intelligent Advertising Title Generation for Bilibili

We built an LLM‑powered system for Bilibili that automatically creates ad titles from user keywords, employing fluency, style, and quality classifiers, mixed domain data cleaning, and alignment methods such as SFT, DPO and KTO, resulting in a product that now generates about ten percent of daily titles and drives significant ad spend.

AI alignmentAd Title GenerationBilibili

0 likes · 24 min read

Technical Practices and Productization of Intelligent Advertising Title Generation for Bilibili

JD Tech Talk

Jan 14, 2025 · Artificial Intelligence

Advantages and Engineering Implementation of Generative Recommendation Systems Using Large Language Models

This article explains how generative recommendation systems powered by large language models simplify the recommendation pipeline, integrate world knowledge, benefit from scaling laws, and require specialized engineering optimizations such as TensorRT‑LLM deployment, inference acceleration, and hybrid model strategies to achieve low latency and high throughput in real‑world e‑commerce scenarios.

AIInference OptimizationLLM

0 likes · 10 min read

Advantages and Engineering Implementation of Generative Recommendation Systems Using Large Language Models

JD Cloud Developers

Jan 14, 2025 · Artificial Intelligence

How Generative Recommendation Systems Transform E‑Commerce with LLMs

This article explains how large language models reshape recommendation systems by simplifying pipelines, integrating world knowledge, and leveraging scaling laws, and details the engineering steps for deploying generative recall models—including product encoding, user prompting, model training, TensorRT‑LLM optimization, and continuous performance improvements.

Generative RecommendationLLMRecommendation Systems

0 likes · 13 min read

How Generative Recommendation Systems Transform E‑Commerce with LLMs

AI Large Model Application Practice

Jan 14, 2025 · Artificial Intelligence

Turning Classification Nets into Language Generators: A Step‑by‑Step Guide

This article explains how a simple neural network trained for classification can be adapted to generate natural language by expanding its output layer, encoding characters as numbers, using a sliding‑window context, and recursively predicting the next token, illustrating each step with diagrams and concrete examples.

AILLMlanguage generation

0 likes · 10 min read

Turning Classification Nets into Language Generators: A Step‑by‑Step Guide

Baobao Algorithm Notes

Jan 10, 2025 · Artificial Intelligence

Unlocking Text Classification with Qwen2: Experiments, Tips, and LoRA Fine‑Tuning

This article shares practical experiments and insights on using Qwen2ForSequenceClassification for short‑ and long‑text sentiment tasks, compares it with BERT, outlines improvement strategies such as generative fine‑tuning and LoRA, and provides end‑to‑end code, training details, and evaluation results.

FineTuningLLMLoRA

0 likes · 25 min read

Unlocking Text Classification with Qwen2: Experiments, Tips, and LoRA Fine‑Tuning

Java Architecture Diary

Jan 10, 2025 · Artificial Intelligence

Generate Structured JSON with Ollama LLM Using Java

This guide explains why structured JSON output from LLMs is essential, walks through installing and running Ollama, and provides a complete Java Spring Boot implementation—including POJOs, service code, and best‑practice tips—to retrieve AI‑generated data in a reliable, parsable format.

AIJSONLLM

0 likes · 7 min read

Generate Structured JSON with Ollama LLM Using Java

Tencent Advertising Technology

Jan 9, 2025 · Artificial Intelligence

Applying Large Language Models to Search Advertising: End‑to‑End Generative Recall and System Optimizations

This report details how large language models (LLMs) were integrated into Tencent's search advertising pipeline—from early extraction‑distillation experiments in 2023 to a 2024 end‑to‑end generative recall architecture—showing significant improvements in relevance, diversity, and revenue through knowledge injection, supervised fine‑tuning, constrained beam‑search decoding, and high‑performance inference services.

AIBeam SearchLLM

0 likes · 11 min read

Applying Large Language Models to Search Advertising: End‑to‑End Generative Recall and System Optimizations

Baobao Algorithm Notes

Jan 9, 2025 · Artificial Intelligence

How to Efficiently Deploy and Manage 100 LoRA‑Enhanced LLMs with vLLM

A technical walkthrough shows how to use vLLM to load multiple LoRA adapters for role‑playing LLMs, analyzes the massive GPU and labor costs of naïve deployment, and presents a hosted multi‑LoRA platform as a cost‑effective solution.

AI inferenceLLMLoRA

0 likes · 11 min read

How to Efficiently Deploy and Manage 100 LoRA‑Enhanced LLMs with vLLM

Data Thinking Notes

Jan 7, 2025 · Databases

Unlocking LLM-Powered Text-to-SQL: From Basics to Cutting-Edge Techniques

This article provides a comprehensive overview of LLM-based Text-to-SQL technology, covering its background, evolution, challenges, various LLM-driven methods, benchmark datasets, evaluation metrics, and future research directions to guide researchers and practitioners in advancing natural language interfaces for databases.

DatabaseLLMPrompt Engineering

0 likes · 18 min read

Unlocking LLM-Powered Text-to-SQL: From Basics to Cutting-Edge Techniques

Infra Learning Club

Jan 7, 2025 · Artificial Intelligence

How GitHub Copilot Workspace Made Me Fear Unemployment

The author experiments with GitHub Copilot Workspace to automatically generate a WeChat mini‑program for family library management, documents the prompting process, code generation, bug fixes, UI tweaks, and reflects on the broader impact of AI‑driven development on programmers' future jobs.

AI Code GenerationGitHub CopilotLLM

0 likes · 5 min read

How GitHub Copilot Workspace Made Me Fear Unemployment

DataFunSummit

Jan 7, 2025 · Artificial Intelligence

Tencent OlaChat: Intelligent Data Analysis Platform – Research, Architecture, and Capabilities

This article presents the Tencent PCG OlaChat team's research and practice in intelligent data analysis, covering the DIKW model, evolution of BI platforms, the impact of large language models, challenges of third‑generation data products, detailed product features, agent architecture, system design, and related academic publications.

Data AnalysisIntelligent BILLM

0 likes · 19 min read

Tencent OlaChat: Intelligent Data Analysis Platform – Research, Architecture, and Capabilities

DevOps

Jan 6, 2025 · Artificial Intelligence

Ten Popular Large Language Model Deployment Engines and Tools: Features, Advantages, and Limitations

This article reviews ten mainstream LLM deployment solutions—including WebLLM, LM Studio, Ollama, vLLM, LightLLM, OpenLLM, HuggingFace TGI, GPT4ALL, llama.cpp, and Triton Inference Server—detailing their technical characteristics, strengths, drawbacks, and example deployment workflows for both personal and enterprise environments.

AI inferenceGPU AccelerationLLM

0 likes · 16 min read

Ten Popular Large Language Model Deployment Engines and Tools: Features, Advantages, and Limitations

DeWu Technology

Jan 6, 2025 · Artificial Intelligence

Design and Implementation of a Retrieval‑Augmented Generation (RAG) Answering Assistant for the Dewu Open Platform

The paper describes building a Retrieval‑Augmented Generation assistant for the Dewu Open Platform that leverages GPT‑4o‑mini, OpenAI embeddings, Milvus vector store, and LangChain.js to semantically retrieve API documentation, structure user queries, and generate accurate, JSON‑formatted answers, thereby reducing manual support and hallucinations.

AILLMLangChain

0 likes · 28 min read

Design and Implementation of a Retrieval‑Augmented Generation (RAG) Answering Assistant for the Dewu Open Platform

AI Large Model Application Practice

Jan 6, 2025 · Artificial Intelligence

Boost LLM Agent Performance with the Evaluator‑Optimizer Reflection Loop

This article explains the Evaluator‑Optimizer reflection pattern for LLM agents, shows how it can improve output quality in single‑ or multi‑agent tasks, and provides a step‑by‑step PydanticAI implementation with code examples and practical usage tips.

LLMPydanticAIReflection

0 likes · 9 min read

Boost LLM Agent Performance with the Evaluator‑Optimizer Reflection Loop

Fighter's World

Jan 4, 2025 · Industry Insights

Is Unlimited Digital Labor Arriving? A Deep Dive into Salesforce’s Agentforce 2.0

Salesforce’s Agentforce 2.0 positions AI agents as a limitless digital labor platform, reshaping enterprise software with a new agent‑first model, consumption‑based pricing, and real‑world case studies that illustrate productivity gains, cost reductions, and strategic advantages in today’s AI‑driven market.

AI AgentsAgentforceDigital Labor

0 likes · 19 min read

Is Unlimited Digital Labor Arriving? A Deep Dive into Salesforce’s Agentforce 2.0

Alibaba Cloud Infrastructure

Jan 3, 2025 · Cloud Native

How to Enable LLM Traffic Observability with Alibaba Cloud Service Mesh (ASM)

This guide explains how to use Alibaba Cloud Service Mesh (ASM) to add infrastructure‑level observability for large language model (LLM) traffic, covering custom access‑log fields, new Prometheus metrics for token usage, and adding model dimensions to native Istio metrics, with step‑by‑step commands and configuration examples.

ASMKubernetesLLM

0 likes · 14 min read

How to Enable LLM Traffic Observability with Alibaba Cloud Service Mesh (ASM)

AI Large Model Application Practice

Jan 3, 2025 · Artificial Intelligence

How to Build an Orchestrator‑Workers AI Agent Workflow with Pydantic AI

This article explains the Orchestrator‑Workers pattern from Anthropic’s “Build effective agents”, compares it with routing and parallel modes, distinguishes it from Supervisor agents, and provides a step‑by‑step Python implementation using Pydantic AI, including model definitions, prompts, orchestration logic, worker execution, and a test example.

AI AgentsLLMOrchestrator-Workers

0 likes · 9 min read

How to Build an Orchestrator‑Workers AI Agent Workflow with Pydantic AI

Alibaba Cloud Big Data AI Platform

Jan 3, 2025 · Artificial Intelligence

Build an Education‑Focused Retrieval‑Augmented Generation (RAG) Solution with Alibaba PAI

This guide walks you through creating a RAG‑enhanced AI solution for education using Alibaba PAI, covering prerequisite setup, knowledge‑base construction with PAI‑Designer, model deployment, connection configuration, workflow assembly, and a side‑by‑side comparison of RAG versus non‑RAG answers.

AI platformLLMMilvus

0 likes · 16 min read

Build an Education‑Focused Retrieval‑Augmented Generation (RAG) Solution with Alibaba PAI

Infra Learning Club

Jan 2, 2025 · Artificial Intelligence

Three Major LLM Trends in 2025: Ubiquitous Agents, Rising Small Models, and Multimodal Fusion

In 2025, large language models will see three key trends—agents becoming pervasive in daily life and industry, the emergence of efficient small models for edge and specialized tasks, and the integration of multimodal capabilities that combine text, images, and audio to enable more natural human‑machine interaction.

AI trendsLLMagents

0 likes · 4 min read

Three Major LLM Trends in 2025: Ubiquitous Agents, Rising Small Models, and Multimodal Fusion

DataFunSummit

Jan 1, 2025 · Artificial Intelligence

Challenges and Evaluation Strategies for LLM Agents in 2024

The article outlines the rapid progress of LLM agents in 2024 while highlighting key difficulties in planning capabilities, evaluation methods, dataset generation, and metric design, and suggests practical combinations and product‑level enhancements to improve efficiency, accuracy, and usability.

AILLMPlanning

0 likes · 3 min read

Challenges and Evaluation Strategies for LLM Agents in 2024

ByteFE

Dec 31, 2024 · Artificial Intelligence

In‑Depth Review of Cursor: AI‑Powered Coding Assistant, Capabilities, Use Cases, and Limitations

This article evaluates the Cursor AI coding assistant, describing its context‑aware indexing, Composer panel, and code‑generation features, while outlining practical scenarios such as Q&A, test creation, language conversion, and prototype development, and discussing its inherent randomness, domain‑knowledge gaps, and best‑practice recommendations for developers.

AI coding assistantLLMcode generation

0 likes · 27 min read

In‑Depth Review of Cursor: AI‑Powered Coding Assistant, Capabilities, Use Cases, and Limitations

Alibaba Cloud Observability

Dec 30, 2024 · Operations

How to Quickly Diagnose Error and Performance Issues in Cloud‑Native Applications

This article outlines a comprehensive approach to identifying and resolving both error‑related and slow‑request problems in online systems by leveraging trace data, log correlation, method‑stack analysis, unified entity models, and large‑language‑model assistance to accelerate root‑cause diagnosis.

APMLLMPerformance debugging

0 likes · 12 min read

How to Quickly Diagnose Error and Performance Issues in Cloud‑Native Applications

AI Large Model Application Practice

Dec 30, 2024 · Artificial Intelligence

Implementing LLM Routing and Parallel Agent Workflows with PydanticAI

This tutorial walks through building semantic routing and parallel execution patterns for LLM agents using the lightweight PydanticAI framework, providing step‑by‑step code, example configurations, and practical observations to help developers create flexible AI‑driven workflows.

LLMPydanticAIPython

0 likes · 11 min read

Implementing LLM Routing and Parallel Agent Workflows with PydanticAI

ZhongAn Tech Team

Dec 28, 2024 · Artificial Intelligence

Weekly AI Digest Issue 8: OpenAI Robotics, ModernBERT Upgrade, Spatial Cognition, LLM Agent Evolution, and GNN‑LLM Fusion

This issue surveys recent AI developments, covering OpenAI's renewed robot program, the ModernBERT encoder upgrade, spatial reasoning advances in multimodal models, automated environment generation for LLM agents, and a novel GNN‑LLM approach for label‑free node classification.

Artificial IntelligenceBERTLLM

0 likes · 10 min read

Weekly AI Digest Issue 8: OpenAI Robotics, ModernBERT Upgrade, Spatial Cognition, LLM Agent Evolution, and GNN‑LLM Fusion

DataFunTalk

Dec 28, 2024 · Big Data

Next‑Generation Data Analysis Platform: Integrating Chat BI and Headless BI

This article examines the current challenges of enterprise data analysis platforms, outlines three traditional analysis modes, and presents a next‑generation solution that combines Headless BI’s semantic modeling with Chat BI’s large‑language‑model interaction to deliver a more efficient, secure, and user‑friendly analytics experience.

ChatBIData AnalysisDataGovernance

0 likes · 15 min read

Next‑Generation Data Analysis Platform: Integrating Chat BI and Headless BI

Volcano Engine Developer Services

Dec 26, 2024 · Artificial Intelligence

How LLMs Can Auto-Generate Unit Tests: Insights from ByteDance’s QCon Talk

This article summarizes ByteDance’s quality‑efficiency expert Zhao Liang’s QCon presentation on using large language models to automatically generate unit tests, covering pain points, goals, data‑quality engineering, model‑analysis fusion, architecture, evaluation metrics, and future plans for a production‑grade testing tool.

AILLMTest Generation

0 likes · 26 min read

How LLMs Can Auto-Generate Unit Tests: Insights from ByteDance’s QCon Talk

DevOps

Dec 25, 2024 · Artificial Intelligence

Anthropic’s Agent Development: The Counter‑Intuitive “Less Is More” Principle

Anthropic argues that building effective AI agents should start with simple, enhanced LLMs and only add workflow or autonomous agent complexity when necessary, emphasizing a “Less is More” approach to reduce latency, cost, and debugging difficulty.

AnthropicLLMLess is More

0 likes · 13 min read

Anthropic’s Agent Development: The Counter‑Intuitive “Less Is More” Principle

Alibaba Cloud Native

Dec 24, 2024 · Operations

How to Quickly Diagnose Error and Latency Issues in Cloud‑Native Applications

This article outlines a practical, end‑to‑end approach for identifying and resolving both error‑related and slow‑request problems in online systems by leveraging trace links, correlated logs, entity relationships, and large‑language‑model‑driven analysis to achieve rapid root‑cause isolation.

APMLLMObservability

0 likes · 12 min read

How to Quickly Diagnose Error and Latency Issues in Cloud‑Native Applications

Alibaba Cloud Big Data AI Platform

Dec 24, 2024 · Artificial Intelligence

Build a Medical RAG Solution with Alibaba PAI: Step-by-Step Guide

Learn how to create a Retrieval‑Augmented Generation (RAG) system for medical applications using Alibaba's PAI platform, covering knowledge‑base construction with PAI‑Designer, template setup in PAI‑LangStudio, deployment of LLM and embedding models, vector database integration, and end‑to‑end workflow configuration.

EmbeddingLLMMilvus

0 likes · 18 min read

Build a Medical RAG Solution with Alibaba PAI: Step-by-Step Guide

NewBeeNLP

Dec 23, 2024 · Artificial Intelligence

What’s New in Qwen2.5? A Deep Dive into the Latest LLM Advances

The Qwen2.5 Technical Report introduces a new series of large language models with up to 72 B parameters, expanded pre‑training data to 18 trillion tokens, advanced supervised fine‑tuning and reinforcement learning pipelines, and demonstrates strong performance across comprehension, reasoning, coding, and long‑context tasks.

LLMLarge Language ModelQwen2.5

0 likes · 5 min read

What’s New in Qwen2.5? A Deep Dive into the Latest LLM Advances

DataFunSummit

Dec 22, 2024 · Artificial Intelligence

From Concept to Deployment: The Evolution of 1688’s AI Purchasing Assistant “Yuanbao”

This article chronicles the development of 1688’s AI buyer assistant “Yuanbao”, detailing why an e‑commerce AI assistant is needed, its functional design, MVP constraints, the shift to a data‑driven 2.0 version, future prospects, and a Q&A, providing practical insights for AI product rollout in B‑to‑C platforms.

AILLMagent

0 likes · 24 min read

From Concept to Deployment: The Evolution of 1688’s AI Purchasing Assistant “Yuanbao”

Baobao Algorithm Notes

Dec 18, 2024 · Artificial Intelligence

How STAR Enables Training‑Free Recommendations with Large Language Models

The article reviews the STAR framework, a training‑free recommendation approach that leverages large language model embeddings and collaborative co‑occurrence scores to retrieve and rank items, and evaluates its performance, hyper‑parameter effects, and ablation studies against existing LLM‑based recommender methods.

Artificial IntelligenceLLMRecommendation Systems

0 likes · 10 min read

How STAR Enables Training‑Free Recommendations with Large Language Models

Full-Stack Cultivation Path

Dec 18, 2024 · Frontend Development

Midscene.js: An AI‑Powered UI Automation Framework for Web Testing

Midscene.js leverages multimodal AI to simplify web UI automation by providing .ai, .aiQuery and .aiAssert methods, supporting JavaScript and YAML integrations, a Chrome extension, and detailed cost analysis while acknowledging latency, interaction limits, and prompt‑engineering challenges.

Chrome ExtensionJavaScriptLLM

0 likes · 9 min read

Midscene.js: An AI‑Powered UI Automation Framework for Web Testing

Alibaba Cloud Developer

Dec 17, 2024 · Frontend Development

Choosing the Best LangChain Text Splitter for Frontend LLM Apps

This article compares five LangChain text splitters—CharacterTextSplitter, RecursiveCharacterTextSplitter, TokenTextSplitter, MarkdownTextSplitter, and LatexTextSplitter—by examining their principles, pros and cons, and ideal use cases, helping developers select the most suitable splitter for their frontend large‑model applications.

Frontend DevelopmentJavaScriptLLM

0 likes · 10 min read

Choosing the Best LangChain Text Splitter for Frontend LLM Apps

Huolala Tech

Dec 17, 2024 · Artificial Intelligence

How to Secure AI Agents: Privacy Risks, Threats, and Governance Strategies

This article examines the rapid growth of AI agents, outlines typical privacy and security challenges such as data leakage, model attacks, and prompt injection, and proposes comprehensive governance and technical measures to mitigate these risks in enterprise deployments.

AI AgentsGovernanceLLM

0 likes · 22 min read

How to Secure AI Agents: Privacy Risks, Threats, and Governance Strategies

Huolala Safety Emergency Response Center

Dec 17, 2024 · Information Security

How Secure Are AI Agents? Risks, Attacks, and Governance Strategies

This article examines the rapid growth of AI agents, outlines their core components and classifications, analyzes a wide range of privacy and security threats—including data leakage, prompt injection, jailbreak, backdoor, hallucination, and memory attacks—and proposes practical governance measures to mitigate these risks.

AI AgentsGovernanceLLM

0 likes · 25 min read

How Secure Are AI Agents? Risks, Attacks, and Governance Strategies

Baobao Algorithm Notes

Dec 16, 2024 · Artificial Intelligence

What Do Leading Open‑Source LLMs Do After Pretraining? A Deep Dive into Post‑Training Strategies

This article surveys the post‑training pipelines of major open‑source large language models released this year, detailing their alignment algorithms, data synthesis, reward modeling, DPO/GRPO variants, long‑context handling, tool use, and model‑averaging techniques, and highlights emerging trends such as data‑centric pipelines and iterative weak‑to‑strong alignment.

AI researchLLMalignment

0 likes · 99 min read

What Do Leading Open‑Source LLMs Do After Pretraining? A Deep Dive into Post‑Training Strategies

Alibaba Cloud Big Data AI Platform

Dec 16, 2024 · Artificial Intelligence

Build a RAG-Powered Q&A App with Alibaba Cloud Milvus, DashScope & PAI

This guide walks you through creating a Retrieval‑Augmented Generation (RAG) question‑answering application by integrating Alibaba Cloud Milvus vector search, DashScope embedding models, and PAI EAS LLM services, covering prerequisites, service deployment, configuration, Python code setup, and execution steps.

LLMLangChainMilvus

0 likes · 12 min read

Build a RAG-Powered Q&A App with Alibaba Cloud Milvus, DashScope & PAI

ZhongAn Tech Team

Dec 15, 2024 · Artificial Intelligence

AI Weekly Digest Issue 6: OpenAI’s AI Christmas Season, LeCun’s AGI Forecast, Chinese Text‑to‑Image Breakthrough, and EchoMimic V2

This issue reviews OpenAI’s twelve‑day product launch, LeCun’s surprising AGI timeline, a new Chinese text‑to‑image capability from ByteDance’s Doubao, and the open‑source EchoMimic V2 digital‑human system, highlighting trends, technical details, and industry reactions across the AI landscape.

Artificial IntelligenceChinese Text GenerationEchoMimic

0 likes · 13 min read

AI Weekly Digest Issue 6: OpenAI’s AI Christmas Season, LeCun’s AGI Forecast, Chinese Text‑to‑Image Breakthrough, and EchoMimic V2

Baobao Algorithm Notes

Dec 15, 2024 · Artificial Intelligence

What Are the Best Practices for Retrieval‑Augmented Generation (RAG)?

This comprehensive study evaluates various components of Retrieval‑Augmented Generation pipelines—including query classification, chunking, embedding models, vector databases, retrieval, re‑ranking, summarization, and generator fine‑tuning—identifies optimal configurations, and proposes best‑practice guidelines for both performance‑maximizing and efficiency‑balanced RAG systems.

Best PracticesLLMRAG

0 likes · 17 min read

What Are the Best Practices for Retrieval‑Augmented Generation (RAG)?

Fighter's World

Dec 14, 2024 · Industry Insights

Sequoia’s 2025 AI Outlook: From Hype to Real‑World Value

Sequoia Capital’s 2025 AI outlook argues that the industry is shifting from early excitement and massive spending to a phase focused on differentiated large‑model providers, AI‑search as a killer app, and a more disciplined, ROI‑driven investment climate.

2025 predictionsAIAI Search

0 likes · 16 min read

Sequoia’s 2025 AI Outlook: From Hype to Real‑World Value

DevOps

Dec 12, 2024 · Artificial Intelligence

The Future of Large Language Models: From Consumer Q&A to Agentic Workflows

Andrew Ng highlights that large language models are shifting from optimizing simple question‑answering for consumers to supporting complex agentic workflows, including tool usage, computer interaction, and multi‑agent collaboration, signaling a major evolution in AI capabilities.

AI AgentsAI trendsAnthropic

0 likes · 8 min read

The Future of Large Language Models: From Consumer Q&A to Agentic Workflows

AI Large Model Application Practice

Dec 12, 2024 · Artificial Intelligence

Mastering AutoGen: Build Multi‑Agent LLM Applications in Minutes

AutoGen, Microsoft’s advanced multi‑agent framework, lets developers quickly assemble collaborative LLM agents—supporting chat, tool use, and hierarchical group chats—through concise Python code, with examples ranging from simple two‑agent dialogues to complex three‑agent reporting pipelines, while outlining its strengths, limitations, and upcoming v0.4 enhancements.

AIAutoGenFramework

0 likes · 9 min read

Mastering AutoGen: Build Multi‑Agent LLM Applications in Minutes

Airbnb Technology Team

Dec 12, 2024 · Artificial Intelligence

Airbnb Automation Platform v2: Enabling LLM‑Driven Conversational AI

Airbnb’s Automation Platform v2 replaces the rigid, workflow‑driven architecture of v1 with an LLM‑centric design that orchestrates context gathering, chain‑of‑thought reasoning, tool execution, and guardrails, enabling more natural, scalable, and safe conversational AI while preserving the reliability of traditional workflows.

AI ArchitectureAirbnbConversational AI

0 likes · 11 min read

Airbnb Automation Platform v2: Enabling LLM‑Driven Conversational AI

DaTaobao Tech

Dec 9, 2024 · Artificial Intelligence

Analyzing LLM Failure Cases: Tokenization, Next‑Token Prediction, and Chain‑of‑Thought Prompting

The article explains how tokenization mismatches and biased next‑token prediction cause LLMs to miscount letters in “Strawberry” and incorrectly compare 9.9 versus 9.11, and shows that step‑by‑step Chain‑of‑Thought prompting with reason‑first output dramatically improves accuracy.

AIChain-of-ThoughtLLM

0 likes · 13 min read

Analyzing LLM Failure Cases: Tokenization, Next‑Token Prediction, and Chain‑of‑Thought Prompting

37 Interactive Technology Team

Dec 9, 2024 · Artificial Intelligence

Optimizing Request Concurrency for LLM Workflows: Rationale, Implementation, and Results

By breaking iterable inputs into parallel LLM calls and batching 20 items across three languages within Dify’s platform limits, the workflow achieves 43‑64% average runtime reductions and markedly higher success rates, demonstrating that request‑level concurrency dramatically improves throughput for large‑scale translation tasks.

CozeDifyLLM

0 likes · 6 min read

Optimizing Request Concurrency for LLM Workflows: Rationale, Implementation, and Results

DataFunSummit

Dec 4, 2024 · Artificial Intelligence

Accelerating Large Language Model Inference with the YiNian LLM Framework

This article presents the YiNian LLM framework, detailing how KVCache, prefill/decoding separation, continuous batching, PageAttention, and multi‑hardware scheduling are used to speed up large language model inference while managing GPU memory and latency.

AI accelerationGPUKVCache

0 likes · 20 min read

Accelerating Large Language Model Inference with the YiNian LLM Framework