Tag

language model

0 views collected around this technical thread.

JD Tech Talk
JD Tech Talk
Mar 5, 2025 · Artificial Intelligence

GLM: General Language Model Pretraining with Autoregressive Blank Infilling

GLM introduces a unified pretraining framework that combines autoregressive blank‑filling with 2D positional encoding and span‑shuffle, achieving superior performance over BERT, T5 and GPT on a range of NLU and generation tasks such as SuperGLUE, text‑filling, and language modeling.

2D positional encodingNLUPretraining
0 likes · 27 min read
GLM: General Language Model Pretraining with Autoregressive Blank Infilling
Code Mala Tang
Code Mala Tang
Jan 31, 2025 · Artificial Intelligence

Master DeepSeek: 7 Prompt Engineering Tricks to Boost AI Responses

This guide presents seven practical prompt‑engineering techniques—clear goals, structured queries, domain terminology, concrete examples, scoped questions, step‑by‑step breakdowns, and multi‑turn interactions—to help users get more accurate and useful answers from DeepSeek.

AI promptsDeepSeekPrompt Engineering
0 likes · 6 min read
Master DeepSeek: 7 Prompt Engineering Tricks to Boost AI Responses
360 Tech Engineering
360 Tech Engineering
Dec 17, 2024 · Artificial Intelligence

Innovative Multimodal Architectures: IAA for Extending Language Models and BDM for Chinese-Native AI Painting

The article introduces two 360 AI Research Institute projects—IAA, an architecture that equips frozen language models with multimodal capabilities via plug‑in layers, and BDM, a Chinese‑native diffusion model compatible with the Stable Diffusion ecosystem—detailing their motivations, designs, benchmark results, and open‑source resources.

Chinese AI paintingPlugin ArchitectureStable Diffusion
0 likes · 6 min read
Innovative Multimodal Architectures: IAA for Extending Language Models and BDM for Chinese-Native AI Painting
DataFunTalk
DataFunTalk
Nov 2, 2023 · Artificial Intelligence

Enhancing Language and Vision Models with External Knowledge and Tools: OREO‑LM, REVEAL, and AVIS

This article reviews recent research on augmenting language and multimodal models with external knowledge sources and tool‑calling mechanisms, covering three systems—OREO‑LM for knowledge‑graph reasoning, REVEAL for multi‑source visual‑language pretraining, and AVIS for dynamic tool selection—and their experimental results and implications.

Tool Integrationknowledge graphlanguage model
0 likes · 28 min read
Enhancing Language and Vision Models with External Knowledge and Tools: OREO‑LM, REVEAL, and AVIS
DataFunSummit
DataFunSummit
Oct 17, 2023 · Artificial Intelligence

Enhancing Vision and Language Models with External Knowledge Graphs and Tool Integration

This article reviews recent research on augmenting language and vision models by incorporating external knowledge sources such as knowledge graphs, multi‑source retrieval, and dynamic tool‑calling frameworks, presenting three systems—OREO‑LM, REVEAL, and AVIS—and their experimental results.

AI researchTool Integrationknowledge graph
0 likes · 27 min read
Enhancing Vision and Language Models with External Knowledge Graphs and Tool Integration
DataFunSummit
DataFunSummit
Sep 22, 2023 · Artificial Intelligence

Exploring Game AI Agents: Review, LLM‑Driven Exploration, and Future Directions

This article reviews the evolution of game AI agents, examines how large language models (LLMs) can drive new AI behaviors in games, and discusses practical case studies across genres such as Werewolf‑style, war‑SLG, and MOBA games, concluding with challenges and future research directions.

AI agentsGame developmentLLM
0 likes · 31 min read
Exploring Game AI Agents: Review, LLM‑Driven Exploration, and Future Directions
58 Tech
58 Tech
Jun 21, 2023 · Artificial Intelligence

GPU Hotword Enhancement for WeNet End-to-End Speech Recognition

This article explains the design, implementation, and experimental evaluation of hot‑word augmentation in WeNet's GPU runtime, detailing how character‑ and word‑based language model scoring are extended to boost recognition of rare proper nouns in both streaming and non‑streaming ASR services.

ASRCTC decoderGPU
0 likes · 12 min read
GPU Hotword Enhancement for WeNet End-to-End Speech Recognition
Top Architect
Top Architect
Mar 1, 2023 · Artificial Intelligence

Understanding the Internals of ChatGPT: Neural Networks, Embeddings, and Training Techniques

This article provides a comprehensive overview of how ChatGPT works, covering its probabilistic text generation, transformer architecture, embedding representations, neural network training processes, and the underlying principles that enable large language models to produce coherent and meaningful human-like language.

AIChatGPTEmbeddings
0 likes · 80 min read
Understanding the Internals of ChatGPT: Neural Networks, Embeddings, and Training Techniques
IT Architects Alliance
IT Architects Alliance
Feb 23, 2023 · Artificial Intelligence

Training a Positive Review Generator with RLHF and PPO

This article demonstrates how to use Reinforcement Learning from Human Feedback (RLHF) with a PPO algorithm and a sentiment‑analysis model to train a language model that generates positive product reviews, covering task definition, data sampling, reward evaluation, model optimization, and experimental results.

GPTPPORLHF
0 likes · 11 min read
Training a Positive Review Generator with RLHF and PPO
Architect
Architect
Feb 19, 2023 · Artificial Intelligence

Training a Positive Review Generator with RLHF and PPO

This article demonstrates how to apply Reinforcement Learning from Human Feedback (RLHF) using a sentiment‑analysis model as a reward function and Proximal Policy Optimization (PPO) to fine‑tune a language model that generates positive product reviews, complete with code snippets and experimental results.

PPORLHFlanguage model
0 likes · 10 min read
Training a Positive Review Generator with RLHF and PPO
DataFunSummit
DataFunSummit
Feb 8, 2023 · Artificial Intelligence

Technical Architecture and Training Process of ChatGPT

ChatGPT, a dialogue-focused language model, builds on the GPT family and employs techniques such as Reinforcement Learning from Human Feedback (RLHF), the TAMER framework, and a three-stage training pipeline (supervised fine‑tuning, reward modeling, and PPO reinforcement learning) to achieve advanced conversational capabilities.

ChatGPTGPTRLHF
0 likes · 7 min read
Technical Architecture and Training Process of ChatGPT
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Nov 11, 2022 · Artificial Intelligence

Language Model as a Service and Black‑Box Optimization: Insights from Prof. Qiu Xipeng’s Talk

Prof. Qiu Xipeng’s talk highlighted how large language models can be offered as a service and efficiently adapted via in‑context learning, lightweight label‑tuning, and gradient‑free black‑box optimization, showcasing a unified asymmetric Transformer (CPT) that handles understanding, generation, ABSA and NER tasks while reducing resource demands.

LLMNLPPretraining
0 likes · 15 min read
Language Model as a Service and Black‑Box Optimization: Insights from Prof. Qiu Xipeng’s Talk
DataFunTalk
DataFunTalk
Jul 30, 2021 · Artificial Intelligence

Fundamentals of Natural Language Processing: Language Models, Smoothing, and Basic Tasks

This article provides a comprehensive overview of natural language processing fundamentals, covering the challenges of language modeling, N‑gram and Markov assumptions, smoothing techniques such as discounting and add‑one, evaluation via perplexity, basic tasks like Chinese word segmentation, subword tokenization, POS tagging, syntactic and semantic parsing, and a range of downstream applications including information extraction, sentiment analysis, question answering, machine translation, and dialogue systems.

AINLPlanguage model
0 likes · 29 min read
Fundamentals of Natural Language Processing: Language Models, Smoothing, and Basic Tasks
58 Tech
58 Tech
Dec 11, 2020 · Artificial Intelligence

Weighted Finite State Transducers (WFST) in Traditional Speech Recognition: Principles and Optimization

This article explains the role of Weighted Finite State Transducers in conventional HMM‑based speech recognition, covering language models, pronunciation dictionaries, WFST definitions, semiring theory, composition and determinization operations, decoding graph construction (HCLG), lattice rescoring, and practical optimization techniques for real‑world scenarios.

ASROptimizationSpeech Recognition
0 likes · 23 min read
Weighted Finite State Transducers (WFST) in Traditional Speech Recognition: Principles and Optimization
Sohu Tech Products
Sohu Tech Products
Nov 25, 2020 · Artificial Intelligence

Illustrated Guide to GPT-2: Detailed Explanation of the Decoder‑Only Transformer Model

This article provides a comprehensive, illustrated walkthrough of OpenAI's GPT‑2 language model, covering its decoder‑only Transformer architecture, self‑attention mechanisms, token processing, training data, differences from BERT, and applications beyond language modeling, enriched with visual diagrams and code snippets for deeper understanding.

AIGPT-2Self-Attention
0 likes · 24 min read
Illustrated Guide to GPT-2: Detailed Explanation of the Decoder‑Only Transformer Model
Didi Tech
Didi Tech
Nov 5, 2020 · Artificial Intelligence

Self-Learning Platform for Speech Recognition Model Optimization at DiDi

DiDi’s self‑learning ASR platform lets non‑technical users upload business data, automatically train, test and deploy models with semi‑supervised learning, hot‑word updates and LSTM rescoring, creating a closed‑loop pipeline that boosted vehicle voice‑interaction accuracy from around 80 % to over 95 % within months.

AISemi-supervised LearningSpeech Recognition
0 likes · 14 min read
Self-Learning Platform for Speech Recognition Model Optimization at DiDi
58 Tech
58 Tech
Mar 2, 2020 · Artificial Intelligence

Low-Quality Text Detection Using Unsupervised Language Model Perplexity

This article proposes a method to identify low-quality text in business data by training a large-scale unsupervised language model to compute sentence perplexity, converting the detection problem into a threshold decision, and details model design, challenges, optimizations, and online performance results.

BERTNLPTransformer
0 likes · 13 min read
Low-Quality Text Detection Using Unsupervised Language Model Perplexity
DataFunTalk
DataFunTalk
Sep 3, 2019 · Artificial Intelligence

Forward Neural Networks and Their Applications in Language Modeling, Ranking, and Recommendation

This article excerpt explains the structure and training of feed‑forward neural networks, illustrates their use in neural language models, describes deep structured semantic models for ranking tasks, and details two‑stage recommendation systems such as YouTube, covering both theoretical formulas and practical deployment considerations.

Artificial IntelligenceRankingdeep learning
0 likes · 13 min read
Forward Neural Networks and Their Applications in Language Modeling, Ranking, and Recommendation
Tencent Cloud Developer
Tencent Cloud Developer
Aug 25, 2019 · Artificial Intelligence

Understanding Intelligent Speech Recognition Technology

Intelligent speech recognition converts spoken audio to text using a pipeline of feature extraction, acoustic and language modeling, where deep neural networks—especially CNN, LSTM, and hybrid CLDNN architectures—drive high accuracy, enabling mobile voice input, call‑center transcription, legal record keeping, and Tencent Cloud ASR’s 97% Mandarin accuracy with speaker separation and on‑premises deployment.

AISpeech RecognitionTencent Cloud
0 likes · 7 min read
Understanding Intelligent Speech Recognition Technology
Tencent Cloud Developer
Tencent Cloud Developer
Jul 17, 2019 · Artificial Intelligence

Design and Implementation of a Multi‑Turn Conversational Chatbot

The article outlines the design and implementation of a multi‑turn conversational chatbot, detailing how natural‑language understanding converts user utterances into structured representations, a CNN‑LSTM language model classifies topics, intents, and sentiments, and an XML‑based answer engine orchestrates tasks and services for real‑world deployment.

AIChatbotMulti-turn Dialogue
0 likes · 9 min read
Design and Implementation of a Multi‑Turn Conversational Chatbot