Tag

model alignment

1 views collected around this technical thread.

DataFunTalk
DataFunTalk
Jun 19, 2025 · Artificial Intelligence

Can We Flip the Switch on AI Good vs. Evil? OpenAI’s Toxic Persona Find

OpenAI’s new research reveals that training language models to produce incorrect answers in a single domain can trigger a toxic persona feature, causing the model to generate harmful suggestions across unrelated tasks, but the team also demonstrates detection methods and a reversible “emergent realignment” technique to restore safe behavior.

AI SafetyEmergent misalignmentOpenAI
0 likes · 7 min read
Can We Flip the Switch on AI Good vs. Evil? OpenAI’s Toxic Persona Find
DataFunSummit
DataFunSummit
Jan 21, 2025 · Artificial Intelligence

NVIDIA NeMo Full Stack: End‑to‑End Large Language Model Training, Alignment, and RLHF

This article presents NVIDIA's NeMo technology stack for end‑to‑end large language model (LLM) training, covering the full software pipeline, model alignment with reinforcement learning from human feedback (RLHF), performance optimizations such as model parallelism, FP8, TensorRT‑LLM inference, dynamic load balancing, and future research directions.

GPU optimizationLLMNeMo
0 likes · 24 min read
NVIDIA NeMo Full Stack: End‑to‑End Large Language Model Training, Alignment, and RLHF
DataFunSummit
DataFunSummit
Aug 8, 2024 · Artificial Intelligence

Exploring Training and Alignment Techniques for Financial Large Models

The announcement details a DataFun Summit 2024 session where Du Xiaoman AI researcher Huo Liangyu will present on the challenges, development, and alignment methods of the Xuan Yuan financial large language model, highlighting RLHF techniques, data collection, and real‑world deployment insights for the finance sector.

AIFinancial AIRLHF
0 likes · 6 min read
Exploring Training and Alignment Techniques for Financial Large Models
Data Thinking Notes
Data Thinking Notes
Aug 1, 2024 · Artificial Intelligence

Unlocking Vertical Domain LLMs: Advantages, Challenges, and Alignment Strategies

Over the past year our team explored applying large language models to specialized domains, detailing their professional benefits, unique challenges such as accuracy and knowledge‑base maintenance, and presenting solutions like alignment enhancement via BPO, Text2API, RAG, and advanced SFT/DPO techniques.

RAGSFTText2API
0 likes · 10 min read
Unlocking Vertical Domain LLMs: Advantages, Challenges, and Alignment Strategies
DataFunTalk
DataFunTalk
Mar 10, 2024 · Artificial Intelligence

Aligning Graph Models with Large Language Models for Open-Task Scenarios

This talk presents GraphTranslator, a framework that bridges pretrained graph models and large language models to enable unified handling of both predefined and open-ended graph analysis tasks by translating node representations into language tokens and training an alignment producer for node‑text pairs.

AI researchGraph Neural Networkslarge language models
0 likes · 3 min read
Aligning Graph Models with Large Language Models for Open-Task Scenarios
DataFunTalk
DataFunTalk
Sep 8, 2023 · Artificial Intelligence

Knowledge Processing in the Era of Large Models: New Opportunities and New Challenges

This article examines how large language models and knowledge graphs complement each other, discussing their respective strengths, integration techniques such as prompt engineering and knowledge editing, and outlining future research directions for building large knowledge models that combine linguistic understanding with structured knowledge representation.

AIKnowledge GraphsPrompt Engineering
0 likes · 27 min read
Knowledge Processing in the Era of Large Models: New Opportunities and New Challenges
Top Architect
Top Architect
Feb 9, 2023 · Artificial Intelligence

How ChatGPT Works: Training, RLHF, and Consistency Issues

ChatGPT, OpenAI’s latest language model, builds on GPT‑3 and improves performance through supervised fine‑tuning, human‑feedback reinforcement learning (RLHF), and PPO optimization, addressing consistency challenges such as misaligned outputs, bias, and hallucinations while evaluating helpfulness, truthfulness, and harmlessness.

ChatGPTRLHFlarge language models
0 likes · 15 min read
How ChatGPT Works: Training, RLHF, and Consistency Issues
Top Architect
Top Architect
Feb 8, 2023 · Artificial Intelligence

A Technical Roadmap of GPT‑3.5: From Pre‑training to RLHF and Emerging Capabilities

This article analyses how ChatGPT and the GPT‑3.5 series evolved from the original GPT‑3 through large‑scale pre‑training, code‑based training, instruction tuning, and reinforcement learning from human feedback, identifying the origins of their language generation, in‑context learning, world knowledge, code understanding, chain‑of‑thought reasoning, and alignment capabilities while also outlining current limitations.

ChatGPTEmergent AbilitiesGPT-3.5
0 likes · 27 min read
A Technical Roadmap of GPT‑3.5: From Pre‑training to RLHF and Emerging Capabilities