Artificial Intelligence 22 min read

InternLM Model Fine-Tuning Tutorial with XTuner: Chat Format and Practical Implementation Guide

This tutorial walks through fine‑tuning Shanghai AI Lab’s open‑source InternLM models with XTuner, explaining chat‑format conventions, loading and inference (including multimodal InternLM‑XComposer), dataset preparation, configuration sections, DeepSpeed acceleration, and memory‑efficient QLoRA details for 7‑B‑parameter chat models.

OPPO Kernel Craftsman
OPPO Kernel Craftsman
OPPO Kernel Craftsman
InternLM Model Fine-Tuning Tutorial with XTuner: Chat Format and Practical Implementation Guide

This article provides a comprehensive tutorial on the InternLM large language model and its fine-tuning using the XTuner framework. InternLM is developed by Shanghai AI Lab and SenseTime, offering a complete open-source ecosystem including data (Shu Sheng Wan Juan), training (XTuner), deployment (LMDeploy), evaluation (OpenCompass), and application (Lagent).

The tutorial first explains the concept of Chat Format (对话模板), which is essential for training dialogue models. The content covers OpenAI's ChatML format with roles like system, user, and assistant, special tokens like <|im_start|> and <|im_end|> , Llama2's chat format, and InternLM's chat format using tokens like <|System|> , <|User|> , <|eoh|> , <|Bot|> , and <|eoa|> .

The practical section demonstrates how to load InternLM models for inference using AutoTokenizer and AutoModelForCausalLM, including multi-turn dialogue examples. It also covers the InternLM multimodal model (InternLM-XComposer) for text and image generation.

The core of the tutorial focuses on XTuner fine-tuning workflow: (1) Installing XTuner and selecting configuration files, (2) Preparing datasets (e.g., openassistant-guanaco) and modifying configuration, (3) Starting training with optional DeepSpeed acceleration. The article provides detailed code examples and explains the data processing pipeline including dataset loading, dataset_map_fn for format conversion, template_map_fn for adding chat templates, encode_fn for tokenization, and optional pack_to_max_length for data packing.

Key technical details include: parameter counts (7.3B for internlm-chat-7b, 7.7B for internlm2-chat-7b), memory usage with QLoRA (approximately 3.5GB for model parameters, 14.6GB total with LoRA/gradients/optimizer), and the five configuration file sections (Settings, Model & Tokenizer, Dataset & Dataloader, Scheduler & Optimizer, Runtime).

Model DeploymentFine-tuningQLoRAlarge language modelDeepSpeedLLM TrainingHuggingFaceChat FormatInternLMXTuner
OPPO Kernel Craftsman
Written by

OPPO Kernel Craftsman

Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.