InternLM Model Fine-Tuning Tutorial with XTuner: Chat Format and Practical Implementation Guide
This tutorial walks through fine‑tuning Shanghai AI Lab’s open‑source InternLM models with XTuner, explaining chat‑format conventions, loading and inference (including multimodal InternLM‑XComposer), dataset preparation, configuration sections, DeepSpeed acceleration, and memory‑efficient QLoRA details for 7‑B‑parameter chat models.
This article provides a comprehensive tutorial on the InternLM large language model and its fine-tuning using the XTuner framework. InternLM is developed by Shanghai AI Lab and SenseTime, offering a complete open-source ecosystem including data (Shu Sheng Wan Juan), training (XTuner), deployment (LMDeploy), evaluation (OpenCompass), and application (Lagent).
The tutorial first explains the concept of Chat Format (对话模板), which is essential for training dialogue models. The content covers OpenAI's ChatML format with roles like system, user, and assistant, special tokens like <|im_start|> and <|im_end|> , Llama2's chat format, and InternLM's chat format using tokens like <|System|> , <|User|> , <|eoh|> , <|Bot|> , and <|eoa|> .
The practical section demonstrates how to load InternLM models for inference using AutoTokenizer and AutoModelForCausalLM, including multi-turn dialogue examples. It also covers the InternLM multimodal model (InternLM-XComposer) for text and image generation.
The core of the tutorial focuses on XTuner fine-tuning workflow: (1) Installing XTuner and selecting configuration files, (2) Preparing datasets (e.g., openassistant-guanaco) and modifying configuration, (3) Starting training with optional DeepSpeed acceleration. The article provides detailed code examples and explains the data processing pipeline including dataset loading, dataset_map_fn for format conversion, template_map_fn for adding chat templates, encode_fn for tokenization, and optional pack_to_max_length for data packing.
Key technical details include: parameter counts (7.3B for internlm-chat-7b, 7.7B for internlm2-chat-7b), memory usage with QLoRA (approximately 3.5GB for model parameters, 14.6GB total with LoRA/gradients/optimizer), and the five configuration file sections (Settings, Model & Tokenizer, Dataset & Dataloader, Scheduler & Optimizer, Runtime).
OPPO Kernel Craftsman
Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.