MOSS 003: Open‑Source Large Language Model Development, Training Data, and Plugin‑Enabled Deployment
The article details the evolution of the open‑source MOSS series—from OpenChat 001 to MOSS 003—covering data collection, fine‑tuning procedures, multilingual capabilities, plugin architecture, example code for inference, and upcoming releases, providing a comprehensive technical overview for AI practitioners.
The post introduces the MOSS family of open‑source large language models, starting with the early internal prototype OpenChat 001, which was built by expanding ~400k dialogue pairs using self‑instruction techniques and fine‑tuned on a 16B CodeGen base.
OpenChat 001 already demonstrated instruction‑following, multi‑turn dialogue, and surprising cross‑language alignment despite being trained on almost no Chinese data.
Following OpenChat 001, the team released MOSS 002, adding ~30B Chinese tokens and over 1.16M bilingual helpfulness, honesty, and harmlessness dialogues (available on HuggingFace). Engineering work on inference acceleration, model deployment, and front‑end/back‑end integration was also completed, and a closed beta began on February 21.
MOSS 003 further scales pre‑training to 100B Chinese tokens (total 700B tokens, including ~300B code) and incorporates ~1.1M real‑world user dialogues plus ~300k plugin‑enhanced conversations covering search, image generation, calculators, and equation solving. A small subset of this data is publicly released.
The model suite uploaded to HuggingFace includes:
moss-moon-003-base – the base language model with extensive Chinese knowledge.
moss-moon-003-sft – a dialogue‑fine‑tuned model with initial helpfulness, honesty, and harmlessness.
moss-moon-003-sft-plugin – a plugin‑enhanced version capable of invoking at least four external tools.
Interaction with MOSS can be done in a few Python lines:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("fnlp/moss-moon-003-sft", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("fnlp/moss-moon-003-sft", trust_remote_code=True).half()
model.eval()
meta_instruction = "You are an AI assistant whose name is MOSS. ..."
query = meta_instruction + "<|Human|>: 你好
\n<|MOSS|>:"
inputs = tokenizer(query, return_tensors="pt")
outputs = model.generate(**inputs, do_sample=True, temperature=0.7, top_p=0.8, repetition_penalty=1.1, max_new_tokens=128)
response = tokenizer.decode(outputs[0])
print(response[len(query)+2:])For plugin calls, MOSS first generates <|Inner Thoughts|> and <|Commands|> , executes the indicated API, inserts the result into <|Results|> , and then performs a second inference to produce the final <|MOSS|> reply. The web UI shows these inner thoughts via a small light‑bulb icon.
Future work includes releasing quantized Int‑4/8 models, expanding the full fine‑tuning dataset, and improving plugin reliability. The team also open‑sources front‑end and back‑end code repositories for community experimentation.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.