Artificial Intelligence 11 min read

Open‑Source 35B Intern‑S2‑Preview Rivals Trillion‑Parameter Models on Scientific Benchmarks

The open‑source 35‑billion‑parameter Intern‑S2‑Preview model achieves scientific‑task performance comparable to trillion‑parameter models, thanks to full‑link “general‑specialized” training, reinforced‑learning scaling, and hardware‑aware optimizations, and it outperforms leading closed‑source models on benchmarks such as MolecularIQ and crystal‑structure generation.

Machine Learning Algorithms & Natural Language Processing

May 28, 2026

Open‑Source 35B Intern‑S2‑Preview Rivals Trillion‑Parameter Models on Scientific Benchmarks

On May 15, Shanghai AI Laboratory released Intern‑S2‑Preview, a 35‑billion‑parameter open‑source large language model designed to extend the capability frontier of “deeply specialized general models” while dramatically lowering the usage barrier. The release highlights three major breakthroughs: a smaller model size that matches trillion‑parameter performance in core scientific domains, enhanced scientific abilities including unprecedented material crystal‑structure generation, and a scientific agent that surpasses leading closed‑source models such as Claude‑Haiku‑4.5 and GPT‑5.4‑Nano on both general and scientific tasks.

The team attributes these gains to a scaling strategy that goes beyond merely increasing parameters or data. By raising task difficulty and diversity, they achieve a scaling effect that lifts model capability. Intern‑S2‑Preview adopts a “full‑link training” paradigm: each specialized scientific task is equipped with high‑quality data and training strategies from pre‑training through post‑training, and all tasks are jointly trained on a stable, efficient infrastructure. When many high‑difficulty, diverse tasks are fused, the small model reaches performance levels comparable to trillion‑parameter models, while avoiding the “one‑task‑wins‑others‑lose” trade‑off through synergistic task interaction.

Reinforcement‑learning enhancements further accelerate this “general‑specialized fusion.” The model is guided to use chain‑of‑thought reasoning for tasks such as bio‑omics understanding, leveraging the generalization advantage of chain‑of‑thought to let a 35B model rival trillion‑parameter performance. Longer RL training steps combined with graduate‑level reasoning problems enable the model to acquire cross‑domain inference abilities. Guided by the IQPT (Intelligence Quality per Token) concept, the team explored chain‑folding algorithms that compress chain length while preserving effectiveness; on math reasoning tasks, Intern‑S2‑Preview matches the performance of a recent ~300B‑parameter model.

Benchmark results substantiate the claims. On the MolecularIQ evaluation set, which tests spatial modeling and topological understanding of molecular structures, Intern‑S2‑Preview scores 57.26, surpassing Gemini‑3.1‑Pro’s 41.33. In material crystal‑structure generation, the model achieves a success rate above 40%, compared with roughly 10% for GPT‑5.5, marking a significant improvement in both quality and usability. Notably, these achievements are realized without relying on diffusion models, demonstrating efficient high‑precision coordinate regression.

Scientific agent capabilities are also upgraded. By incorporating systematic task synthesis and high‑quality agent training data drawn from open‑source skill repositories and real tool ecosystems, Intern‑S2‑Preview improves step‑decomposition, tool‑calling, and autonomous execution. The model shows robust performance on PinchBench and SciCode benchmarks, excelling in multi‑step decision‑making, state tracking, and scientific code generation, placing it among the top models of its scale.

Hardware‑software co‑evolution further boosts training and inference efficiency. On Ascend A3 super‑nodes, the training framework introduces multiple memory‑ and VRAM‑optimizations that stabilize long‑sequence multimodal training. Data‑chunk planning reduces host‑device communication, while shared‑weight inference narrows the gap between training and deployment. Visual‑language module balancing via offline simulation of compute ratios leads to more even resource allocation and higher overall throughput.

Beyond the model itself, Shanghai AI Laboratory continues to expand its open‑source ecosystem, including the XTuner training framework, LMDeploy inference framework, the OpenCompass evaluation suite, and the MinerU document‑analysis engine. Since the first Shushu model launch in 2023, the Intern‑S series has topped HuggingFace multimodal leaderboards and amassed over one million downloads, lowering the barrier for global research teams to engage in AI‑for‑Science.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Open Source Large Language Model benchmark reinforcement learning InternLM scientific AI

Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.