Tag

SWE-bench

0 views collected around this technical thread.

DataFunTalk
DataFunTalk
Jun 17, 2025 · Artificial Intelligence

Kimi-Dev-72B Sets New Open‑Source SOTA on SWE‑bench Verified (60.4% Score)

Kimi-Dev-72B, an open-source 72-billion-parameter code model from Moonshot AI, achieved a record 60.4% score on the SWE-bench Verified benchmark, surpassing larger models, and incorporates BugFixer/TestWriter dual roles, extensive mid-stage training on billions of GitHub data, and reinforcement-learning-driven self-play, with code available on Hugging Face and GitHub.

AIOpen SourceSWE-bench
0 likes · 7 min read
Kimi-Dev-72B Sets New Open‑Source SOTA on SWE‑bench Verified (60.4% Score)
Continuous Delivery 2.0
Continuous Delivery 2.0
Jul 3, 2024 · Artificial Intelligence

Applying Large Language Models to Software Engineering: Challenges, Cross‑File Editing Issues, Bug‑Fixing Evaluation, and SWE‑Bench Results

This article examines the practical challenges of using large language models in software development, including handling long contexts, cross‑file editing, bug‑fixing evaluation methods, and presents benchmark results from SWE‑Bench and its Lite subset to assess model capabilities.

Cross-File EditingLLMSWE-bench
0 likes · 7 min read
Applying Large Language Models to Software Engineering: Challenges, Cross‑File Editing Issues, Bug‑Fixing Evaluation, and SWE‑Bench Results