Can We Trust AI? Inside GPT‑5.2‑Codex’s Monitorability Breakthrough
OpenAI’s new GPT‑5.2‑Codex model achieves state‑of‑the‑art performance on SWE‑Bench Pro and Terminal‑Bench 2.0, and a 90‑page technical report introduces the concept of monitorability, defining metrics, benchmark suites, and key findings about chain‑of‑thought length, RL training, and model size.
