DataFunTalk
Apr 3, 2025 · Artificial Intelligence
Large Language Models GPT-4.5 and LLaMa-3.1-405B Pass Standard Turing Test in UCSD Study
A UC San Diego study found that GPT-4.5 was judged human 73% of the time and LLaMa-3.1-405B 56%, demonstrating that both large language models can pass a standard three‑party Turing test, with detailed methodology, results, and analysis of judge behavior.
AI evaluationGPT-4.5LLaMa-3.1
0 likes · 5 min read