Tagged articles
3 articles
Page 1 of 1
Machine Heart
Machine Heart
May 31, 2026 · Artificial Intelligence

How a Near‑Invisible Image Can Make GPT‑5.4 and Claude Opus 4.6 Spread False Claims

Researchers from ETH Zurich show that tiny, human‑imperceptible perturbations to a single image can fool leading visual language models—including GPT‑5.4, Claude Opus 4.6, and Grok—into confidently delivering fabricated answers, enabling misinformation amplification, defamation, content‑filter evasion, and large‑scale AI authority laundering.

AI safetyClaude OpusGPT-5.4
0 likes · 7 min read
How a Near‑Invisible Image Can Make GPT‑5.4 and Claude Opus 4.6 Spread False Claims
IT Services Circle
IT Services Circle
Jul 16, 2025 · Artificial Intelligence

How a Simple Colon Can Trick Top LLMs – The Master‑RM Fix

A recent study reveals that tiny symbols like colons or generic reasoning prefixes can cause large language models used as reward judges to issue false‑positive rewards, but an enhanced reward model called Master‑RM, trained with adversarial data, eliminates this vulnerability across multiple LLMs and languages.

AI safetyLLMMaster-RM
0 likes · 10 min read
How a Simple Colon Can Trick Top LLMs – The Master‑RM Fix