Tagged articles

9 articles

Page 1 of 1

Machine Learning Algorithms & Natural Language Processing

May 26, 2026 · Artificial Intelligence

Teaching 7,000 Languages: How LASA’s Semantic Bottleneck Enables Multilingual LLM Safety

The paper reveals a language‑agnostic "semantic bottleneck" layer inside large language models and introduces LASA, a three‑step framework that locates this layer, extracts safety signals with a lightweight interpreter, and injects them via KTO loss, dramatically improving multilingual safety without per‑language data collection.

AI safetyLASALLM safety

0 likes · 8 min read

Teaching 7,000 Languages: How LASA’s Semantic Bottleneck Enables Multilingual LLM Safety

Woodpecker Software Testing

May 7, 2026 · Artificial Intelligence

Open-Source AI Security Testing Tools Every Test Engineer Must Know

As AI becomes core to systems, traditional testing falls short; this article compares four production‑grade open‑source tools, shows real‑world failure cases, and outlines three practical rules for integrating AI security testing into CI/CD pipelines.

AI securityLLM safetyOpen Source

0 likes · 9 min read

Open-Source AI Security Testing Tools Every Test Engineer Must Know

Data Party THU

May 4, 2026 · Artificial Intelligence

Why Sending a Tilde to an LLM Can Erase Your Entire Home Directory

A recent ACL 2026 paper uncovers a “Emoticon Semantic Confusion” vulnerability in large language models, where the tilde symbol (~) intended as a friendly emoticon is interpreted as the shell shortcut for the home directory, causing silent, irreversible deletions across major LLMs with a 38.6 % confusion rate.

ACL 2026LLM safetySecurity Vulnerability

0 likes · 9 min read

Why Sending a Tilde to an LLM Can Erase Your Entire Home Directory

AI Waka

Apr 26, 2026 · Artificial Intelligence

Unlocking Reliable AI Agents: A Deep Dive into Harness Engineering

The article examines why raw LLM models fail as autonomous coding agents and introduces Harness Engineering—a disciplined scaffold of prompts, tools, context policies, hooks, and sub‑agents—that mitigates context corruption, long‑task collapse, and security risks while cutting token costs by up to 50%.

AI agentHarness EngineeringLLM safety

0 likes · 14 min read

Unlocking Reliable AI Agents: A Deep Dive into Harness Engineering

Machine Heart

Apr 22, 2026 · Artificial Intelligence

ProSafePrune: One‑Shot Pruning to Eliminate Over‑Refusal in Large Language Models

ProSafePrune, a low‑rank parameter pruning framework presented at ICLR 2026, precisely removes over‑harmful encoding in LLMs, dramatically reducing over‑refusal while preserving safety defenses and slightly improving general‑task performance.

ICLR 2026LLM safetyProSafePrune

0 likes · 10 min read

ProSafePrune: One‑Shot Pruning to Eliminate Over‑Refusal in Large Language Models

Wu Shixiong's Large Model Academy

Apr 14, 2026 · Artificial Intelligence

Designing High‑Quality Tools for Deep Research Agents: From Search to Python Execution

This article explains how to turn simple API calls into robust, noise‑filtering tools—Search, Visit, Scholar, and Python—by adding domain blacklists, relevance scoring, query‑driven extraction, safety sandboxes, and a unified registry, ultimately boosting the success rate of LLM‑driven research agents.

AI agentsLLM safetyReAct

0 likes · 32 min read

Designing High‑Quality Tools for Deep Research Agents: From Search to Python Execution

Machine Heart

Apr 6, 2026 · Information Security

Bridging the Trust Gap in Agent Deployment: Introducing AgentWard Full-Stack Defense OS

AgentWard is a full‑stack security operating system for autonomous AI agents that protects the entire lifecycle—from startup and input handling to memory, decision alignment, and execution—using layered defenses that have already blocked over 95% of simulated attacks in real‑world tests.

AI securityAgentWardLLM safety

0 likes · 19 min read

Bridging the Trust Gap in Agent Deployment: Introducing AgentWard Full-Stack Defense OS

AI Frontier Lectures

Oct 27, 2025 · Artificial Intelligence

How ARGRE Revolutionizes LLM Detoxification with Autoregressive Reward‑Guided Editing

The paper introduces ARGRE, a novel test‑time detoxification framework for large language models that visualizes toxicity trajectories in representation space and uses a lightweight autoregressive reward model to efficiently reduce harmful outputs while preserving generation quality.

ARGRELLM safetyNeurIPS 2025

0 likes · 10 min read

How ARGRE Revolutionizes LLM Detoxification with Autoregressive Reward‑Guided Editing

OPPO Kernel Craftsman

Aug 1, 2023 · Information Security

What the 2023 ACM China Turing Conference Revealed About AI‑Driven Security Challenges

The 2023 ACM China Turing Conference and the ACM TURC‑OPPO Security Summit in Wuhan gathered leading researchers and industry experts to discuss AI‑powered security, Bluetooth vulnerabilities, database fuzz testing, LLM‑enhanced mobile security, and proactive privacy computing, highlighting both breakthroughs and emerging risks.

AI securityBluetooth vulnerabilitiesLLM safety

0 likes · 8 min read

What the 2023 ACM China Turing Conference Revealed About AI‑Driven Security Challenges