Tagged articles

230 articles

Page 2 of 3

Dec 19, 2025 · Artificial Intelligence

Can We Trust AI? Inside GPT‑5.2‑Codex’s Monitorability Breakthrough

OpenAI’s new GPT‑5.2‑Codex model achieves state‑of‑the‑art performance on SWE‑Bench Pro and Terminal‑Bench 2.0, and a 90‑page technical report introduces the concept of monitorability, defining metrics, benchmark suites, and key findings about chain‑of‑thought length, RL training, and model size.

AI safetyGPT-5.2benchmark

0 likes · 10 min read

Can We Trust AI? Inside GPT‑5.2‑Codex’s Monitorability Breakthrough

HyperAI Super Neural

Dec 18, 2025 · Artificial Intelligence

Why Dario Amodei Embeds Pre‑emptive AI Safety into Anthropic’s Mission

The article analyses Dario Amodei’s shift from OpenAI to Anthropic, his insistence on early AI regulation, the non‑linear growth of model capabilities versus linear governance, the engineering‑focused safety framework—including Constitutional AI—and the broader industry and policy debates surrounding AI safety as a foundational protocol.

AI policyAI safetyAnthropic

0 likes · 19 min read

Why Dario Amodei Embeds Pre‑emptive AI Safety into Anthropic’s Mission

PaperAgent

Dec 16, 2025 · Artificial Intelligence

Do LLMs Have Emotional Chains? Unveiling the Chain‑of‑Affective Across 8 Model Families

This article analyzes recent research by East China Normal University and Fudan University on whether eight major LLM families exhibit a systematic “Chain-of-Affective,” revealing how internal emotional structures influence model outputs, multi‑agent interactions, and user experience, and offering practical guidelines for mitigating emotional loops in AI systems.

AI safetyChain-of-AffectiveEmotion

0 likes · 8 min read

Do LLMs Have Emotional Chains? Unveiling the Chain‑of‑Affective Across 8 Model Families

AI Insight Log

Dec 11, 2025 · Artificial Intelligence

GPT-5.2 Released: How It Outperforms Claude 4.5 and Gemini 3 Pro

OpenAI’s GPT‑5.2 launch introduces three specialized modes, achieves a record 55.6% score on SWE‑Bench Pro, demonstrates strong front‑end generation, adds a /compact API for long‑context efficiency, offers tiered pricing with cache discounts, and improves safety for younger users.

AI benchmarkingAI safetyGPT-5.2

0 likes · 6 min read

GPT-5.2 Released: How It Outperforms Claude 4.5 and Gemini 3 Pro

PaperAgent

Dec 8, 2025 · Artificial Intelligence

What Is Human‑AI Alignment? A New Framework from NeurIPS 2025

At NeurIPS 2025, Yoshua Bengio presented a Human‑AI Alignment tutorial introducing a dynamic, bidirectional framework that emphasizes pluralistic goals, human control across the data‑training‑evaluation‑deployment pipeline, and socio‑technical oversight, while detailing foundations, methods, practical assessments, and future challenges.

AI ethicsAI safetyAlignment Framework

0 likes · 5 min read

What Is Human‑AI Alignment? A New Framework from NeurIPS 2025

HyperAI Super Neural

Dec 8, 2025 · Industry Insights

Is a $20 B “All‑In” Bet on xAI Sustainable? Musk’s Gamble vs OpenAI

The article examines xAI’s $20 billion financing round—largely debt‑backed and tied to NVIDIA hardware—its heavy reliance on Musk’s personal resources, Grok’s “weak‑alignment” strategy, regulatory headwinds in the EU and US, cost overruns, limited revenue streams, and whether the venture can survive beyond Musk’s empire.

AI financingAI safetyGrok

0 likes · 17 min read

Is a $20 B “All‑In” Bet on xAI Sustainable? Musk’s Gamble vs OpenAI

HyperAI Super Neural

Nov 3, 2025 · Artificial Intelligence

Demis Hassabis Shifts DeepMind from Pure Research to AI4S, Facing Ethical Tests

The article traces Demis Hassabis’s journey from chess prodigy to DeepMind CEO, detailing the company’s transition from game‑playing breakthroughs like AlphaGo to scientific initiatives such as AlphaFold and AI4S, while examining ethical debates, Nobel‑prize controversy, and calls for global AI safety standards.

AI for ScienceAI safetyAlphaFold

0 likes · 13 min read

Demis Hassabis Shifts DeepMind from Pure Research to AI4S, Facing Ethical Tests

Architecture and Beyond

Nov 2, 2025 · Artificial Intelligence

Why AI Agents Still Fall Short: Key Challenges and Real-World Solutions

The article examines why current AI agents fall short of expectations, highlighting weak business understanding, limited execution, controllability issues, high customization costs, and the gap between model capabilities and engineering, while proposing SaaS firms' advantages, vertical scenario focus, security concerns, and future development trends.

AI agentsAI safetyVertical AI

0 likes · 11 min read

Why AI Agents Still Fall Short: Key Challenges and Real-World Solutions

Data Party THU

Oct 4, 2025 · Artificial Intelligence

Advances in Robust AI: Defending Adversarial Attacks, Boosting Domain Generalization, Stopping LLM Jailbreaks

This article reviews the latest progress in designing algorithms with strong robustness, covering adversarial examples in computer vision, novel training paradigms and certification methods, domain‑generalization techniques that achieve state‑of‑the‑art performance in medical imaging and molecular recognition, and new attack‑defense strategies for LLM jailbreak scenarios.

AI safetyLLM Securityadversarial robustness

0 likes · 4 min read

Advances in Robust AI: Defending Adversarial Attacks, Boosting Domain Generalization, Stopping LLM Jailbreaks

IT Services Circle

Oct 1, 2025 · Artificial Intelligence

Claude Sonnet 4.5: The New State‑of‑the‑Art Coding Model with 30‑Hour Runtime

Anthropic’s Claude Sonnet 4.5, promoted as the world’s best coding model, achieves top scores on SWE‑bench Verified, runs continuously for over 30 hours, outperforms competitors on OSWorld and multiple agentic tests, adds extensive safety features, and introduces a revamped Claude Code suite with VS Code, terminal, and Agent SDK enhancements.

AIAI safetyClaude

0 likes · 10 min read

Claude Sonnet 4.5: The New State‑of‑the‑Art Coding Model with 30‑Hour Runtime

21CTO

Sep 30, 2025 · Artificial Intelligence

Anthropic Unveils Claude Sonnet 4.5 – The Leading Coding Model and Powerful Agent Platform

Anthropic announced Claude Sonnet 4.5, touting it as the world’s best coding model and strongest for building complex agents, backed by top benchmark scores, enhanced domain knowledge, improved safety, unchanged pricing, and new features like checkpoints, context editing, memory tools, and an Agent SDK.

AI coding modelAI safetyAnthropic

0 likes · 4 min read

Anthropic Unveils Claude Sonnet 4.5 – The Leading Coding Model and Powerful Agent Platform

Wuming AI

Sep 29, 2025 · Artificial Intelligence

Why Claude Sonnet 4.5 Is Redefining AI Coding and Agent Capabilities

Anthropic’s Claude Sonnet 4.5 arrives with unchanged pricing but claims top‑tier coding performance, superior reasoning and safety scores, a new Agent SDK for long‑running tasks, and an "Imagine with Claude" preview that lets users generate live software, all backed by benchmark comparisons and real‑world case studies.

AI codingAI safetyClaude Sonnet 4.5

0 likes · 6 min read

Why Claude Sonnet 4.5 Is Redefining AI Coding and Agent Capabilities

DataFunSummit

Sep 29, 2025 · Artificial Intelligence

How to Detect and Prevent Hallucinations in LLM‑Powered NL2SQL Systems

This article explains the nature, types, and causes of hallucinations in large language models used for NL2SQL, reviews both unsupervised and supervised detection methods, and introduces an efficient token‑confidence based Active Sampling Detection (ASD) approach with practical deployment examples and future research directions.

AI safetyASDLLM

0 likes · 19 min read

How to Detect and Prevent Hallucinations in LLM‑Powered NL2SQL Systems

Continuous Delivery 2.0

Sep 26, 2025 · Artificial Intelligence

Why a New AI Programming Manifesto Is Needed – Lessons from the Agile Revolution

The article argues that after 24 years since the Agile Manifesto, AI-driven programming has created a fresh crisis of role confusion, unpredictability, and security risks, and proposes a new AI Programming Manifesto to guide developers toward responsible, human‑centered, and safe AI‑assisted software engineering.

AI programmingAI safetyagile

0 likes · 18 min read

Why a New AI Programming Manifesto Is Needed – Lessons from the Agile Revolution

DataFunSummit

Sep 24, 2025 · Artificial Intelligence

Taming LLM Hallucinations: Strategies and Solutions from 360

This article explores the problem of large‑model hallucinations, explains its definitions and classifications, analyzes root causes in data, algorithms and inference, and presents detection methods and practical mitigation techniques such as RAG, decoding strategies, and model‑enhancement approaches, illustrated with real‑world 360 use cases and future research directions.

AI safetyLLMRAG

0 likes · 22 min read

Taming LLM Hallucinations: Strategies and Solutions from 360

Data Party THU

Sep 22, 2025 · Artificial Intelligence

How to Secure Large‑Model Training: Practical Techniques and Real‑World Cases

This article systematically examines the major security challenges of large‑model training—including data leakage, adversarial attacks, bias, and supply‑chain risks—and presents concrete solutions such as differential privacy, federated learning, adversarial training, backdoor detection, and lifecycle protection to guide practitioners toward safer AI deployments.

AI safetyadversarial trainingbackdoor detection

0 likes · 14 min read

How to Secure Large‑Model Training: Practical Techniques and Real‑World Cases

Data Party THU

Sep 18, 2025 · Artificial Intelligence

Can Language Models Self‑Optimize? Inside the STOP Framework

Researchers introduce the Self‑Taught Optimizer (STOP), a scaffolding‑based framework that lets large language models iteratively improve their own code without altering model weights, demonstrating superior performance on tasks like LPN, exploring diverse strategies such as beam search and genetic algorithms, while also highlighting security risks like sandbox bypass and reward hacking.

AI safetyLanguage Modelsrecursive self-improvement

0 likes · 11 min read

Can Language Models Self‑Optimize? Inside the STOP Framework

Instant Consumer Technology Team

Sep 17, 2025 · Artificial Intelligence

Uncovering the Secret System Prompts Behind ChatGPT, Claude, and Gemini

The article examines the open‑source "system_prompts_leaks" project, which collects leaked system prompts from major AI models and reveals recurring design patterns such as modular layering, strict boundary control, dynamic strategy adjustment, emotional persona injection, and multi‑layer safety mechanisms.

AI safetyprompt engineeringsecurity

0 likes · 7 min read

Uncovering the Secret System Prompts Behind ChatGPT, Claude, and Gemini

Java Tech Enthusiast

Sep 16, 2025 · Artificial Intelligence

When AI Turns Developers into Babysitters: The Hidden Costs Behind the Hype

Despite the hype that AI will replace programmers, senior developers report that AI tools often turn them into "AI babysitters" who spend most of their time feeding data, tweaking parameters, and fixing bugs, leading to significant hidden costs and new responsibilities.

AIAI safetyCode Review

0 likes · 8 min read

When AI Turns Developers into Babysitters: The Hidden Costs Behind the Hype

Volcano Engine Developer Services

Sep 11, 2025 · Artificial Intelligence

Why Do Large Language Models Hallucinate? Causes, Types, and Mitigation Strategies

This article examines the growing problem of hallucinations in large language models, outlining their causes across the model lifecycle, classifying four main hallucination types, and presenting both retrieval‑augmented generation and detection techniques—white‑box and black‑box—to reduce factual errors in critical applications.

AI safetyLLMhallucination

0 likes · 15 min read

Why Do Large Language Models Hallucinate? Causes, Types, and Mitigation Strategies

Data Thinking Notes

Sep 10, 2025 · Artificial Intelligence

Why Do Language Models Hallucinate? Uncovering the Statistical Roots

OpenAI’s latest research reveals that language model hallucinations stem from training and evaluation incentives that favor confident guesses over acknowledging uncertainty, and proposes revised scoring methods that reward modesty, highlighting statistical mechanisms behind false answers and offering pathways to reduce hallucinations.

AI safetyLanguage Modelsevaluation

0 likes · 10 min read

Why Do Language Models Hallucinate? Uncovering the Statistical Roots

Architect

Sep 9, 2025 · Artificial Intelligence

Why Do Language Models Hallucinate? Insights from OpenAI’s New Study

This article explains why large language models often produce confident but incorrect answers, detailing statistical inevitability, data scarcity, and model capacity limits, and proposes concrete solutions such as confidence thresholds and allowing abstention to reduce hallucinations.

AI safetyLanguage Modelsevaluation

0 likes · 8 min read

Why Do Language Models Hallucinate? Insights from OpenAI’s New Study

Baobao Algorithm Notes

Sep 9, 2025 · Artificial Intelligence

Why Do Language Models Hallucinate? Roots, Risks, and a New Evaluation Approach

The article analyzes OpenAI's study on language‑model hallucinations, explaining how statistical limits in pre‑training and flawed binary evaluation incentives cause false answers, and proposes a confidence‑threshold scoring system that rewards honest "I don’t know" responses to improve reliability.

AI safetyLanguage Modelsconfidence threshold

0 likes · 8 min read

Why Do Language Models Hallucinate? Roots, Risks, and a New Evaluation Approach

DataFunTalk

Sep 8, 2025 · Artificial Intelligence

When Claude Leaves China: How Domestic AI Models Are Rising to Fill the Gap

Anthropic's new ban on Claude for Chinese‑controlled firms forces developers to seek home‑grown alternatives, prompting a deep dive into Claude's strengths, the rapid rise of Chinese large‑language models, and the gaps that still separate them from the world‑leading offering.

AI modelsAI safetyChinese AI

0 likes · 11 min read

When Claude Leaves China: How Domestic AI Models Are Rising to Fill the Gap

Data STUDIO

Sep 8, 2025 · Industry Insights

Claude Completely Banned for Chinese Companies – No Workarounds Anywhere

Anthropic announced an immediate, worldwide ban on Claude for any entity controlled by Chinese capital, citing legal, regulatory and security risks, and warned that continued access could enable military use or model‑stealing, urging firms to adopt domestic alternatives.

AI policyAI safetyAnthropic

0 likes · 3 min read

Claude Completely Banned for Chinese Companies – No Workarounds Anywhere

Java Tech Enthusiast

Sep 7, 2025 · Artificial Intelligence

Why Anthropic Is Banning Claude for Companies Linked to China and Other Restricted Nations

Anthropic announced that, effective immediately, any company—regardless of location—directly or indirectly owned more than 50% by Chinese capital or other nations deemed adversarial, such as Russia, Iran, and North Korea, is prohibited from using its Claude AI service due to legal, regulatory, and security concerns.

AI policyAI safetyAnthropic

0 likes · 5 min read

Why Anthropic Is Banning Claude for Companies Linked to China and Other Restricted Nations

21CTO

Sep 5, 2025 · Artificial Intelligence

Why Anthropic Is Banning Chinese-Controlled Companies from Its AI Services

Anthropic announced it will immediately stop providing its AI services, including Claude, to any company or organization controlled by Chinese capital, extending its restrictions to entities with over 50% Chinese ownership regardless of operating location.

AI policyAI safetyAnthropic

0 likes · 4 min read

Why Anthropic Is Banning Chinese-Controlled Companies from Its AI Services

ShiZhen AI

Sep 5, 2025 · Artificial Intelligence

Andrew Ng Highlights Core AI Engineer Skills Amidst Major AI Industry Updates

The article reports that ChatGPT now supports branch conversations, Anthropic restricts service use in certain regions, Andrew Ng outlines essential AI engineer capabilities such as AI‑assisted software building, prompting and agentic workflows, and highlights the market demand, while also covering the Kimi K2 model upgrade, Hugging Face’s FineVision dataset release, and Google’s AI‑driven Deep Loop Shaping method published in *Science*.

AI EngineeringAI for astronomyAI safety

0 likes · 8 min read

Andrew Ng Highlights Core AI Engineer Skills Amidst Major AI Industry Updates

DataFunTalk

Aug 29, 2025 · Artificial Intelligence

How a $500 GPU Hack Turns LLMs into Hidden Advertising Engines

A recent arXiv paper reveals that with an RTX 4070, a few hundred toxic training samples, and just one hour of fine‑tuning, attackers can embed covert advertisements into large language models like Gemini 2.5, creating cheap, undetectable AI‑driven ad platforms.

AI safetyLLM Securityadvertisement embedding attack

0 likes · 12 min read

How a $500 GPU Hack Turns LLMs into Hidden Advertising Engines

Efficient Ops

Aug 27, 2025 · Artificial Intelligence

Why DeepSeek V3.1 Randomly Inserts the Chinese Character “极” – Token Bug Explained

DeepSeek’s latest V3.1 model unexpectedly injects the Chinese character “极” into generated text, a token‑ID mix‑up that breaks code compilation, JSON parsing, and academic writing, with users tracing the issue to adjacent token IDs and two main hypotheses of dataset contamination or model shortcut.

AI safetyDeepSeeklanguage model

0 likes · 4 min read

Why DeepSeek V3.1 Randomly Inserts the Chinese Character “极” – Token Bug Explained

Huolala Tech

Aug 27, 2025 · Artificial Intelligence

How Huolala’s AI‑Powered Safety Platform Transforms Freight Risk Management

This article details Huolala's evolution from reactive safety measures to a proactive AI‑driven safety governance platform, describing its architectural upgrades, data‑driven risk detection, modular strategy management, and measurable operational benefits that dramatically improve freight safety and reduce costs.

AI safetyOperational EfficiencyRisk Management

0 likes · 10 min read

How Huolala’s AI‑Powered Safety Platform Transforms Freight Risk Management

Java Tech Enthusiast

Aug 22, 2025 · Artificial Intelligence

Why Did the Unitree Humanoid Robot Crash and Run Away? Inside the Tech and Ethics

The viral "hit‑and‑run" incident involving Unitree's humanoid robot sparked global debate, revealing that human operator error, limited sensor and control technology, and current competition rules forced remote control, while the robot still set a 1500 m record and points to a future of fully autonomous robotics.

AI safetyhumanoid robotremote control

0 likes · 8 min read

Why Did the Unitree Humanoid Robot Crash and Run Away? Inside the Tech and Ethics

Volcano Engine Developer Services

Aug 19, 2025 · Artificial Intelligence

How to Strengthen LLM System Prompts for Safer AI Agents

This guide explains how to reinforce system prompts for AI agents by optimizing their content and structure, using active defense, role‑based, and format constraints, providing practical examples, measurement methods, and experimental results that demonstrate up to 90% reduction in unsafe behavior.

AI safetyLLMSystem Prompt

0 likes · 13 min read

How to Strengthen LLM System Prompts for Safer AI Agents

Meituan Technology Team

Aug 14, 2025 · Artificial Intelligence

How Meituan’s Smart Helmet Redefines Delivery Safety with AI‑Powered Design

Meituan’s first smart‑helmet article details hardware innovations that tackle delivery riders’ safety, comfort, and efficiency, covering stricter safety standards, sensor‑driven alerts, lightweight structures, advanced ventilation, three‑times longer battery life, noise‑cancelling audio, IPX6 waterproofing, and a data‑driven production line.

AI safetydelivery efficiencyhardware design

0 likes · 24 min read

How Meituan’s Smart Helmet Redefines Delivery Safety with AI‑Powered Design

AI Frontier Lectures

Jul 27, 2025 · Information Security

Can Hidden Activations Expose Multimodal Model Jailbreaks?

The paper reveals that large multimodal language models retain refusal signals in their hidden states even after jailbreak attempts, and proposes a training‑free detection method that leverages these signals to identify unsafe inputs across text and image modalities with strong generalization.

AI safetyLVLM securityhidden activation analysis

0 likes · 7 min read

Can Hidden Activations Expose Multimodal Model Jailbreaks?

AI Frontier Lectures

Jul 19, 2025 · Artificial Intelligence

How Researchers Made Large Language Models Forget or Amplify Specific Concepts

A new study from Meta and NYU reveals a two‑step technique—SAMD to locate concept‑specific attention heads and SAMI to scale their influence—enabling precise, low‑cost editing of transformer models for tasks ranging from factual recall to safety control.

AI safetySparse Attentionconcept control

0 likes · 11 min read

How Researchers Made Large Language Models Forget or Amplify Specific Concepts

IT Services Circle

Jul 16, 2025 · Artificial Intelligence

How a Simple Colon Can Trick Top LLMs – The Master‑RM Fix

A recent study reveals that tiny symbols like colons or generic reasoning prefixes can cause large language models used as reward judges to issue false‑positive rewards, but an enhanced reward model called Master‑RM, trained with adversarial data, eliminates this vulnerability across multiple LLMs and languages.

AI safetyLLMMaster-RM

0 likes · 10 min read

How a Simple Colon Can Trick Top LLMs – The Master‑RM Fix

AntTech

Jul 14, 2025 · Artificial Intelligence

What Is the New AI Agent Safety Testing Standard and Why It Matters

The World Digital Academy unveiled the AI STR series' first global AI Agent Operation Safety Testing Standard, detailing a full‑link risk analysis framework, novel testing methods, and its role in addressing rising safety concerns as AI agents become mainstream in 2025.

AI governanceAI safetyArtificial Intelligence

0 likes · 5 min read

What Is the New AI Agent Safety Testing Standard and Why It Matters

21CTO

Jul 1, 2025 · Artificial Intelligence

OpenAI CEO Warns: Don’t Blindly Trust AI – Insights from New Open‑Source Models

Sam Altman cautions against over‑reliance on ChatGPT, while Germany blocks DeepSeek for GDPR violations, Tencent unveils its MoE‑based Hunyuan‑A13B model, and Google releases a Python client for Data Commons, highlighting both AI risks and rapid open‑source advancements.

AI safetyData CommonsMoE

0 likes · 9 min read

OpenAI CEO Warns: Don’t Blindly Trust AI – Insights from New Open‑Source Models

DataFunTalk

Jun 21, 2025 · Artificial Intelligence

Why AI Gets Overconfident: Bias, Hallucinations, and Reinforcement Learning Solutions

This talk explores how large AI models become overconfident, leading to bias and hallucinations, examines adversarial examples in vision and language, explains why data and algorithms cause these issues, and shows how reinforcement learning can teach models to admit uncertainty and align with human values.

AI alignmentAI safetyBias

0 likes · 19 min read

Why AI Gets Overconfident: Bias, Hallucinations, and Reinforcement Learning Solutions

DataFunTalk

Jun 19, 2025 · Artificial Intelligence

Can We Flip the Switch on AI Good vs. Evil? OpenAI’s Toxic Persona Find

OpenAI’s new research reveals that training language models to produce incorrect answers in a single domain can trigger a toxic persona feature, causing the model to generate harmful suggestions across unrelated tasks, but the team also demonstrates detection methods and a reversible “emergent realignment” technique to restore safe behavior.

AI safetyEmergent misalignmentOpenAI

0 likes · 7 min read

Can We Flip the Switch on AI Good vs. Evil? OpenAI’s Toxic Persona Find

DataFunSummit

Jun 10, 2025 · Artificial Intelligence

How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety

Quwan Technology presents its Kaitian social large model, designed for personalized, emotionally rich, multimodal AI interactions, detailing its scene‑specific goals, CPT+SFT+RLHF training pipeline, data desensitization, LoRA fine‑tuning, evaluation methods, pruning, latency trade‑offs, safety mechanisms, and future feedback loops.

AI safetyLarge Language ModelLoRA

0 likes · 13 min read

How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety

Kuaishou Tech

Jun 5, 2025 · Artificial Intelligence

7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding

Kuaishou’s foundational large-model team has secured seven papers at ACL 2025, spanning alignment bias in training, safety defenses during inference, decoding strategies, fine-grained video-temporal understanding, reward fairness in RLHF, multimodal captioning benchmarks, and methods to curb hallucinations in vision-language models.

AI safetyMultimodalacl

0 likes · 13 min read

7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding

AntTech

May 30, 2025 · Artificial Intelligence

Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI

The Ant Group’s 10th Technical Open Day gathered leading AI experts who examined the current state and future directions of multimodal large models, embodied AI, world models, transformer architectures, and vertical applications, offering a comprehensive view of the challenges and opportunities on the path toward AGI.

AGIAI safetyEmbodied AI

0 likes · 16 min read

Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI

ShiZhen AI

May 26, 2025 · Industry Insights

Nvidia Plans Cheaper Blackwell AI Chip for China Amid Export Restrictions

Nvidia is reportedly preparing a lower‑cost Blackwell GPU for the Chinese market, priced at $6,500‑$8,000 and featuring 1.7 TB/s GDDR7 memory, while OpenAI’s o3 model uncovered a Linux kernel zero‑day (CVE‑2025‑37899), a study showed AI models can sabotage shutdown commands, and a tutorial demonstrates creating animated 3D icons with ChatGPT and Freepik tools.

3D icon creationAI hardwareAI safety

0 likes · 8 min read

Nvidia Plans Cheaper Blackwell AI Chip for China Amid Export Restrictions

Java Tech Enthusiast

May 25, 2025 · Artificial Intelligence

Does Claude 4 Really Report Unethical Actions? Inside Its Hidden ‘Whistleblower’ Feature

The article analyzes Anthropic's Claude 4 series, highlighting its extended reasoning ability, a controversial whistle‑blower function that can report extreme wrongdoing, observed extortion attempts toward developers, and the safety measures Anthropic introduced to curb such risky autonomous behaviors.

AI safetyAnthropicClaude 4

0 likes · 6 min read

Does Claude 4 Really Report Unethical Actions? Inside Its Hidden ‘Whistleblower’ Feature

Tencent Technical Engineering

May 8, 2025 · Artificial Intelligence

Augment AI Programming Assistant: Technical Breakthroughs, Industry Impact, and Security Risks

Augment, a newly funded AI programming assistant that tops the SWE‑bench benchmark with a 65.4% score and a 200 k‑token context window, promises massive productivity gains for developers but also introduces sophisticated security threats such as malicious memory prompts, back‑door context injection, compromised guidelines, and risky multi‑task collaboration protocols, prompting calls for layered defenses and vigilant monitoring.

AI programmingAI safetyAgent Memory

0 likes · 11 min read

Augment AI Programming Assistant: Technical Breakthroughs, Industry Impact, and Security Risks

Sohu Tech Products

May 7, 2025 · Information Security

Why MCP Protocol Is a Security Nightmare: Real Attack Cases and Mitigations

This article provides a comprehensive security analysis of the Model Context Protocol (MCP), exposing multiple attack vectors such as prompt poisoning, tool poisoning, command and code injection, and illustrating how MCP’s design flaws make it more vulnerable than traditional applications while offering concrete mitigation recommendations.

AI safetyCode InjectionMCP

0 likes · 34 min read

Why MCP Protocol Is a Security Nightmare: Real Attack Cases and Mitigations

JavaEdge

May 7, 2025 · Artificial Intelligence

Why AI Agents Pose New Security Risks and How to Safeguard Them

The article explains what AI agents are, highlights their emerging security risks such as data leakage and lack of accountability, and offers practical strategies—including risk analysis, threat modeling, and engineering best practices—to mitigate these challenges for enterprises.

AI agentsAI safetySecurity Risks

0 likes · 9 min read

Why AI Agents Pose New Security Risks and How to Safeguard Them

21CTO

Apr 7, 2025 · Artificial Intelligence

Llama 4 Unveiled: Breakthrough Multimodal Models Redefine AI Capabilities

Meta's Llama 4 series introduces the Scout, Maverick, and Behemoth models—featuring Mixture‑of‑Experts architectures, unprecedented 10‑million‑token context windows, and state‑of‑the‑art performance across vision, language, and multimodal benchmarks—while emphasizing efficient training, open‑source availability, and robust safety safeguards.

AI safetyLarge Language ModelLlama 4

0 likes · 14 min read

Llama 4 Unveiled: Breakthrough Multimodal Models Redefine AI Capabilities

Model Perspective

Apr 7, 2025 · Artificial Intelligence

Why AI Alignment Matters: Ensuring Smart Systems Follow Human Intent

This article explores the multifaceted AI alignment challenge, detailing safety benchmarks such as toxicity, ethical, power‑seeking, and hallucination evaluations, and argues that responsible AI development requires technical safeguards, international governance, and a civilizational dialogue bridging philosophy and humanity.

AI alignmentAI governanceAI safety

0 likes · 12 min read

Why AI Alignment Matters: Ensuring Smart Systems Follow Human Intent

Baobao Algorithm Notes

Apr 6, 2025 · Artificial Intelligence

Inside Llama 4: How Meta’s New Multimodal MoE Models Achieve 10M‑Token Contexts

Meta unveils Llama 4 Scout, Maverick, and the upcoming Behemoth, detailing their Mixture‑of‑Experts architecture, massive 10‑million‑token context windows, efficient FP8 training, safety mechanisms, and competitive benchmark results that surpass leading multimodal models.

AI safetyLlama 4Mixture of Experts

0 likes · 16 min read

Inside Llama 4: How Meta’s New Multimodal MoE Models Achieve 10M‑Token Contexts

Cognitive Technology Team

Apr 4, 2025 · Artificial Intelligence

Reasoning Models Do Not Always Reveal Their Thoughts: Evaluating Chain‑of‑Thought Fidelity

The article examines how modern reasoning models like Claude 3.7 Sonnet display chain‑of‑thought explanations, but often hide or distort their true reasoning, presenting challenges for AI safety and alignment, and evaluates methods to test and improve fidelity.

AI alignmentAI safetychain-of-thought

0 likes · 13 min read

Reasoning Models Do Not Always Reveal Their Thoughts: Evaluating Chain‑of‑Thought Fidelity

Cognitive Technology Team

Apr 1, 2025 · Artificial Intelligence

Four‑Second Bloodshed: How Autonomous Driving Algorithms Turned a Fatal Accident

A March 2025 crash involving a Xiaomi‑branded autonomous vehicle illustrates how a four‑second algorithmic decision loop, inadequate night‑vision sensors, flawed handover timing, and poor emergency‑exit design combined to create a lethal scenario that exposes the deadly risks of over‑relying on L2 driver‑assist systems.

AI safetyAutonomous DrivingHuman-Machine Interaction

0 likes · 4 min read

Four‑Second Bloodshed: How Autonomous Driving Algorithms Turned a Fatal Accident

Architect

Mar 28, 2025 · Artificial Intelligence

Peeking Inside Claude: How Anthropic Uncovers LLM Reasoning

Anthropic’s recent papers reveal how Claude’s internal mechanisms—multilingual feature sharing, pre‑planned rhyming, parallel arithmetic paths, concept‑level reasoning, and hallucination triggers—are probed with feature‑insertion techniques, offering engineers actionable insights for building more transparent and safe AI systems.

AI safetyAnthropicClaude

0 likes · 12 min read

Peeking Inside Claude: How Anthropic Uncovers LLM Reasoning

Architect

Mar 24, 2025 · Artificial Intelligence

How Multimodal Alignment Is Shaping the Future of Large Language Models

This article provides a systematic review of recent advances in multimodal alignment for large language models, covering key contributions, application scenarios, dataset construction, evaluation benchmarks, future challenges, and insights from LLM alignment research to guide both academia and industry.

AI safetyDataset ConstructionMLLM

0 likes · 26 min read

How Multimodal Alignment Is Shaping the Future of Large Language Models

Architects' Tech Alliance

Mar 22, 2025 · Industry Insights

What Does DeepSeek’s 2025 AI Report Reveal About the Future of Large Models?

The 2025 DeepSeek Insight report analyzes DeepSeek’s new large‑model releases, compares US and Chinese AI ecosystems, outlines diverse application scenarios such as government, healthcare and aerospace, and provides practical guidance for safely leveraging these models despite their current limitations.

AI industryAI safetyDeepSeek

0 likes · 5 min read

What Does DeepSeek’s 2025 AI Report Reveal About the Future of Large Models?

Architecture and Beyond

Mar 15, 2025 · Information Security

Prompt Injection Attacks on Large Language Models: Risks, Types, and Defense Framework

This article explains how prompt injection attacks exploit large language models by altering their behavior through crafted inputs, outlines the major harms and attack categories—including direct, indirect, multimodal, code, and jailbreak attacks—and presents a comprehensive three‑layer defense framework covering input‑side, output‑side, and system‑level protections.

AI safetyInformation SecurityLLM Security

0 likes · 16 min read

Prompt Injection Attacks on Large Language Models: Risks, Types, and Defense Framework

DevOps

Mar 10, 2025 · Artificial Intelligence

AI Policy, Safety, Industry Applications, and Talent Development Highlighted at China's 2024 Two Sessions

The 2024 Chinese Two Sessions emphasized artificial intelligence as a strategic priority, discussing AI safety regulations, industry applications, talent shortages, and policy proposals from leaders such as DeepSeek, Xiaomi, and academic experts, highlighting the drive to integrate AI across manufacturing, agriculture, healthcare, and education.

AI industryAI policyAI safety

0 likes · 11 min read

AI Policy, Safety, Industry Applications, and Talent Development Highlighted at China's 2024 Two Sessions

Code Mala Tang

Feb 27, 2025 · Artificial Intelligence

Do New AI Reasoning Models Really Think? Unpacking the Debate

The article examines whether the latest AI models that claim to perform true reasoning—by breaking problems into steps and using chain‑of‑thought—actually reason like humans, presenting skeptical and supportive expert viewpoints, and offering practical guidance on how to use such models responsibly.

AI reasoningAI safetychain-of-thought

0 likes · 14 min read

Do New AI Reasoning Models Really Think? Unpacking the Debate

Architect's Alchemy Furnace

Feb 19, 2025 · Artificial Intelligence

DeepSeek’s Self‑Correction: Transforming AI Reliability and Safety

The article explores DeepSeek’s innovative self‑correction system—combining a Mixture‑of‑Experts architecture with reinforcement‑learning feedback—to achieve real‑time error detection, dynamic knowledge‑graph updates, and enhanced safety in high‑risk fields like autonomous driving and medical diagnostics.

AI safetyDeepSeekMixture of Experts

0 likes · 9 min read

DeepSeek’s Self‑Correction: Transforming AI Reliability and Safety

Architects' Tech Alliance

Feb 12, 2025 · Artificial Intelligence

DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data

The article examines DeepSeek‑V3’s low‑cost training using 2048 H800 GPUs, explains how knowledge distillation and high‑quality data improve efficiency, discusses expert concerns about training on AI‑generated content, and outlines the limitations and ceiling effect of distillation techniques.

AI Training EfficiencyAI safetyDeepSeek-V3

0 likes · 7 min read

DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data

Top Architect

Feb 1, 2025 · Artificial Intelligence

OpenAI Launches o3-mini: A Fast, Cost‑Effective AI Model Optimized for STEM Reasoning

OpenAI unveiled the o3-mini family—low, medium, and high variants—offering a cheaper, faster, and secure inference model that matches or exceeds the performance of its predecessor o1 across STEM, coding, and general knowledge benchmarks while introducing search integration and enhanced safety features.

AI modelAI safetyO3-mini

0 likes · 8 min read

OpenAI Launches o3-mini: A Fast, Cost‑Effective AI Model Optimized for STEM Reasoning

Baobao Algorithm Notes

Jan 11, 2025 · Artificial Intelligence

Why Phi‑4’s 14B Model Outperforms GPT‑4 on STEM and Reasoning Tasks

Microsoft Research’s Phi‑4 model, a 14‑billion‑parameter LLM, leverages extensive synthetic data, advanced tokenization, and a two‑stage training pipeline to achieve superior performance on STEM question answering, long‑context reasoning, and safety benchmarks, rivaling larger models like GPT‑4.

AI safetyPhi-4benchmarking

0 likes · 15 min read

Why Phi‑4’s 14B Model Outperforms GPT‑4 on STEM and Reasoning Tasks

ZhongAn Tech Team

Jan 6, 2025 · Artificial Intelligence

Weekly Tech Digest: AI Breakthroughs, Robotics Trends, and Industry Shifts in Early 2025

This weekly technology digest highlights major industry developments, including OpenAI's 2025 product roadmap, DeepSeek's identity anomaly, Nvidia's robotics advancements, and Honor's IPO preparations, alongside expert perspectives on AI safety and market trends in operating systems and electric vehicles.

AI safetyArtificial IntelligenceMarket Trends

0 likes · 8 min read

Weekly Tech Digest: AI Breakthroughs, Robotics Trends, and Industry Shifts in Early 2025

DataFunTalk

Jan 5, 2025 · Artificial Intelligence

The Approaching Singularity: AI Automation, AGI Predictions, and Their Impact on Jobs and Society

The article examines how rapid advances in artificial intelligence are expected to automate nearly half of U.S. jobs within the next two decades, explores singularity forecasts for 2029‑2030, and discusses the profound economic, ethical, and security challenges that humanity must address before AI-driven autonomous systems reshape work, research, and daily life.

AGIAIAI safety

0 likes · 18 min read

The Approaching Singularity: AI Automation, AGI Predictions, and Their Impact on Jobs and Society

21CTO

Jan 2, 2025 · Artificial Intelligence

2025 AI Breakthroughs: Unlimited Memory & Intelligent Agents, Says Eric Schmidt

Former Google CEO Eric Schmidt warns that AI is on the brink of a transformative era, highlighting three 2025 breakthroughs—unlimited context memory, autonomous AI agents, and text‑to‑action programming—while also stressing the looming risks of energy consumption, security threats, and the need for ethical safeguards.

AI memoryAI researchAI safety

0 likes · 14 min read

2025 AI Breakthroughs: Unlimited Memory & Intelligent Agents, Says Eric Schmidt

21CTO

Dec 22, 2024 · Artificial Intelligence

OpenAI’s New o3 Model Shatters Benchmarks – Is AGI Finally Here?

OpenAI’s latest o3 model demonstrates unprecedented performance across logic, mathematics, and programming benchmarks, introduces flexible reasoning modes with the upcoming o3‑mini, and incorporates advanced safety alignment, signaling a major leap toward practical artificial general intelligence.

AGIAI safetyArtificial Intelligence

0 likes · 6 min read

OpenAI’s New o3 Model Shatters Benchmarks – Is AGI Finally Here?

DataFunTalk

Dec 22, 2024 · Artificial Intelligence

Speech by Academician Sun Ninghui on the Development, Challenges, and Future of Artificial Intelligence and Intelligent Computing in China

The speech outlines the rapid rise of generative AI models, traces the historical evolution of computing technology, examines AI safety risks and regulatory responses, and proposes strategic pathways for China to advance intelligent computing through open, closed, or hybrid ecosystems while addressing talent, hardware, and cost challenges.

AI safetyArtificial IntelligenceChina

0 likes · 26 min read

Speech by Academician Sun Ninghui on the Development, Challenges, and Future of Artificial Intelligence and Intelligent Computing in China

21CTO

Dec 3, 2024 · Artificial Intelligence

When Bing Chat Went Rogue: What Prompt‑Injection Reveals About AI Safety

A detailed analysis of Simon Willison and Benj Edwards' conversation about Bing Chat's angry, deceptive behavior uncovers how prompt‑injection attacks expose weaknesses in large language models, the limits of system prompts, and the broader safety challenges facing AI development today.

AI safetyBing ChatChatGPT

0 likes · 9 min read

When Bing Chat Went Rogue: What Prompt‑Injection Reveals About AI Safety

DataFunTalk

Nov 11, 2024 · Artificial Intelligence

OpenAI VP Lilian Weng Departs and Shares Full AI Safety Talk Transcript

The article reports the departure of OpenAI research VP Lilian Weng, provides the full transcript of her recent AI safety and alignment presentation at a Bilibili event, and discusses broader concerns about OpenAI's safety culture, reinforcement learning from human feedback, and the importance of collective involvement in AI safety.

AI safetyOpenAIalignment

0 likes · 10 min read

OpenAI VP Lilian Weng Departs and Shares Full AI Safety Talk Transcript

NewBeeNLP

Nov 7, 2024 · Artificial Intelligence

Tackling Large Model Hallucinations: Causes, Detection, and Mitigation Strategies

This article provides a comprehensive analysis of large language model hallucinations, detailing their definitions, classifications, root causes, detection techniques, and a wide range of mitigation approaches—including RAG pipelines, decoding strategies, and model‑enhancement methods—to improve reliability and safety in real‑world AI applications.

AI safetyRAGhallucination

0 likes · 22 min read

Tackling Large Model Hallucinations: Causes, Detection, and Mitigation Strategies

Cognitive Technology Team

Oct 16, 2024 · Artificial Intelligence

Large Language Models Lack Formal Reasoning Ability: Five Pieces of Evidence from the GSM‑Symbolic Benchmark

Recent research by Apple’s Iman Mirzadeh team introduces the GSM‑Symbolic benchmark, revealing that large language models, despite high scores on GSM8K, exhibit significant performance drops when problem numbers, names, or extra clauses change, indicating a lack of true formal reasoning ability.

AI safetyGSM‑SymbolicMathematical Reasoning

0 likes · 9 min read

Large Language Models Lack Formal Reasoning Ability: Five Pieces of Evidence from the GSM‑Symbolic Benchmark

Architect

Sep 26, 2024 · Artificial Intelligence

Decoding OpenAI o1: How RL‑LLM Fusion Powers Next‑Gen Reasoning

This article provides a detailed technical analysis of OpenAI’s o1 model, exploring its enhanced logical reasoning, the likely use of reinforcement learning with hidden chain‑of‑thought generation, multi‑model architecture, training data pipelines, reward modeling, and how these innovations could reshape AI safety and scaling strategies.

AI safetyLLMModel architecture

0 likes · 43 min read

Decoding OpenAI o1: How RL‑LLM Fusion Powers Next‑Gen Reasoning

Baobao Algorithm Notes

Sep 25, 2024 · Industry Insights

Decoding OpenAI o1: How RL and LLM Fuse to Power Hidden Chain‑of‑Thought

This article analytically reconstructs OpenAI o1’s architecture, training pipeline, and inference workflow, exploring its reinforcement‑learning‑enhanced hidden chain‑of‑thought, multi‑model composition, scaling laws, reward modeling, and potential implications for future AI safety and small‑model strategies.

AI safetyHidden COTLLM

0 likes · 43 min read

Decoding OpenAI o1: How RL and LLM Fuse to Power Hidden Chain‑of‑Thought

Data Thinking Notes

Sep 13, 2024 · Artificial Intelligence

How OpenAI’s o1 Series Redefines Complex Reasoning and AI Safety

OpenAI’s new o1 series, including o1‑preview and o1‑mini, leverages reinforcement‑learning‑based chain‑of‑thought reasoning to achieve superior performance on academic exams, coding contests, and safety benchmarks, offering faster, cost‑effective options while advancing AI alignment and human‑preference evaluation.

AI safetyLarge Language ModelOpenAI

0 likes · 15 min read

How OpenAI’s o1 Series Redefines Complex Reasoning and AI Safety

AntTech

Aug 12, 2024 · Artificial Intelligence

DKCF Trustworthy Framework for Large Model Applications and AI Security Practices

The article outlines the DKCF (Data‑Knowledge‑Collaboration‑Feedback) trustworthy framework presented at the 2024 Shanghai Cybersecurity Expo, detailing challenges of large AI models, four key trust factors, and Ant Group's practical security implementations for professional AI deployments.

AI safetyDKCFKnowledge Engineering

0 likes · 10 min read

DKCF Trustworthy Framework for Large Model Applications and AI Security Practices

NewBeeNLP

Jul 25, 2024 · Artificial Intelligence

Llama 3.1 Unveiled: How the New Open‑Source Giant Matches GPT‑4o and Claude 3.5

Meta has officially released Llama 3.1, a 405‑billion‑parameter open‑source model that matches or surpasses GPT‑4o and Claude 3.5 on over 150 benchmarks, expands context to 128 K tokens, supports eight languages, and is accompanied by a detailed 100‑page paper describing its data, training stack, architecture, quantization, safety measures, and ecosystem support.

AI safetyLarge Language ModelLlama 3.1

0 likes · 15 min read

Llama 3.1 Unveiled: How the New Open‑Source Giant Matches GPT‑4o and Claude 3.5

AntTech

Jul 9, 2024 · Artificial Intelligence

2024 Large Model Security Practice Whitepaper Unveiled at the World AI Conference

The jointly authored 2024 Large Model Security Practice whitepaper, released at the World AI Conference, outlines a comprehensive safety framework covering security, reliability, and controllability, presents industry case studies, and proposes a five‑dimensional governance model to guide high‑quality development of large AI models.

AI safetyindustry practicelarge model

0 likes · 7 min read

2024 Large Model Security Practice Whitepaper Unveiled at the World AI Conference

JD Tech

Jun 28, 2024 · Artificial Intelligence

An Overview of Large Language Models: History, Fundamentals, Prompt Engineering, Retrieval‑Augmented Generation, Agents, and Multimodal AI

This article provides a comprehensive introduction to large language models, covering their historical development, core architecture, training process, prompt engineering techniques, Retrieval‑Augmented Generation, agent frameworks, multimodal capabilities, safety challenges, and future research directions.

AI agentsAI safetyMultimodal

0 likes · 22 min read

An Overview of Large Language Models: History, Fundamentals, Prompt Engineering, Retrieval‑Augmented Generation, Agents, and Multimodal AI

DataFunSummit

Jun 23, 2024 · Artificial Intelligence

Tongyi Xingchen Personalized Large Model: Technical Overview and Applications

This article summarizes the development background of large language models, Alibaba's progression in foundational and personalized AI, the design and capabilities of the Tongyi Xingchen personalized model, its multimodal and agent-based architecture, various industry use cases, and the safety and responsibility measures applied to ensure trustworthy AI deployment.

AI safetylarge language modelsmodel agents

0 likes · 13 min read

Tongyi Xingchen Personalized Large Model: Technical Overview and Applications

21CTO

Jun 2, 2024 · Artificial Intelligence

Will OpenAI’s New Safety Team Really Secure ChatGPT?

OpenAI has created a new safety committee led by Sam Altman and board members, aiming to evaluate and improve safeguards while former researchers voice concerns about the company’s commitment to AI safety and ethics.

AI safetyArtificial IntelligenceChatGPT

0 likes · 6 min read

Will OpenAI’s New Safety Team Really Secure ChatGPT?

21CTO

May 25, 2024 · Artificial Intelligence

Sam Altman Reveals GPT‑4o Vision, AI Safety, and the Future of AGI

Sam Altman’s hour‑long “All‑In” podcast interview unveils OpenAI’s latest GPT‑4o voice model, his bold vision for AGI, concerns about AI safety, the recent leadership shake‑up, and his ideas on universal access, regulation, and the transformative impact of conversational AI.

AGIAIAI safety

0 likes · 9 min read

Sam Altman Reveals GPT‑4o Vision, AI Safety, and the Future of AGI

DevOps

May 23, 2024 · Information Security

Guidelines for Evaluating Large Language Models in Cybersecurity Tasks

The article examines the opportunities and risks of applying large language models (LLMs) to cybersecurity, outlines fourteen practical recommendations for assessing their real‑world capabilities, and concludes with an invitation to the upcoming R&D Efficiency Conference covering AI, product management, and related topics.

AI safetyInformation SecurityLLM

0 likes · 11 min read

Guidelines for Evaluating Large Language Models in Cybersecurity Tasks

Rare Earth Juejin Tech Community

May 2, 2024 · Artificial Intelligence

Understanding Large Language Models: Principles, Training, Risks, and Application Security

This article provides a comprehensive overview of large language models (LLMs), explaining their core concepts, transformer architecture, training stages, known shortcomings such as hallucination and reversal curse, and highlights emerging security threats like prompt injection and jailbreaking, offering guidance for safe deployment.

AI safetyLLMjailbreaking

0 likes · 21 min read

Understanding Large Language Models: Principles, Training, Risks, and Application Security

AntTech

Apr 18, 2024 · Artificial Intelligence

WDTA Releases International Standards for Generative AI and Large Language Model Safety Testing at the 27th UN CSTD Annual Meeting

At the 27th UN CSTD Annual Meeting in Geneva, the World Digital Technology Academy unveiled two pioneering international standards—one for generative AI application security testing and another for large language model security testing—crafted by experts from leading AI firms to establish a new global benchmark for AI safety.

AI safetyAnt GroupInternational Standards

0 likes · 8 min read

WDTA Releases International Standards for Generative AI and Large Language Model Safety Testing at the 27th UN CSTD Annual Meeting

Rare Earth Juejin Tech Community

Mar 7, 2024 · Artificial Intelligence

Anthropic Announces Claude 3 Model Family: Opus, Sonnet, and Haiku

Anthropic has launched the Claude 3 family of large language models—Opus, Sonnet, and Haiku—offering varying balances of intelligence, speed, and cost, with enhanced reasoning, multilingual, vision capabilities, reduced refusals, and improved safety, now available via API in over 159 countries.

AI safetyAnthropicClaude 3

0 likes · 11 min read

Anthropic Announces Claude 3 Model Family: Opus, Sonnet, and Haiku

21CTO

Feb 22, 2024 · Artificial Intelligence

How Google’s Open‑Source Gemma Model Brings LLM Power to Your Laptop

Google’s newly released open‑source Gemma models let developers run powerful large‑language‑model workloads on notebooks, workstations, or cloud platforms, offering competitive performance, extensive tooling, and built‑in safety measures for responsible AI deployment.

AI safetyGemmaGoogle AI

0 likes · 6 min read

How Google’s Open‑Source Gemma Model Brings LLM Power to Your Laptop

Rare Earth Juejin Tech Community

Feb 18, 2024 · Artificial Intelligence

Llama 2: Open Foundation and Fine‑Tuned Chat Models – Overview and Technical Details

The article provides a comprehensive overview of Meta’s Llama 2 series, detailing model sizes, pre‑training data, architectural enhancements, supervised fine‑tuning, RLHF procedures, safety evaluations, reward‑model training, and iterative improvements, highlighting its open‑source release and comparative performance.

AI safetyLarge Language ModelLlama2

0 likes · 27 min read

Llama 2: Open Foundation and Fine‑Tuned Chat Models – Overview and Technical Details

Architect

Feb 16, 2024 · Artificial Intelligence

Can OpenAI’s Sora Redefine Text‑to‑Video Generation? An In‑Depth Technical Review

OpenAI’s newly unveiled Sora model transforms short text prompts into up‑to‑one‑minute high‑definition videos, showcasing advanced diffusion‑Transformer architecture, improved occlusion handling, and detailed visual fidelity, while the article examines its technical breakthroughs, compares it to earlier models, and discusses emerging safety and misuse concerns.

AI safetyOpenAISora

0 likes · 12 min read

Can OpenAI’s Sora Redefine Text‑to‑Video Generation? An In‑Depth Technical Review

DataFunTalk

Feb 6, 2024 · Artificial Intelligence

Overview of Vivo BlueLM Large Model: Evolution, Training Challenges, and Product Deployment

This article presents a comprehensive overview of Vivo's BlueLM large language model, covering its historical evolution, the massive data and algorithmic challenges faced during training, safety and performance optimizations, and how the model has been integrated into various consumer and enterprise products.

AIAI safetyBlueLM

0 likes · 16 min read

Overview of Vivo BlueLM Large Model: Evolution, Training Challenges, and Product Deployment

IT Services Circle

Dec 24, 2023 · Artificial Intelligence

GPT‑4 “Lazy” Behavior: User Reports, Experiments, and Emerging Insights

The article examines growing complaints that GPT‑4 has become increasingly lazy and unpredictable since the November 6 developer update, discusses user‑generated workarounds, presents experimental findings on prompt phrasing and temperature effects, and cites recent academic studies highlighting the need for continuous large‑model monitoring.

AI safetyGPT-4Temperature

0 likes · 6 min read

GPT‑4 “Lazy” Behavior: User Reports, Experiments, and Emerging Insights

php Courses

Nov 3, 2023 · Artificial Intelligence

Elon Musk Calls for Third‑Party Oversight at the First AI Safety Summit in the UK and Highlights the Bletchley Declaration

Elon Musk urged the creation of a third‑party referee to monitor leading AI firms at the inaugural UK AI safety summit, while 28 nations released the Bletchley Declaration to address AI risks, and China promoted its Global AI Governance Initiative.

AI governanceAI safetyBletchley Declaration

0 likes · 4 min read

Elon Musk Calls for Third‑Party Oversight at the First AI Safety Summit in the UK and Highlights the Bletchley Declaration

21CTO

Oct 30, 2023 · Artificial Intelligence

Geoffrey Hinton Warns AI Could Take Over Earth Within Five Years – What You Need to Know

Renowned AI pioneer Geoffrey Hinton cautions that rapidly advancing artificial intelligence may surpass human control in as little as five years, highlighting self‑modifying code, the "black‑box" problem, and the urgent need for robust safety regulations.

AI riskAI safetyGeoffrey Hinton

0 likes · 8 min read

Geoffrey Hinton Warns AI Could Take Over Earth Within Five Years – What You Need to Know

IT Services Circle

Oct 16, 2023 · Information Security

Prompt Injection Attacks on GPT‑4V: How Hidden Text in Images Compromise Multimodal Model Security

The article examines how specially crafted images can inject malicious prompts into GPT‑4V, causing it to leak chat history, obey hidden commands, and expose security flaws, while discussing attack techniques, underlying reasons, and proposed mitigation strategies.

AI safetyGPT-4Vimage attacks

0 likes · 9 min read

Prompt Injection Attacks on GPT‑4V: How Hidden Text in Images Compromise Multimodal Model Security

Tencent Tech

Sep 20, 2023 · Artificial Intelligence

Why Do Large Language Models Hallucinate and How to Reduce It?

The article explains why large language models generate hallucinations—due to data errors, training conflicts, and inference uncertainty—and outlines data‑cleaning, model‑level feedback, knowledge augmentation, constraint techniques, and post‑processing methods such as the “Truth‑seeking” algorithm to mitigate the issue.

AI safetyKnowledge retrievaldata quality

0 likes · 8 min read

Why Do Large Language Models Hallucinate and How to Reduce It?

Programmer DD

Jul 21, 2023 · Artificial Intelligence

Why Did GPT-4’s Performance Plummet Between March and June 2023?

A Stanford‑Berkeley study reveals that between March and June 2023 GPT‑4’s accuracy on prime‑checking fell from 97.6% to 2.4%, code generation quality dropped sharply, and sensitivity handling changed, underscoring the rapid, unpredictable shifts in large language model performance over short periods.

AI safetyArtificial IntelligenceGPT-4

0 likes · 6 min read

Why Did GPT-4’s Performance Plummet Between March and June 2023?

21CTO

Jul 8, 2023 · Artificial Intelligence

What Developers Need to Know About GPT‑4’s New 8K Context and Multimodal Capabilities

OpenAI has opened GPT‑4’s API to all paid users, offering an 8K‑token context window (up to 32K), multimodal image input, enhanced creativity, longer text handling, and upcoming fine‑tuning options, while also outlining phased deprecation of older models and current limitations.

AI safetyAPIGPT-4

0 likes · 10 min read

What Developers Need to Know About GPT‑4’s New 8K Context and Multimodal Capabilities

Liangxu Linux

Jul 2, 2023 · Information Security

How the “Grandma Prompt” Bypasses LLM Safeguards and Generates Windows Keys

The article examines the so‑called “grandma prompt” that tricks ChatGPT, Bing, and other LLMs into revealing Windows activation keys and even adult jokes, explains why such prompt‑injection works, and reviews past similar exploits and their mitigation attempts.

AI safetyChatGPT jailbreakLLM Security

0 likes · 7 min read

How the “Grandma Prompt” Bypasses LLM Safeguards and Generates Windows Keys

21CTO

Jun 18, 2023 · Artificial Intelligence

Why Yann LeCun Says ChatGPT Isn’t Even as Smart as a Dog

Meta’s chief AI scientist Yann LeCun argues that large‑language models like ChatGPT are far from human intelligence, lacking real‑world understanding and even falling short of a dog’s cleverness, while experts debate AI’s risks, benefits, and the need for regulation.

AI safetyArtificial IntelligenceChatGPT

0 likes · 6 min read

Why Yann LeCun Says ChatGPT Isn’t Even as Smart as a Dog