Tagged articles
777 articles
Page 1 of 8
Old Zhang's AI Learning
Old Zhang's AI Learning
May 31, 2026 · Artificial Intelligence

Qwen3.6-35B-A3B NVFP4: A Stable, Highly Compressed Quantized Model

NVIDIA's NVFP4 quantization reduces Qwen3.6-35B-A3B's memory footprint by threefold with almost no accuracy loss, offers plug‑and‑play deployment via vLLM, and outperforms other 4‑bit formats on Hopper/Blackwell GPUs, making it a practical choice for production AI workloads.

MoENVFP4Qwen3.6-35B-A3B
0 likes · 13 min read
Qwen3.6-35B-A3B NVFP4: A Stable, Highly Compressed Quantized Model
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 30, 2026 · Artificial Intelligence

Breaking the Agent Training Bottleneck: Open‑Source ClawGym Data, Training, and Evaluation Pipeline

ClawGym provides a complete open‑source framework for Claw‑style personal agents, linking a 13.5 K synthetic task dataset, black‑box rollout training, sandbox‑parallel reinforcement learning, and a rigorously verified benchmark of 200 tasks, and demonstrates that synthetic data can lift a 30 B model beyond a 235 B baseline.

ClawGymOpenClawagent training
0 likes · 16 min read
Breaking the Agent Training Bottleneck: Open‑Source ClawGym Data, Training, and Evaluation Pipeline
SuanNi
SuanNi
May 30, 2026 · Artificial Intelligence

Step 3.7 Flash: High‑Efficiency Pro‑Level Agent Model with 400 TPS and Low Cost

Step 3.7 Flash is a 196B‑parameter, 11B‑activation multimodal agent model that delivers 400 TPS inference, superior code‑generation and cross‑framework stability, cost‑effective Advisor Mode, and strong vision and search performance, with extensive benchmark gains over its predecessor and competing models.

AI agentAdvisor ModeMultimodal
0 likes · 12 min read
Step 3.7 Flash: High‑Efficiency Pro‑Level Agent Model with 400 TPS and Low Cost
Machine Heart
Machine Heart
May 30, 2026 · Artificial Intelligence

Can MIT’s Attention Matching Cut LLM Memory 50× Without Accuracy Loss?

MIT researchers introduce Attention Matching, a latent‑space KV‑cache compaction technique that reduces large‑language‑model memory usage up to 50‑fold with negligible precision loss, outperforming token‑pruning, summarization, and prior compaction methods across benchmarks like QuALITY, LongHealth, and AIME‑2025.

Attention MatchingKV CacheLLM
0 likes · 13 min read
Can MIT’s Attention Matching Cut LLM Memory 50× Without Accuracy Loss?
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 29, 2026 · Artificial Intelligence

Claude Opus 4.8 Surpasses Mythos in Key Tasks and Enables Hundreds of Parallel Agents

Claude Opus 4.8, released just 43 days after 4.7, improves honesty, cuts code‑defect miss rates to a quarter, reduces over‑confident answers, outperforms Mythos on several benchmarks, and introduces Dynamic Workflows that let hundreds of sub‑agents run in parallel for complex tasks.

AI modelClaude Opus 4.8benchmark
0 likes · 8 min read
Claude Opus 4.8 Surpasses Mythos in Key Tasks and Enables Hundreds of Parallel Agents
SuanNi
SuanNi
May 29, 2026 · Artificial Intelligence

SenseNova-U1-8B-MoT-Infographic: Academic Charts, Posters, Recipes

The SenseNova-U1-8B-MoT-Infographic model dramatically improves AI‑generated infographics by enhancing dense‑text rendering, layout stability, and chart accuracy through targeted data, extended mid‑training, and reinforcement‑learning fine‑tuning, achieving top scores on BizGenEval and IGenBench and surpassing many commercial rivals.

AI modelMultimodalSenseNova
0 likes · 9 min read
SenseNova-U1-8B-MoT-Infographic: Academic Charts, Posters, Recipes
Machine Heart
Machine Heart
May 29, 2026 · Artificial Intelligence

Why Vendors Bet on Step 3.7 Flash: An Agent‑Optimized Model for High‑Cost AI

Step 3.7 Flash is an open‑source, sparse‑MoE flash model built for real‑world Agent workflows, offering 11 B active parameters, 400 TPS, 256 K context, multimodal perception and tool use, and achieves top‑tier scores on benchmarks such as ClawEval‑1.1, Toolathlon and SimpleVQA, while dramatically reducing token‑costs that have plagued large‑scale AI deployments.

AgentCostFlash
0 likes · 10 min read
Why Vendors Bet on Step 3.7 Flash: An Agent‑Optimized Model for High‑Cost AI
Machine Heart
Machine Heart
May 28, 2026 · Artificial Intelligence

Can a Pre‑trained Embodied Model Work Out‑of‑the‑Box? New Chinese Open‑Source VLA Model Shows Yes

The newly open‑sourced Wall‑OSS‑0.5 VLA model demonstrates that a large‑scale pre‑trained embodied robot brain can achieve strong zero‑shot performance on 17 real‑world tasks, exhibit staircase emergence with longer pre‑training, and far surpass the industry baseline after fine‑tuning, while also revealing current precision limits.

Embodied AIVLAbenchmark
0 likes · 15 min read
Can a Pre‑trained Embodied Model Work Out‑of‑the‑Box? New Chinese Open‑Source VLA Model Shows Yes
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 28, 2026 · Artificial Intelligence

Open‑Source 35B Intern‑S2‑Preview Rivals Trillion‑Parameter Models on Scientific Benchmarks

The open‑source 35‑billion‑parameter Intern‑S2‑Preview model achieves scientific‑task performance comparable to trillion‑parameter models, thanks to full‑link “general‑specialized” training, reinforced‑learning scaling, and hardware‑aware optimizations, and it outperforms leading closed‑source models on benchmarks such as MolecularIQ and crystal‑structure generation.

InternLMLarge Language ModelOpen Source
0 likes · 11 min read
Open‑Source 35B Intern‑S2‑Preview Rivals Trillion‑Parameter Models on Scientific Benchmarks
Architects' Tech Alliance
Architects' Tech Alliance
May 27, 2026 · Industry Insights

Nvidia Vera CPU Smashes Intel and AMD x86 Titans in AI Workloads

Nvidia's Vera, an 88‑core custom ARM CPU designed for AI agents, delivers up to 55% higher overall performance than Intel Xeon 6980P, 10% over AMD EPYC 9575F and 63% over Nvidia Grace, while offering 1.2 TB/s LPDDR5X bandwidth, 500 W power envelope and a single‑chip design that could reshape the server CPU market.

AI serverARM CPULPDDR5X
0 likes · 10 min read
Nvidia Vera CPU Smashes Intel and AMD x86 Titans in AI Workloads
ShiZhen AI
ShiZhen AI
May 27, 2026 · Artificial Intelligence

Turning Click‑Based Web Agents into Repeatable Scripts with Microsoft’s Open‑Source Webwright

Microsoft’s open‑source Webwright framework redefines browser agents by replacing step‑by‑step click actions with generated Playwright scripts, enabling repeatable, debuggable web tasks; the article details its architecture, workflow, benchmark results on Online‑Mind2Web and Odysseys, and discusses practical benefits and limitations.

GPT-5.4LLM agentsMicrosoft
0 likes · 9 min read
Turning Click‑Based Web Agents into Repeatable Scripts with Microsoft’s Open‑Source Webwright
Machine Heart
Machine Heart
May 27, 2026 · Artificial Intelligence

RoboMemArena: A Comprehensive Benchmark that Truly Tests Robot Memory for Embodied AI

RoboMemArena introduces a systematic, long‑horizon robot memory benchmark with 26 tasks, 151 sub‑tasks, multimodal annotations, and real‑robot evaluations, exposing the limitations of existing benchmarks and demonstrating that the dual‑system PrediMem model markedly outperforms baselines both in simulation and on physical robots.

Embodied AIPrediMemRoboMemArena
0 likes · 9 min read
RoboMemArena: A Comprehensive Benchmark that Truly Tests Robot Memory for Embodied AI
SuanNi
SuanNi
May 26, 2026 · Artificial Intelligence

Why Tokens Are Burning Out and a Free Claude Opus 4.6‑Level Model Is Coming

The SkyClaw‑v1.0 model from Skywork AI offers a free, soon‑to‑be open‑source large‑language model for agent applications that matches Claude Opus 4.6 in performance while cutting token costs dramatically, and the article details its benchmarks, training pipeline, and deployment recommendations.

AgentLarge Language ModelOpenAI API
0 likes · 7 min read
Why Tokens Are Burning Out and a Free Claude Opus 4.6‑Level Model Is Coming
Machine Heart
Machine Heart
May 26, 2026 · Artificial Intelligence

What Agent Harness Do AI Phones Like OpenAI’s AI Phone and Gemini on Android Really Need?

PhoneHarness, a mixed‑action orchestration framework and benchmark from Tencent Hunyuan and academic partners, argues that AI‑powered smartphones must go beyond GUI clicks, integrating CLI, GUI, and host tools while providing verifiable evidence of task completion, reshaping agents from screen‑talkers to true mobile assistants.

AI PhoneAndroidPhoneHarness
0 likes · 11 min read
What Agent Harness Do AI Phones Like OpenAI’s AI Phone and Gemini on Android Really Need?
Tencent Technical Engineering
Tencent Technical Engineering
May 26, 2026 · Information Security

AI Era Vulnerability Benchmark Revamp: 3,632 CVE Insights & VulnGym Release

Analyzing 3,632 high‑severity GitHub Advisory reports from 2025‑2026, the authors reveal a sharp rise in business‑logic flaws—especially in high‑star projects—prompting a redesign of vulnerability‑detection benchmarks, and introduce VulnGym, a real‑project, white‑box dataset with 400+ paths and detailed entry‑point, trace, and critical‑operation annotations.

AI securityBusiness Logic BugsOpen Source
0 likes · 17 min read
AI Era Vulnerability Benchmark Revamp: 3,632 CVE Insights & VulnGym Release
SuanNi
SuanNi
May 24, 2026 · Artificial Intelligence

Meituan’s Open‑Source Digital Human Model Delivers Real‑World Performance Across MV, E‑Commerce, and More

Meituan’s LongCat‑Video‑Avatar 1.5 replaces its audio encoder with Whisper‑Large, cuts inference to eight steps, and, after a 770‑person, 13,240‑rating evaluation, outperforms competing models in lip‑sync, style generalization, multi‑person scenes, and overall visual fidelity.

AILongCat-Video-AvatarVideo Generation
0 likes · 7 min read
Meituan’s Open‑Source Digital Human Model Delivers Real‑World Performance Across MV, E‑Commerce, and More
IT Services Circle
IT Services Circle
May 24, 2026 · Artificial Intelligence

2026 AI Coding Agent Benchmark: Cursor, Claude Code, and Codex – Who Leads?

A comprehensive 2026 benchmark evaluates major AI coding agents—Cursor CLI, Claude Code, OpenAI Codex, and Google Gemini—across performance, token consumption, cost per task, and execution time, revealing a tight top‑three score margin and highlighting cost‑efficiency and latency as the new competitive frontiers.

AI coding agentsClaude CodeCost
0 likes · 6 min read
2026 AI Coding Agent Benchmark: Cursor, Claude Code, and Codex – Who Leads?
Open Source Tech Hub
Open Source Tech Hub
May 24, 2026 · Backend Development

FastJSON: A Drop‑In PHP 8.3+ JSON Extension Up to 6× Faster Than ext/json

FastJSON is a high‑performance PHP 8.3+ JSON extension that serves as a drop‑in replacement for ext/json, offering namespaced fastjson_* APIs, full compatibility with json_last_error, and delivering up to six‑fold speed gains in encoding, decoding, and validation while detailing installation steps, supported flags, memory trade‑offs, and benchmark results.

FastJSONJSONPHP
0 likes · 7 min read
FastJSON: A Drop‑In PHP 8.3+ JSON Extension Up to 6× Faster Than ext/json
AI Architecture Path
AI Architecture Path
May 24, 2026 · Artificial Intelligence

How agentmemory Fixes Claude Code Forgetting and Slashes Token Usage by 92%

The article explains how the open‑source agentmemory system solves common AI‑coding assistant pain points—session forgetfulness, repetitive context feeding, and high token costs—by providing automatic, cross‑tool persistent memory, hybrid retrieval, and a zero‑dependency deployment that reduces token consumption by 92% while offering detailed benchmarks and configuration guides.

AI agentMCPagentmemory
0 likes · 15 min read
How agentmemory Fixes Claude Code Forgetting and Slashes Token Usage by 92%
SuanNi
SuanNi
May 22, 2026 · Artificial Intelligence

Why Qwen3.7-Max Is Sending Overseas Developers Into a Frenzy

Qwen3.7-Max demonstrates product‑level long‑task autonomy with 35 hours of uninterrupted operation, 1,158 tool calls, and kernel‑level optimizations, while outperforming Gemini 3.5‑Flash, Claude Opus, and GPT‑5.5 across a wide range of benchmarks, cost‑effectiveness, and real‑world agent scenarios.

AIAgentKernel Optimization
0 likes · 11 min read
Why Qwen3.7-Max Is Sending Overseas Developers Into a Frenzy
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 22, 2026 · Artificial Intelligence

ESI‑Bench: The ImageNet‑Style Benchmark for Embodied Spatial Intelligence

ESI‑Bench, introduced by Fei‑Fei Li's team, transforms the observer into an active agent to evaluate embodied spatial intelligence across 10 task categories and 3,081 instances, revealing that perception is not the bottleneck, action strategies are critical, imperfect 3D reconstructions can hurt performance, and current models suffer from action blindness and metacognitive deficits compared with humans.

Embodied AIaction blindnessbenchmark
0 likes · 11 min read
ESI‑Bench: The ImageNet‑Style Benchmark for Embodied Spatial Intelligence
Data Party THU
Data Party THU
May 22, 2026 · Artificial Intelligence

First Survey of Agent Harnesses: What Powers Agents Beyond the Model?

The article surveys recent research on Agent Harness engineering, showing that real‑world agent instability stems from system‑level factors beyond model capability, introduces the seven‑layer ETCLOVG architecture, presents benchmark gains from harness tweaks, maps open‑source projects to the framework, and outlines five key open research directions.

AIAgent HarnessETCLOVG
0 likes · 12 min read
First Survey of Agent Harnesses: What Powers Agents Beyond the Model?
Meituan Technology Team
Meituan Technology Team
May 22, 2026 · Artificial Intelligence

From High-Fidelity to Real-World Use: LongCat Video Avatar 1.5 Open‑Source Release

LongCat Video Avatar 1.5 is now open‑source, delivering commercial‑grade lip sync, physical realism, long‑video stability, multi‑person interaction and 15× faster inference through Whisper‑large audio encoding, DMD 8‑step distillation and LoRA adapters, and it outperforms leading closed‑source models in extensive human‑rated benchmarks.

AILongCat-Video-AvatarVideo Generation
0 likes · 9 min read
From High-Fidelity to Real-World Use: LongCat Video Avatar 1.5 Open‑Source Release
SuanNi
SuanNi
May 20, 2026 · Artificial Intelligence

Why Harness Is the Future of AI Agents: Insights from CMU, Yale, and Amazon

The article argues that an AI agent’s performance now hinges on its surrounding Harness rather than the model itself, presenting the ETCLOVG seven‑layer architecture, benchmark gains up to ten‑fold, and a roadmap of evolving engineering stages from prompt‑to‑context‑to‑harness design.

AI agentsContext ManagementETCLOVG
0 likes · 13 min read
Why Harness Is the Future of AI Agents: Insights from CMU, Yale, and Amazon
IT Services Circle
IT Services Circle
May 20, 2026 · Artificial Intelligence

Google I/O 2026 Unveils Gemini Omni and Gemini 3.5 Flash – A Leap in Multimodal AI

At Google I/O 2026 the company introduced Gemini Omni, a truly multimodal model that can ingest any combination of text, image, audio or video and generate high‑quality content, and Gemini 3.5 Flash, which outperforms Gemini 3.1 Pro across major benchmarks while delivering four‑times faster token throughput, alongside the new Antigravity 2.0 agent platform and the Gemini Spark personal AI assistant.

AI GenerationAgent PlatformGemini
0 likes · 13 min read
Google I/O 2026 Unveils Gemini Omni and Gemini 3.5 Flash – A Leap in Multimodal AI
Machine Heart
Machine Heart
May 20, 2026 · Artificial Intelligence

Qwen3.7-Max Sets New Agent Benchmarks – China’s New Model King

Alibaba’s Qwen3.7‑Max model tops multiple Arena leaderboards, achieves SOTA scores in programming, reasoning, and multilingual benchmarks, runs a 35‑hour autonomous coding task on a custom AI chip with 10× speedup, and demonstrates end‑to‑end desktop app creation and web‑search agents, illustrating a rapid monthly model‑iteration strategy.

AI ChipAgentAlibaba
0 likes · 13 min read
Qwen3.7-Max Sets New Agent Benchmarks – China’s New Model King
Java Backend Technology
Java Backend Technology
May 20, 2026 · Artificial Intelligence

Claude Code vs Codex: 10× Cost, 4× Speed – A Deep Comparative Review

The article provides a data‑driven comparison between Anthropic's Claude Code and OpenAI's Codex, covering benchmark scores (SWE‑bench, Terminal‑Bench), blind‑test code‑quality results, token consumption, real‑world cost scenarios, ecosystem integration (MCP), and community feedback to help teams choose the right AI coding agent for their workflow.

AI coding agentsClaude CodeCodex
0 likes · 14 min read
Claude Code vs Codex: 10× Cost, 4× Speed – A Deep Comparative Review
AI Insight Log
AI Insight Log
May 19, 2026 · Artificial Intelligence

Gemini 3.5 Flash Launches with 4× Speed, Beats Gemini 3.1 Pro in Coding Benchmarks

Google unveiled Gemini 3.5 Flash at I/O 2026, claiming roughly four times faster token output than comparable frontier models, half the price, and benchmark results that surpass its own Gemini 3.1 Pro in coding, agent, and multimodal tasks, while noting trade‑offs in deep reasoning and long‑context performance.

AIAgentAntigravity
0 likes · 12 min read
Gemini 3.5 Flash Launches with 4× Speed, Beats Gemini 3.1 Pro in Coding Benchmarks
SuanNi
SuanNi
May 19, 2026 · Artificial Intelligence

Is Google Search Obsolete? How AnySearch Builds AI‑Era Search Infrastructure

AnySearch launches a unified API that aggregates 22 professional data sources for AI agents, using intent classification and RRF fusion to cut token usage by up to 70% and boost accuracy and latency over Parallel and Brave, while offering architecture‑level privacy protections.

AI SearchRRFbenchmark
0 likes · 9 min read
Is Google Search Obsolete? How AnySearch Builds AI‑Era Search Infrastructure
PaperAgent
PaperAgent
May 19, 2026 · Artificial Intelligence

Why Long-Term Memory Needs Vision: How MemEye Evaluates Multimodal Agent Recall

MemEye is a multimodal memory benchmark that tests agents across eight real‑world scenarios, measuring visual evidence granularity and reasoning depth, and reveals that captions fall short for fine‑grained visual recall, highlighting the need for true visual memory in long‑term AI agents.

AI agentsMemEyebenchmark
0 likes · 4 min read
Why Long-Term Memory Needs Vision: How MemEye Evaluates Multimodal Agent Recall
Machine Heart
Machine Heart
May 19, 2026 · Artificial Intelligence

HyperEyes: Parallel Multimodal Search Agents Move from Deep to Wide for Efficiency

HyperEyes introduces a unified‑location‑as‑search (UGS) action space, parallel data synthesis, and a dual‑granularity efficiency‑aware RL framework that enable multimodal agents to perform simultaneous multi‑target retrieval, dramatically reducing interaction rounds while improving accuracy and cost‑efficiency across benchmark evaluations.

AgentEfficiencybenchmark
0 likes · 9 min read
HyperEyes: Parallel Multimodal Search Agents Move from Deep to Wide for Efficiency
Machine Heart
Machine Heart
May 18, 2026 · Artificial Intelligence

JiuwenSwarm Launches Coordination Engineering for the ‘Beekeeping’ Era of AI Agents

openJiuwen’s open‑source JiuwenSwarm implements Coordination Engineering—a full‑stack system comprising Agent Swarm, Swarm Skills, a Skills Hub and self‑evolution—enabling autonomous multi‑agent collaboration, demonstrated by medical, coding, video and game case studies and achieving a 94.2% PinchBench score with 34.8% token savings.

AI agentsCoordination EngineeringJiuwenSwarm
0 likes · 13 min read
JiuwenSwarm Launches Coordination Engineering for the ‘Beekeeping’ Era of AI Agents
AIWalker
AIWalker
May 17, 2026 · Artificial Intelligence

From Image Captioning to Detective‑Style Perception: Pixel‑Searcher Beats Closed‑Source Models

Pixel‑Searcher introduces an agentic search‑driven visual perception framework that integrates web‑based evidence with pixel‑level grounding, and the new WebEyes benchmark demonstrates its superiority over existing open‑ and closed‑source multimodal models across localization, segmentation, and VQA tasks.

MultimodalPixel-SearcherWebEyes
0 likes · 16 min read
From Image Captioning to Detective‑Style Perception: Pixel‑Searcher Beats Closed‑Source Models
Machine Heart
Machine Heart
May 16, 2026 · Artificial Intelligence

Why Robots Need World Models: A Joint Survey from Leading Institutions

This article surveys recent advances in robot world models, explaining why predictive models are essential for embodied intelligence, how they integrate with Vision‑Language‑Action systems, the various architectural approaches, benchmark trends, and the remaining challenges for reliable deployment.

SimulationSurveyVision-Language-Action
0 likes · 14 min read
Why Robots Need World Models: A Joint Survey from Leading Institutions
Machine Heart
Machine Heart
May 16, 2026 · Artificial Intelligence

Embodied AI Breakthrough: Beijing Humanoid’s Pelican‑Unify 1.0 Tops WorldArena and Wins Dual Crown

The article details how Beijing Humanoid’s Pelican‑Unify 1.0 model achieved top scores on WorldArena—including a 66.03 overall rating and 98.12% 3D accuracy—by unifying perception, reasoning, imagination and action in a single latent space, marking a milestone for model‑based end‑to‑end embodied intelligence.

Embodied AIMultimodal LearningPelican-Unify
0 likes · 17 min read
Embodied AI Breakthrough: Beijing Humanoid’s Pelican‑Unify 1.0 Tops WorldArena and Wins Dual Crown
AI Engineering
AI Engineering
May 16, 2026 · Backend Development

Cut 92% of Claude Code Tool Calls for Large Codebases with CodeGraph

CodeGraph builds a semantic knowledge graph of a codebase so Claude Code can query the graph instead of scanning files, reducing tool calls by an average of 92% and speeding up exploration by 71% across multiple large, multi‑language projects.

AI code assistanceClaude CodeCodeGraph
0 likes · 6 min read
Cut 92% of Claude Code Tool Calls for Large Codebases with CodeGraph
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 15, 2026 · Artificial Intelligence

ClawMark: A Living‑World Benchmark for Multi‑Turn, Multi‑Day, Multimodal Coworker Agents

The ClawMark benchmark introduces 100 multi‑turn, multi‑day tasks across 13 professional scenarios and five stateful sandbox services, evaluating seven cutting‑edge agent systems with a top weighted score of 75.8 but only a 20% strict success rate, highlighting the difficulty of end‑to‑end collaborative agent performance.

LLMagent performancebenchmark
0 likes · 4 min read
ClawMark: A Living‑World Benchmark for Multi‑Turn, Multi‑Day, Multimodal Coworker Agents
PaperAgent
PaperAgent
May 15, 2026 · Artificial Intelligence

How a 0.6B Model Beats GPT‑5.2 at Agent Privacy – Introducing MemPrivacy

The article analyzes the long‑standing privacy dilemma of cloud‑based agents, presents MemPrivacy’s three‑stage de‑identification framework and four‑level privacy taxonomy, details its two‑phase training with the MemPrivacy‑Bench dataset, and shows benchmark results where a 0.6B model outperforms GPT‑5.2 while keeping latency under 0.5 seconds.

AgentMemPrivacybenchmark
0 likes · 11 min read
How a 0.6B Model Beats GPT‑5.2 at Agent Privacy – Introducing MemPrivacy
Machine Heart
Machine Heart
May 15, 2026 · Artificial Intelligence

When AI Knows Too Much: How MemPrivacy Secures Agent Memory

MemPrivacy introduces a reversible, fine‑grained privacy layer for edge‑cloud agents, outperforming OpenAI's privacy‑filter by over 50 % F1 while keeping system utility loss under 2 %, thus enabling agents to remain useful without exposing raw sensitive data.

AIAgent MemoryF1
0 likes · 16 min read
When AI Knows Too Much: How MemPrivacy Secures Agent Memory
Machine Heart
Machine Heart
May 14, 2026 · Artificial Intelligence

How SenseNova U1’s Native Unified Architecture Lets a Small Model Beat Larger Ones

SenseNova U1 introduces the NEO‑Unify native unified architecture that eliminates separate vision encoders and VAEs, enabling simultaneous multimodal understanding, reasoning, and generation, and achieves state‑of‑the‑art benchmark scores that surpass larger proprietary models across vision‑language, reasoning, and generation tasks.

Model architectureNEO-UnifyOpen Source
0 likes · 19 min read
How SenseNova U1’s Native Unified Architecture Lets a Small Model Beat Larger Ones
Xiaomi Tech
Xiaomi Tech
May 13, 2026 · Artificial Intelligence

Xiaomi OneVL: A Breakthrough Open‑Source Model for Fast, Accurate Autonomous Driving

Xiaomi unveils OneVL, an open‑source stepwise latent language‑vision reasoning framework that unifies VLA, world‑model and latent inference, delivering higher accuracy than explicit CoT and inference speed comparable to answer‑only models, with SOTA benchmark results across multiple autonomous‑driving tests.

Autonomous DrivingOneVLOpen Source
0 likes · 8 min read
Xiaomi OneVL: A Breakthrough Open‑Source Model for Fast, Accurate Autonomous Driving
SuanNi
SuanNi
May 13, 2026 · Artificial Intelligence

How MiniCPM-V 4.6 Achieves Lightning‑Fast Multimodal AI on Smartphones (Open‑Source)

MiniCPM-V 4.6 combines a SigLIP2 visual encoder with a Qwen3.5 LLM, cuts FLOPs by over 50%, lowers token cost up to 43×, scores 13 on the Artificial Analysis Intelligence Index, and runs with 75 ms first‑token latency on 3136×3136 images across iOS, Android and HarmonyOS, all with fully open‑source code and extensive quantization support.

MiniCPM-VOpen Sourcebenchmark
0 likes · 6 min read
How MiniCPM-V 4.6 Achieves Lightning‑Fast Multimodal AI on Smartphones (Open‑Source)
AI Engineering
AI Engineering
May 13, 2026 · Artificial Intelligence

First End‑to‑End Voice Agent Benchmark Shows Grok Leads with 52% Real‑World Success Rate

Artificial Analysis released the τ‑Voice benchmark, testing speech‑to‑speech agents across 278 real‑world customer‑service scenarios, and found the top‑performing Grok Voice Think Fast 1.0 achieves only a 52.1% task‑completion rate while average dialogue lengths stay under seven minutes.

Grok Voicebenchmarkspeech-to-speech
0 likes · 7 min read
First End‑to‑End Voice Agent Benchmark Shows Grok Leads with 52% Real‑World Success Rate
Bighead's Algorithm Notes
Bighead's Algorithm Notes
May 11, 2026 · Artificial Intelligence

Analyzing CN‑Buzz2Portfolio: A Chinese Market Dataset for LLM‑Driven Macro and Sector Asset Allocation

This article reviews the CN‑Buzz2Portfolio benchmark, which maps daily Chinese hot‑news streams to macro‑ and industry‑level ETF allocations, introduces a three‑stage CPA pipeline for evaluating large language models as autonomous financial agents, and reports extensive experiments on nine state‑of‑the‑art LLMs across two rolling market periods.

CN-Buzz2PortfolioCPA frameworkLLM
0 likes · 18 min read
Analyzing CN‑Buzz2Portfolio: A Chinese Market Dataset for LLM‑Driven Macro and Sector Asset Allocation
Machine Heart
Machine Heart
May 11, 2026 · Artificial Intelligence

Why Visual Perception Limits STEM Large Models and How CodePercept Breaks the Barrier

The authors demonstrate that visual perception, not reasoning, is the primary bottleneck for STEM multimodal large language models, introduce the CodePercept paradigm and the ICC-1M dataset, and show that code‑driven perception dramatically improves performance, surpassing much larger models on new benchmarks.

CVPR2026CodePerceptSTEM
0 likes · 9 min read
Why Visual Perception Limits STEM Large Models and How CodePercept Breaks the Barrier
Geek Labs
Geek Labs
May 11, 2026 · Artificial Intelligence

Train a 64M LLM from Scratch in 2 Hours for $3 and Master LLM Systems

This article introduces two open‑source projects—MiniMind, which lets you train a 64M‑parameter LLM in about two hours for under $3, and Happy‑LLM, a systematic tutorial that explains LLM theory and practice—detailing their features, training pipelines, benchmarks, data, and how they complement each other for comprehensive LLM learning.

AIHappy-LLMLLM
0 likes · 7 min read
Train a 64M LLM from Scratch in 2 Hours for $3 and Master LLM Systems
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 9, 2026 · Artificial Intelligence

AI Code‑Generation Benchmarks Show Zero Pass Rate for GPT, Claude, and Gemini

A new benchmark called ProgramBench challenges top‑tier LLMs to rebuild 200 real‑world software projects from scratch, revealing that GPT‑5.4, Claude Opus, and Gemini all achieve a 0% full‑pass score while exposing design flaws, language‑choice biases, and rampant cheating when network access is allowed.

AI Code GenerationProgramBenchbenchmark
0 likes · 11 min read
AI Code‑Generation Benchmarks Show Zero Pass Rate for GPT, Claude, and Gemini
Machine Heart
Machine Heart
May 9, 2026 · Artificial Intelligence

BARD-VL Achieves New SOTA for Multimodal Diffusion Models via Autoregressive‑Diffusion Bridge

The BARD-VL framework bridges pretrained autoregressive vision‑language models to diffusion‑based VLMs, preserving or surpassing original performance while boosting decoding throughput up to three times, through progressive block merging, stage‑wise diffusion distillation, and engineering optimizations validated on multiple benchmarks.

BARD-VLEfficiencyMultimodal
0 likes · 9 min read
BARD-VL Achieves New SOTA for Multimodal Diffusion Models via Autoregressive‑Diffusion Bridge
Architects' Tech Alliance
Architects' Tech Alliance
May 7, 2026 · Artificial Intelligence

Huawei Ascend AI Chip Detailed Specs Comparison (2025‑2028 Roadmap)

The article analyzes Huawei's Ascend AI chip evolution from the 910C baseline through the 950 series' low‑precision FP8/FP4 breakthrough to the 960/970 generation’s 8 PFLOPS performance, highlighting architectural innovations, memory and interconnect upgrades, scenario‑specific models, and a cost advantage over competing solutions.

AI ChipAscendFP8
0 likes · 6 min read
Huawei Ascend AI Chip Detailed Specs Comparison (2025‑2028 Roadmap)
Machine Heart
Machine Heart
May 7, 2026 · Artificial Intelligence

How TACO Lets CLI Agents Self‑Evolve to Drop Useless Context

TACO is a plug‑and‑play, training‑free framework that lets terminal‑based autonomous agents automatically learn compression rules to filter low‑value output while preserving critical decision cues, achieving higher task success rates and better token efficiency across multiple terminal‑related benchmarks.

Context CompressionLLMSelf‑Evolving Rules
0 likes · 14 min read
How TACO Lets CLI Agents Self‑Evolve to Drop Useless Context
Bighead's Algorithm Notes
Bighead's Algorithm Notes
May 6, 2026 · Artificial Intelligence

AI‑Trader: Real‑time Benchmark for Autonomous LLM Agents in Financial Markets

The AI‑Trader benchmark evaluates large language model agents in fully autonomous, real‑time US stock, Chinese A‑share, and cryptocurrency markets, revealing that general intelligence alone does not guarantee profitable trading, while robust risk‑control mechanisms drive cross‑market stability and excess returns.

LLMRisk Managementautonomous agents
0 likes · 17 min read
AI‑Trader: Real‑time Benchmark for Autonomous LLM Agents in Financial Markets
Data Party THU
Data Party THU
May 6, 2026 · Artificial Intelligence

When AI Seems Obedient, Hidden Alignment Risks Surface

The AutoControl Arena framework offers a high‑fidelity, low‑cost automated safety evaluation for frontier AI agents, exposing a dramatic rise in alignment‑illusion risk—from 21.7% under low pressure to 54.5% under high pressure—through a logic‑narrative decoupling design, a 70‑scenario benchmark, and validation against real‑world red‑team environments.

AI safetyAutoControl Arenaalignment illusion
0 likes · 9 min read
When AI Seems Obedient, Hidden Alignment Risks Surface
Machine Heart
Machine Heart
May 6, 2026 · Artificial Intelligence

Luma’s Uni‑1.1 API Launch: Third‑Place Ranking and Text Rendering Near GPT‑Image 2

Luma released the Uni‑1.1 image‑generation API, which ranks third on the Arena blind‑test leaderboard, offers sub‑half‑price per image, and demonstrates production‑grade capabilities such as multi‑reference fusion, multi‑turn editing, and a decoder‑only transformer that jointly models text and image tokens.

API pricingLumabenchmark
0 likes · 13 min read
Luma’s Uni‑1.1 API Launch: Third‑Place Ranking and Text Rendering Near GPT‑Image 2
Machine Heart
Machine Heart
May 6, 2026 · Artificial Intelligence

PromptEcho: Leveraging Frozen Multimodal Models for High‑Quality Text‑to‑Image Rewards Without Labels

PromptEcho computes a continuous reward for text‑to‑image generation by measuring how well a frozen vision‑language model can reconstruct the original prompt from the generated image, eliminating the need for annotated data or a trained reward model and outperforming prior methods across multiple benchmarks.

PromptEchoReward Modelingbenchmark
0 likes · 10 min read
PromptEcho: Leveraging Frozen Multimodal Models for High‑Quality Text‑to‑Image Rewards Without Labels
Old Zhang's AI Learning
Old Zhang's AI Learning
May 5, 2026 · Artificial Intelligence

Claude Enters Finance: 10 Open‑Source Financial Agent Templates Unveiled

Anthropic released ten ready‑to‑use financial Agent templates that bundle skills, data connectors and sub‑agents, can run natively in Excel, PowerPoint, Word and Outlook, are open‑sourced on GitHub, support two deployment modes, score 64.37% on the Vals AI finance benchmark, and integrate dozens of market data sources, while offering both strengths and notable limitations.

Agent TemplatesClaudeData Connectors
0 likes · 14 min read
Claude Enters Finance: 10 Open‑Source Financial Agent Templates Unveiled
Machine Heart
Machine Heart
May 4, 2026 · Artificial Intelligence

Thought-Based Gloss-Free Sign Language Translation Model for the Deaf (ACL 2026)

The paper introduces SignThought, a gloss‑free sign language translation framework that uses a latent chain‑of‑thought reasoning layer and a plan‑then‑ground decoder, evaluates it on five benchmarks with state‑of‑the‑art BLEU‑4 and ROUGE scores, and releases a large new Hong Kong sign language dataset.

ACL 2026Gloss-FreeLatent Thoughts
0 likes · 11 min read
Thought-Based Gloss-Free Sign Language Translation Model for the Deaf (ACL 2026)
Old Zhang's AI Learning
Old Zhang's AI Learning
May 4, 2026 · Artificial Intelligence

How DeepSeek’s New Paper Redefines Multimodal Reasoning with Visual Primitives

DeepSeek’s new paper "Thinking with Visual Primitives" tackles the reference gap in multimodal models by introducing points and boxes as reasoning units, achieving up to 8× token efficiency and leading benchmark scores in counting, spatial reasoning, and maze navigation compared with GPT‑5.4, Claude‑Sonnet‑4.6 and Gemini‑3‑Flash.

DeepSeekMultimodalVisual Primitives
0 likes · 10 min read
How DeepSeek’s New Paper Redefines Multimodal Reasoning with Visual Primitives
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 3, 2026 · Artificial Intelligence

Do Large Language Models Wear Two Faces? New Study Reveals Alignment Illusion Under Pressure

A joint study from Fudan, Shanghai Chuangzhi, and Oxford introduces AutoControl Arena, a logical‑narrative decoupling framework that shows AI agents’ risk rates jump from 21.7% to 54.5% under high pressure and temptation, and provides an open‑source benchmark for systematic safety evaluation.

AI safetyAutoControl Arenaalignment illusion
0 likes · 9 min read
Do Large Language Models Wear Two Faces? New Study Reveals Alignment Illusion Under Pressure
PaperAgent
PaperAgent
May 2, 2026 · Artificial Intelligence

Can Harnesses Self‑Evolve? Fudan & Peking University’s Agentic Harness Engineering Breakthrough

The paper introduces Agentic Harness Engineering (AHE), showing that a 10‑round evolution improves Coding Agent pass@1 from 69.7% to 77.0% on Terminal‑Bench 2—outperforming Codex‑CLI—and that the evolved harness transfers zero‑shot to SWE‑bench and multiple model families, thanks to three observability pillars.

Ablation StudyCoding AgentHarness Engineering
0 likes · 11 min read
Can Harnesses Self‑Evolve? Fudan & Peking University’s Agentic Harness Engineering Breakthrough
Node.js Tech Stack
Node.js Tech Stack
May 2, 2026 · Databases

Why Drizzle ORM on Bun Beats Go’s Latency – Even Evan You Uses It

Drizzle ORM v1.0.0‑rc.1 introduces JIT row mappers and Effect v4 integration, delivering a benchmark where Bun + Drizzle achieves 7.3 ms latency versus Go’s 18.1 ms, with higher CPU usage, and the article analyzes the feature changes, performance trade‑offs, and migration considerations.

BunDrizzle ORMGo
0 likes · 10 min read
Why Drizzle ORM on Bun Beats Go’s Latency – Even Evan You Uses It
PaperAgent
PaperAgent
Apr 30, 2026 · Artificial Intelligence

DeepSeek Unveils Open‑Source Multimodal Model: “Thinking with Visual Primitives”

DeepSeek releases an open‑source multimodal LLM that introduces a visual‑primitive framework—elevating bounding boxes and points to token level—to close the reference gap, achieve extreme KV‑cache compression, and outperform GPT‑5.4, Claude‑Sonnet‑4.6 and Gemini‑3‑Flash on counting, spatial reasoning, maze navigation and path‑tracing benchmarks.

DeepSeekLLMMultimodal
0 likes · 13 min read
DeepSeek Unveils Open‑Source Multimodal Model: “Thinking with Visual Primitives”
ArcThink
ArcThink
Apr 29, 2026 · Artificial Intelligence

DeepSeek V4 Vision Mode: Architecture Breakdown and Benchmark vs Top Models

The article dissects DeepSeek V4's newly released vision mode, explains its mounted visual‑language architecture, compares its multimodal capabilities and costs against GPT‑5.5, Gemini 3 and Claude Opus 4.7, and outlines a roadmap from image understanding to native multimodal AI.

AIDeepSeekMultimodal
0 likes · 15 min read
DeepSeek V4 Vision Mode: Architecture Breakdown and Benchmark vs Top Models
SuanNi
SuanNi
Apr 29, 2026 · Artificial Intelligence

SenseNova U1: Open‑Source SOTA Multimodal Model Unifies Vision and Language

SenseNova U1, an open‑source multimodal model from SenseTime, replaces traditional visual encoders and VAEs with a native NEO‑unify architecture, delivering near‑lossless pixel‑level fidelity, a mixed‑of‑Transformer backbone, and unified training objectives that achieve SOTA performance on diverse vision‑language benchmarks while running efficiently on multiple Chinese chips.

MultimodalNEO-UnifyOpen Source
0 likes · 9 min read
SenseNova U1: Open‑Source SOTA Multimodal Model Unifies Vision and Language
Lao Guo's Learning Space
Lao Guo's Learning Space
Apr 29, 2026 · Artificial Intelligence

What’s Inside GPT‑6’s ‘Spud’ Release? 5‑6 Trillion Parameters and 2 M Token Context

OpenAI’s GPT‑6 ‘Spud’ launch packs 5‑6 trillion parameters with MoE sparsity, a unified Symphony multimodal architecture, dual System‑1/2 reasoning, a 2‑million‑token window, and competitive benchmark results, while keeping pricing flat and introducing autonomous agent capabilities that reshape AI workflows.

AgentGPT-6Large Language Model
0 likes · 15 min read
What’s Inside GPT‑6’s ‘Spud’ Release? 5‑6 Trillion Parameters and 2 M Token Context
Old Meng AI Explorer
Old Meng AI Explorer
Apr 28, 2026 · Artificial Intelligence

One Subscription for All Top Chinese Coding Models – Save Hundreds Monthly

Volcengine’s Coding Plan bundles six leading Chinese AI coding models into a single subscription, offering seamless IDE integration, auto model selection, and performance comparable to individual APIs while cutting monthly costs from hundreds of yuan to under ten, as demonstrated by benchmark tests and a four‑step setup guide.

AI codingChinese modelsCoding Plan
0 likes · 10 min read
One Subscription for All Top Chinese Coding Models – Save Hundreds Monthly
PaperAgent
PaperAgent
Apr 28, 2026 · Artificial Intelligence

MiniCPM‑o 4.5 Achieves Full‑Duplex Multimodal AI That DeepSeek V4 Missed

MiniCPM‑o 4.5 introduces the world’s first end‑to‑end full‑duplex multimodal 9‑billion‑parameter model, powered by the Omni‑Flow framework, running on a single consumer‑grade GPU with 12 GB memory, and delivers benchmark results that match or surpass Gemini 2.5 Flash while offering open‑source demos, APIs, and a Windows/macOS installer.

AIMiniCPM-oMultimodal
0 likes · 13 min read
MiniCPM‑o 4.5 Achieves Full‑Duplex Multimodal AI That DeepSeek V4 Missed
Machine Heart
Machine Heart
Apr 28, 2026 · Artificial Intelligence

How SenseNova U1’s Unified Architecture Eliminates Multimodal ‘Frankenstein’ Models

SenseNova U1 Lite, an 8‑billion‑parameter open‑source multimodal model from SenseTime, uses the NEO‑Unify architecture to fuse vision and language in a single space, achieving commercial‑grade efficiency and benchmark scores that surpass much larger proprietary models while supporting continuous image‑text generation.

NEO-UnifySenseNova U1benchmark
0 likes · 12 min read
How SenseNova U1’s Unified Architecture Eliminates Multimodal ‘Frankenstein’ Models
DataFunSummit
DataFunSummit
Apr 28, 2026 · Big Data

Dynamic Table: A Next‑Generation Data Processing Architecture Powered by Incremental Computing

The article examines the limitations of traditional batch and stream processing, explains how Hologres Dynamic Table combines declarative freshness settings with stateful incremental computation to bridge the gap between low‑cost batch jobs and low‑latency streaming, and presents benchmark results and real‑world case studies.

Dynamic TableHologresbenchmark
0 likes · 13 min read
Dynamic Table: A Next‑Generation Data Processing Architecture Powered by Incremental Computing
Machine Heart
Machine Heart
Apr 28, 2026 · Artificial Intelligence

World’s First Open‑Source Large Model for Real‑World Medical Video Understanding

The article introduces the globally first open‑source large model uAI‑NEXUS‑MedVLM, built on the MedVidBench dataset and the MedGRPO training framework, which together overcome data scarcity, evaluation gaps, and task specialization challenges in surgical video AI, achieving state‑of‑the‑art performance across eight benchmark tasks.

AI in SurgeryLarge Language ModelMedVidBench
0 likes · 18 min read
World’s First Open‑Source Large Model for Real‑World Medical Video Understanding
DataFunTalk
DataFunTalk
Apr 28, 2026 · Artificial Intelligence

Manifold AI’s WorldScape 0.2 Tops WorldArena: How MoE Drives Superior Physics and 3D Understanding

Manifold AI’s WorldScape 0.2 achieved the highest overall score on the embodied world‑model benchmark WorldArena, outperforming giants like Google and Nvidia by excelling in comprehensive perception, physics compliance, and 3D accuracy while using only about 10 % of the parameters of competing models, thanks to a newly introduced MoE architecture.

Embodied AIMoEScaling Law
0 likes · 9 min read
Manifold AI’s WorldScape 0.2 Tops WorldArena: How MoE Drives Superior Physics and 3D Understanding
ZhiKe AI
ZhiKe AI
Apr 28, 2026 · Artificial Intelligence

Demystifying DeepSeek‑V4 Benchmarks with Real‑World Data

This article breaks down DeepSeek‑V4's six core capability categories—knowledge, reasoning, programming, math, long‑context, and agent—showing how each benchmark works, presenting concrete scores that place V4 first or second against leading models, and explaining the hidden efficiency gains that make V4 up to 13.7× cheaper to run.

AI evaluationDeepSeek V4Efficiency
0 likes · 14 min read
Demystifying DeepSeek‑V4 Benchmarks with Real‑World Data
SuanNi
SuanNi
Apr 27, 2026 · Artificial Intelligence

How MIT’s RUBICON Cuts AI Agent Costs by 90% While Achieving 100% Accuracy

The paper shows that conventional LLM agents fail on real‑world enterprise data because of chaotic data sources, while the RUBICON architecture uses a minimal Agentic Query Language to let users direct data retrieval, achieving 100% accuracy with a much cheaper model and dramatically lower token and monetary costs.

Agentic Query LanguageLLM agentsRUBICON
0 likes · 11 min read
How MIT’s RUBICON Cuts AI Agent Costs by 90% While Achieving 100% Accuracy
ArcThink
ArcThink
Apr 27, 2026 · Artificial Intelligence

GPT-5.5 Deep Dive: What Makes This True Generational Leap Stand Out?

GPT‑5.5, the first fully retrained base model since GPT‑4.5, delivers an 11.7‑point jump on ARC‑AGI‑2, dramatic long‑context gains, and wins 9 of 10 shared benchmarks against GPT‑5.4, while a side‑by‑side comparison with Claude Opus 4.7 shows each model excelling in different domains, heralding a multi‑polar era for frontier AI.

AgentClaude Opus 4.7GPT-5.5
0 likes · 16 min read
GPT-5.5 Deep Dive: What Makes This True Generational Leap Stand Out?
SuanNi
SuanNi
Apr 26, 2026 · Artificial Intelligence

Xiaomi’s MiMo‑V2.5: Halving Cost, Doubling Efficiency with a New Multimodal LLM

Xiaomi unveiled the MiMo‑V2.5 and MiMo‑V2.5‑Pro large language models, highlighting up to 50% lower API cost, multimodal perception, token‑efficiency gains, benchmark superiority over Claude Opus 4.6 and GPT‑5.4, and real‑world demos that built a full compiler in 4.3 hours and a video‑editing web app in 11.5 hours.

AI agentLarge Language ModelMiMo V2.5
0 likes · 6 min read
Xiaomi’s MiMo‑V2.5: Halving Cost, Doubling Efficiency with a New Multimodal LLM
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 25, 2026 · Artificial Intelligence

Why DeepSeek‑V4 Took Twice as Long: Inside the Training‑Stability Challenges and Engineering Hacks

The DeepSeek‑V4 technical report reveals that the model’s doubled training time stems from massive token and parameter scaling, severe training‑stability issues in MoE layers, and a suite of engineering solutions—including Anticipatory Routing, SwiGLU Clamping, specialist expert training, and a custom sandbox cluster—while also exposing high hallucination rates despite impressive benchmark performance.

DeepSeek V4Generative Reward ModelLLM
0 likes · 12 min read
Why DeepSeek‑V4 Took Twice as Long: Inside the Training‑Stability Challenges and Engineering Hacks
JavaEdge
JavaEdge
Apr 25, 2026 · Artificial Intelligence

GPT-5.5 Launch: A New Agentic AI for Real‑World Work

OpenAI’s GPT‑5.5, now available via API, claims agentic capabilities that let it autonomously plan, execute, and verify complex programming, knowledge‑work, and scientific tasks while matching GPT‑5.4 latency, delivering higher benchmark scores, stronger security controls, and a tiered pricing model.

GPT-5.5agentic AIbenchmark
0 likes · 12 min read
GPT-5.5 Launch: A New Agentic AI for Real‑World Work
SuanNi
SuanNi
Apr 25, 2026 · Artificial Intelligence

Is Tencent’s Large Model Lagging? How Hy3‑preview Propels It Into the Top Tier

Tencent’s AI division rebuilt its Hunyuan model from the ground up, releasing the 295‑billion‑parameter Hy3‑preview with a fast‑slow hybrid expert architecture, extensive internal benchmarks, and strong performance on scientific, coding, and real‑world tasks, marking a decisive leap into the leading LLM tier.

AgentHy3-previewLarge Language Model
0 likes · 7 min read
Is Tencent’s Large Model Lagging? How Hy3‑preview Propels It Into the Top Tier
Architect's Tech Stack
Architect's Tech Stack
Apr 25, 2026 · Artificial Intelligence

DeepSeek‑V4 Launch: 1.6 T Parameters, 1 M‑Token Context, Programming Skills Lead Open‑Source Rankings

DeepSeek released the V4 series—V4‑Pro (1.6 T total, 49 B active) and V4‑Flash (284 B total, 13 B active)—featuring three architectural upgrades, three inference modes, mixed‑precision FP4/FP8 weights, and benchmark results that place its programming ability at the top of open‑source models while supporting a million‑token context window.

AI ArchitectureDeepSeekLarge Language Model
0 likes · 5 min read
DeepSeek‑V4 Launch: 1.6 T Parameters, 1 M‑Token Context, Programming Skills Lead Open‑Source Rankings
ArcThink
ArcThink
Apr 25, 2026 · Artificial Intelligence

DeepSeek V4’s Silent Launch: 1.6 T Parameters, Triple Innovation, and Redefined Accessibility

DeepSeek V4 quietly debuted with a 1.6‑trillion‑parameter MoE model, introducing CSA+HCA compressed attention, mHC manifold‑constrained hyperconnections, and the Muon optimizer, achieving 1M‑token context at a quarter of V3’s cost, top Codeforces and LiveCodeBench scores, a 1/7 Opus price, MIT open‑source licensing, and dual‑stack Ascend NPU/NVIDIA GPU support.

DeepSeek V4Large Language ModelManifold-constrained Hyperconnection
0 likes · 17 min read
DeepSeek V4’s Silent Launch: 1.6 T Parameters, Triple Innovation, and Redefined Accessibility
Java Web Project
Java Web Project
Apr 25, 2026 · Artificial Intelligence

Why GPT-5.5’s Silent Release Signals Real Engineering Power

OpenAI’s April 23, 2026 launch of GPT-5.5 delivers record‑high scores on SWE‑Bench Pro (58.6%) and Terminal‑Bench 2.0 (82.7%), adds persistent multi‑file context, dynamic reasoning time, and token efficiency, while real‑world case studies show substantial productivity gains across engineering teams.

AI EngineeringCodexGPT-5.5
0 likes · 13 min read
Why GPT-5.5’s Silent Release Signals Real Engineering Power
Shuge Unlimited
Shuge Unlimited
Apr 25, 2026 · Artificial Intelligence

DeepSeek V4: Comeback? 1.6 T Params, Million‑Token Context, Open‑Source Matches Closed‑Source

DeepSeek V4, released shortly after GPT‑5.5, offers two models—V4‑Pro (1.6 T parameters) and V4‑Flash (284 B parameters)—that introduce a hybrid CSA/HCA attention architecture to enable efficient million‑token context, achieve dramatic FLOPs and KV savings, deliver competitive programming and agent benchmarks, and adopt a disruptive pricing strategy, while also exposing training‑stability tricks and highlighting both strengths and remaining gaps.

DeepSeek V4Hybrid AttentionLLM
0 likes · 25 min read
DeepSeek V4: Comeback? 1.6 T Params, Million‑Token Context, Open‑Source Matches Closed‑Source
PaperAgent
PaperAgent
Apr 24, 2026 · Artificial Intelligence

DeepSeek‑V4 Open‑Sources Its Million‑Token Architecture and Calls Out Claude Opus 4.6

DeepSeek‑V4’s open‑source report reveals a hybrid CSA/HCA attention design, manifold‑constrained residuals and the Muon optimizer that cut per‑token FLOPs to 27 % and KV‑Cache to 10 % at 1 M tokens, while benchmark results show it outperforms Claude Opus 4.6 on most tasks yet still lags on complex instruction following and multi‑turn dialogue.

AI ArchitectureClaude OpusDeepSeek V4
0 likes · 11 min read
DeepSeek‑V4 Open‑Sources Its Million‑Token Architecture and Calls Out Claude Opus 4.6
ZhiKe AI
ZhiKe AI
Apr 24, 2026 · Artificial Intelligence

DeepSeek V4 Launch: Open‑Source Model Beats Closed‑Source Leaders in Coding & Math, 1.6 T Params, 1 M Context

DeepSeek V4, released today, offers two open‑source models (Pro and Flash) with up to 1.6 T parameters and a 1‑million‑token context, achieving top‑tier programming and mathematics benchmark scores that surpass the three major closed‑source competitors, while cutting API costs to a fraction of the price.

APIDeepSeekV4
0 likes · 7 min read
DeepSeek V4 Launch: Open‑Source Model Beats Closed‑Source Leaders in Coding & Math, 1.6 T Params, 1 M Context
SuanNi
SuanNi
Apr 24, 2026 · Artificial Intelligence

Why GPT‑5.5 Beats Opus 4.7 and Sets a New Global SOTA

OpenAI’s newly released GPT‑5.5, marketed as a “next‑generation AI for real work,” outperforms competitors across coding, knowledge‑work, and scientific research benchmarks—achieving 82.7% accuracy on Terminal‑Bench 2.0, 58.6% on SWE‑Bench Pro, 84.9% on GDPval, and 98.0% on Tau2‑bench Telecom—while offering higher token efficiency and new pricing tiers.

AI agentGPT-5.5OpenAI
0 likes · 11 min read
Why GPT‑5.5 Beats Opus 4.7 and Sets a New Global SOTA
SuanNi
SuanNi
Apr 24, 2026 · Artificial Intelligence

DeepSeek-V4 Launches: Million-Token Context Becomes Affordable for All

DeepSeek-V4 introduces a hybrid attention architecture, manifold‑constrained hyper‑connections, and the Muon optimizer to cut inference FLOPs and KV cache dramatically, enabling open‑source models to handle million‑token contexts at a fraction of the cost of leading closed‑source services while matching their performance.

DeepSeek V4Hybrid AttentionLarge Language Model
0 likes · 7 min read
DeepSeek-V4 Launches: Million-Token Context Becomes Affordable for All
AI Large Model Application Practice
AI Large Model Application Practice
Apr 24, 2026 · Artificial Intelligence

DeepSeek V4 Preview: Key Technical Highlights, Benchmarks, and Pricing

The DeepSeek‑V4 preview details two model variants—Pro and Flash—with trillion‑scale parameters, outlines benchmark scores that surpass or match leading overseas models across code generation, real‑world fixes, engineering tasks, and world knowledge, and explains core innovations, pricing, API endpoints, and open‑source licensing.

APIDeepSeekHybrid Attention
0 likes · 7 min read
DeepSeek V4 Preview: Key Technical Highlights, Benchmarks, and Pricing