Tagged articles

396 articles

Page 2 of 4

Jan 30, 2026 · Artificial Intelligence

Can Rendering Thought Chains as Images Speed Up LLM Reasoning?

This article introduces Render‑of‑Thought (RoT), a novel paradigm that compresses chain‑of‑thought reasoning into visual embeddings using frozen vision encoders, achieving 3‑4× token reduction, faster inference, and improved interpretability while requiring minimal pre‑training.

Inference OptimizationLatent SpaceMultimodal

0 likes · 12 min read

Can Rendering Thought Chains as Images Speed Up LLM Reasoning?

Old Zhang's AI Learning

Jan 28, 2026 · Artificial Intelligence

RAG-Anything: A Universal RAG Framework for PDFs, Office Docs, and Images

RAG-Anything is an open-source, end-to-end multimodal RAG framework that ingests PDFs, Office files, images, and scientific papers, parses them with high fidelity using MinerU, builds a multimodal knowledge graph, and enables hybrid retrieval, while noting resource and dependency considerations.

AIDocument ProcessingKnowledge Base

0 likes · 7 min read

RAG-Anything: A Universal RAG Framework for PDFs, Office Docs, and Images

Baobao Algorithm Notes

Jan 27, 2026 · Artificial Intelligence

Putting Kimi K2.5 and Kimi Code to the Test: Real‑World AI Agent Benchmarks

This article presents a hands‑on evaluation of Kimi K2.5 and its open‑source Kimi Code agent across a series of hard‑core prompts, covering Python API generation, cost‑optimized routing, multimodal ECharts visualisation, massive‑scale SQL optimisation, web‑search‑driven research, MoE explanation and video‑to‑code workflows.

AI agentKimiLarge Language Model

0 likes · 9 min read

Putting Kimi K2.5 and Kimi Code to the Test: Real‑World AI Agent Benchmarks

AI Frontier Lectures

Jan 25, 2026 · Artificial Intelligence

Turning Chain‑of‑Thought into Images: The Render‑of‑Thought Breakthrough

Render‑of‑Thought (RoT) proposes a novel visual‑latent reasoning framework that compresses textual chain‑of‑thought into dense image embeddings, achieving faster inference, better interpretability, and plug‑and‑play integration without costly pre‑training, as demonstrated on multiple math and logic benchmarks.

Chain-of-ThoughtImplicit CoTInference Acceleration

0 likes · 11 min read

Turning Chain‑of‑Thought into Images: The Render‑of‑Thought Breakthrough

PaperAgent

Jan 25, 2026 · Industry Insights

Top 10 Chinese Large Models to Watch: Features, Benchmarks, and Download Links

This roundup highlights ten cutting‑edge Chinese AI models—including Qwen3‑TTS, LongCat‑Flash‑Thinking‑2601, GLM‑4.7‑Flash, STEP3‑VL‑10B, Baichuan‑M3, and Youtu‑LLM—detailing their multilingual capabilities, architecture innovations, performance claims, and providing direct repository links for researchers and developers.

AI researchChinese AIMultimodal

0 likes · 7 min read

Top 10 Chinese Large Models to Watch: Features, Benchmarks, and Download Links

Old Meng AI Explorer

Jan 25, 2026 · Artificial Intelligence

How AI Can Control Your Desktop: Inside the Open‑Source TuriX‑CUA Agent

TuriX‑CUA is an open‑source AI desktop agent that captures screen content, uses multimodal large models to decide actions, and automatically moves the mouse or types, offering cross‑platform support, multi‑model architecture, and detailed setup instructions for Windows and macOS.

AI automationMultimodalOpen Source

0 likes · 7 min read

How AI Can Control Your Desktop: Inside the Open‑Source TuriX‑CUA Agent

PaperAgent

Jan 23, 2026 · Artificial Intelligence

Top AAAI 2026 Papers: New Vision‑Language‑Action Model, LLM2CLIP and More

AAAI 2026 in Singapore showcased 23,680 submissions, highlighting breakthrough papers such as ReconVLA’s reconstructive vision‑language‑action model, LLM2CLIP’s language‑enhanced multimodal representation, a sheaflet‑based hypergraph neural network design, advances in description logic modeling, and a novel causal discovery method for dynamical systems.

AAAI 2026AI PapersLLM

0 likes · 7 min read

Top AAAI 2026 Papers: New Vision‑Language‑Action Model, LLM2CLIP and More

Xiaomi Tech

Jan 21, 2026 · Artificial Intelligence

Xiaomi’s AI Breakthroughs Earn Spot at ICASSP 2026

Xiaomi announced that a suite of AI research papers—including a large‑scale audio‑text dataset, a federated learning framework for domain and class generalization, a dual‑encoder music evaluation model, a cross‑domain audio‑text pre‑training system, a one‑step video‑to‑audio synthesis method, a training‑free frame‑selection technique for long‑video understanding, and a unified multimodal retrieval architecture—were accepted to the prestigious ICASSP 2026 conference, showcasing detailed methodologies, benchmark results, and potential impact across audio, vision, and multimodal AI applications.

AIICASSP 2026Multimodal

0 likes · 14 min read

Xiaomi’s AI Breakthroughs Earn Spot at ICASSP 2026

StarRocks

Jan 15, 2026 · Artificial Intelligence

How AI‑First Lakehouse Redefines Data Platforms for Multimodal Analytics

The article outlines the evolution from traditional OLAP to an AI‑first Lakehouse, detailing unified multimodal storage, CPU/GPU heterogeneous scheduling, native vector search, in‑database AI inference, agent‑centric execution, and self‑evolving platform capabilities that together reshape modern data analytics.

AIAgent ArchitectureBig Data

0 likes · 11 min read

How AI‑First Lakehouse Redefines Data Platforms for Multimodal Analytics

AI Algorithm Path

Jan 11, 2026 · Artificial Intelligence

How Vector Embeddings Enable AI to Understand Anything

This article explains the principle of vector embeddings, shows how they turn words, images, audio and other data into dense numeric vectors, compares them with one‑hot encoding, describes static and contextual models, training methods, similarity metrics, and a wide range of real‑world AI applications.

AI fundamentalsMultimodalRAG

0 likes · 15 min read

How Vector Embeddings Enable AI to Understand Anything

IT Services Circle

Jan 11, 2026 · Artificial Intelligence

Can AI Really Control Your Computer? Inside TuriX‑CUA Open‑Source Agent

TuriX‑CUA is an open‑source Python‑based AI agent that equips artificial intelligence with visual perception and mouse‑keyboard control, enabling it to see the screen, reason with multimodal models, and act autonomously across macOS and Windows, with a multi‑model architecture, MCP support, and step‑by‑step setup instructions.

AICrossPlatformMultimodal

0 likes · 7 min read

Can AI Really Control Your Computer? Inside TuriX‑CUA Open‑Source Agent

Amap Tech

Jan 8, 2026 · Artificial Intelligence

How AI Powers Fancy Video Generation for Real‑World POI Scenes

This article details the AI techniques behind Gaode's "Street Ranking" project, explaining the Fancy video concept, the dual training and production pipelines, and the use of SFT, reinforcement learning, MoE‑LoRA, distribution‑matching distillation, and quality‑filtering to achieve 25× faster generation with high aesthetic fidelity.

AI video generationMultimodaldistillation

0 likes · 24 min read

How AI Powers Fancy Video Generation for Real‑World POI Scenes

IT Services Circle

Jan 2, 2026 · Artificial Intelligence

Top Open‑Source NotebookLM Alternatives: AI‑Powered Docs, Podcasts & Research Tools

This article surveys the most popular open‑source replacements for Google NotebookLM, detailing each project's star count, supported AI models, multimodal input capabilities, Docker deployment options, and unique features such as multi‑speaker podcast generation, semantic search, and collaborative knowledge‑base integration.

AIDockerLLM

0 likes · 8 min read

Top Open‑Source NotebookLM Alternatives: AI‑Powered Docs, Podcasts & Research Tools

Alibaba Cloud Developer

Dec 22, 2025 · Artificial Intelligence

Turning Real‑Time Hotspot Detection into AI‑Powered E‑Commerce Recommendations

Traditional recommendation systems lag behind fast‑moving external trends, missing the freshness and surprise users crave. This article details an end‑to‑end AI pipeline that perceives, understands, and reacts to hotspots within hours, automatically generating high‑quality product selections and continuously optimizing through feedback loops.

AI recommendationLLMMultimodal

0 likes · 25 min read

Turning Real‑Time Hotspot Detection into AI‑Powered E‑Commerce Recommendations

AI Insight Log

Dec 17, 2025 · Artificial Intelligence

Google Unveils Gemini 3 Flash: Free, Lightning‑Fast, and Outperforms Its Predecessor

Google released Gemini 3 Flash without warning, offering Pro‑level intelligence at Flash‑speed, costing just $0.5 per million input tokens and $3 per million output tokens, delivering three‑times faster inference than Gemini 2.5 Pro and surpassing it on benchmarks such as GPQA Diamond (90.4%), SWE‑bench (78.0%) and MMMU‑Pro (81.2%), while being freely accessible to all users and developers via the Gemini app, AI Studio, or API.

Gemini 3 FlashGoogle AILarge Language Model

0 likes · 5 min read

Google Unveils Gemini 3 Flash: Free, Lightning‑Fast, and Outperforms Its Predecessor

Alimama Tech

Dec 17, 2025 · Artificial Intelligence

How MUSE Revives Long-Tail User Behaviors with Multimodal Search for Lifelong Interest Modeling

MUSE introduces a multimodal search‑based framework that reorganizes tens of thousands of dormant user actions into a unified visual‑semantic interest graph, enabling CTR models to leverage ultra‑long behavior sequences with a 12.6% lift in online performance.

CTRMultimodalTaobao-MM

0 likes · 19 min read

How MUSE Revives Long-Tail User Behaviors with Multimodal Search for Lifelong Interest Modeling

PaperAgent

Dec 16, 2025 · Artificial Intelligence

Open Notebook: The Open‑Source, Privacy‑First Alternative to Google Notebook LM

Open Notebook is a fully local, open‑source AI notebook that rivals Google Notebook LM by supporting over 16 LLM providers, handling multimodal content, and enabling advanced multi‑speaker podcast generation while giving users complete data sovereignty and flexible deployment options.

AI NotebookLLMMultimodal

0 likes · 4 min read

Open Notebook: The Open‑Source, Privacy‑First Alternative to Google Notebook LM

PaperAgent

Dec 13, 2025 · Artificial Intelligence

Why Unified Multimodal Models Are the Key to Next‑Gen AGI – A Deep Survey

This article surveys the latest research on Unified Multimodal Foundations (UFM), explaining why integrating understanding and generation across text, image, video, and audio is essential for AGI, and detailing modeling paradigms, encoding/decoding strategies, training pipelines, benchmarks, and real‑world applications.

AI researchEncodingMultimodal

0 likes · 10 min read

Why Unified Multimodal Models Are the Key to Next‑Gen AGI – A Deep Survey

Baidu Tech Salon

Dec 8, 2025 · Artificial Intelligence

How Baidu’s HuiBosheng AI Live Platform Generates Super‑Human Scripts and Real‑Time Interaction

The article details Baidu HuiBosheng's end‑to‑end AI live‑streaming platform, covering merchant workflow, multimodal product understanding, style‑aware script generation, reinforcement‑learning‑driven smart control, voice and avatar cloning, and a data‑flywheel that continuously improves model performance, illustrated with real‑world GMV results.

AIData FlywheelMultimodal

0 likes · 20 min read

How Baidu’s HuiBosheng AI Live Platform Generates Super‑Human Scripts and Real‑Time Interaction

Kuaishou Tech

Dec 4, 2025 · Artificial Intelligence

Can a Tree‑Reasoned Model Master Video Emotion Understanding?

The paper introduces VidEmo, a multimodal video foundation model that uses a two‑stage emotion‑clue‑guided reasoning framework and a large emotion‑centric dataset (Emo‑CFG) to achieve state‑of‑the‑art performance on facial attribute, expression, and fine‑grained emotion tasks, surpassing Gemini 2.0.

AIFoundation ModelMultimodal

0 likes · 15 min read

Can a Tree‑Reasoned Model Master Video Emotion Understanding?

Alimama Tech

Dec 3, 2025 · Artificial Intelligence

How LORE Transforms E‑Commerce Search Relevance with Generative AI

The article details the development and deployment of LORE, a large generative model that reshapes e‑commerce search relevance by combining knowledge injection, chain‑of‑thought reasoning, and multimodal alignment, achieving simultaneous improvements in user experience and revenue metrics.

MultimodalSystem Architecturechain-of-thought

0 likes · 15 min read

How LORE Transforms E‑Commerce Search Relevance with Generative AI

ShiZhen AI

Dec 1, 2025 · Artificial Intelligence

AI Comic Episode 3: What Exactly Is a Token?

This episode explains that a token is the smallest text chunk an LLM processes—ranging from characters to subwords—covers why subword tokenization avoids vocabulary explosion, compares token counts across languages, describes the computational cost of sequential generation, and introduces visual tokens for multimodal models.

AI fundamentalsMultimodallarge language models

0 likes · 7 min read

AI Comic Episode 3: What Exactly Is a Token?

Fighter's World

Nov 28, 2025 · Artificial Intelligence

Is Gemini 3 Pro Google’s New Starting Point? An In‑Depth Technical and Market Analysis

The article examines Google’s Gemini 3 Pro launch, highlighting its full‑stack vertical integration, advanced System 2 reasoning, dynamic compute budgeting, native multimodal architecture, TPU cost advantages, the Antigravity IDE platform, generative UI capabilities, and the strategic implications for Google’s AI ecosystem and competitive positioning.

AI infrastructureAntigravityGemini 3 Pro

0 likes · 32 min read

Is Gemini 3 Pro Google’s New Starting Point? An In‑Depth Technical and Market Analysis

Kuaishou Tech

Nov 28, 2025 · Artificial Intelligence

Keye-VL-671B-A37B Leads Vision, Video, and Math Benchmarks

Kwai has open‑sourced its new flagship multimodal model Keye‑VL‑671B‑A37B, which upgrades visual perception, cross‑modal alignment and complex reasoning, achieving top scores on image, video, and mathematical reasoning benchmarks while detailing its architecture, three‑stage pre‑training, post‑training strategies, and future multimodal agent plans.

Large Language ModelMultimodalOpen Source

0 likes · 10 min read

Keye-VL-671B-A37B Leads Vision, Video, and Math Benchmarks

AI Large Model Application Practice

Nov 24, 2025 · Artificial Intelligence

How to Turn Text into an AI‑Powered PPT Video: A Step‑by‑Step Guide

This article breaks down the end‑to‑end engineering pipeline that converts a knowledge source such as a URL or PDF into a narrated PPT‑style video, detailing six core stages—from knowledge extraction and script generation to image creation, voice synthesis, and final video stitching—while highlighting practical model choices, prompt design, and stability tricks.

Artificial IntelligenceLLMMultimodal

0 likes · 16 min read

How to Turn Text into an AI‑Powered PPT Video: A Step‑by‑Step Guide

Amap Tech

Nov 19, 2025 · Artificial Intelligence

How Gaode’s Spacetime‑GR Model Boosts POI Recommendation with AI‑Powered SFT and DPO

Gaode transforms its map app into a dynamic, AI‑driven “living map” by fine‑tuning the large Spacetime‑GR model through embedding‑based and generative ranking SFT, DPO alignment, and multimodal augmentation, achieving significant offline CTR‑AUC improvements and online CTR gains in POI recommendation.

AI recommendationDPOMultimodal

0 likes · 12 min read

How Gaode’s Spacetime‑GR Model Boosts POI Recommendation with AI‑Powered SFT and DPO

Data Party THU

Nov 5, 2025 · Artificial Intelligence

How VLM‑FO1 Turns Vision‑Language Models into Precise Perception Machines

VLM‑FO1 introduces a generate‑plus‑reference paradigm that replaces coordinate generation with region token referencing, adding plug‑in modules such as a proposal generator, a hybrid fine‑grained encoder, and a region‑language connector to give any pretrained visual language model accurate, fine‑grained perception while preserving its original capabilities.

AI researchMultimodalPlug-and-Play

0 likes · 15 min read

How VLM‑FO1 Turns Vision‑Language Models into Precise Perception Machines

AsiaInfo Technology: New Tech Exploration

Nov 4, 2025 · Artificial Intelligence

How Multimodal Large Models Are Revolutionizing Video Analysis

This article examines the evolution from single‑frame video analysis to multimodal large models, detailing their architecture, optimization techniques, experimental validation on edge devices, and practical scenarios, while highlighting current limitations and future directions for AI‑driven video understanding.

AILarge ModelsMultimodal

0 likes · 20 min read

How Multimodal Large Models Are Revolutionizing Video Analysis

Meituan Technology Team

Nov 3, 2025 · Artificial Intelligence

LongCat-Flash-Omni: 560B Open‑Source Multimodal Model with Real‑Time Interaction

LongCat-Flash-Omni, the latest open‑source model from Meituan, combines a 560 billion‑parameter architecture, efficient multimodal perception and speech reconstruction modules, and a progressive training strategy to deliver real‑time audio‑video interaction and state‑of‑the‑art performance across text, image, audio, and video tasks.

AILarge Language ModelMultimodal

0 likes · 9 min read

LongCat-Flash-Omni: 560B Open‑Source Multimodal Model with Real‑Time Interaction

AI Info Trend

Nov 3, 2025 · Industry Insights

2025 Q3 AI Landscape: Key Players, Model Trends, and Hardware Shifts

Artificial Analysis’s Q3 2025 AI report reveals a rapidly accelerating industry across the entire stack, with US and Chinese labs neck‑and‑neck, fierce competition among OpenAI, Google, Anthropic, xAI, DeepSeek and Alibaba, cost‑efficient models, booming multimodal agents, and a hardware race led by NVIDIA’s Blackwell accelerators.

2025AIHardware

0 likes · 12 min read

2025 Q3 AI Landscape: Key Players, Model Trends, and Hardware Shifts

Huawei Cloud Developer Alliance

Nov 3, 2025 · Artificial Intelligence

How AI Agents Are Revolutionizing Technology: The New Engine of Innovation

This article explores the rise of AI agents—from their definition as intelligent digital assistants powered by large language models to their evolution through planning, memory, and tool use—highlighting real‑world applications, core technical mechanisms, code implementations, and future trends such as autonomy, multimodal fusion, standardization, and safety considerations.

AI agentLarge Language ModelMultimodal

0 likes · 24 min read

How AI Agents Are Revolutionizing Technology: The New Engine of Innovation

DataFunSummit

Oct 30, 2025 · Artificial Intelligence

How Multimodal Large Models Are Revolutionizing Document Processing and OCR

This article explores how the explosion of unstructured data exposes the limits of traditional OCR and shows how emerging multimodal large language models provide end‑to‑end document understanding, reduce pipeline complexity, cut training costs, enable hybrid retrieval‑augmented generation, and drive real‑world industry deployments.

AIDocument ProcessingLarge Language Model

0 likes · 28 min read

How Multimodal Large Models Are Revolutionizing Document Processing and OCR

BirdNest Tech Talk

Oct 30, 2025 · Artificial Intelligence

How to Build Multimodal Prompts with LangChain: A Step‑by‑Step Guide

Learn how LangChain enables multimodal interactions by preparing inputs, constructing prompts, invoking models like GPT‑4o, and processing responses, with a complete example that demonstrates image‑question answering, code walkthrough, environment setup, and key considerations for API keys and image URLs.

LLMLangChainMultimodal

0 likes · 9 min read

How to Build Multimodal Prompts with LangChain: A Step‑by‑Step Guide

AntTech

Oct 28, 2025 · Artificial Intelligence

Ming-Flash-Omni-Preview: 103B Open-Source Multimodal Model Excelling in Image, Video, and Speech

Introducing Ming‑Flash‑Omni‑Preview, a 103‑billion‑parameter open‑source multimodal model built on a sparse MoE architecture that delivers state‑of‑the‑art performance in controllable image generation, streaming video understanding, and context‑aware speech recognition, surpassing prior models on GenEval and GEdit benchmarks.

Large Language ModelMultimodalSparse MoE

0 likes · 8 min read

Ming-Flash-Omni-Preview: 103B Open-Source Multimodal Model Excelling in Image, Video, and Speech

HyperAI Super Neural

Oct 24, 2025 · Artificial Intelligence

Google Teams Unite on Earth AI: Boosting Geospatial Reasoning by 64% with Three Core Data Types

Google Research, X, and Cloud teams introduced Earth AI, a interoperable GeoAI model family that fuses image, population, and environmental data via a Gemini‑driven reasoning Agent, achieving state‑of‑the‑art performance and a 64% reasoning boost over Gemini 2.5 Pro while enabling non‑experts to run real‑time cross‑domain analyses.

AgentEarth AIGeospatial AI

0 likes · 16 min read

Google Teams Unite on Earth AI: Boosting Geospatial Reasoning by 64% with Three Core Data Types

Network Intelligence Research Center (NIRC)

Oct 17, 2025 · Artificial Intelligence

LucaOne: Unified Nucleic Acid & Protein Language Model Surpasses Other Models

Researchers present LucaOne, a Transformer‑based foundation model that unifies DNA/RNA and protein sequences using a 39‑token vocabulary, rotary positional encoding, and molecule‑type embeddings, and demonstrate through extensive multi‑task benchmarks that it outperforms domain‑specific models across seven biological tasks.

DNAFoundation ModelMultimodal

0 likes · 5 min read

LucaOne: Unified Nucleic Acid & Protein Language Model Surpasses Other Models

Wuming AI

Oct 16, 2025 · Industry Insights

Top AI Model Releases This Week: NanoChat, Ring‑1T, Qwen3‑VL, Veo 3.1, Claude Haiku 4.5

This week’s AI landscape saw Karpathy’s NanoChat open‑sourcing a 8‑K‑line ChatGPT replica, Ant Group unveiling a trillion‑parameter Ring‑1T model, Alibaba releasing the 4B/8B Qwen3‑VL visual language models that outperform Gemini 2.5 Flash Lite and GPT‑5 Nano, Google launching Veo 3.1 for high‑fidelity video generation, and Anthropic announcing Claude Haiku 4.5, a faster and cheaper LLM that excels on SWE‑bench benchmarks.

AI modelsMultimodalOpen Source

0 likes · 7 min read

Top AI Model Releases This Week: NanoChat, Ring‑1T, Qwen3‑VL, Veo 3.1, Claude Haiku 4.5

Alibaba Cloud Big Data AI Platform

Oct 15, 2025 · Big Data

How MaxCompute’s AI‑Native Data Warehouse Redefines Big Data for the Generative AI Era

The article details Alibaba Cloud's MaxCompute transformation into an AI‑native data warehouse, highlighting its serverless elasticity, multimodal data management, unified model lifecycle, AI Function integration, and new distributed Python engine that together address the bursty, high‑complexity data and compute challenges of the generative AI era.

AI-nativeMultimodaldistributed Python

0 likes · 11 min read

How MaxCompute’s AI‑Native Data Warehouse Redefines Big Data for the Generative AI Era

Bighead's Algorithm Notes

Oct 12, 2025 · Artificial Intelligence

Trading-R1: Open-Source LLM Framework for Explainable Financial Trading

This article reviews Trading‑R1, an open‑source LLM inference framework that integrates multimodal financial data, three‑stage supervised‑fine‑tuning and reinforcement learning to generate structured investment arguments and risk‑adjusted trade decisions, achieving superior Sharpe ratio and drawdown performance on real‑world stock and ETF tests.

Financial TradingLLMMultimodal

0 likes · 11 min read

Trading-R1: Open-Source LLM Framework for Explainable Financial Trading

DataFunSummit

Oct 12, 2025 · Artificial Intelligence

How Kuaishou Uses Large Models to Supercharge Ad Targeting with COPE and LEARN

This article reviews Kuaishou's two‑year exploration of multimodal large‑model techniques for advertising, outlining challenges in content‑domain ad estimation, the COPE unified product representation framework, and the LEARN LLM knowledge‑transfer approach that together improve ad system performance.

AdvertisingKuaishouLLM

0 likes · 6 min read

How Kuaishou Uses Large Models to Supercharge Ad Targeting with COPE and LEARN

DataFunSummit

Oct 10, 2025 · Artificial Intelligence

How Kuaishou Boosted Ad Performance with Multimodal Large Models

This article reviews Kuaishou's two‑year exploration of large‑model techniques in advertising, outlining challenges in content‑domain ad estimation, introducing the COPE unified content representation framework and the LEARN LLM knowledge‑transfer approach, and showing how these innovations delivered tangible business gains.

AIAdvertisingKnowledge Transfer

0 likes · 5 min read

How Kuaishou Boosted Ad Performance with Multimodal Large Models

DataFunSummit

Oct 9, 2025 · Artificial Intelligence

How Kuaishou Boosted Ad Performance with Multimodal Large Models: COPE & LEARN

This article reviews Kuaishou's two‑year exploration of multimodal large‑model techniques for advertising, detailing challenges of fragmented user behavior, the COPE unified product representation framework, and the LEARN LLM knowledge‑transfer approach that together delivered measurable business gains.

AIAdvertisingKnowledge Transfer

0 likes · 6 min read

How Kuaishou Boosted Ad Performance with Multimodal Large Models: COPE & LEARN

Data Party THU

Oct 9, 2025 · Artificial Intelligence

Can One Model Master All Audio‑Visual Tasks? Introducing Crab’s Unified Approach

This article presents Crab, a unified audio‑visual scene understanding model that leverages a novel display‑cooperation learning paradigm, introduces the AV‑UIE dataset with explicit reasoning steps, and demonstrates superior performance across temporal, spatial, pixel‑level, and spatio‑temporal tasks through extensive experiments and ablations.

LoRAMultimodalaudio-visual

0 likes · 12 min read

Can One Model Master All Audio‑Visual Tasks? Introducing Crab’s Unified Approach

DataFunSummit

Oct 8, 2025 · Artificial Intelligence

How Kuaishou Boosted Ad Performance with Multimodal LLMs and the COPE Framework

This article reviews Kuaishou’s two‑year exploration of large‑model techniques in advertising, detailing the content‑domain estimation challenges, how multimodal and LLM approaches improve full‑domain behavior utilization and external knowledge integration, and introducing the COPE product‑content representation framework and the LEARN LLM knowledge‑transfer system.

AdvertisingKuaishouLLM

0 likes · 7 min read

How Kuaishou Boosted Ad Performance with Multimodal LLMs and the COPE Framework

Data Party THU

Oct 6, 2025 · Artificial Intelligence

How OneCAT Redefines Multimodal AI with a Decoder‑Only Architecture

OneCAT introduces a unified decoder‑only transformer that eliminates separate visual encoders, employs a modality‑specific MoE, integrates multi‑scale visual generation, and achieves state‑of‑the‑art performance and efficiency across multimodal understanding, text‑to‑image synthesis, and image editing tasks.

AI modelEfficiencyMultimodal

0 likes · 14 min read

How OneCAT Redefines Multimodal AI with a Decoder‑Only Architecture

DataFunSummit

Sep 30, 2025 · Artificial Intelligence

How Kuaishou Uses Large Models to Boost Ad Performance with COPE and LEARN

This article outlines Kuaishou's two‑year exploration of large‑model techniques in advertising, detailing challenges of sparse cross‑domain data, the COPE unified product representation framework, and the LEARN LLM knowledge‑transfer approach that together improve ad system effectiveness.

COPELLMLarge Models

0 likes · 6 min read

How Kuaishou Uses Large Models to Boost Ad Performance with COPE and LEARN

DataFunSummit

Sep 30, 2025 · Artificial Intelligence

How Kuaishou Boosted Ad Performance with Multimodal LLMs: COPE & LEARN Frameworks

Over the past two years, Kuaishou has leveraged multimodal large‑model techniques to overcome sparse advertising data, integrating full‑domain user behavior and external knowledge via the COPE unified product representation framework and the LEARN LLM knowledge‑transfer system, achieving measurable business gains.

KuaishouLLMLarge Models

0 likes · 6 min read

How Kuaishou Boosted Ad Performance with Multimodal LLMs: COPE & LEARN Frameworks

Tech Freedom Circle

Sep 27, 2025 · Artificial Intelligence

What Is an AI‑Native Application and How to Design One?

The article explains the concept of AI‑native applications, distinguishes them from AI‑plugin extensions, outlines their core principles such as model‑first design, data flywheel, event‑driven agents, multimodal semantics, continuous learning, and provides a seven‑step practical guide with code examples for building an AI‑native app.

AI assistantAI-nativeData Flywheel

0 likes · 23 min read

What Is an AI‑Native Application and How to Design One?

Bighead's Algorithm Notes

Sep 26, 2025 · Artificial Intelligence

Paper Summaries: Recent AI-Driven Finance Research (Sep 20‑26, 2025)

This article presents concise English summaries of four recent arXiv papers that explore AI-driven trading frameworks, dual‑view risk‑relation identification from 10‑K filings, multimodal language models for financial forecasting, and credit‑spread prediction enhanced by non‑financial data, highlighting their methods, datasets, and performance results.

AICredit SpreadsFinance

0 likes · 9 min read

Paper Summaries: Recent AI-Driven Finance Research (Sep 20‑26, 2025)

AIWalker

Sep 23, 2025 · Artificial Intelligence

Manzano: A Small 3B Multimodal Model That Unifies Image Understanding and Generation with SOTA Performance

Manzano introduces a hybrid vision tokenizer and a three‑stage training recipe that let a 3‑billion‑parameter multimodal LLM achieve state‑of‑the‑art results on both image‑understanding benchmarks and text‑to‑image generation, while scaling smoothly to larger sizes and minimizing task conflict.

AI researchLarge Language ModelManzano

0 likes · 25 min read

Manzano: A Small 3B Multimodal Model That Unifies Image Understanding and Generation with SOTA Performance

HyperAI Super Neural

Sep 12, 2025 · Industry Insights

Why Apple and ASML Back Mistral AI: Inside Its Tech, Funding and Controversies

The article examines Mistral AI's rapid rise—from its Paris founding and record‑breaking seed round to ASML's €1.3 billion C‑round stake and Apple acquisition rumors—detailing its lightweight and multimodal models, open‑source strategy, product ecosystem, and the plagiarism and geopolitical debates that shape its valuation.

AI modelsASMLApple

0 likes · 15 min read

Why Apple and ASML Back Mistral AI: Inside Its Tech, Funding and Controversies

Architect's Journey

Sep 12, 2025 · Artificial Intelligence

Coze vs Yuanqi: In‑Depth Comparison of Two AI Agent Platforms – Who Will Own the Future?

This article provides a detailed side‑by‑side analysis of ByteDance's Coze and Tencent's Yuanqi, examining their features, performance, ecosystem integration, free‑tier limits, target users, and future prospects to help developers and enterprises choose the platform that best fits their needs.

AI agentsCozeEcosystem Integration

0 likes · 13 min read

Coze vs Yuanqi: In‑Depth Comparison of Two AI Agent Platforms – Who Will Own the Future?

DataFunTalk

Sep 11, 2025 · Artificial Intelligence

How AI Dressing and Multimodal Models Transform Home Service Experiences

During a pre-conference interview, AI expert Wang Mingzhong details how multimodal AI dressing, video résumé creation, short‑video templates, and interactive digital‑human live streams are technically realized for 58 Home Services, highlighting model training, workflow optimization, and future fusion of template‑based and agent‑driven video generation.

AIDomestic ServiceMultimodal

0 likes · 11 min read

How AI Dressing and Multimodal Models Transform Home Service Experiences

DataFunTalk

Sep 7, 2025 · Artificial Intelligence

Why Apple’s FastVLM Is 85× Faster and What It Means for On‑Device AI

Apple recently open‑sourced its FastVLM and MobileCLIP2 models, showcasing a multimodal vision‑language system that runs up to 85 times faster than comparable models, enabling real‑time AI on iPhones and other edge devices while illustrating Apple’s broader “B‑plan” of on‑device small‑model AI strategy.

AppleFastVLMMultimodal

0 likes · 15 min read

Why Apple’s FastVLM Is 85× Faster and What It Means for On‑Device AI

ByteDance Data Platform

Sep 3, 2025 · Artificial Intelligence

Revolutionizing AI Data Lakes: How Daft + Lance Enable Multimodal Processing

This article explores how the LAS team's AI‑driven data lake solution, built on Daft for lake computing and Lance for lake storage, tackles the emerging challenges of multimodal data handling, offering faster I/O, heterogeneous CPU‑GPU scheduling, and seamless integration for AI workloads.

AIDaftDistributed computing

0 likes · 11 min read

Revolutionizing AI Data Lakes: How Daft + Lance Enable Multimodal Processing

Fun with Large Models

Sep 3, 2025 · Artificial Intelligence

Mastering Multimodal Fine-Tuning of Large Models: Interview‑Ready Techniques

The article explains how to fine‑tune large multimodal models by focusing on the projection layer, optionally using LORA for language‑model adaptation, and highlights data alignment, common applications, and the added difficulty of modality alignment for interview preparation.

Large ModelsMultimodalfine-tuning

0 likes · 6 min read

Mastering Multimodal Fine-Tuning of Large Models: Interview‑Ready Techniques

AI2ML AI to Machine Learning

Sep 2, 2025 · Artificial Intelligence

Why Enterprise Large‑Model Digitalization Is So Hard: Key Challenges and Capabilities

The article analyzes why enterprise‑wide large‑model AI projects face steep hurdles, outlining required human capabilities, historical labor shifts, current hot technologies such as RAG, Agent, CoT and multimodal, their limits, a three‑stage implementation roadmap, typical case pitfalls, and the key success factors for sustainable digital transformation.

AgentCoTMultimodal

0 likes · 15 min read

Why Enterprise Large‑Model Digitalization Is So Hard: Key Challenges and Capabilities

IT Services Circle

Sep 1, 2025 · Artificial Intelligence

Unlocking Gemini CLI: Extending Google’s AI Agent for Any LLM

This article introduces the rapidly popular Gemini CLI, compares it with Claude Code, explains its core features, demonstrates coding, multimodal, and MCP use cases, and details the author’s Easy LLM CLI fork that enables custom model integration, flexible configuration, and direct code embedding for developers.

AI agentGemini CLILLM integration

0 likes · 15 min read

Unlocking Gemini CLI: Extending Google’s AI Agent for Any LLM

Data Party THU

Aug 31, 2025 · Artificial Intelligence

How Google’s Gemini 2.5 “Nano Banana” Redefines Image Generation and Editing

Google’s Gemini 2.5 Flash model, codenamed “Nano Banana”, dramatically improves visual quality, natural editing, identity consistency, instruction following, and generation speed, while researchers discuss its new metrics, interleaved generation capabilities, comparisons with Imagen, and future directions for smarter, more factual multimodal AI.

AI modelGeminiMultimodal

0 likes · 23 min read

How Google’s Gemini 2.5 “Nano Banana” Redefines Image Generation and Editing

DataFunTalk

Aug 26, 2025 · Artificial Intelligence

Exploring Cutting-Edge AI & Knowledge Graph Applications: A Curated Resource Guide

This resource guide presents a curated list of cutting‑edge topics—including multimodal GraphRAG, knowledge‑graph‑driven large‑model applications in finance, traditional Chinese medicine, automotive manufacturing, and knowledge‑management trends—offering insights into AI‑powered knowledge services, and invites readers to scan the QR code to download the full e‑book.

AILarge Language ModelMultimodal

0 likes · 2 min read

Exploring Cutting-Edge AI & Knowledge Graph Applications: A Curated Resource Guide

Qborfy AI

Aug 25, 2025 · Artificial Intelligence

Unlocking AI Understanding: A Deep Dive into Embeddings and Their Real‑World Applications

This article explains how embeddings transform discrete items such as text, images, or user actions into continuous vectors, walks through the step‑by‑step workflow—from tokenization to normalization—highlights core properties, compares popular models, and showcases practical use cases in e‑commerce intent filtering and medical image retrieval, all backed by concrete examples and code.

AI fundamentalsEmbeddingsMultimodal

0 likes · 7 min read

Unlocking AI Understanding: A Deep Dive into Embeddings and Their Real‑World Applications

Kuaishou Tech

Aug 23, 2025 · Artificial Intelligence

How Thyme Enables Models to Think Beyond Images with Code‑Driven Multimodal Reasoning

The Kwai Keye team presents Thyme, a novel multimodal reasoning framework that lets large language models generate and safely execute Python code for image manipulation and complex calculations, achieving significant performance gains over existing vision‑language models across perception, reasoning, and hallucination‑reduction benchmarks.

AI researchLarge Language ModelMultimodal

0 likes · 12 min read

How Thyme Enables Models to Think Beyond Images with Code‑Driven Multimodal Reasoning

Instant Consumer Technology Team

Aug 21, 2025 · Artificial Intelligence

How Data‑Juicer Supercharges LLM Training with High‑Quality Multimodal Data

Data‑Juicer is an open‑source, one‑stop multimodal data processing system that provides fine‑grained operators, scalable pipelines, and ready‑made recipes to deliver high‑quality, diverse, and model‑friendly data for large language model pre‑training, fine‑tuning, and multimodal applications.

AILLMMultimodal

0 likes · 22 min read

How Data‑Juicer Supercharges LLM Training with High‑Quality Multimodal Data

AI Info Trend

Aug 19, 2025 · Industry Insights

What’s Driving the AI Revolution in 2025? Key Trends and Insights

The 2025 H1 AI Core Achievements and Trends report reveals how agents are reshaping productivity, models are gaining inference power and becoming smaller, reinforcement learning is overtaking pre‑training, and industry competition is intensifying, with China and the US narrowing their technology gap.

AIChinaIndustry Insights

0 likes · 10 min read

What’s Driving the AI Revolution in 2025? Key Trends and Insights

AI Info Trend

Aug 13, 2025 · Industry Insights

How China’s AI Labs Are Closing the Gap with the US in Q2 2025

The Q2 2025 State of AI report analyzes Chinese AI labs’ rapid progress across language models, open‑source weights, and multimodal generation, showing a shrinking performance gap with US leaders, detailed benchmark scores, ecosystem classifications, and emerging competitive dynamics.

AIChinaMultimodal

0 likes · 10 min read

How China’s AI Labs Are Closing the Gap with the US in Q2 2025

Data Party THU

Aug 11, 2025 · Artificial Intelligence

Can Hidden Signals Reveal Multimodal Model Jailbreaks? Introducing HiddenDetect

This article presents HiddenDetect, a training‑free method that leverages refusal‑semantic vectors and layer‑wise activation analysis to detect jailbreak attempts in multimodal large language models, revealing distinct safety signals across text and image modalities and demonstrating strong performance on several LVLM benchmarks.

LVLMMultimodalactivation analysis

0 likes · 7 min read

Can Hidden Signals Reveal Multimodal Model Jailbreaks? Introducing HiddenDetect

Volcano Engine Developer Services

Aug 6, 2025 · Artificial Intelligence

How VeOmni Revolutionizes Multimodal Model Training with 40% Speed Gains

VeOmni, ByteDance’s open‑source unified multimodal training framework, tackles fragmented training pipelines by integrating LoRA fine‑tuning, FSDP, Ulysses, and Expert Parallel, delivering up to 40% higher throughput, up to 55% memory savings, and streamlined one‑click deployment for LLM, VLM, and video models.

AIFrameworkMultimodal

0 likes · 14 min read

How VeOmni Revolutionizes Multimodal Model Training with 40% Speed Gains

AI Info Trend

Aug 4, 2025 · Industry Insights

How AI Agents and Small Models Are Redefining Productivity in 2025 H1

The report analyzes first‑half‑2025 AI breakthroughs, covering the rise of general‑purpose agents, rapid inference improvements, small‑model proliferation, reinforcement‑learning compute dominance, evolving transformer architectures, and shifting industry dynamics, offering actionable insights for researchers, product leaders, and decision‑makers.

AIAgentLarge Language Model

0 likes · 9 min read

How AI Agents and Small Models Are Redefining Productivity in 2025 H1

AIWalker

Aug 4, 2025 · Artificial Intelligence

Can Lumina-mGPT 2.0 Replace Diffusion Models? A Deep Dive into Its Autoregressive Power

Lumina-mGPT 2.0 is a decoder‑only, zero‑shot trained autoregressive image model that rivals diffusion systems like DALL·E 3 in quality while offering unified multimodal tokenization, flexible multi‑task generation, and several inference‑speed tricks, yet it still faces licensing, scaling and sampling‑time challenges.

AI model analysisInference OptimizationLumina-mGPT

0 likes · 22 min read

Can Lumina-mGPT 2.0 Replace Diffusion Models? A Deep Dive into Its Autoregressive Power

DataFunTalk

Jul 21, 2025 · Artificial Intelligence

Top AI & Knowledge Graph Resources: A Curated Guide to Emerging Research

This article presents a curated list of cutting‑edge resources covering multimodal GraphRAG, knowledge‑graph‑driven large‑model applications in finance, healthcare, automotive, and more, offering insights into the evolving synergy between AI and knowledge graphs.

AIMultimodalknowledge graph

0 likes · 2 min read

Top AI & Knowledge Graph Resources: A Curated Guide to Emerging Research

DataFunSummit

Jul 16, 2025 · Artificial Intelligence

Unlock AI Frontiers: Multimodal GraphRAG, Knowledge Graphs, and Large‑Model Innovations

This resource compiles a curated list of cutting‑edge studies and frameworks that combine multimodal GraphRAG, knowledge‑graph techniques, and large‑model AI across sectors such as finance, healthcare, and manufacturing, guiding readers through emerging paradigms and future directions.

AIDocument IntelligenceLarge Models

0 likes · 2 min read

Unlock AI Frontiers: Multimodal GraphRAG, Knowledge Graphs, and Large‑Model Innovations

DataFunSummit

Jul 14, 2025 · Artificial Intelligence

How AI Agents Transform E‑commerce Content from Production to Optimization

This presentation explores the evolution of AI agents in e‑commerce content creation, detailing the transition from text‑only industrial production (1.0) to multimodal image and video generation (2.0) and finally to quality‑driven optimization and decision‑making (3.0), highlighting technical architectures, challenges, and future directions.

AIMultimodalautomation

0 likes · 27 min read

How AI Agents Transform E‑commerce Content from Production to Optimization

Fun with Large Models

Jul 10, 2025 · Artificial Intelligence

Grok 4: The ‘Problem‑Solving Champion’ That Falters in Real‑World Use – Detailed Evaluation

The article reviews Grok 4’s flashy launch and claimed first‑principles advantage, then presents benchmark results—showing strong reasoning, multimodal and agent scores but disappointing coding performance versus DeepSeek‑R1—concluding that the model’s real‑world capabilities fall short of its hype.

AgentGrok4LLM

0 likes · 11 min read

Grok 4: The ‘Problem‑Solving Champion’ That Falters in Real‑World Use – Detailed Evaluation

Instant Consumer Technology Team

Jul 10, 2025 · Artificial Intelligence

How LLMs and Vector Search Power Real-Time Icon Recommendations

This article explains a system that combines large language models with multimodal vector retrieval to automatically understand user intent and instantly recommend the most relevant icons, detailing the workflow, semantic vectorization, offline indexing, online inference, and evaluation methods.

CLIPHNSWLLM

0 likes · 13 min read

How LLMs and Vector Search Power Real-Time Icon Recommendations

DataFunTalk

Jul 10, 2025 · Artificial Intelligence

Inside Elon Musk’s Grok‑4 Launch: Breakthrough AI Capabilities and Pricing

Elon Musk unveiled Grok‑4, a subscription‑based AI reasoning model that claims near‑human performance on elite exams, showcases unprecedented benchmark scores, multimodal understanding, voice synthesis, and a roadmap of upcoming coding and video generation models, while introducing a $30/month and $300/month tier.

AI modelGrok 4Multimodal

0 likes · 6 min read

Inside Elon Musk’s Grok‑4 Launch: Breakthrough AI Capabilities and Pricing

Kuaishou Tech

Jul 7, 2025 · Artificial Intelligence

8 Kuaishou Papers Spotlighted at ICML 2025: Multimodal AI, Causal Inference and More

Kuaishou has had eight cutting‑edge papers accepted at the International Conference on Machine Learning 2025, covering breakthroughs in multimodal emotion modeling, monotonic probability learning, causal effect generalization, cascade ranking, multimodal LLM alignment, ultra‑low‑rate image compression, and visual autoregressive super‑resolution, with links to each work and accompanying code repositories.

AIMachine LearningMultimodal

0 likes · 13 min read

8 Kuaishou Papers Spotlighted at ICML 2025: Multimodal AI, Causal Inference and More

DataFunSummit

Jul 6, 2025 · Artificial Intelligence

AI-Driven Knowledge Graphs: Key Insights from Multimodal GraphRAG Research

This article presents a comprehensive overview of cutting‑edge research on integrating large language models with knowledge graphs, covering multimodal GraphRAG, financial AI solutions, traditional Chinese medicine decision support, and industry‑specific knowledge services, guiding readers through emerging paradigms and practical implementations.

AILarge Language ModelMultimodal

0 likes · 2 min read

AI-Driven Knowledge Graphs: Key Insights from Multimodal GraphRAG Research

AntTech

Jul 3, 2025 · Artificial Intelligence

How Ant Group’s AI Multimodal Evaluation Transforms Image, Speech, and Video Quality Testing

In a QECon 2025 talk, Ant Group’s AI team detailed a comprehensive multimodal evaluation framework that leverages large‑model metrics, custom pipelines, and benchmark datasets to assess image generation, speech recognition, and video quality, while also contributing to industry standards and academic research.

AI evaluationLarge ModelsMultimodal

0 likes · 16 min read

How Ant Group’s AI Multimodal Evaluation Transforms Image, Speech, and Video Quality Testing

DataFunTalk

Jul 3, 2025 · Artificial Intelligence

How Vivo’s Blue Heart XiaoV Leverages LLMs to Transform Conversational Recommendations

In an interview with Vivo AI engineer Liang Tianan, the article explores the challenges of post‑Q&A recommendation, the integration of large language models into recall, ranking and evaluation pipelines, and the engineering trade‑offs required to deliver high‑quality, diverse suggestions on mobile devices.

LLMModel CompressionMultimodal

0 likes · 15 min read

How Vivo’s Blue Heart XiaoV Leverages LLMs to Transform Conversational Recommendations

DataFunTalk

Jun 29, 2025 · Artificial Intelligence

Large Models Boost Douyin User Experience: Expert Insights

In an interview at the DA Digital Intelligence Conference, ByteDance AI specialist Cai Conghuai explains how large language models, combined with techniques like SFT, DPO, and RAG, are reshaping Douyin's user‑experience signal detection, root‑cause analysis, and evaluation, while outlining future AI‑agent breakthroughs.

AIDPOMultimodal

0 likes · 12 min read

Large Models Boost Douyin User Experience: Expert Insights

Alibaba Cloud Developer

Jun 26, 2025 · Artificial Intelligence

How to Build a Multi‑Dimensional Evaluation Framework for AI‑Powered Data Analysis Platforms

This article outlines the design of a scientific, quantifiable, multi‑dimensional evaluation system for the DataV‑Note intelligent analysis platform, addressing the lack of unified standards and accuracy challenges in AI‑driven data reporting, and proposes concrete metrics, model architecture, and future automation plans.

AI evaluationData AnalysisMetrics

0 likes · 13 min read

How to Build a Multi‑Dimensional Evaluation Framework for AI‑Powered Data Analysis Platforms

Open Source Linux

Jun 12, 2025 · Artificial Intelligence

From Transformers to DeepSeek‑R1: The Evolution of Large Language Models (2017‑2025)

This article chronicles the rapid development of large language models from the 2017 Transformer breakthrough through the rise of BERT, GPT‑3, multimodal models, alignment techniques like RLHF, and finally the cost‑efficient DeepSeek‑R1 in 2025, highlighting key innovations, scaling trends, and real‑world impacts.

AI alignmentModel ScalingMultimodal

0 likes · 26 min read

From Transformers to DeepSeek‑R1: The Evolution of Large Language Models (2017‑2025)

AI Algorithm Path

Jun 11, 2025 · Artificial Intelligence

OpenAI's O3‑Pro Model: Deep Reasoning, Pricing, Benchmarks, and Access Guide

OpenAI introduced the O3‑Pro multimodal deep‑reasoning model with an 80% price cut for O3, detailed its training via large‑scale reinforcement learning, compared its capabilities and costs against GPT‑4o, GPT‑4.1 and O3‑Pro, listed its core specs, limitations, access methods, and presented benchmark tests that highlight both strengths and weaknesses.

AIMultimodalO3-Pro

0 likes · 10 min read

OpenAI's O3‑Pro Model: Deep Reasoning, Pricing, Benchmarks, and Access Guide

Baidu Tech Salon

Jun 11, 2025 · Artificial Intelligence

Why Baidu’s Wenxin Model Dominates IDC’s 2025 Large Model Evaluation

IDC’s 2025 China foundational large‑model evaluation crowns Baidu’s Wenxin as the top performer, scoring perfect marks in seven of eight criteria and highlighting its superior multimodal, dialogue, and ecosystem capabilities among twelve leading models.

AIBaidu WenxinIDC evaluation

0 likes · 5 min read

Why Baidu’s Wenxin Model Dominates IDC’s 2025 Large Model Evaluation

Kuaishou Audio & Video Technology

Jun 11, 2025 · Artificial Intelligence

Kuaishou Showcases 12 Cutting-Edge CVPR 2025 Papers on Video Generation and AI

Kuaishou presented twelve peer‑reviewed papers at CVPR 2025 covering video quality assessment, large‑scale video datasets, dynamic 3D avatar reconstruction, 4D scene simulation, controllable video generation, scaling laws for diffusion transformers, multimodal foundations, and more, highlighting the company's leading research in computer vision and AI.

AI researchCVPR2025Multimodal

0 likes · 21 min read

Kuaishou Showcases 12 Cutting-Edge CVPR 2025 Papers on Video Generation and AI

DataFunSummit

Jun 10, 2025 · Artificial Intelligence

How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety

Quwan Technology presents its Kaitian social large model, designed for personalized, emotionally rich, multimodal AI interactions, detailing its scene‑specific goals, CPT+SFT+RLHF training pipeline, data desensitization, LoRA fine‑tuning, evaluation methods, pruning, latency trade‑offs, safety mechanisms, and future feedback loops.

AI safetyLarge Language ModelLoRA

0 likes · 13 min read

How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety

IT Services Circle

Jun 7, 2025 · Artificial Intelligence

Run Powerful Multimodal AI Offline on a 2 GB Android Phone with Google AI Edge Gallery

Google AI Edge Gallery lets a 2 GB RAM Android phone run sophisticated multimodal AI models completely offline, offering image understanding, text generation, and conversational capabilities without any network connection, and provides a quick three‑step setup to start experimenting with on‑device AI.

AIAndroidGoogle

0 likes · 5 min read

Run Powerful Multimodal AI Offline on a 2 GB Android Phone with Google AI Edge Gallery

Kuaishou Tech

Jun 5, 2025 · Artificial Intelligence

7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding

Kuaishou’s foundational large-model team has secured seven papers at ACL 2025, spanning alignment bias in training, safety defenses during inference, decoding strategies, fine-grained video-temporal understanding, reward fairness in RLHF, multimodal captioning benchmarks, and methods to curb hallucinations in vision-language models.

AI safetyMultimodalacl

0 likes · 13 min read

7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding

Fighter's World

Jun 2, 2025 · Artificial Intelligence

Why Is Context King for Large Language Models?

This article provides a comprehensive technical analysis of LLM context, covering its definition, types, tokenization, window‑size evolution, diminishing returns, management techniques such as RAG, CoT, memory‑as‑a‑service, and future challenges like multimodal fusion, privacy, and autonomous agent memory.

Agent MemoryContext ManagementLLM

0 likes · 48 min read

Why Is Context King for Large Language Models?

Baidu MEUX

May 28, 2025 · Artificial Intelligence

Top 10 AI Breakthroughs This Week: New Models, Tools, and Industry Moves

This roundup highlights ten recent AI developments, from Apple's Matrix3D model that creates 3D scenes from photos, to Qwen's Deep Research assistant, Tencent's CodeBuddy 3.0, ByteDance's Seed1.5‑VL, Step Star's open‑source Step1X‑3D, Google's iOS icon refresh, Apple's eye‑tracking scrolling test, Chrome's upcoming Gemini AI assistant, Shanghai's AI Identity Ecosystem Alliance, and Kuaishou's Keling AI 2.0 topping the global video‑generation leaderboard.

3D generationAI assistantsAI models

0 likes · 5 min read

Top 10 AI Breakthroughs This Week: New Models, Tools, and Industry Moves

DataFunTalk

May 23, 2025 · Artificial Intelligence

2025 AI Landscape: Inference Models Dominate, Open‑Source Momentum Accelerates

The 2025 Q1 AI report from Artificial Analysis highlights six major trends—including a thousand‑fold drop in inference cost, the rise of MoE models, the growing parity of Chinese open‑source labs, the emergence of autonomous AI agents, native multimodal capabilities, and the trade‑off between performance, cost, and context windows—painting a picture of a rapidly evolving, increasingly competitive AI ecosystem.

AIMultimodalOpen Source

0 likes · 11 min read

2025 AI Landscape: Inference Models Dominate, Open‑Source Momentum Accelerates

Baidu Tech Salon

May 21, 2025 · Artificial Intelligence

Baidu AI Day 2024: Wenxin X1 Turbo Sets New Benchmark with Top‑Level Evaluation and Advanced Multimodal Capabilities

At Baidu AI Day in Beijing, the company unveiled the Wenxin 4.5 Turbo and X1 Turbo models, detailing multimodal training breakthroughs, self‑feedback loops, enhanced reasoning and tool‑calling, while the China Academy of Information and Communications Technology awarded X1 Turbo the highest "4+" rating across 24 capability tests, highlighting its leading position in domestic large‑model performance.

BaiduMultimodalWenxin

0 likes · 9 min read

Baidu AI Day 2024: Wenxin X1 Turbo Sets New Benchmark with Top‑Level Evaluation and Advanced Multimodal Capabilities

Tencent Technical Engineering

May 19, 2025 · Artificial Intelligence

RAG, Agents, and Multimodal Large Models: Evolution, Challenges, and Future Trends

This article examines the evolution of large model technologies—including Retrieval‑Augmented Generation, AI agents, and multimodal models—detailing their technical foundations, practical challenges, industry applications, and future development trends, offering a comprehensive perspective for AI practitioners and researchers.

AI agentKnowledge retrievalMultimodal

0 likes · 14 min read

RAG, Agents, and Multimodal Large Models: Evolution, Challenges, and Future Trends

Bilibili Tech

May 16, 2025 · Artificial Intelligence

How FineVQ Sets New Standards for Fine‑Grained UGC Video Quality Assessment

The article introduces FineVD, the first large‑scale multi‑dimensional UGC video quality dataset, and presents FineVQ, a unified model that predicts quality scores, attributes, and distortion types across six dimensions, achieving state‑of‑the‑art performance on multiple benchmarks and cross‑dataset evaluations.

FineVQMultimodalUGC

0 likes · 9 min read

How FineVQ Sets New Standards for Fine‑Grained UGC Video Quality Assessment

Network Intelligence Research Center (NIRC)

May 14, 2025 · Artificial Intelligence

Hands‑On CLIP: Implementing Multimodal Vision‑Language Understanding

This article introduces OpenAI’s CLIP multimodal model, explains its architecture and contrastive training, details hardware and installation steps, and demonstrates a hands‑on zero‑shot image classification workflow that achieves 97% confidence on a cat image without any task‑specific fine‑tuning.

CLIPMultimodalPython

0 likes · 6 min read

Hands‑On CLIP: Implementing Multimodal Vision‑Language Understanding

DevOps

May 13, 2025 · Artificial Intelligence

The Rise of AI Agents: Current Trends, Core Capabilities, and Future Outlook

This article surveys the rapid emergence of AI agents, outlining their projected 2025 breakthrough, market momentum, key frameworks such as Manus and MCP, the four core abilities of perception, planning, tool use, and memory, and the evolving landscape of multimodal and autonomous AI systems.

AI agentsArtificial IntelligenceMemory

0 likes · 11 min read

The Rise of AI Agents: Current Trends, Core Capabilities, and Future Outlook

DataFunSummit

May 13, 2025 · Artificial Intelligence

Integrating Large Language Models and Knowledge Graphs for Financial Applications: Challenges, Solutions, and Future Directions

This talk explores the technical challenges of applying large language models and knowledge graphs in finance, discusses solutions such as RAG enhancements, graph‑guided retrieval, multimodal extensions, and presents future research directions including multimodal graph integration, agentic systems, and decision‑making applications.

AIAgentic SystemsFinance

0 likes · 33 min read

Integrating Large Language Models and Knowledge Graphs for Financial Applications: Challenges, Solutions, and Future Directions

Alimama Tech

May 12, 2025 · Artificial Intelligence

Universal Recommendation Model (URM): A General Large‑Model Recall System for Advertising

The article presents the Universal Recommendation Model (URM), a large‑language‑model‑based recall framework that integrates world knowledge and e‑commerce expertise through knowledge injection and prompt‑driven alignment, achieving significant offline recall gains and a 3.1% increase in ad consumption while meeting high‑QPS, low‑latency production constraints.

AdvertisingLarge Language ModelMultimodal

0 likes · 17 min read

Universal Recommendation Model (URM): A General Large‑Model Recall System for Advertising

AntTech

May 12, 2025 · Industry Insights

How AI Large Models Are Revolutionizing Multimodal Content Safety

An award‑winning joint project by Shanghai Jiao Tong University and Ant Group unveils a multimodal foundation model and advanced detection techniques that dramatically improve AI‑driven content risk governance across massive online services.

AIAnt GroupContent Safety

0 likes · 3 min read

How AI Large Models Are Revolutionizing Multimodal Content Safety

Alibaba Cloud Developer

May 9, 2025 · Information Security

What’s New in MCP 2025‑03‑26? Deep Dive into OAuth 2.1, Streamable HTTP, and JSON‑RPC Enhancements

The MCP 2025‑03‑26 release introduces mandatory OAuth 2.1 with PKCE, a single‑endpoint Streamable HTTP transport, required JSON‑RPC batch processing, richer tool metadata, structured progress notifications, audio multimodal support, and robust session management, all backed by extensive security hardening and performance gains.

API SecurityJSON-RPCMCP

0 likes · 14 min read

What’s New in MCP 2025‑03‑26? Deep Dive into OAuth 2.1, Streamable HTTP, and JSON‑RPC Enhancements