Tag

multimodal

1 views collected around this technical thread.

Baidu Tech Salon
Baidu Tech Salon
Jun 11, 2025 · Artificial Intelligence

Why Baidu’s Wenxin Model Dominates IDC’s 2025 Large Model Evaluation

IDC’s 2025 China foundational large‑model evaluation crowns Baidu’s Wenxin as the top performer, scoring perfect marks in seven of eight criteria and highlighting its superior multimodal, dialogue, and ecosystem capabilities among twelve leading models.

AIBaidu WenxinIDC evaluation
0 likes · 5 min read
Why Baidu’s Wenxin Model Dominates IDC’s 2025 Large Model Evaluation
Kuaishou Audio & Video Technology
Kuaishou Audio & Video Technology
Jun 11, 2025 · Artificial Intelligence

Kuaishou Showcases 12 Cutting-Edge CVPR 2025 Papers on Video Generation and AI

Kuaishou presented twelve peer‑reviewed papers at CVPR 2025 covering video quality assessment, large‑scale video datasets, dynamic 3D avatar reconstruction, 4D scene simulation, controllable video generation, scaling laws for diffusion transformers, multimodal foundations, and more, highlighting the company's leading research in computer vision and AI.

AI researchCVPR2025Computer Vision
0 likes · 21 min read
Kuaishou Showcases 12 Cutting-Edge CVPR 2025 Papers on Video Generation and AI
DataFunSummit
DataFunSummit
Jun 10, 2025 · Artificial Intelligence

How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety

Quwan Technology presents its Kaitian social large model, designed for personalized, emotionally rich, multimodal AI interactions, detailing its scene‑specific goals, CPT+SFT+RLHF training pipeline, data desensitization, LoRA fine‑tuning, evaluation methods, pruning, latency trade‑offs, safety mechanisms, and future feedback loops.

AI safetyLoRARLHF
0 likes · 13 min read
How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety
IT Services Circle
IT Services Circle
Jun 7, 2025 · Artificial Intelligence

Run Powerful Multimodal AI Offline on a 2 GB Android Phone with Google AI Edge Gallery

Google AI Edge Gallery lets a 2 GB RAM Android phone run sophisticated multimodal AI models completely offline, offering image understanding, text generation, and conversational capabilities without any network connection, and provides a quick three‑step setup to start experimenting with on‑device AI.

AIGoogleMobile
0 likes · 5 min read
Run Powerful Multimodal AI Offline on a 2 GB Android Phone with Google AI Edge Gallery
Kuaishou Tech
Kuaishou Tech
Jun 5, 2025 · Artificial Intelligence

7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding

Kuaishou’s foundational large-model team has secured seven papers at ACL 2025, spanning alignment bias in training, safety defenses during inference, decoding strategies, fine-grained video-temporal understanding, reward fairness in RLHF, multimodal captioning benchmarks, and methods to curb hallucinations in vision-language models.

ACLAI safetybenchmark
0 likes · 13 min read
7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding
DataFunTalk
DataFunTalk
May 23, 2025 · Artificial Intelligence

2025 AI Landscape: Inference Models Dominate, Open‑Source Momentum Accelerates

The 2025 Q1 AI report from Artificial Analysis highlights six major trends—including a thousand‑fold drop in inference cost, the rise of MoE models, the growing parity of Chinese open‑source labs, the emergence of autonomous AI agents, native multimodal capabilities, and the trade‑off between performance, cost, and context windows—painting a picture of a rapidly evolving, increasingly competitive AI ecosystem.

AIAgentsInference
0 likes · 11 min read
2025 AI Landscape: Inference Models Dominate, Open‑Source Momentum Accelerates
Baidu Tech Salon
Baidu Tech Salon
May 21, 2025 · Artificial Intelligence

Baidu AI Day 2024: Wenxin X1 Turbo Sets New Benchmark with Top‑Level Evaluation and Advanced Multimodal Capabilities

At Baidu AI Day in Beijing, the company unveiled the Wenxin 4.5 Turbo and X1 Turbo models, detailing multimodal training breakthroughs, self‑feedback loops, enhanced reasoning and tool‑calling, while the China Academy of Information and Communications Technology awarded X1 Turbo the highest "4+" rating across 24 capability tests, highlighting its leading position in domestic large‑model performance.

Artificial IntelligenceBaiduWenxin
0 likes · 9 min read
Baidu AI Day 2024: Wenxin X1 Turbo Sets New Benchmark with Top‑Level Evaluation and Advanced Multimodal Capabilities
Tencent Technical Engineering
Tencent Technical Engineering
May 19, 2025 · Artificial Intelligence

RAG, Agents, and Multimodal Large Models: Evolution, Challenges, and Future Trends

This article examines the evolution of large model technologies—including Retrieval‑Augmented Generation, AI agents, and multimodal models—detailing their technical foundations, practical challenges, industry applications, and future development trends, offering a comprehensive perspective for AI practitioners and researchers.

AI AgentRAGknowledge retrieval
0 likes · 14 min read
RAG, Agents, and Multimodal Large Models: Evolution, Challenges, and Future Trends
DevOps
DevOps
May 13, 2025 · Artificial Intelligence

The Rise of AI Agents: Current Trends, Core Capabilities, and Future Outlook

This article surveys the rapid emergence of AI agents, outlining their projected 2025 breakthrough, market momentum, key frameworks such as Manus and MCP, the four core abilities of perception, planning, tool use, and memory, and the evolving landscape of multimodal and autonomous AI systems.

AI agentsArtificial IntelligenceTool Integration
0 likes · 11 min read
The Rise of AI Agents: Current Trends, Core Capabilities, and Future Outlook
DataFunSummit
DataFunSummit
May 13, 2025 · Artificial Intelligence

Integrating Large Language Models and Knowledge Graphs for Financial Applications: Challenges, Solutions, and Future Directions

This talk explores the technical challenges of applying large language models and knowledge graphs in finance, discusses solutions such as RAG enhancements, graph‑guided retrieval, multimodal extensions, and presents future research directions including multimodal graph integration, agentic systems, and decision‑making applications.

AIAgentic SystemsFinance
0 likes · 33 min read
Integrating Large Language Models and Knowledge Graphs for Financial Applications: Challenges, Solutions, and Future Directions
Alimama Tech
Alimama Tech
May 12, 2025 · Artificial Intelligence

Universal Recommendation Model (URM): A General Large‑Model Recall System for Advertising

The article presents the Universal Recommendation Model (URM), a large‑language‑model‑based recall framework that integrates world knowledge and e‑commerce expertise through knowledge injection and prompt‑driven alignment, achieving significant offline recall gains and a 3.1% increase in ad consumption while meeting high‑QPS, low‑latency production constraints.

advertisinghigh QPSlarge language model
0 likes · 17 min read
Universal Recommendation Model (URM): A General Large‑Model Recall System for Advertising
Tencent Cloud Developer
Tencent Cloud Developer
May 8, 2025 · Artificial Intelligence

Advances and Future of AI Agents: Capabilities, Trends, and Applications

AI agents are rapidly evolving toward a 2025 breakthrough in perception, autonomous planning, tool use and memory, driven by multimodal models, neural‑symbolic reasoning and embodied intelligence, with $27 billion investment forecasts, exemplified by general‑purpose agents like Manus and emerging applications in code generation, research, healthcare, and risk analysis.

AGENT frameworkAI AgentAutonomous Planning
0 likes · 12 min read
Advances and Future of AI Agents: Capabilities, Trends, and Applications
DataFunTalk
DataFunTalk
May 7, 2025 · Artificial Intelligence

Google Gemini 2.5 Pro Preview 05-06: Code Generation Breakthroughs and Multimodal Video‑to‑Web Capabilities

The Gemini 2.5 Pro 05‑06 update dramatically improves code‑generation performance, tops the WebDev Arena leaderboard over Claude 3.7 Sonnet, and introduces unique video‑to‑web multimodal abilities, while still facing UI bugs and naming inconsistencies ahead of the upcoming Google I/O conference.

AICode GenerationGemini
0 likes · 7 min read
Google Gemini 2.5 Pro Preview 05-06: Code Generation Breakthroughs and Multimodal Video‑to‑Web Capabilities
Spring Full-Stack Practical Cases
Spring Full-Stack Practical Cases
May 7, 2025 · Artificial Intelligence

Unlock Multimodal AI with Spring AI: Hands‑On Image & ID Recognition Cases

This article introduces Spring AI's multimodal capabilities, explains the Message API for handling text, image, audio, and video inputs, and provides step‑by‑step Spring Boot examples for image analysis, ID card extraction, and structured JSON output of car‑color counts.

Artificial IntelligenceImage RecognitionJava
0 likes · 8 min read
Unlock Multimodal AI with Spring AI: Hands‑On Image & ID Recognition Cases
DevOps
DevOps
Apr 27, 2025 · Artificial Intelligence

Large Model Technologies: RAG, AI Agents, Multimodal Applications, and Future Trends

This article examines how Retrieval‑Augmented Generation (RAG), AI agents, and multimodal large‑model techniques are reshaping AI‑industry integration, discusses their technical challenges and practical implementations, and outlines future development directions across algorithms, products, and domain‑specific applications.

AI agentsArtificial IntelligenceLarge Models
0 likes · 14 min read
Large Model Technologies: RAG, AI Agents, Multimodal Applications, and Future Trends
Didi Tech
Didi Tech
Apr 24, 2025 · Artificial Intelligence

Algorithmic Foundations and Evolution of Natural Language Processing

The article surveys the Algorithmic Foundations of Engineering R&D series, tracing NLP’s evolution from rule‑based systems to today’s multimodal large‑model era, reviewing core machine‑learning and deep‑learning techniques, transformer breakthroughs, representation learning, optimization methods, and emerging research such as retrieval‑augmented generation and AI agents.

AINLPTransformer
0 likes · 43 min read
Algorithmic Foundations and Evolution of Natural Language Processing
Kuaishou Tech
Kuaishou Tech
Apr 23, 2025 · Artificial Intelligence

Kuaishou's Accepted Papers at ICLR 2025 and Their Summaries

The article highlights Kuashou's eleven high‑quality papers accepted at ICLR 2025, covering advances in streaming video understanding, 3D trajectory control, multimodal talking‑face animation, transformer indexing, efficient video generation, industrial recommendation datasets, token gradient conflict in MoE, stable segmentation, multi‑camera video synthesis, large‑scale multimodal instruction tuning, and hallucination detection in retrieval‑augmented generation.

AIResearchDeepLearningICLR2025
0 likes · 20 min read
Kuaishou's Accepted Papers at ICLR 2025 and Their Summaries
Swan Home Tech Team
Swan Home Tech Team
Apr 21, 2025 · Artificial Intelligence

How Front-End Teams Leverage AI: FastGPT Platform, Intelligent Search, and Video Synthesis

This article examines how a front‑end team uses AI innovations—FastGPT visual platform, AI‑powered semantic search, and AI video synthesis—to rebuild business workflows, cut costs, and boost efficiency, highlighting architecture, technical highlights, and practical use cases.

AISemantic Searchfrontend development
0 likes · 7 min read
How Front-End Teams Leverage AI: FastGPT Platform, Intelligent Search, and Video Synthesis
DataFunTalk
DataFunTalk
Apr 18, 2025 · Artificial Intelligence

Applying ByteDance’s Doubao‑1.5 Vision Model for Image Counting and Automated Annotation

The article demonstrates how ByteDance’s new Doubao‑1.5 multimodal model can be used to locate and count objects in images—such as sushi plates, street signs, and cartoon hats—by generating coordinates and overlaying visual annotations through a concise Python script.

AIComputer VisionDoubao
0 likes · 5 min read
Applying ByteDance’s Doubao‑1.5 Vision Model for Image Counting and Automated Annotation
Baidu Tech Salon
Baidu Tech Salon
Apr 16, 2025 · Artificial Intelligence

Release of the 'Fangsheng' Large Model Benchmark Results (Q1 2025) and Overview of Baidu's Wenxin 4.5 and X1 Models

The China AI Industry Alliance unveiled its Q1 2025 Fangsheng benchmark, showing Baidu’s new multimodal models—Wenxin 4.5 leading basic abilities and Wenxin X1 excelling in reasoning—available for free on the Wenxin Yiyan platform, while Baidu pledges major 2025 investments in AI, data‑center and cloud infrastructure.

AIFactTestingWenxin
0 likes · 4 min read
Release of the 'Fangsheng' Large Model Benchmark Results (Q1 2025) and Overview of Baidu's Wenxin 4.5 and X1 Models