Tag

AI research

2 views collected around this technical thread.

Kuaishou Audio & Video Technology
Kuaishou Audio & Video Technology
Jun 11, 2025 · Artificial Intelligence

Kuaishou Showcases 12 Cutting-Edge CVPR 2025 Papers on Video Generation and AI

Kuaishou presented twelve peer‑reviewed papers at CVPR 2025 covering video quality assessment, large‑scale video datasets, dynamic 3D avatar reconstruction, 4D scene simulation, controllable video generation, scaling laws for diffusion transformers, multimodal foundations, and more, highlighting the company's leading research in computer vision and AI.

AI researchCVPR2025Computer Vision
0 likes · 21 min read
Kuaishou Showcases 12 Cutting-Edge CVPR 2025 Papers on Video Generation and AI
DataFunTalk
DataFunTalk
Jun 8, 2025 · Artificial Intelligence

Why Autoregressive Video Models Like MAGI-1 May Outperform Diffusion Approaches

The article examines the current dominance of diffusion models in commercial video generation, contrasts them with autoregressive methods, and details how the open‑source MAGI‑1 model combines both paradigms to achieve longer, more controllable video synthesis while addressing scalability and quality challenges.

AI researchMAGI-1autoregressive models
0 likes · 70 min read
Why Autoregressive Video Models Like MAGI-1 May Outperform Diffusion Approaches
Architect
Architect
Jun 7, 2025 · Artificial Intelligence

Mass Framework: Boosting Multi‑Agent Design with Smarter Prompts & Topologies

The Mass framework, developed by Google and Cambridge University, automates multi‑agent system design by jointly optimizing prompts and topologies through three staged processes, demonstrating significant performance gains over existing methods across various tasks while highlighting the importance of coordinated prompt‑topology optimization.

AI researchMass frameworkmulti-agent systems
0 likes · 6 min read
Mass Framework: Boosting Multi‑Agent Design with Smarter Prompts & Topologies
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 6, 2025 · Artificial Intelligence

How dots.llm1 Sets New Benchmarks for Open‑Source MoE Language Models

dots.llm1, an open‑source 142‑billion‑parameter Mixture‑of‑Experts language model from hi lab, achieves Qwen2.5‑72B‑level performance after training on 11.2 T high‑quality tokens, and the release includes full models, intermediate checkpoints, and detailed training pipelines for the research community.

AI researchMixture of ExpertsTraining Efficiency
0 likes · 10 min read
How dots.llm1 Sets New Benchmarks for Open‑Source MoE Language Models
Kuaishou Tech
Kuaishou Tech
Jun 4, 2025 · Artificial Intelligence

KwaiCoder-AutoThink-preview: An Automatic‑Thinking Large Model Enhanced with Step‑SRPO Reinforcement Learning

The KwaiPilot team released the KwaiCoder‑AutoThink‑preview model, which introduces a novel automatic‑thinking training paradigm and a process‑supervised reinforcement‑learning method called Step‑SRPO, enabling the model to dynamically switch between thinking and non‑thinking modes, reduce inference cost, and achieve up to 20‑point gains on code and math benchmarks while handling large‑scale codebases.

AI researchautomatic thinkingcode generation
0 likes · 12 min read
KwaiCoder-AutoThink-preview: An Automatic‑Thinking Large Model Enhanced with Step‑SRPO Reinforcement Learning
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 3, 2025 · Artificial Intelligence

Beyond One-Size-Fits-All: Tailored Benchmarks for Efficient Evaluation

The TailoredBench framework dramatically reduces large‑language‑model evaluation cost and error by using a global probe set, model‑specific source selection, extensible K‑Medoids clustering, and calibration, achieving up to 300× speedup and a 31.4% MAE reduction across diverse benchmarks.

AI researchK-MedoidsLLM evaluation
0 likes · 10 min read
Beyond One-Size-Fits-All: Tailored Benchmarks for Efficient Evaluation
AntTech
AntTech
May 31, 2025 · Artificial Intelligence

Machine Reasoning and Deep Thinking: Insights from Ant Financial’s NLP Lead Wu Wei

The article explores how DeepSeek R1 and long‑thinking chains have revived interest in machine reasoning, tracing the evolution of natural‑language models, defining reasoning as logical knowledge composition, and outlining future research directions in efficient reasoning architectures and deep‑thinking applications.

AI researchdeep thinkingefficient reasoning
0 likes · 8 min read
Machine Reasoning and Deep Thinking: Insights from Ant Financial’s NLP Lead Wu Wei
DataFunTalk
DataFunTalk
Apr 25, 2025 · Artificial Intelligence

Does Reinforcement Learning Really Expand Reasoning Capacity in Large Language Models? Insights from Recent Empirical Study

Recent empirical research by Tsinghua’s LeapLab and Shanghai Jiao Tong University reveals that reinforcement‑learning‑based fine‑tuning (RLVR) improves sampling efficiency but does not extend the fundamental reasoning abilities of large language models beyond their base capabilities, as demonstrated across mathematics, code, and visual reasoning benchmarks.

AI researchRLVRlarge language models
0 likes · 12 min read
Does Reinforcement Learning Really Expand Reasoning Capacity in Large Language Models? Insights from Recent Empirical Study
Architect
Architect
Apr 21, 2025 · Artificial Intelligence

Microsoft Research Releases BitNet b1.58 2B4T: A 1‑Bit Native Large Language Model with Ultra‑Low Memory and Energy Consumption

Microsoft Research introduced BitNet b1.58 2B4T, a native 1‑bit large language model with 2 billion parameters trained on 4 trillion tokens, achieving only 0.4 GB non‑embedding memory, 0.028 J decoding energy, and 29 ms CPU latency while matching full‑precision performance.

1-bit LLMAI researchBitNet
0 likes · 7 min read
Microsoft Research Releases BitNet b1.58 2B4T: A 1‑Bit Native Large Language Model with Ultra‑Low Memory and Energy Consumption
DevOps
DevOps
Apr 13, 2025 · Artificial Intelligence

The Amazing Magic of GPT‑4o and a Speculative Technical Roadmap

This article reviews the breakthrough image‑generation capabilities of GPT‑4o, showcases diverse examples, and offers a detailed speculation on its underlying autoregressive architecture, tokenization methods, VQ‑VAE/GAN advances, and training strategies that could explain its performance.

AI researchGPT-4oImage Generation
0 likes · 16 min read
The Amazing Magic of GPT‑4o and a Speculative Technical Roadmap
AntTech
AntTech
Apr 10, 2025 · Artificial Intelligence

Ant Group Presents Four AI Research Papers at ICLR 2025 Live Showcase

At the ICLR 2025 live session in Singapore, Ant Group showcased four cutting‑edge papers—CodePlan, Animate‑X, Group Position Embedding, and OmniKV—demonstrating advances in large‑language‑model reasoning, universal character animation, layout‑aware document understanding, and efficient long‑context inference.

AI researchDocument Understandinglarge language models
0 likes · 6 min read
Ant Group Presents Four AI Research Papers at ICLR 2025 Live Showcase
DevOps
DevOps
Apr 7, 2025 · Artificial Intelligence

Meta Llama 4 Scout, Maverick, and Behemoth: Architecture, NoPE Innovation, and Training Advances

The article introduces Meta's newly open‑sourced Llama 4 series—including Scout with a 1 billion‑token context window, Maverick with 400 billion parameters, and the upcoming Behemoth teacher model—detailing their expert‑mix architecture, the NoPE positional‑encoding removal, training pipelines, performance benchmarks, and infrastructure improvements for large‑scale AI research.

AI researchContext WindowLlama 4
0 likes · 8 min read
Meta Llama 4 Scout, Maverick, and Behemoth: Architecture, NoPE Innovation, and Training Advances
AntTech
AntTech
Mar 31, 2025 · Artificial Intelligence

Ant Group Papers Accepted at ICLR 2025: Summaries and Links

The article presents the abstracts, publication types, links, and research areas of seventeen Ant Group papers accepted at ICLR 2025, covering topics such as embodied robot co‑design, efficient distributed training for large language models, optimization via LLMs, character animation, interactive frame interpolation, KV‑cache management, and privacy‑preserving Transformers.

AI researchAnt GroupICLR2025
0 likes · 23 min read
Ant Group Papers Accepted at ICLR 2025: Summaries and Links
JD Tech
JD Tech
Mar 12, 2025 · Artificial Intelligence

From Low‑Resource Large Model Training to Dynamic Margin Selection: A JD Engineer’s Journey

The article recounts a JD retail engineer’s rapid growth through tackling low‑resource large‑model training, developing a margin‑based dynamic data selection method (DynaMS) that earned an ICLR paper, and sharing practical insights on aligning business needs with cutting‑edge AI research.

AI researchICLRLarge Models
0 likes · 11 min read
From Low‑Resource Large Model Training to Dynamic Margin Selection: A JD Engineer’s Journey
AntTech
AntTech
Mar 10, 2025 · Artificial Intelligence

Ant Insurance and Zhejiang University’s AAAI 2025 Papers Tackle Hallucination in Large Vision‑Language and Video Models

Two collaborative papers by Ant Insurance and Zhejiang University were accepted at AAAI 2025, introducing the MoLE decoding framework to reduce hallucination in large vision‑language models and the MHBench benchmark plus Motion Contrastive Decoding to address motion hallucination in video large language models, advancing reliable AI‑driven insurance claim processing.

AAAI 2025AI researchhallucination
0 likes · 6 min read
Ant Insurance and Zhejiang University’s AAAI 2025 Papers Tackle Hallucination in Large Vision‑Language and Video Models
DataFunTalk
DataFunTalk
Mar 7, 2025 · Artificial Intelligence

DeepSeek R1 Technical Report: Insights into Reasoning Models and Their Impact

This presentation reviews the development, technical details, and societal impact of DeepSeek's R1 model, explaining its reasoning capabilities, training pipeline, comparisons with other models, and future directions for AI research and product applications.

AI researchDeepSeekR1
0 likes · 53 min read
DeepSeek R1 Technical Report: Insights into Reasoning Models and Their Impact
DataFunTalk
DataFunTalk
Feb 28, 2025 · Artificial Intelligence

DeepSeek LLM Series (V1‑V3) and R1: Architecture, Training Strategies, Evaluation, and Distillation

An in‑depth overview of the DeepSeek LLM series (V1‑V3) and the R1 models, covering their architectures, scaling‑law experiments, data pipelines, training strategies—including MoE, MLA, FP8, multi‑step learning‑rate scheduling, reinforcement learning, and extensive evaluation results, as well as knowledge‑distillation techniques.

AI researchMixture of ExpertsModel Distillation
0 likes · 36 min read
DeepSeek LLM Series (V1‑V3) and R1: Architecture, Training Strategies, Evaluation, and Distillation
AntTech
AntTech
Feb 27, 2025 · Artificial Intelligence

Entity Contrastive Learning via Multi-Token Parallel Prediction for Knowledge Graph Completion

Researchers from Ant Group and Zhejiang University propose K-ON, a multi-token parallel prediction method that enables large language models to perceive knowledge graph entities through entity-level contrastive learning, achieving superior performance, lower cost, and higher efficiency on KG completion benchmarks.

AI researchK-ONMulti-Token Prediction
0 likes · 8 min read
Entity Contrastive Learning via Multi-Token Parallel Prediction for Knowledge Graph Completion
Tencent Cloud Developer
Tencent Cloud Developer
Feb 27, 2025 · Artificial Intelligence

DeepSeek LLM Series (V1‑V3, R1) Technical Overview and Analysis

The DeepSeek technical overview details the evolution from the dense 67 B V1 model through the 236 B MoE‑based V2 and 671 B V3 with FP8 training, to the RL‑only R1 series that learns reasoning without supervision, highlighting innovations such as Grouped‑Query Attention, Multi‑Head Latent Attention, load‑balancing‑free MoE, Multi‑Token Prediction, and knowledge distillation, and reporting state‑of‑the‑art benchmark results and open‑source reproduction projects.

AI researchDeepSeekMixture of Experts
0 likes · 37 min read
DeepSeek LLM Series (V1‑V3, R1) Technical Overview and Analysis
Architecture Digest
Architecture Digest
Feb 25, 2025 · Artificial Intelligence

DeepSeek Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges

DeepSeek’s distillation technology combines data and model distillation to transfer knowledge from large teacher models to compact student models, detailing its definitions, principles, key innovations, architecture, training methods, performance gains, and challenges, especially in multimodal contexts.

AI researchDeepSeekknowledge distillation
0 likes · 16 min read
DeepSeek Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges