Tag

MoE

0 views collected around this technical thread.

DataFunTalk
DataFunTalk
Feb 26, 2025 · Artificial Intelligence

DeepGEMM: An Open‑Source FP8 GEMM Library for Efficient AI Model Training and Inference

DeepGEMM is an open‑source FP8‑precision GEMM library that delivers up to 1350 TFLOPS on NVIDIA Hopper GPUs, offering JIT‑compiled, lightweight code (~300 lines) for dense and MoE matrix multiplication, with easy deployment, configurable environment variables, and performance advantages over CUTLASS for large AI models.

AI accelerationDeepGEMMFP8
0 likes · 7 min read
DeepGEMM: An Open‑Source FP8 GEMM Library for Efficient AI Model Training and Inference
Architect
Architect
Feb 16, 2025 · Artificial Intelligence

DeepSeek-V3, DeepSeek-R1, and Janus‑Pro: Architecture, Training Techniques, and Performance Insights

This article provides an in‑depth technical overview of DeepSeek‑V3, DeepSeek‑R1 and Janus‑Pro models, covering their Mixture‑of‑Experts architecture, novel MLA attention, auxiliary‑loss‑free load balancing, multi‑token prediction, FP8 mixed‑precision training, efficient cross‑node communication, reinforcement‑learning pipelines, multimodal modeling strategies, performance comparisons, cost statistics, and current limitations.

AI architectureDeepSeek-V3FP8 Training
0 likes · 18 min read
DeepSeek-V3, DeepSeek-R1, and Janus‑Pro: Architecture, Training Techniques, and Performance Insights
Java Captain
Java Captain
Feb 7, 2025 · Artificial Intelligence

DeepSeek: Disruptive Innovations in Large Language Model Architecture, Efficiency, and Ecosystem

DeepSeek reshapes the AI landscape by replacing brute‑force compute scaling with algorithmic breakthroughs such as a novel MoE architecture, memory compression, active‑learning data pipelines, and open‑source tooling, delivering dramatically lower training and inference costs while enabling edge deployment and a vibrant developer ecosystem.

Algorithmic EfficiencyDeepSeekLarge Language Models
0 likes · 11 min read
DeepSeek: Disruptive Innovations in Large Language Model Architecture, Efficiency, and Ecosystem
Java Tech Enthusiast
Java Tech Enthusiast
Feb 16, 2024 · Artificial Intelligence

Google's Gemini 1.5: Breakthrough in Long-Context Understanding and Multimodal Capabilities

Google’s Gemini 1.5, a new multimodal Mixture‑of‑Experts model, supports up to a million‑token context (10 million internally), can understand text, video, audio and code, learns a new language from a single prompt, and is already being used by Samsung, Jasper and Quora, positioning it as a direct challenger to OpenAI’s flagship models.

Gemini 1.5Google AILLM
0 likes · 7 min read
Google's Gemini 1.5: Breakthrough in Long-Context Understanding and Multimodal Capabilities
DataFunTalk
DataFunTalk
Aug 12, 2022 · Artificial Intelligence

Multi‑Task Learning for Sample Selection Bias in Financial Risk Control

This article presents a comprehensive study on addressing sample selection bias in credit risk modeling by applying multi‑task learning techniques, including MoE/MMoE, ESMM, hierarchical attention, and semi‑supervised loss, and demonstrates their effectiveness through two real‑world application cases and experimental results.

Financial AIMoEmulti-task learning
0 likes · 14 min read
Multi‑Task Learning for Sample Selection Bias in Financial Risk Control