Tag

Model Architecture

1 views collected around this technical thread.

IT Services Circle
IT Services Circle
May 25, 2025 · Artificial Intelligence

DeepSeek Core Technologies and Model Innovations: DeepSeek‑V3 and DeepSeek‑R1 Technical Overview

The article provides a detailed technical overview of DeepSeek's flagship large language models, DeepSeek‑V3 and DeepSeek‑R1, describing their MoE architecture, training frameworks, reinforcement‑learning based fine‑tuning, inference optimizations, and the broader impact of these innovations on the AI landscape while also promoting related books and resources.

AIDeepSeekMixture of Experts
0 likes · 10 min read
DeepSeek Core Technologies and Model Innovations: DeepSeek‑V3 and DeepSeek‑R1 Technical Overview
Tencent Technical Engineering
Tencent Technical Engineering
May 12, 2025 · Artificial Intelligence

Comprehensive Summary and Expansion of Andrej Karpathy’s 7‑Hour LLM Lecture

This article provides a detailed Chinese‑to‑English summary of Andrej Karpathy’s 7‑hour LLM tutorial, covering chat process analysis, tokenization, pre‑training data pipelines, model architecture, training strategies, post‑training fine‑tuning, reinforcement learning, chain‑of‑thought reasoning, and current industry applications.

AILLMModel Architecture
0 likes · 25 min read
Comprehensive Summary and Expansion of Andrej Karpathy’s 7‑Hour LLM Lecture
DataFunSummit
DataFunSummit
Dec 17, 2024 · Artificial Intelligence

Exploring Baidu PaddlePaddle's Multimodal Large Model Innovations and the PaddleMIX Development Kit

This article presents Baidu's latest advances in multimodal large models, detailing their capabilities, architectural evolution, real‑world applications, and the open‑source PaddleMIX toolkit that streamlines data processing, training, fine‑tuning, and high‑performance inference for developers.

AI applicationsData ProcessingLarge Models
0 likes · 20 min read
Exploring Baidu PaddlePaddle's Multimodal Large Model Innovations and the PaddleMIX Development Kit
Zhuanzhuan Tech
Zhuanzhuan Tech
Nov 6, 2024 · Artificial Intelligence

Multi-Task Learning for E-commerce Search: Overview, Practices, and Model Design in the Zhuanzhuan Scenario

This article reviews the necessity, benefits, and practical implementations of multi-task learning in e‑commerce search, detailing model selection, architecture extensions such as ESMM and ESM², and future directions for handling user behavior sequences and multi‑objective optimization.

ESMMModel ArchitectureRecommendation systems
0 likes · 13 min read
Multi-Task Learning for E-commerce Search: Overview, Practices, and Model Design in the Zhuanzhuan Scenario
DataFunSummit
DataFunSummit
Nov 1, 2024 · Artificial Intelligence

Progress in Multimodal Large Language Models: Background, Architecture, Evolution, Team Work, and Future Outlook

This article reviews recent advances in multimodal large language models, covering their background, architectural components, training strategies, application scenarios, evaluation benchmarks, team research on hallucination mitigation and long‑video understanding, and outlines promising future research directions.

Model ArchitectureVision-Languageevaluation benchmarks
0 likes · 15 min read
Progress in Multimodal Large Language Models: Background, Architecture, Evolution, Team Work, and Future Outlook
DataFunSummit
DataFunSummit
Oct 28, 2024 · Artificial Intelligence

Exploration and Practice of Multimodal Large Models at 360

This article presents 360's comprehensive exploration of image‑text multimodal large models, covering background concepts, research routes, three generations of model development, proprietary architectures like SEEChat, 360VL and Inner‑Adaptor, and real‑world AI applications across various products and services.

AI applicationsModel ArchitectureVision-Language
0 likes · 19 min read
Exploration and Practice of Multimodal Large Models at 360
DataFunTalk
DataFunTalk
Aug 7, 2024 · Artificial Intelligence

Multi-Scenario Modeling for NetEase Cloud Music Recommendation: Architecture, Challenges, and Results

This article presents NetEase Cloud Music's multi‑scenario recommendation modeling work, detailing background, overall system architecture, key modules, modeling goals, technical difficulties, performance improvements, future outlook, and a comprehensive Q&A session that addresses practical deployment challenges.

AB testingAIModel Architecture
0 likes · 14 min read
Multi-Scenario Modeling for NetEase Cloud Music Recommendation: Architecture, Challenges, and Results
Sohu Tech Products
Sohu Tech Products
Apr 24, 2024 · Artificial Intelligence

Evolution, Architecture, Training Data, Methods, and Performance of Meta's Llama Series (Llama 1, 2, 3)

Meta's Llama series has progressed from the 7‑65B Llama‑1 in early 2023 to the 8B and 70B Llama‑3 in 2024, scaling token counts from 1 T to over 15 T, adopting decoder‑only Transformers with RMSNorm, SwiGLU, RoPE and GQA, and adding supervised fine‑tuning, RLHF and DPO, resulting in state‑of‑the‑art benchmark performance and a vibrant open‑source ecosystem.

AILlamaModel Architecture
0 likes · 25 min read
Evolution, Architecture, Training Data, Methods, and Performance of Meta's Llama Series (Llama 1, 2, 3)
Bilibili Tech
Bilibili Tech
Mar 1, 2024 · Artificial Intelligence

Bilibili's Self-Developed Video Super-Resolution Algorithm: Background, Optimization Directions, and Implementation Details

Bilibili’s self‑supervised video super‑resolution system upgrades low‑resolution streams to 4K by using three parallel degradation‑branch networks—texture‑enhancing, line‑recovering, and noise‑removing—tailored to anime, game, and real‑world content, delivering sharper edges, finer textures, and measurable quality gains across its online playback pipeline.

AIBilibiliModel Architecture
0 likes · 16 min read
Bilibili's Self-Developed Video Super-Resolution Algorithm: Background, Optimization Directions, and Implementation Details
DataFunSummit
DataFunSummit
Jan 15, 2024 · Artificial Intelligence

Financial Large Language Model: Characteristics, Construction, Architecture, and Practical Applications

This article presents a comprehensive overview of financial large language models, covering their unique characteristics, construction methods, layered technical architecture, evaluation strategies, and real‑world use cases such as quality inspection, AIGC‑driven material generation, sales‑lead mining, and knowledge‑graph‑enhanced intelligent Q&A.

Data EngineeringFinancial AIModel Architecture
0 likes · 14 min read
Financial Large Language Model: Characteristics, Construction, Architecture, and Practical Applications
Sohu Tech Products
Sohu Tech Products
Dec 27, 2023 · Artificial Intelligence

Analysis of LLaMA Model Architecture in the Transformers Library

This article walks through the core LLaMA implementation in HuggingFace’s Transformers library, detailing the inheritance hierarchy, configuration defaults, model initialization, embedding and stacked decoder layers, the RMSNorm‑based attention and MLP modules, and the forward pass that produces normalized hidden states.

Artificial IntelligenceLlamaModel Architecture
0 likes · 14 min read
Analysis of LLaMA Model Architecture in the Transformers Library
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
May 5, 2023 · Artificial Intelligence

Limitations of Generative Pre‑trained Transformers: Hallucinations, Memory, Planning, and Architectural Proposals

The article critically examines GPT‑4 and similar transformer models, highlighting persistent hallucinations, outdated knowledge, insufficient domain coverage, lack of planning and memory, and proposes architectural extensions inspired by fast‑slow thinking and differentiable modules to overcome these fundamental constraints.

AI limitationsGPT-4Model Architecture
0 likes · 24 min read
Limitations of Generative Pre‑trained Transformers: Hallucinations, Memory, Planning, and Architectural Proposals
DataFunTalk
DataFunTalk
Mar 6, 2023 · Artificial Intelligence

Explainable Recommendation Algorithms at Alibaba Health: System Design, Feature Engineering, and Experimental Results

This article presents Alibaba Health's exploration of explainable recommendation algorithms, covering business context, data preparation, feature extraction and encoding, model architecture combining selection and prediction components, experimental offline and online results, and a detailed Q&A on implementation challenges and future directions.

AIAlibaba HealthFeature Engineering
0 likes · 12 min read
Explainable Recommendation Algorithms at Alibaba Health: System Design, Feature Engineering, and Experimental Results
DataFunTalk
DataFunTalk
Dec 17, 2022 · Artificial Intelligence

Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance

This article presents a comprehensive overview of multimodal pre‑training, describing its motivation, architecture choices, large‑scale Chinese image‑text dataset construction, training optimizations, performance benchmarks, downstream applications, and a Q&A session that highlights practical deployment considerations.

Model ArchitectureNatural Language ProcessingPretraining
0 likes · 16 min read
Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance
Zhuanzhuan Tech
Zhuanzhuan Tech
Aug 17, 2022 · Artificial Intelligence

Designing a Scalable Image Classification System for Prohibited Item Detection in a Second‑hand E‑commerce Platform

This article describes how a second‑hand e‑commerce company built a fast, modular image‑classification pipeline using small binary classifiers, efficientNet‑b0, and active‑learning‑driven data annotation to detect prohibited items while keeping inference latency under 200 ms and reducing labeling costs dramatically.

AIModel Architectureactive learning
0 likes · 10 min read
Designing a Scalable Image Classification System for Prohibited Item Detection in a Second‑hand E‑commerce Platform
DataFunTalk
DataFunTalk
Aug 16, 2021 · Artificial Intelligence

Intelligent Risk Control in Live Streaming: Architecture, Challenges, and Model Evolution at Douyu

This article presents Douyu's intelligent risk‑control system for live streaming, detailing the operational, activity, traffic, account, transaction and content safety challenges, the multi‑layer algorithm architecture, and the evolution of models for spam detection, risk scoring, gang identification, behavior sequencing, device fingerprinting, and interpretability.

Artificial IntelligenceBig DataLive Streaming
0 likes · 13 min read
Intelligent Risk Control in Live Streaming: Architecture, Challenges, and Model Evolution at Douyu
JD Tech Talk
JD Tech Talk
Sep 17, 2020 · Artificial Intelligence

Federated Transfer Learning: Concepts, Examples, and Model Structures

This article introduces the fundamentals of transfer learning and federated transfer learning, explains domain adaptation for sentiment analysis, presents two illustrative examples—mid-level image feature transfer and text-to-image transfer—and outlines the model architecture and loss functions of federated transfer learning frameworks.

Federated LearningModel Architecturedomain adaptation
0 likes · 14 min read
Federated Transfer Learning: Concepts, Examples, and Model Structures
iQIYI Technical Product Team
iQIYI Technical Product Team
Nov 22, 2019 · Artificial Intelligence

Analysis of ICCV 2019 Lightweight Face Recognition Challenge Champion Solutions

The ICCV 2019 Lightweight Face Recognition Challenge attracted 292 teams and defined four strict FLOP‑ and size‑limited protocols for image and video recognition, with champions employing near‑30 GFLOP EfficientNet‑style backbones, novel loss functions, frame‑fusion, and knowledge‑distilled VarGNet models to balance accuracy and computational budget.

ICCV ChallengeLightweight Face RecognitionModel Architecture
0 likes · 8 min read
Analysis of ICCV 2019 Lightweight Face Recognition Challenge Champion Solutions
360 Tech Engineering
360 Tech Engineering
May 21, 2019 · Artificial Intelligence

Understanding Residual Networks: Ideas, Mechanisms, Variants, and Insights

This article reviews the concept of residual networks, explains their working principle and data‑flow interpretation, discusses why they improve deep models, analyzes path‑length effects on gradients, and surveys various residual block designs and practical takeaways.

Model ArchitectureResNetdeep learning
0 likes · 9 min read
Understanding Residual Networks: Ideas, Mechanisms, Variants, and Insights