Tagged articles
1067 articles
Page 11 of 11
Programmer DD
Programmer DD
Jun 12, 2023 · Artificial Intelligence

Master Prompt Engineering: Guide ChatGPT to Deliver Precise Answers

This article explains prompt engineering for large language models like ChatGPT, covering its definition, essential techniques such as diverse prompting strategies, problem restatement, background provision, gradient prompting, example inclusion, role‑playing, and the importance of systematic experimentation and quantitative evaluation to achieve high‑quality, task‑specific AI outputs.

AIChatGPTlarge language models
0 likes · 16 min read
Master Prompt Engineering: Guide ChatGPT to Deliver Precise Answers
Sohu Tech Products
Sohu Tech Products
Jun 7, 2023 · Artificial Intelligence

Multiscale PU Learning for Detecting AI‑Generated Text

Researchers from Peking University and Huawei present a multiscale positive‑unlabeled learning framework that significantly improves detection of AI‑generated short and long texts, addressing the difficulty of distinguishing AI‑written content from human writing and outperforming existing baselines on multiple benchmarks.

AI detectionPu-Learninglarge language models
0 likes · 8 min read
Multiscale PU Learning for Detecting AI‑Generated Text
Python Programming Learning Circle
Python Programming Learning Circle
Jun 6, 2023 · Artificial Intelligence

Why ChatGPT Plus Performance Is Dropping and What OpenAI’s Roadmap Reveals

Recent reports indicate a noticeable decline in ChatGPT Plus’s GPT‑4 performance, especially in coding accuracy, prompting speculation about model scaling pain, AI alignment trade‑offs, and OpenAI’s GPU‑limited roadmap that includes cheaper models, longer context windows, finetuning, and multimodal extensions.

AI alignmentChatGPTGPT-4
0 likes · 8 min read
Why ChatGPT Plus Performance Is Dropping and What OpenAI’s Roadmap Reveals
OPPO Amber Lab
OPPO Amber Lab
Jun 5, 2023 · Information Security

How ChatGPT Impacts Security: Key Insights from the CSA Seminar

An online CSA seminar on May 30 examined ChatGPT’s security impact, presenting a whitepaper and four AI‑security interaction dimensions, while experts discussed telecom‑operator security‑GPT models, safe vertical‑domain large‑model training, and future industry implications.

AI governanceAI securityChatGPT
0 likes · 7 min read
How ChatGPT Impacts Security: Key Insights from the CSA Seminar
Programmer DD
Programmer DD
May 19, 2023 · Artificial Intelligence

Master Advanced Prompt Engineering: Boost LLM Performance with Proven Techniques

This article explains why effective prompt design—covering system messages, few‑shot learning, non‑dialogue scenarios, explicit instructions, output shaping, syntax cues, task decomposition, chain‑of‑thought, and real‑world context—is essential for reliable large language model results and provides practical examples and tips.

AISystem Messagesfew-shot learning
0 likes · 8 min read
Master Advanced Prompt Engineering: Boost LLM Performance with Proven Techniques
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
May 5, 2023 · Artificial Intelligence

Limitations of Generative Pre‑trained Transformers: Hallucinations, Memory, Planning, and Architectural Proposals

The article critically examines GPT‑4 and similar transformer models, highlighting persistent hallucinations, outdated knowledge, insufficient domain coverage, lack of planning and memory, and proposes architectural extensions inspired by fast‑slow thinking and differentiable modules to overcome these fundamental constraints.

AI limitationsGPT-4Model architecture
0 likes · 24 min read
Limitations of Generative Pre‑trained Transformers: Hallucinations, Memory, Planning, and Architectural Proposals
Architect
Architect
Apr 27, 2023 · Artificial Intelligence

Survey of Large Language Model Research: From GPT‑1 to ChatGPT and Open‑Source Alternatives

This article provides a comprehensive overview of the development of large language models, reviewing classic papers from GPT‑1 through GPT‑4, discussing open‑source implementations such as LLaMA, Alpaca, GLM, and ChatGLM, and analyzing training methods, datasets, and future research directions.

AI researchGPTlarge language models
0 likes · 36 min read
Survey of Large Language Model Research: From GPT‑1 to ChatGPT and Open‑Source Alternatives
Kuaishou Tech
Kuaishou Tech
Apr 23, 2023 · Artificial Intelligence

Kuaishou & Renmin AI Institute: Driving Multimodal Large Model Innovation

The article details how Kuaishou’s multimodal AI research, including its K7 trillion‑parameter model and VLUA algorithm, partners with Renmin University’s Gaoling AI Institute to launch a joint lab, produce cutting‑edge papers such as WebBrain and ChatImg, and advance recommendation and search technologies across the short‑video ecosystem.

AIIndustry collaborationRecommendation Systems
0 likes · 17 min read
Kuaishou & Renmin AI Institute: Driving Multimodal Large Model Innovation
DataFunSummit
DataFunSummit
Apr 20, 2023 · Artificial Intelligence

Mengzi Lightweight Model Technology System and Advances in Small‑Scale and Retrieval‑Augmented Pretraining

This presentation introduces the Mengzi lightweight model technology stack, covering large‑scale pre‑training, motivations for lightweight models, detailed techniques such as knowledge and sequence‑relation enhancement, training optimization, model compression, retrieval‑augmented pre‑training, multimodal extensions, open‑source releases, and real‑world applications.

Knowledge DistillationMultimodallarge language models
0 likes · 23 min read
Mengzi Lightweight Model Technology System and Advances in Small‑Scale and Retrieval‑Augmented Pretraining
IT Architects Alliance
IT Architects Alliance
Apr 20, 2023 · Artificial Intelligence

Overview of Prominent Large Language Models and Instruction‑Finetuned Variants

This article provides a comprehensive overview of major large language models—including GPT series, T5, LaMDA, LLaMA, BLOOM, and others—detailing their architectures, parameter scales, open‑source status, and the evolution of instruction‑fine‑tuning techniques that improve zero‑shot and few‑shot performance.

AI researchInstruction TuningLLM comparison
0 likes · 24 min read
Overview of Prominent Large Language Models and Instruction‑Finetuned Variants
Architect
Architect
Apr 19, 2023 · Artificial Intelligence

Emergence in Large Language Models: Phenomena, Explanations, and Implications

This article reviews the emergence phenomena observed in large language models, explains how model scale, in‑context learning and chain‑of‑thought prompting contribute to sudden performance gains, discusses small‑model alternatives, and explores the relationship between emergence and the training‑time Grokking effect.

AI researchIn-Context LearningModel Scaling
0 likes · 13 min read
Emergence in Large Language Models: Phenomena, Explanations, and Implications
DataFunTalk
DataFunTalk
Apr 19, 2023 · Artificial Intelligence

Is the Daily Emergence of Large Language Models Beneficial?

The article examines the rapid proliferation of large language models, weighing both the opportunities for experimentation and the drawbacks of noise, and argues that establishing authoritative Chinese LLM evaluation benchmarks is essential to guide meaningful progress in the field.

AI researchLLM evaluationOpen Source
0 likes · 7 min read
Is the Daily Emergence of Large Language Models Beneficial?
Architect
Architect
Apr 14, 2023 · Artificial Intelligence

Overview of Prominent Large Language Models and Instruction Fine‑Tuning Techniques

The article surveys major large language models—including GPT‑3, T5, LaMDA, Jurassic‑1, MT‑NLG, Gopher, Chinchilla, PaLM, U‑PaLM, OPT, LLaMA, BLOOM, GLM‑130B, and ERNIE 3.0 Titan—explains their architectures, scaling trade‑offs, and then details instruction‑fine‑tuned variants such as T0, FLAN, GPT‑3.5, ChatGPT, GPT‑4, Alpaca and ChatGLM, providing references for further study.

AIChatGPTGPT-3
0 likes · 27 min read
Overview of Prominent Large Language Models and Instruction Fine‑Tuning Techniques
ITPUB
ITPUB
Apr 14, 2023 · Artificial Intelligence

How Do Generative, Perceptual, and Decision AI Interact? Insights from Jina AI’s Founder

In this interview, Jina AI’s founder Shao Han examines the relationships among generative, perceptual, and decision AI, compares single‑modal and multimodal approaches, discusses large language model development, and evaluates the impact of ChatGPT on search and future AI commercialization.

AIAI commercializationSearch
0 likes · 11 min read
How Do Generative, Perceptual, and Decision AI Interact? Insights from Jina AI’s Founder
Programmer DD
Programmer DD
Apr 14, 2023 · Artificial Intelligence

How DeepSpeed-Chat Accelerates ChatGPT‑Style Model Training by 15×

Microsoft open‑sourced DeepSpeed‑Chat, a toolkit that streamlines the end‑to‑end training and inference of ChatGPT‑like large language models using RLHF, delivering up to fifteen‑fold speedups and dramatically lower costs, even on a single GPU.

ChatGPTDeepSpeedRLHF
0 likes · 8 min read
How DeepSpeed-Chat Accelerates ChatGPT‑Style Model Training by 15×
Top Architect
Top Architect
Apr 12, 2023 · Artificial Intelligence

Data‑Centric AI Perspective on GPT Models: Training, Inference, and Maintenance

This article examines how large language models such as GPT‑1 through GPT‑4 succeed largely due to high‑quality, large‑scale training data, and explains the Data‑centric AI framework—training data development, inference data development, and data maintenance—while discussing prompt engineering, data‑driven improvements, and future trends in AI.

AIData‑Centric AIGPT
0 likes · 19 min read
Data‑Centric AI Perspective on GPT Models: Training, Inference, and Maintenance
Architect
Architect
Apr 9, 2023 · Artificial Intelligence

Evaluating the Commonsense Knowledge and Reasoning Capabilities of ChatGPT and Other Large Language Models

This study systematically evaluates ChatGPT and other large language models on their ability to answer commonsense questions, assess their knowledge awareness, and utilize generated knowledge for reasoning, revealing strong QA performance but notable gaps in social and temporal commonsense and in leveraging contextual knowledge.

ChatGPTNLPcommonsense reasoning
0 likes · 20 min read
Evaluating the Commonsense Knowledge and Reasoning Capabilities of ChatGPT and Other Large Language Models
DataFunSummit
DataFunSummit
Apr 7, 2023 · Artificial Intelligence

China's AI Startup Landscape: Who Holds the Greatest Advantage in the Large Model Race?

Amid the 2023 AI boom, veteran entrepreneurs, former tech executives, and academic teams in China are racing to launch large‑model ventures, each leveraging distinct funding, talent, and product strategies, while major tech giants simultaneously push their own AI offerings, reshaping the market dynamics.

AI startupsChina AIlarge language models
0 likes · 12 min read
China's AI Startup Landscape: Who Holds the Greatest Advantage in the Large Model Race?
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Apr 3, 2023 · Industry Insights

What Drives Intelligent Recommendation and Search? Key Takeaways from Xiaohongshu’s CCF C³ Event

The CCF C³ event at Xiaohongshu gathered leading researchers and industry experts to dissect the latest advances, challenges, and future opportunities in intelligent recommendation and search, including multimodal content handling, decentralized distribution, cold‑start solutions, and the impact of large language models.

AIIndustry InsightsRecommendation Systems
0 likes · 11 min read
What Drives Intelligent Recommendation and Search? Key Takeaways from Xiaohongshu’s CCF C³ Event
Baidu Geek Talk
Baidu Geek Talk
Mar 21, 2023 · Artificial Intelligence

Infrastructure Challenges and Solutions for Large‑Scale AI Model Training

The article explains how the massive compute and storage demands of today’s large language models create a “compute wall” and “storage wall,” and describes Baidu Intelligent Cloud’s four‑layer full‑stack infrastructure—combining advanced parallelism techniques, optimized GPU networking, static‑graph compilation, and cost‑model‑driven placement—to train trillion‑parameter models efficiently.

AI infrastructureCost ModelGPU clusters
0 likes · 27 min read
Infrastructure Challenges and Solutions for Large‑Scale AI Model Training
Efficient Ops
Efficient Ops
Mar 16, 2023 · Artificial Intelligence

What’s New in AI? Baidu’s Wenxin, GPT‑4 Multimodal, Docker’s Policy Shift & More

The article reviews the latest AI breakthroughs—including Baidu’s Wenxin Yiyan launch, OpenAI’s multimodal GPT‑4, Google’s PaLM API, Microsoft’s super‑computer investment—and also covers Docker’s paid‑plan warning, the US demand for TikTok’s sale, and a Stack Overflow technology survey.

Artificial IntelligenceDockerGPT-4
0 likes · 7 min read
What’s New in AI? Baidu’s Wenxin, GPT‑4 Multimodal, Docker’s Policy Shift & More
DataFunTalk
DataFunTalk
Mar 16, 2023 · Artificial Intelligence

Technical Optimizations and Breakthroughs of GPT‑4: Multimodal Capabilities, Alignment Strategies, and Predictable Scaling

The article summarizes the technical innovations behind GPT‑4, highlighting its multimodal abilities, improved alignment methods, scaling‑law‑based performance prediction, and remaining limitations, while referencing the official OpenAI technical report and community analyses.

AI researchGPT-4alignment
0 likes · 10 min read
Technical Optimizations and Breakthroughs of GPT‑4: Multimodal Capabilities, Alignment Strategies, and Predictable Scaling
21CTO
21CTO
Mar 11, 2023 · Artificial Intelligence

Microsoft Announces Multimodal GPT-4: A New ‘iPhone Moment’ for AI

Microsoft Germany's CTO announced the imminent release of a multimodal GPT‑4, highlighting its ability to process text, images and video, while executives liken the breakthrough to an “iPhone moment” for AI, emphasizing new capabilities, industry disruption, and responsible data use.

AI developmentGPT-4Microsoft
0 likes · 6 min read
Microsoft Announces Multimodal GPT-4: A New ‘iPhone Moment’ for AI
DataFunSummit
DataFunSummit
Feb 28, 2023 · Artificial Intelligence

Baidu Document Intelligence Technology Overview and Applications

This article presents a comprehensive overview of Baidu's document intelligence technologies—including the ERNIE‑Layout multimodal large model, the prompt‑based DocPrompt extraction system, layout and table understanding techniques, and PaddleNLP open‑source integration—detailing their architectures, challenges, solutions, performance benchmarks, and real‑world application cases across multiple industries.

DocPromptDocument IntelligenceERNIE-Layout
0 likes · 19 min read
Baidu Document Intelligence Technology Overview and Applications
21CTO
21CTO
Feb 27, 2023 · Artificial Intelligence

What’s Next for Large Language Models? Emerging Trends Shaping AI

The article explores three emerging directions for next‑generation large language models—self‑generated training data, built‑in verification with external retrieval, and massive sparse‑expert architectures—highlighting recent research, practical challenges, and their potential to reshape AI development.

AI researchgenerative AIlarge language models
0 likes · 17 min read
What’s Next for Large Language Models? Emerging Trends Shaping AI
21CTO
21CTO
Feb 23, 2023 · Artificial Intelligence

How Does ChatGPT Really Work? Inside the RLHF Training Process

This article explains ChatGPT’s architecture, the distinction between model capability and consistency, how next‑token and masked‑language‑model training lead to inconsistencies, and how OpenAI’s supervised fine‑tuning, reward‑model training, and PPO reinforcement learning (RLHF) are combined to improve alignment while highlighting the method’s limitations.

AI alignmentChatGPTRLHF
0 likes · 15 min read
How Does ChatGPT Really Work? Inside the RLHF Training Process
21CTO
21CTO
Feb 20, 2023 · Artificial Intelligence

How the AI Arms Race Between Microsoft and Google Is Reshaping Search

The escalating competition between Microsoft and Google over large language models is driving new search experiences, reshaping AI research, and raising concerns about content quality, privacy, and the future of smaller AI innovators.

AI competitionBingChatGPT
0 likes · 10 min read
How the AI Arms Race Between Microsoft and Google Is Reshaping Search
dbaplus Community
dbaplus Community
Feb 18, 2023 · Artificial Intelligence

Why ChatGPT Still Gets It Wrong: Inside RLHF and Model Consistency

ChatGPT, OpenAI’s latest language model, builds on GPT‑3 but uses supervised fine‑tuning and Reinforcement Learning from Human Feedback (RLHF) to improve alignment, yet its training methods still cause consistency issues such as invalid help, hallucinations, bias, and limited explainability.

ChatGPTPPORLHF
0 likes · 17 min read
Why ChatGPT Still Gets It Wrong: Inside RLHF and Model Consistency
Architecture Digest
Architecture Digest
Feb 17, 2023 · Artificial Intelligence

Analyzing the Emergent Abilities of ChatGPT and the Technical Roadmap of GPT‑3.5

This article dissects how ChatGPT acquired its surprising capabilities by tracing the evolution from the original GPT‑3 model through instruction tuning, code‑based pre‑training, and reinforcement learning from human feedback, ultimately presenting a comprehensive technical roadmap for reproducing GPT‑3.5‑scale models.

ChatGPTGPT-3.5Instruction Tuning
0 likes · 26 min read
Analyzing the Emergent Abilities of ChatGPT and the Technical Roadmap of GPT‑3.5
DataFunTalk
DataFunTalk
Feb 15, 2023 · Artificial Intelligence

Three Emerging Directions for Next‑Generation Large Language Models

The article outlines three promising research avenues—self‑generated training data, model‑driven fact‑checking, and sparse expert architectures—that could shape the next wave of large language model innovation and address current limitations such as data scarcity and hallucinations.

AI researchlarge language modelsmodel self‑improvement
0 likes · 14 min read
Three Emerging Directions for Next‑Generation Large Language Models
Open Source Linux
Open Source Linux
Feb 13, 2023 · Artificial Intelligence

How Does ChatGPT Work? Inside RLHF and Model Consistency

This article explains the inner workings of ChatGPT, detailing its evolution from GPT‑3, the role of reinforcement learning from human feedback (RLHF) in improving consistency, the training pipeline steps, and the limitations and evaluation methods of large language models.

AIChatGPTRLHF
0 likes · 15 min read
How Does ChatGPT Work? Inside RLHF and Model Consistency
DataFunSummit
DataFunSummit
Feb 12, 2023 · Artificial Intelligence

Claude vs. ChatGPT: Constitutional AI, RLAIF, and the Quest for Safer Large‑Language Models

This article reviews Anthropic's Claude assistant, explains the novel Constitutional AI (RLAIF) approach that replaces costly human‑feedback data with a set of natural‑language principles, compares Claude with ChatGPT across helpfulness and harmlessness, and details the supervision and reinforcement‑learning pipelines, data annotation, and experimental results that demonstrate superior safety performance.

AI safetyClaudeHarmlessness
0 likes · 21 min read
Claude vs. ChatGPT: Constitutional AI, RLAIF, and the Quest for Safer Large‑Language Models
Top Architect
Top Architect
Feb 11, 2023 · Artificial Intelligence

ChatGPT: Technical Overview, Architecture, Training Process, Limitations and Future Directions

This article provides a comprehensive technical overview of ChatGPT, covering its origins, underlying GPT architecture, reinforcement learning from human feedback, training stages, current limitations, and prospective improvements such as model compression, constitutional AI, and integration with AIGC technologies.

AIGCArtificial IntelligenceChatGPT
0 likes · 18 min read
ChatGPT: Technical Overview, Architecture, Training Process, Limitations and Future Directions
Architect
Architect
Feb 9, 2023 · Artificial Intelligence

Emergent Abilities of Large Language Models: Complex Reasoning, Knowledge Reasoning, and Out‑of‑Distribution Robustness

This article reviews recent research on the emergent abilities of large language models—such as chain‑of‑thought reasoning, knowledge retrieval without external sources, and robustness to distribution shifts—examining scaling laws, model size thresholds, and the open questions surrounding a potential paradigm shift from fine‑tuning to in‑context learning.

AI researchchain-of-thought promptingemergent abilities
0 likes · 23 min read
Emergent Abilities of Large Language Models: Complex Reasoning, Knowledge Reasoning, and Out‑of‑Distribution Robustness
Top Architect
Top Architect
Feb 9, 2023 · Artificial Intelligence

How ChatGPT Works: Training, RLHF, and Consistency Issues

ChatGPT, OpenAI’s latest language model, builds on GPT‑3 and improves performance through supervised fine‑tuning, human‑feedback reinforcement learning (RLHF), and PPO optimization, addressing consistency challenges such as misaligned outputs, bias, and hallucinations while evaluating helpfulness, truthfulness, and harmlessness.

ChatGPTRLHFlarge language models
0 likes · 15 min read
How ChatGPT Works: Training, RLHF, and Consistency Issues
IT Architects Alliance
IT Architects Alliance
Feb 9, 2023 · Artificial Intelligence

How ChatGPT Works: Model Architecture, Training Strategies, and RLHF

ChatGPT, OpenAI’s latest language model, builds on GPT‑3 using supervised fine‑tuning and Reinforcement Learning from Human Feedback (RLHF) with PPO, addressing consistency issues by aligning model outputs with human preferences, while discussing training methods, limitations, and evaluation metrics.

AI alignmentChatGPTPPO
0 likes · 15 min read
How ChatGPT Works: Model Architecture, Training Strategies, and RLHF
Top Architect
Top Architect
Feb 8, 2023 · Artificial Intelligence

A Technical Roadmap of GPT‑3.5: From Pre‑training to RLHF and Emerging Capabilities

This article analyses how ChatGPT and the GPT‑3.5 series evolved from the original GPT‑3 through large‑scale pre‑training, code‑based training, instruction tuning, and reinforcement learning from human feedback, identifying the origins of their language generation, in‑context learning, world knowledge, code understanding, chain‑of‑thought reasoning, and alignment capabilities while also outlining current limitations.

ChatGPTGPT-3.5Instruction Tuning
0 likes · 27 min read
A Technical Roadmap of GPT‑3.5: From Pre‑training to RLHF and Emerging Capabilities
DataFunSummit
DataFunSummit
Feb 7, 2023 · Artificial Intelligence

How to Evaluate OpenAI's Super Conversational Model ChatGPT?

This article compiles three highly upvoted Zhihu answers that examine OpenAI's ChatGPT, discussing its breakthrough impact on NLP, visual in‑context learning, reinforcement‑learning‑from‑human‑feedback, and the broader implications for AI research and development.

AI researchChatGPTIn-Context Learning
0 likes · 10 min read
How to Evaluate OpenAI's Super Conversational Model ChatGPT?
Architects' Tech Alliance
Architects' Tech Alliance
Feb 6, 2023 · Artificial Intelligence

What Makes ChatGPT Tick? A Deep Dive into Its Architecture, Limits, and Market Impact

This article provides a comprehensive analysis of ChatGPT, covering its origins within the OpenAI GPT family, core technical features such as RLHF training and model compression, current limitations, future improvement directions, and the broader industry and investment opportunities generated by large‑language‑model AI.

AI industryChatGPTModel Compression
0 likes · 20 min read
What Makes ChatGPT Tick? A Deep Dive into Its Architecture, Limits, and Market Impact
DataFunTalk
DataFunTalk
Jan 15, 2023 · Artificial Intelligence

Advances in Dialogue Systems: Baidu PLATO Large‑Scale Conversational Models

This article reviews the evolution of dialogue systems from modular task‑oriented designs to end‑to‑end large‑scale models, detailing Baidu's PLATO series, their technical innovations, real‑world deployments, challenges such as inference efficiency and safety, and future research directions in conversational AI.

AI safetyConversational AIDialogue Systems
0 likes · 13 min read
Advances in Dialogue Systems: Baidu PLATO Large‑Scale Conversational Models
DataFunTalk
DataFunTalk
Jan 10, 2023 · Artificial Intelligence

Paradigm Shifts in Large Language Model Research and Future Directions

The article reviews the evolution of large language models from the pre‑GPT‑3 era to the present, analyzes the conceptual and technical gaps between Chinese and global research, and outlines key future research directions such as scaling laws, prompting techniques, multimodal training, and efficient model architectures.

AI researchChatGPTIn-Context Learning
0 likes · 73 min read
Paradigm Shifts in Large Language Model Research and Future Directions
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 10, 2023 · Artificial Intelligence

How GPT‑MoE Cuts Training Costs: Sparse Transformer Techniques and Performance Insights

This article examines the use of Mixture‑of‑Experts (MoE) sparse training for GPT models, detailing the architecture, training and inference efficiency gains, experimental comparisons with dense models, custom routing algorithms, and step‑by‑step deployment on Alibaba Cloud AI platforms.

AI efficiencyGPT-MoESparse Transformers
0 likes · 26 min read
How GPT‑MoE Cuts Training Costs: Sparse Transformer Techniques and Performance Insights
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jan 3, 2023 · Artificial Intelligence

Insights into ChatGPT: Capabilities, Limitations, and Implications for AI Research

During Xiaohongshu’s REDtech livestream, AI researchers examined ChatGPT’s rapid adoption, versatile task performance, and underlying large‑scale pre‑training with in‑context learning, while highlighting persistent hallucinations, weak reasoning, high costs, and limited search‑engine replacement potential, and emphasized the importance of RLHF‑driven human feedback for future multimodal AI research.

AI researchChatGPTRLHF
0 likes · 14 min read
Insights into ChatGPT: Capabilities, Limitations, and Implications for AI Research
21CTO
21CTO
Dec 15, 2022 · Artificial Intelligence

Sam Altman & Reid Hoffman on AI’s Future: Business, Multimodal Models, Society

In a candid conversation, Sam Altman and Reid Hoffman explore the next stage of AI, discussing commercial opportunities of large language models, the rise of AI‑plus applications in science and the metaverse, future directions such as multimodal and continuously learning models, and the societal challenges of AGI, wealth distribution and universal basic income.

AGIAIAI commercialization
0 likes · 16 min read
Sam Altman & Reid Hoffman on AI’s Future: Business, Multimodal Models, Society
DataFunTalk
DataFunTalk
Aug 28, 2022 · Artificial Intelligence

Emerging Paths Toward General AI: Trends in Large‑Scale Pretrained Models

The article reviews how the Transformer breakthrough, the rapid scaling of large language models such as GPT‑3, Switch Transformer, and Alibaba's AliceMind and M6, together with multimodal research, are shaping the next phase of artificial intelligence toward more general, collaborative, and open AI systems.

AI trendsArtificial Intelligencelarge language models
0 likes · 5 min read
Emerging Paths Toward General AI: Trends in Large‑Scale Pretrained Models
DataFunSummit
DataFunSummit
Jul 27, 2022 · Artificial Intelligence

DataFun 2022 Natural Language Processing Summit – Leading Experts Discuss Large‑Scale Language Models, Multimodal Understanding, Dialogue Systems and AI Applications

The DataFun 2022 NLP Summit, held on July 30, brings together top researchers and industry leaders from Alibaba, Baidu, Microsoft, Amazon, and more to present the latest advances in large‑scale pre‑training, multimodal perception, information extraction, dialogue interaction, machine translation, and practical AI deployments, with live streaming and free registration via QR code.

AIDialogue SystemsMultimodal
0 likes · 44 min read
DataFun 2022 Natural Language Processing Summit – Leading Experts Discuss Large‑Scale Language Models, Multimodal Understanding, Dialogue Systems and AI Applications
DataFunSummit
DataFunSummit
Apr 25, 2022 · Artificial Intelligence

Token‑Level Pipeline Parallelism for Transformer‑based Language Models (TeraPipe)

The article introduces a token‑level pipeline parallelism strategy that splits the sequence‑length dimension of Transformer‑based language models, explains why this approach is feasible, presents a dynamic‑programming formulation for optimal slicing, discusses engineering challenges, and evaluates its performance on large GPT models.

Performance OptimizationPipeline ParallelismToken-level
0 likes · 13 min read
Token‑Level Pipeline Parallelism for Transformer‑based Language Models (TeraPipe)
Meituan Technology Team
Meituan Technology Team
Mar 17, 2022 · Artificial Intelligence

Tsinghua University & Meituan Digital Life Joint Research Institute Academic Salon: Large Model Technologies and Challenges

The Tsinghua‑Meituan Digital Life Joint Research Institute’s Academic Salon on March 23 featured Associate Professor Liu Zhiyuan presenting the latest advances and ten key challenges in large‑model technologies, aiming to foster industry‑academia collaboration and drive innovation in representation learning, knowledge graphs, and social computing.

Academic SeminarArtificial IntelligenceNLP
0 likes · 4 min read
Tsinghua University & Meituan Digital Life Joint Research Institute Academic Salon: Large Model Technologies and Challenges
Volcano Engine Developer Services
Volcano Engine Developer Services
Mar 16, 2022 · Artificial Intelligence

How veGiantModel Boosts Large Language Model Training Up to 6.9× Faster

The article introduces Volcano Engine's veGiantModel, a high‑performance large‑model training framework built on PyTorch, Megatron and DeepSpeed, details its distributed parallel strategies, hardware setups, benchmark results showing up to 6.9× speedup over Megatron and DeepSpeed, and provides open‑source links for further use.

ByteCCLdistributed traininglarge language models
0 likes · 6 min read
How veGiantModel Boosts Large Language Model Training Up to 6.9× Faster
DataFunTalk
DataFunTalk
Mar 16, 2022 · Artificial Intelligence

Parameter-Efficient Sparsity Training for the PLUG Large-Scale Language Model

This article presents the PLUG 270‑billion‑parameter Chinese language model and introduces a parameter‑efficient sparsity training (PST) framework that combines unstructured and structured pruning with low‑rank decomposition to dramatically reduce model size while preserving downstream performance.

PLUGParameter-Efficient Trainingdeep learning
0 likes · 13 min read
Parameter-Efficient Sparsity Training for the PLUG Large-Scale Language Model
DataFunTalk
DataFunTalk
Jan 3, 2022 · Artificial Intelligence

Top AI Stories of 2021: Large‑Scale Pretrained Models, Transformers, Multimodal AI, and Emerging Challenges

The article reviews the 2021 AI landscape, highlighting the race for ever‑larger pretrained models, the dominance of Transformers across modalities, the promise and limits of large models, the rise of multimodal systems, regulatory considerations, and the still‑nascent progress in reinforcement learning.

AI governanceAI industryTransformers
0 likes · 12 min read
Top AI Stories of 2021: Large‑Scale Pretrained Models, Transformers, Multimodal AI, and Emerging Challenges
DataFunTalk
DataFunTalk
Dec 24, 2021 · Artificial Intelligence

Large-Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD

This article reviews three consecutive works from Alibaba DAMO Academy on compressing and distilling large pretrained language models—AdaBERT, L2A, and Meta‑KD—detailing their motivations, neural‑architecture‑search‑based designs, loss formulations, experimental results, and insights from a Q&A session.

AIKnowledge DistillationModel Compression
0 likes · 10 min read
Large-Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD
Programmer DD
Programmer DD
Aug 29, 2021 · Artificial Intelligence

When AI Code Assistants Leak Fake IDs: What GitHub Copilot’s Slip Reveals

GitHub Copilot, powered by the Codex model, recently generated a seemingly real Chinese ID number for Bilibili CEO Chen Rui, sparking concerns about privacy leaks, model training data, and the broader risks of AI code assistants inadvertently exposing personal information.

AI Code GenerationGitHub Copilotlarge language models
0 likes · 6 min read
When AI Code Assistants Leak Fake IDs: What GitHub Copilot’s Slip Reveals
DataFunTalk
DataFunTalk
Jul 1, 2021 · Artificial Intelligence

Pre‑Trained Models: Past, Present, and Future – A Comprehensive Survey

This article surveys the evolution of pre‑trained models, covering the origins of transfer and self‑supervised learning, the rise of transformer‑based PTMs such as BERT and GPT, efficient architecture designs, multimodal and multilingual extensions, theoretical analyses, and future research directions for scalable and robust AI systems.

AI researchMultimodalefficient training
0 likes · 27 min read
Pre‑Trained Models: Past, Present, and Future – A Comprehensive Survey
DataFunTalk
DataFunTalk
Jul 7, 2020 · Artificial Intelligence

Optimizing Pretrained Language Model Inference: Lessons from the NLPCC Small Model Competition and Deployment at Xiaomi

This article shares the Xiaomi AI Lab NLP team's experience in the NLPCC lightweight language model competition, discusses efficiency challenges of large pretrained models like BERT, and details practical inference optimizations—including model distillation, batching, FP16 quantization, and FasterTransformer integration—that dramatically reduce latency and hardware costs in production.

AIBERTInference Optimization
0 likes · 15 min read
Optimizing Pretrained Language Model Inference: Lessons from the NLPCC Small Model Competition and Deployment at Xiaomi