Tagged articles

1067 articles

Page 11 of 11

Jun 12, 2023 · Artificial Intelligence

Master Prompt Engineering: Guide ChatGPT to Deliver Precise Answers

This article explains prompt engineering for large language models like ChatGPT, covering its definition, essential techniques such as diverse prompting strategies, problem restatement, background provision, gradient prompting, example inclusion, role‑playing, and the importance of systematic experimentation and quantitative evaluation to achieve high‑quality, task‑specific AI outputs.

AIChatGPTlarge language models

0 likes · 16 min read

Master Prompt Engineering: Guide ChatGPT to Deliver Precise Answers

DataFunTalk

Jun 9, 2023 · Artificial Intelligence

Expert Roundtable on Causal Inference and Large Language Models: Opportunities and Challenges

Leading experts discuss how causal inference intersects with large language models, exploring opportunities, challenges, industry applications, and future research directions, while sharing personal journeys into causal reasoning and offering practical advice for practitioners.

AI researchcausal inferenceexpert interview

0 likes · 16 min read

Expert Roundtable on Causal Inference and Large Language Models: Opportunities and Challenges

Sohu Tech Products

Jun 7, 2023 · Artificial Intelligence

Multiscale PU Learning for Detecting AI‑Generated Text

Researchers from Peking University and Huawei present a multiscale positive‑unlabeled learning framework that significantly improves detection of AI‑generated short and long texts, addressing the difficulty of distinguishing AI‑written content from human writing and outperforming existing baselines on multiple benchmarks.

AI detectionPu-Learninglarge language models

0 likes · 8 min read

Multiscale PU Learning for Detecting AI‑Generated Text

Python Programming Learning Circle

Jun 6, 2023 · Artificial Intelligence

Why ChatGPT Plus Performance Is Dropping and What OpenAI’s Roadmap Reveals

Recent reports indicate a noticeable decline in ChatGPT Plus’s GPT‑4 performance, especially in coding accuracy, prompting speculation about model scaling pain, AI alignment trade‑offs, and OpenAI’s GPU‑limited roadmap that includes cheaper models, longer context windows, finetuning, and multimodal extensions.

AI alignmentChatGPTGPT-4

0 likes · 8 min read

Why ChatGPT Plus Performance Is Dropping and What OpenAI’s Roadmap Reveals

OPPO Amber Lab

Jun 5, 2023 · Information Security

How ChatGPT Impacts Security: Key Insights from the CSA Seminar

An online CSA seminar on May 30 examined ChatGPT’s security impact, presenting a whitepaper and four AI‑security interaction dimensions, while experts discussed telecom‑operator security‑GPT models, safe vertical‑domain large‑model training, and future industry implications.

AI governanceAI securityChatGPT

0 likes · 7 min read

How ChatGPT Impacts Security: Key Insights from the CSA Seminar

Programmer DD

May 19, 2023 · Artificial Intelligence

Master Advanced Prompt Engineering: Boost LLM Performance with Proven Techniques

This article explains why effective prompt design—covering system messages, few‑shot learning, non‑dialogue scenarios, explicit instructions, output shaping, syntax cues, task decomposition, chain‑of‑thought, and real‑world context—is essential for reliable large language model results and provides practical examples and tips.

AISystem Messagesfew-shot learning

0 likes · 8 min read

Master Advanced Prompt Engineering: Boost LLM Performance with Proven Techniques

Python Crawling & Data Mining

May 7, 2023 · Artificial Intelligence

Why Does ChatGPT Suddenly ‘Think Step‑by‑Step’? Unveiling the Chain‑of‑Thought Emergence

The article explains how ChatGPT’s surprising step‑by‑step reasoning, known as Chain of Thought, emerged as a technical breakthrough, links it to model scaling, cognitive System 1/2 theory, and the influence of code data in training large language models.

AI reasoningChatGPTchain-of-thought

0 likes · 10 min read

Why Does ChatGPT Suddenly ‘Think Step‑by‑Step’? Unveiling the Chain‑of‑Thought Emergence

Rare Earth Juejin Tech Community

May 5, 2023 · Artificial Intelligence

Limitations of Generative Pre‑trained Transformers: Hallucinations, Memory, Planning, and Architectural Proposals

The article critically examines GPT‑4 and similar transformer models, highlighting persistent hallucinations, outdated knowledge, insufficient domain coverage, lack of planning and memory, and proposes architectural extensions inspired by fast‑slow thinking and differentiable modules to overcome these fundamental constraints.

AI limitationsGPT-4Model architecture

0 likes · 24 min read

Architect

Apr 27, 2023 · Artificial Intelligence

Survey of Large Language Model Research: From GPT‑1 to ChatGPT and Open‑Source Alternatives

This article provides a comprehensive overview of the development of large language models, reviewing classic papers from GPT‑1 through GPT‑4, discussing open‑source implementations such as LLaMA, Alpaca, GLM, and ChatGLM, and analyzing training methods, datasets, and future research directions.

AI researchGPTlarge language models

0 likes · 36 min read

Survey of Large Language Model Research: From GPT‑1 to ChatGPT and Open‑Source Alternatives

Kuaishou Tech

Apr 23, 2023 · Artificial Intelligence

Kuaishou & Renmin AI Institute: Driving Multimodal Large Model Innovation

The article details how Kuaishou’s multimodal AI research, including its K7 trillion‑parameter model and VLUA algorithm, partners with Renmin University’s Gaoling AI Institute to launch a joint lab, produce cutting‑edge papers such as WebBrain and ChatImg, and advance recommendation and search technologies across the short‑video ecosystem.

AIIndustry collaborationRecommendation Systems

0 likes · 17 min read

Kuaishou & Renmin AI Institute: Driving Multimodal Large Model Innovation

DataFunSummit

Apr 20, 2023 · Artificial Intelligence

Mengzi Lightweight Model Technology System and Advances in Small‑Scale and Retrieval‑Augmented Pretraining

This presentation introduces the Mengzi lightweight model technology stack, covering large‑scale pre‑training, motivations for lightweight models, detailed techniques such as knowledge and sequence‑relation enhancement, training optimization, model compression, retrieval‑augmented pre‑training, multimodal extensions, open‑source releases, and real‑world applications.

Knowledge DistillationMultimodallarge language models

0 likes · 23 min read

Mengzi Lightweight Model Technology System and Advances in Small‑Scale and Retrieval‑Augmented Pretraining

IT Architects Alliance

Apr 20, 2023 · Artificial Intelligence

Overview of Prominent Large Language Models and Instruction‑Finetuned Variants

This article provides a comprehensive overview of major large language models—including GPT series, T5, LaMDA, LLaMA, BLOOM, and others—detailing their architectures, parameter scales, open‑source status, and the evolution of instruction‑fine‑tuning techniques that improve zero‑shot and few‑shot performance.

AI researchInstruction TuningLLM comparison

0 likes · 24 min read

Overview of Prominent Large Language Models and Instruction‑Finetuned Variants

Architect

Apr 19, 2023 · Artificial Intelligence

Emergence in Large Language Models: Phenomena, Explanations, and Implications

This article reviews the emergence phenomena observed in large language models, explains how model scale, in‑context learning and chain‑of‑thought prompting contribute to sudden performance gains, discusses small‑model alternatives, and explores the relationship between emergence and the training‑time Grokking effect.

AI researchIn-Context LearningModel Scaling

0 likes · 13 min read

Emergence in Large Language Models: Phenomena, Explanations, and Implications

DataFunTalk

Apr 19, 2023 · Artificial Intelligence

Is the Daily Emergence of Large Language Models Beneficial?

The article examines the rapid proliferation of large language models, weighing both the opportunities for experimentation and the drawbacks of noise, and argues that establishing authoritative Chinese LLM evaluation benchmarks is essential to guide meaningful progress in the field.

AI researchLLM evaluationOpen Source

0 likes · 7 min read

Is the Daily Emergence of Large Language Models Beneficial?

Architect

Apr 14, 2023 · Artificial Intelligence

Overview of Prominent Large Language Models and Instruction Fine‑Tuning Techniques

The article surveys major large language models—including GPT‑3, T5, LaMDA, Jurassic‑1, MT‑NLG, Gopher, Chinchilla, PaLM, U‑PaLM, OPT, LLaMA, BLOOM, GLM‑130B, and ERNIE 3.0 Titan—explains their architectures, scaling trade‑offs, and then details instruction‑fine‑tuned variants such as T0, FLAN, GPT‑3.5, ChatGPT, GPT‑4, Alpaca and ChatGLM, providing references for further study.

AIChatGPTGPT-3

0 likes · 27 min read

Overview of Prominent Large Language Models and Instruction Fine‑Tuning Techniques

ITPUB

Apr 14, 2023 · Artificial Intelligence

How Do Generative, Perceptual, and Decision AI Interact? Insights from Jina AI’s Founder

In this interview, Jina AI’s founder Shao Han examines the relationships among generative, perceptual, and decision AI, compares single‑modal and multimodal approaches, discusses large language model development, and evaluates the impact of ChatGPT on search and future AI commercialization.

AIAI commercializationSearch

0 likes · 11 min read

How Do Generative, Perceptual, and Decision AI Interact? Insights from Jina AI’s Founder

Programmer DD

Apr 14, 2023 · Artificial Intelligence

How DeepSpeed-Chat Accelerates ChatGPT‑Style Model Training by 15×

Microsoft open‑sourced DeepSpeed‑Chat, a toolkit that streamlines the end‑to‑end training and inference of ChatGPT‑like large language models using RLHF, delivering up to fifteen‑fold speedups and dramatically lower costs, even on a single GPU.

ChatGPTDeepSpeedRLHF

0 likes · 8 min read

How DeepSpeed-Chat Accelerates ChatGPT‑Style Model Training by 15×

Top Architect

Apr 12, 2023 · Artificial Intelligence

Data‑Centric AI Perspective on GPT Models: Training, Inference, and Maintenance

This article examines how large language models such as GPT‑1 through GPT‑4 succeed largely due to high‑quality, large‑scale training data, and explains the Data‑centric AI framework—training data development, inference data development, and data maintenance—while discussing prompt engineering, data‑driven improvements, and future trends in AI.

AIData‑Centric AIGPT

0 likes · 19 min read

Data‑Centric AI Perspective on GPT Models: Training, Inference, and Maintenance

Architect

Apr 9, 2023 · Artificial Intelligence

Evaluating the Commonsense Knowledge and Reasoning Capabilities of ChatGPT and Other Large Language Models

This study systematically evaluates ChatGPT and other large language models on their ability to answer commonsense questions, assess their knowledge awareness, and utilize generated knowledge for reasoning, revealing strong QA performance but notable gaps in social and temporal commonsense and in leveraging contextual knowledge.

ChatGPTNLPcommonsense reasoning

0 likes · 20 min read

Evaluating the Commonsense Knowledge and Reasoning Capabilities of ChatGPT and Other Large Language Models

DataFunSummit

Apr 7, 2023 · Artificial Intelligence

China's AI Startup Landscape: Who Holds the Greatest Advantage in the Large Model Race?

Amid the 2023 AI boom, veteran entrepreneurs, former tech executives, and academic teams in China are racing to launch large‑model ventures, each leveraging distinct funding, talent, and product strategies, while major tech giants simultaneously push their own AI offerings, reshaping the market dynamics.

AI startupsChina AIlarge language models

0 likes · 12 min read

China's AI Startup Landscape: Who Holds the Greatest Advantage in the Large Model Race?

DataFunTalk

Apr 6, 2023 · Artificial Intelligence

A Comprehensive Survey of Large Language Models: Background, Capabilities, Key Technologies, and Future Directions

This article reviews the rapid progress of large language models (LLMs), covering their historical development, scaling laws, emergent abilities, core technologies such as training and alignment, resource ecosystems, evaluation methods, safety concerns, and prospective research challenges.

AI researchLLMScaling

0 likes · 21 min read

A Comprehensive Survey of Large Language Models: Background, Capabilities, Key Technologies, and Future Directions

Xiaohongshu Tech REDtech

Apr 3, 2023 · Industry Insights

What Drives Intelligent Recommendation and Search? Key Takeaways from Xiaohongshu’s CCF C³ Event

The CCF C³ event at Xiaohongshu gathered leading researchers and industry experts to dissect the latest advances, challenges, and future opportunities in intelligent recommendation and search, including multimodal content handling, decentralized distribution, cold‑start solutions, and the impact of large language models.

AIIndustry InsightsRecommendation Systems

0 likes · 11 min read

What Drives Intelligent Recommendation and Search? Key Takeaways from Xiaohongshu’s CCF C³ Event

Python Programming Learning Circle

Mar 25, 2023 · Artificial Intelligence

Impact of ChatGPT and Large Language Models on the U.S. Labor Market

A recent OpenAI study estimates that roughly 80% of U.S. workers will see at least 10% of their tasks affected by ChatGPT and similar large language models, with about 19% of occupations potentially losing half of their tasks, highlighting widespread economic implications across all wage levels.

AI economicsChatGPTGPT‑4

0 likes · 9 min read

Impact of ChatGPT and Large Language Models on the U.S. Labor Market

Baidu Geek Talk

Mar 21, 2023 · Artificial Intelligence

Infrastructure Challenges and Solutions for Large‑Scale AI Model Training

The article explains how the massive compute and storage demands of today’s large language models create a “compute wall” and “storage wall,” and describes Baidu Intelligent Cloud’s four‑layer full‑stack infrastructure—combining advanced parallelism techniques, optimized GPU networking, static‑graph compilation, and cost‑model‑driven placement—to train trillion‑parameter models efficiently.

AI infrastructureCost ModelGPU clusters

0 likes · 27 min read

Infrastructure Challenges and Solutions for Large‑Scale AI Model Training

Efficient Ops

Mar 16, 2023 · Artificial Intelligence

What’s New in AI? Baidu’s Wenxin, GPT‑4 Multimodal, Docker’s Policy Shift & More

The article reviews the latest AI breakthroughs—including Baidu’s Wenxin Yiyan launch, OpenAI’s multimodal GPT‑4, Google’s PaLM API, Microsoft’s super‑computer investment—and also covers Docker’s paid‑plan warning, the US demand for TikTok’s sale, and a Stack Overflow technology survey.

Artificial IntelligenceDockerGPT-4

0 likes · 7 min read

What’s New in AI? Baidu’s Wenxin, GPT‑4 Multimodal, Docker’s Policy Shift & More

DataFunTalk

Mar 16, 2023 · Artificial Intelligence

Technical Optimizations and Breakthroughs of GPT‑4: Multimodal Capabilities, Alignment Strategies, and Predictable Scaling

The article summarizes the technical innovations behind GPT‑4, highlighting its multimodal abilities, improved alignment methods, scaling‑law‑based performance prediction, and remaining limitations, while referencing the official OpenAI technical report and community analyses.

AI researchGPT-4alignment

0 likes · 10 min read

Technical Optimizations and Breakthroughs of GPT‑4: Multimodal Capabilities, Alignment Strategies, and Predictable Scaling

21CTO

Mar 11, 2023 · Artificial Intelligence

Microsoft Announces Multimodal GPT-4: A New ‘iPhone Moment’ for AI

Microsoft Germany's CTO announced the imminent release of a multimodal GPT‑4, highlighting its ability to process text, images and video, while executives liken the breakthrough to an “iPhone moment” for AI, emphasizing new capabilities, industry disruption, and responsible data use.

AI developmentGPT-4Microsoft

0 likes · 6 min read

Microsoft Announces Multimodal GPT-4: A New ‘iPhone Moment’ for AI

DataFunSummit

Feb 28, 2023 · Artificial Intelligence

Baidu Document Intelligence Technology Overview and Applications

This article presents a comprehensive overview of Baidu's document intelligence technologies—including the ERNIE‑Layout multimodal large model, the prompt‑based DocPrompt extraction system, layout and table understanding techniques, and PaddleNLP open‑source integration—detailing their architectures, challenges, solutions, performance benchmarks, and real‑world application cases across multiple industries.

DocPromptDocument IntelligenceERNIE-Layout

0 likes · 19 min read

Baidu Document Intelligence Technology Overview and Applications

21CTO

Feb 27, 2023 · Artificial Intelligence

What’s Next for Large Language Models? Emerging Trends Shaping AI

The article explores three emerging directions for next‑generation large language models—self‑generated training data, built‑in verification with external retrieval, and massive sparse‑expert architectures—highlighting recent research, practical challenges, and their potential to reshape AI development.

AI researchgenerative AIlarge language models

0 likes · 17 min read

What’s Next for Large Language Models? Emerging Trends Shaping AI

21CTO

Feb 23, 2023 · Artificial Intelligence

How Does ChatGPT Really Work? Inside the RLHF Training Process

This article explains ChatGPT’s architecture, the distinction between model capability and consistency, how next‑token and masked‑language‑model training lead to inconsistencies, and how OpenAI’s supervised fine‑tuning, reward‑model training, and PPO reinforcement learning (RLHF) are combined to improve alignment while highlighting the method’s limitations.

AI alignmentChatGPTRLHF

0 likes · 15 min read

How Does ChatGPT Really Work? Inside the RLHF Training Process

21CTO

Feb 20, 2023 · Artificial Intelligence

How the AI Arms Race Between Microsoft and Google Is Reshaping Search

The escalating competition between Microsoft and Google over large language models is driving new search experiences, reshaping AI research, and raising concerns about content quality, privacy, and the future of smaller AI innovators.

AI competitionBingChatGPT

0 likes · 10 min read

How the AI Arms Race Between Microsoft and Google Is Reshaping Search

dbaplus Community

Feb 18, 2023 · Artificial Intelligence

Why ChatGPT Still Gets It Wrong: Inside RLHF and Model Consistency

ChatGPT, OpenAI’s latest language model, builds on GPT‑3 but uses supervised fine‑tuning and Reinforcement Learning from Human Feedback (RLHF) to improve alignment, yet its training methods still cause consistency issues such as invalid help, hallucinations, bias, and limited explainability.

ChatGPTPPORLHF

0 likes · 17 min read

Why ChatGPT Still Gets It Wrong: Inside RLHF and Model Consistency

Architecture Digest

Feb 17, 2023 · Artificial Intelligence

Analyzing the Emergent Abilities of ChatGPT and the Technical Roadmap of GPT‑3.5

This article dissects how ChatGPT acquired its surprising capabilities by tracing the evolution from the original GPT‑3 model through instruction tuning, code‑based pre‑training, and reinforcement learning from human feedback, ultimately presenting a comprehensive technical roadmap for reproducing GPT‑3.5‑scale models.

ChatGPTGPT-3.5Instruction Tuning

0 likes · 26 min read

Analyzing the Emergent Abilities of ChatGPT and the Technical Roadmap of GPT‑3.5

DataFunTalk

Feb 15, 2023 · Artificial Intelligence

Three Emerging Directions for Next‑Generation Large Language Models

The article outlines three promising research avenues—self‑generated training data, model‑driven fact‑checking, and sparse expert architectures—that could shape the next wave of large language model innovation and address current limitations such as data scarcity and hallucinations.

AI researchlarge language modelsmodel self‑improvement

0 likes · 14 min read

Three Emerging Directions for Next‑Generation Large Language Models

Architects' Tech Alliance

Feb 13, 2023 · Artificial Intelligence

Why ChatGPT’s Explosive Growth Is Redefining the AI Landscape

Within five days of its launch, ChatGPT attracted over one million users, prompting massive investment from Microsoft, sparking fierce competition with Google, and raising critical questions about the technology's scalability, reliability, and societal impact.

AI industryChatGPTMicrosoft

0 likes · 18 min read

Why ChatGPT’s Explosive Growth Is Redefining the AI Landscape

Architects' Tech Alliance

Feb 13, 2023 · Artificial Intelligence

Do Large Language Models Really Have Theory of Mind? Stanford Study Reveals Surprising Results

A recent Stanford paper shows that GPT‑3.5 and its predecessor can pass classic Theory of Mind tests at levels comparable to 7‑9‑year‑old children, sparking debate over whether these abilities are genuine understanding or emergent by‑products of scaling.

AI evaluationGPT-3.5Stanford Research

0 likes · 10 min read

Do Large Language Models Really Have Theory of Mind? Stanford Study Reveals Surprising Results

Open Source Linux

Feb 13, 2023 · Artificial Intelligence

How Does ChatGPT Work? Inside RLHF and Model Consistency

This article explains the inner workings of ChatGPT, detailing its evolution from GPT‑3, the role of reinforcement learning from human feedback (RLHF) in improving consistency, the training pipeline steps, and the limitations and evaluation methods of large language models.

AIChatGPTRLHF

0 likes · 15 min read

How Does ChatGPT Work? Inside RLHF and Model Consistency

DataFunSummit

Feb 12, 2023 · Artificial Intelligence

Claude vs. ChatGPT: Constitutional AI, RLAIF, and the Quest for Safer Large‑Language Models

This article reviews Anthropic's Claude assistant, explains the novel Constitutional AI (RLAIF) approach that replaces costly human‑feedback data with a set of natural‑language principles, compares Claude with ChatGPT across helpfulness and harmlessness, and details the supervision and reinforcement‑learning pipelines, data annotation, and experimental results that demonstrate superior safety performance.

AI safetyClaudeHarmlessness

0 likes · 21 min read

Claude vs. ChatGPT: Constitutional AI, RLAIF, and the Quest for Safer Large‑Language Models

Top Architect

Feb 11, 2023 · Artificial Intelligence

ChatGPT: Technical Overview, Architecture, Training Process, Limitations and Future Directions

This article provides a comprehensive technical overview of ChatGPT, covering its origins, underlying GPT architecture, reinforcement learning from human feedback, training stages, current limitations, and prospective improvements such as model compression, constitutional AI, and integration with AIGC technologies.

AIGCArtificial IntelligenceChatGPT

0 likes · 18 min read

ChatGPT: Technical Overview, Architecture, Training Process, Limitations and Future Directions

Laravel Tech Community

Feb 9, 2023 · Artificial Intelligence

Understanding ChatGPT: Architecture, Training Strategies, and Alignment Challenges

This article explains how ChatGPT builds on GPT‑3, describes the supervised‑plus‑reinforcement learning (RLHF) pipeline that fine‑tunes the model, compares model capability with consistency, and discusses the performance evaluation and remaining limitations of large language models.

ChatGPTRLHFalignment

0 likes · 15 min read

Architect

Feb 9, 2023 · Artificial Intelligence

Emergent Abilities of Large Language Models: Complex Reasoning, Knowledge Reasoning, and Out‑of‑Distribution Robustness

This article reviews recent research on the emergent abilities of large language models—such as chain‑of‑thought reasoning, knowledge retrieval without external sources, and robustness to distribution shifts—examining scaling laws, model size thresholds, and the open questions surrounding a potential paradigm shift from fine‑tuning to in‑context learning.

AI researchchain-of-thought promptingemergent abilities

0 likes · 23 min read

Emergent Abilities of Large Language Models: Complex Reasoning, Knowledge Reasoning, and Out‑of‑Distribution Robustness

Top Architect

Feb 9, 2023 · Artificial Intelligence

How ChatGPT Works: Training, RLHF, and Consistency Issues

ChatGPT, OpenAI’s latest language model, builds on GPT‑3 and improves performance through supervised fine‑tuning, human‑feedback reinforcement learning (RLHF), and PPO optimization, addressing consistency challenges such as misaligned outputs, bias, and hallucinations while evaluating helpfulness, truthfulness, and harmlessness.

ChatGPTRLHFlarge language models

0 likes · 15 min read

How ChatGPT Works: Training, RLHF, and Consistency Issues

IT Architects Alliance

Feb 9, 2023 · Artificial Intelligence

How ChatGPT Works: Model Architecture, Training Strategies, and RLHF

ChatGPT, OpenAI’s latest language model, builds on GPT‑3 using supervised fine‑tuning and Reinforcement Learning from Human Feedback (RLHF) with PPO, addressing consistency issues by aligning model outputs with human preferences, while discussing training methods, limitations, and evaluation metrics.

AI alignmentChatGPTPPO

0 likes · 15 min read

How ChatGPT Works: Model Architecture, Training Strategies, and RLHF

Top Architect

Feb 8, 2023 · Artificial Intelligence

A Technical Roadmap of GPT‑3.5: From Pre‑training to RLHF and Emerging Capabilities

This article analyses how ChatGPT and the GPT‑3.5 series evolved from the original GPT‑3 through large‑scale pre‑training, code‑based training, instruction tuning, and reinforcement learning from human feedback, identifying the origins of their language generation, in‑context learning, world knowledge, code understanding, chain‑of‑thought reasoning, and alignment capabilities while also outlining current limitations.

ChatGPTGPT-3.5Instruction Tuning

0 likes · 27 min read

A Technical Roadmap of GPT‑3.5: From Pre‑training to RLHF and Emerging Capabilities

DataFunSummit

Feb 7, 2023 · Artificial Intelligence

How to Evaluate OpenAI's Super Conversational Model ChatGPT?

This article compiles three highly upvoted Zhihu answers that examine OpenAI's ChatGPT, discussing its breakthrough impact on NLP, visual in‑context learning, reinforcement‑learning‑from‑human‑feedback, and the broader implications for AI research and development.

AI researchChatGPTIn-Context Learning

0 likes · 10 min read

How to Evaluate OpenAI's Super Conversational Model ChatGPT?

Architects' Tech Alliance

Feb 6, 2023 · Artificial Intelligence

What Makes ChatGPT Tick? A Deep Dive into Its Architecture, Limits, and Market Impact

This article provides a comprehensive analysis of ChatGPT, covering its origins within the OpenAI GPT family, core technical features such as RLHF training and model compression, current limitations, future improvement directions, and the broader industry and investment opportunities generated by large‑language‑model AI.

AI industryChatGPTModel Compression

0 likes · 20 min read

What Makes ChatGPT Tick? A Deep Dive into Its Architecture, Limits, and Market Impact

DataFunTalk

Jan 15, 2023 · Artificial Intelligence

Advances in Dialogue Systems: Baidu PLATO Large‑Scale Conversational Models

This article reviews the evolution of dialogue systems from modular task‑oriented designs to end‑to‑end large‑scale models, detailing Baidu's PLATO series, their technical innovations, real‑world deployments, challenges such as inference efficiency and safety, and future research directions in conversational AI.

AI safetyConversational AIDialogue Systems

0 likes · 13 min read

Advances in Dialogue Systems: Baidu PLATO Large‑Scale Conversational Models

21CTO

Jan 10, 2023 · Artificial Intelligence

Why Major Conferences Are Banning ChatGPT-Generated Papers—and What It Means for AI Research

Amid growing concerns over plagiarism, critical thinking, and copyright, leading AI conferences, educational institutions, and developers are imposing bans and detection tools on ChatGPT-generated content, sparking a worldwide debate on the future of academic publishing and AI governance.

AI policyChatGPTacademic integrity

0 likes · 8 min read

Why Major Conferences Are Banning ChatGPT-Generated Papers—and What It Means for AI Research

DataFunTalk

Jan 10, 2023 · Artificial Intelligence

Paradigm Shifts in Large Language Model Research and Future Directions

The article reviews the evolution of large language models from the pre‑GPT‑3 era to the present, analyzes the conceptual and technical gaps between Chinese and global research, and outlines key future research directions such as scaling laws, prompting techniques, multimodal training, and efficient model architectures.

AI researchChatGPTIn-Context Learning

0 likes · 73 min read

Paradigm Shifts in Large Language Model Research and Future Directions

Alibaba Cloud Big Data AI Platform

Jan 10, 2023 · Artificial Intelligence

How GPT‑MoE Cuts Training Costs: Sparse Transformer Techniques and Performance Insights

This article examines the use of Mixture‑of‑Experts (MoE) sparse training for GPT models, detailing the architecture, training and inference efficiency gains, experimental comparisons with dense models, custom routing algorithms, and step‑by‑step deployment on Alibaba Cloud AI platforms.

AI efficiencyGPT-MoESparse Transformers

0 likes · 26 min read

How GPT‑MoE Cuts Training Costs: Sparse Transformer Techniques and Performance Insights

Xiaohongshu Tech REDtech

Jan 3, 2023 · Artificial Intelligence

Insights into ChatGPT: Capabilities, Limitations, and Implications for AI Research

During Xiaohongshu’s REDtech livestream, AI researchers examined ChatGPT’s rapid adoption, versatile task performance, and underlying large‑scale pre‑training with in‑context learning, while highlighting persistent hallucinations, weak reasoning, high costs, and limited search‑engine replacement potential, and emphasized the importance of RLHF‑driven human feedback for future multimodal AI research.

AI researchChatGPTRLHF

0 likes · 14 min read

Insights into ChatGPT: Capabilities, Limitations, and Implications for AI Research

Sohu Tech Products

Dec 28, 2022 · Artificial Intelligence

Reid Hoffman and Sam Altman Discuss the Future Directions and Business Opportunities of AI

In a detailed conversation, LinkedIn co‑founder Reid Hoffman and OpenAI CEO Sam Altman explore the commercial potential of large language models, the emerging AI‑plus era for science and the metaverse, and their vision for the next decade of AI breakthroughs, societal impact, and governance.

AIAI applicationsOpenAI

0 likes · 14 min read

Reid Hoffman and Sam Altman Discuss the Future Directions and Business Opportunities of AI

21CTO

Dec 15, 2022 · Artificial Intelligence

Sam Altman & Reid Hoffman on AI’s Future: Business, Multimodal Models, Society

In a candid conversation, Sam Altman and Reid Hoffman explore the next stage of AI, discussing commercial opportunities of large language models, the rise of AI‑plus applications in science and the metaverse, future directions such as multimodal and continuously learning models, and the societal challenges of AGI, wealth distribution and universal basic income.

AGIAIAI commercialization

0 likes · 16 min read

Sam Altman & Reid Hoffman on AI’s Future: Business, Multimodal Models, Society

DataFunTalk

Aug 28, 2022 · Artificial Intelligence

Emerging Paths Toward General AI: Trends in Large‑Scale Pretrained Models

The article reviews how the Transformer breakthrough, the rapid scaling of large language models such as GPT‑3, Switch Transformer, and Alibaba's AliceMind and M6, together with multimodal research, are shaping the next phase of artificial intelligence toward more general, collaborative, and open AI systems.

AI trendsArtificial Intelligencelarge language models

0 likes · 5 min read

Emerging Paths Toward General AI: Trends in Large‑Scale Pretrained Models

DataFunSummit

Jul 27, 2022 · Artificial Intelligence

DataFun 2022 Natural Language Processing Summit – Leading Experts Discuss Large‑Scale Language Models, Multimodal Understanding, Dialogue Systems and AI Applications

The DataFun 2022 NLP Summit, held on July 30, brings together top researchers and industry leaders from Alibaba, Baidu, Microsoft, Amazon, and more to present the latest advances in large‑scale pre‑training, multimodal perception, information extraction, dialogue interaction, machine translation, and practical AI deployments, with live streaming and free registration via QR code.

AIDialogue SystemsMultimodal

0 likes · 44 min read

DataFun 2022 Natural Language Processing Summit – Leading Experts Discuss Large‑Scale Language Models, Multimodal Understanding, Dialogue Systems and AI Applications

DataFunSummit

Apr 25, 2022 · Artificial Intelligence

Token‑Level Pipeline Parallelism for Transformer‑based Language Models (TeraPipe)

The article introduces a token‑level pipeline parallelism strategy that splits the sequence‑length dimension of Transformer‑based language models, explains why this approach is feasible, presents a dynamic‑programming formulation for optimal slicing, discusses engineering challenges, and evaluates its performance on large GPT models.

Performance OptimizationPipeline ParallelismToken-level

0 likes · 13 min read

Token‑Level Pipeline Parallelism for Transformer‑based Language Models (TeraPipe)

IT Services Circle

Apr 21, 2022 · Artificial Intelligence

Microsoft Introduces Jigsaw: An AI Tool to Boost Large Language Model Code Generation

Microsoft's Jigsaw tool leverages post‑processing and user feedback to improve large language model code synthesis, especially for Python Pandas, achieving up to 80% accuracy and aiming to automate code verification and debugging for developers.

AIMicrosoft JigsawPython Pandas

0 likes · 5 min read

Microsoft Introduces Jigsaw: An AI Tool to Boost Large Language Model Code Generation

Meituan Technology Team

Mar 17, 2022 · Artificial Intelligence

Tsinghua University & Meituan Digital Life Joint Research Institute Academic Salon: Large Model Technologies and Challenges

The Tsinghua‑Meituan Digital Life Joint Research Institute’s Academic Salon on March 23 featured Associate Professor Liu Zhiyuan presenting the latest advances and ten key challenges in large‑model technologies, aiming to foster industry‑academia collaboration and drive innovation in representation learning, knowledge graphs, and social computing.

Academic SeminarArtificial IntelligenceNLP

0 likes · 4 min read

Tsinghua University & Meituan Digital Life Joint Research Institute Academic Salon: Large Model Technologies and Challenges

Volcano Engine Developer Services

Mar 16, 2022 · Artificial Intelligence

How veGiantModel Boosts Large Language Model Training Up to 6.9× Faster

The article introduces Volcano Engine's veGiantModel, a high‑performance large‑model training framework built on PyTorch, Megatron and DeepSpeed, details its distributed parallel strategies, hardware setups, benchmark results showing up to 6.9× speedup over Megatron and DeepSpeed, and provides open‑source links for further use.

ByteCCLdistributed traininglarge language models

0 likes · 6 min read

How veGiantModel Boosts Large Language Model Training Up to 6.9× Faster

DataFunTalk

Mar 16, 2022 · Artificial Intelligence

Parameter-Efficient Sparsity Training for the PLUG Large-Scale Language Model

This article presents the PLUG 270‑billion‑parameter Chinese language model and introduces a parameter‑efficient sparsity training (PST) framework that combines unstructured and structured pruning with low‑rank decomposition to dramatically reduce model size while preserving downstream performance.

PLUGParameter-Efficient Trainingdeep learning

0 likes · 13 min read

Parameter-Efficient Sparsity Training for the PLUG Large-Scale Language Model

DataFunTalk

Jan 3, 2022 · Artificial Intelligence

Large-Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD

This article reviews three consecutive works from Alibaba DAMO Academy on compressing and distilling large pretrained language models—AdaBERT, L2A, and Meta‑KD—detailing their motivations, neural‑architecture‑search‑based designs, loss formulations, experimental results, and insights from a Q&A session.

AIKnowledge DistillationModel Compression

0 likes · 10 min read

Large-Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD

Java Architect Essentials

Sep 5, 2021 · Industry Insights

When AI Code Assistants Leak Fake IDs: GitHub Copilot’s Privacy Risks Explained

GitHub Copilot once auto‑completed a seemingly real Chinese ID number for a public figure, sparking concerns that large language models can unintentionally expose personal data learned from public web sources, highlighting privacy and security challenges in AI‑driven code assistants.

AI privacyGitHub CopilotInformation Security

0 likes · 8 min read

When AI Code Assistants Leak Fake IDs: GitHub Copilot’s Privacy Risks Explained

Java High-Performance Architecture

Sep 3, 2021 · Artificial Intelligence

When AI Code Completion Leaks Fake ID Numbers: Copilot’s Privacy Risks

GitHub Copilot unexpectedly generated a fabricated ID for Bilibili CEO Chen Rui, sparking concerns about how large language models trained on public data can inadvertently expose synthetic personal information and highlighting broader privacy and ethical issues in AI‑driven code assistants.

AI privacyGitHub Copilotcode generation

0 likes · 6 min read

When AI Code Completion Leaks Fake ID Numbers: Copilot’s Privacy Risks

Programmer DD

Aug 29, 2021 · Artificial Intelligence

When AI Code Assistants Leak Fake IDs: What GitHub Copilot’s Slip Reveals

GitHub Copilot, powered by the Codex model, recently generated a seemingly real Chinese ID number for Bilibili CEO Chen Rui, sparking concerns about privacy leaks, model training data, and the broader risks of AI code assistants inadvertently exposing personal information.

AI Code GenerationGitHub Copilotlarge language models

0 likes · 6 min read

When AI Code Assistants Leak Fake IDs: What GitHub Copilot’s Slip Reveals

DataFunTalk

Jul 1, 2021 · Artificial Intelligence

Pre‑Trained Models: Past, Present, and Future – A Comprehensive Survey

This article surveys the evolution of pre‑trained models, covering the origins of transfer and self‑supervised learning, the rise of transformer‑based PTMs such as BERT and GPT, efficient architecture designs, multimodal and multilingual extensions, theoretical analyses, and future research directions for scalable and robust AI systems.

AI researchMultimodalefficient training

0 likes · 27 min read

Pre‑Trained Models: Past, Present, and Future – A Comprehensive Survey

DataFunTalk

Jul 7, 2020 · Artificial Intelligence

Optimizing Pretrained Language Model Inference: Lessons from the NLPCC Small Model Competition and Deployment at Xiaomi

This article shares the Xiaomi AI Lab NLP team's experience in the NLPCC lightweight language model competition, discusses efficiency challenges of large pretrained models like BERT, and details practical inference optimizations—including model distillation, batching, FP16 quantization, and FasterTransformer integration—that dramatically reduce latency and hardware costs in production.

AIBERTInference Optimization

0 likes · 15 min read

Optimizing Pretrained Language Model Inference: Lessons from the NLPCC Small Model Competition and Deployment at Xiaomi