30 min read

Mastering Prompt Engineering: Few‑Shot, Chain‑of‑Thought, and Self‑Consistency Techniques

This article breaks down three core prompt‑engineering techniques—Few‑Shot prompting for output format stability, Chain‑of‑Thought for multi‑step reasoning, and Self‑Consistency for answer robustness—showing when to use each, how to combine them in LangChain, and providing concrete code examples, performance data, and common pitfalls.

James' Growth Diary

May 28, 2026

Mastering Prompt Engineering: Few‑Shot, Chain‑of‑Thought, and Self‑Consistency Techniques

01 Zero-shot vs Few-shot: The Power of Examples

Zero-shot provides no examples, while Few-shot adds a few input → output pairs to teach the model the desired format and reasoning style. In a sentiment‑analysis task, format stability rises from ~60% with zero‑shot to >95% with just three examples.

LangChain wraps this logic in FewShotPromptTemplate, which can manage static or dynamic example selection.

import { FewShotPromptTemplate, PromptTemplate } from "@langchain/core/prompts";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";
import { OpenAIEmbeddings } from "@langchain/openai";
import { MemoryVectorStore } from "langchain/vectorstores/memory";

const examples = [
  { review: "这款手机拍照超级清晰，完全超出预期", sentiment: "积极" },
  { review: "快递三天没到，客服没人回，服务太差了", sentiment: "消极" },
  { review: "包装还行，产品质量一般般，没什么特别的", sentiment: "中性" }
];

const examplePrompt = new PromptTemplate({
  inputVariables: ["review", "sentiment"],
  template: "评论：{review}
情感：{sentiment}"
});

// Static Few-shot (small example set)
const fewShotPrompt = new FewShotPromptTemplate({
  examples,
  examplePrompt,
  prefix: "请判断评论情感倾向，只输出：积极/消极/中性

",
  suffix: "评论：{input}
情感：",
  inputVariables: ["input"]
});

// Dynamic Few-shot (example pool > 20, select 3 most similar)
const selector = await SemanticSimilarityExampleSelector.fromExamples(
  examples,
  new OpenAIEmbeddings(),
  MemoryVectorStore,
  { k: 3 }
);

const dynamicFewShotPrompt = new FewShotPromptTemplate({
  exampleSelector: selector, // replace examples field
  examplePrompt,
  prefix: "请判断评论情感倾向，只输出：积极/消极/中性

",
  suffix: "评论：{input}
情感：",
  inputVariables: ["input"]
});

Python equivalent (langchain >=0.2):

from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate
from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

examples = [
    {"review": "这款手机拍照超级清晰，完全超出预期", "sentiment": "积极"},
    {"review": "快递三天没到，客服没人回，服务太差了", "sentiment": "消极"},
    {"review": "包装还行，产品质量一般般，没什么特别的", "sentiment": "中性"}
]

example_prompt = PromptTemplate(
    input_variables=["review", "sentiment"],
    template="评论：{review}
情感：{sentiment}"
)

# Static Few-shot
few_shot_prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix="请判断评论情感倾向，只输出：积极/消极/中性

",
    suffix="评论：{input}
情感：",
    input_variables=["input"]
)

# Dynamic Few-shot (select 3 most similar)
selector = SemanticSimilarityExampleSelector.from_examples(
    examples,
    OpenAIEmbeddings(),
    FAISS,
    k=3
)

dynamic_few_shot_prompt = FewShotPromptTemplate(
    example_selector=selector,  # replace examples field
    example_prompt=example_prompt,
    prefix="请判断评论情感倾向，只输出：积极/消极/中性

",
    suffix="评论：{input}
情感：",
    input_variables=["input"]
)

02 Chain‑of‑Thought (CoT): Writing Out Reasoning Steps

CoT improves reasoning accuracy by forcing the model to generate intermediate steps. Zero‑shot CoT adds a trigger phrase; Few‑shot CoT supplies full question → reasoning → answer examples for more stable formatting.

JavaScript example (LangChain):

import { ChatOpenAI } from "@langchain/openai";
import { PromptTemplate } from "@langchain/core/prompts";
import { StringOutputParser } from "@langchain/core/output_parsers";

const llm = new ChatOpenAI({ model: "gpt-4o", temperature: 0 });

// Zero‑shot CoT template
const zeroShotCotTemplate = `请解决以下问题。在给出最终答案之前，请一步一步地展示推理过程。最后一行格式：答案：XXX

问题：{question}

推理过程：`;

// Few‑shot CoT template with example
const fewShotCotTemplate = `以下是推理示例：

问题：小明有24块糖，分给4个朋友每人相等，还剩4块留给自己，对吗？
推理：
1. 总糖数：24块
2. 每人：24÷4=6块，分出24块
3. 剩余：24-24=0块，不是4块
答案：不对，分完恰好没有剩余。

---

请用同样的格式解决：
问题：{question}
推理：`;

const chain = PromptTemplate.fromTemplate(fewShotCotTemplate)
  .pipe(llm)
  .pipe(new StringOutputParser());

const result = await chain.invoke({
  question: "仓库有120件商品，第一天卖出1/4，第二天入库30件，第三天卖出剩余的1/3，最终剩多少件？"
});
// 推理：第一天→90件，入库→120件，第三天卖40件→剩80件
// 答案：80件

Python equivalent:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(model="gpt-4o", temperature=0)

zero_shot_cot_template = """请解决以下问题。在给出最终答案之前，请一步一步地展示推理过程。最后一行格式：答案：XXX

问题：{question}

推理过程："""

few_shot_cot_template = """以下是推理示例：

问题：小明有24块糖，分给4个朋友每人相等，还剩4块留给自己，对吗？
推理：
1. 总糖数：24块
2. 每人：24÷4=6块，分出24块
3. 剩余：24-24=0块，不是4块
答案：不对，分完恰好没有剩余。

---

请用同样的格式解决：
问题：{question}
推理："""

chain = PromptTemplate.from_template(few_shot_cot_template) | llm | StrOutputParser()
result = chain.invoke({"question": "仓库有120件商品，第一天卖出1/4，第二天入库30件，第三天卖出剩余的1/3，最终剩多少件？"})
# 推理：第一天→90件，入库→120件，第三天卖40件→剩80件
# 答案：80件

In a LangChain Agent, CoT can be enforced by adding a system prompt that requires a 思考 (thinking) block before any tool call.

import { createReactAgent } from "@langchain/langgraph/prebuilt";

const agentPrompt = `你是一个专业助手。在执行任何操作之前，请先用以下格式思考：

思考：[分析问题，列出解决步骤]
行动：[选择要调用的工具和参数]
观察：[工具返回的结果]
...（重复直到找到答案）
最终答案：[综合所有观察给出结论]

绝对不要跳过「思考」步骤直接给出答案。`;

const agent = createReactAgent({
  llm: new ChatOpenAI({ model: "gpt-4o" }),
  tools,
  messageModifier: agentPrompt
});

Python version (langgraph):

from langchain_openai import ChatOpenAI
from langchaingraph.prebuilt import create_react_agent

agent_prompt = """你是一个专业助手。在执行任何操作之前，请先用以下格式思考：

思考：[分析问题，列出解决步骤]
行动：[选择要调用的工具和参数]
观察：[工具返回的结果]
...（重复直到找到答案）
最终答案：[综合所有观察给出结论]

绝对不要跳过「思考」步骤直接给出答案。"""

agent = create_react_agent(
    model=ChatOpenAI(model="gpt-4o"),
    tools=tools,
    prompt=agent_prompt,
)

03 Self‑Consistency: Voting Across Multiple Reasoning Paths

Even with CoT, the same question can yield different answers when temperature > 0. Self‑Consistency runs several CoT chains in parallel and selects the majority answer, improving accuracy by 10‑17% on math reasoning (Wei et al., 2022) and about 13% on commonsense tasks.

// Simplified illustration
question → [CoT path 1 → answer A]
        → [CoT path 2 → answer A] → vote → answer A (2 votes)
        → [CoT path 3 → answer B]

JavaScript implementation:

async function selfConsistency(question, numPaths = 5) {
  const llm = new ChatOpenAI({ model: "gpt-4o", temperature: 0.7 });
  const prompt = PromptTemplate.fromTemplate(`请解决以下问题，先展示推理过程，最后一行格式：答案：XXX

问题：{question}`);
  const chain = prompt.pipe(llm).pipe(new StringOutputParser());

  const responses = await Promise.all(
    Array.from({ length: numPaths }, () => chain.invoke({ question }))
  );

  const answerCounts = new Map();
  for (const response of responses) {
    const match = response.match(/答案[：:]
?\s*(.+?)(?:
|$)/);
    const answer = match ? match[1].trim() : "未知";
    answerCounts.set(answer, (answerCounts.get(answer) || 0) + 1);
  }

  const [[bestAnswer, maxVotes]] = [...answerCounts.entries()].sort((a, b) => b[1] - a[1]);
  return { answer: bestAnswer, confidence: maxVotes / numPaths, votes: Object.fromEntries(answerCounts) };
}

const result = await selfConsistency("服务器每秒处理100个任务，高峰时每秒新增150个，持续10分钟后恢复正常。积压了多少任务？", 5);
console.log(`答案：${result.answer}，置信度：${(result.confidence * 100).toFixed(0)}%`);
// 输出示例：答案：30000个，置信度：80%

Python version:

import asyncio, re
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

async def self_consistency(question: str, num_paths: int = 5):
    llm = ChatOpenAI(model="gpt-4o", temperature=0.7)
    prompt = PromptTemplate.from_template("请解决以下问题，先展示推理过程，最后一行格式：答案：XXX

问题：{question}")
    chain = prompt | llm | StrOutputParser()
    responses = await asyncio.gather(*[chain.ainvoke({"question": question}) for _ in range(num_paths)])
    answer_counts: dict[str, int] = {}
    for r in responses:
        m = re.search(r"答案[：:]
?\s*(.+?)(?:
|$)", r)
        ans = m.group(1).strip() if m else "未知"
        answer_counts[ans] = answer_counts.get(ans, 0) + 1
    best = max(answer_counts, key=answer_counts.get)
    return {"answer": best, "confidence": answer_counts[best] / num_paths, "votes": answer_counts}

async def main():
    res = await self_consistency("服务器每秒处理100个任务，高峰时每秒新增150个，持续10分钟后恢复正常。积压了多少任务？")
    print(f"答案：{res['answer']}，置信度：{res['confidence']*100:.0f}%")

asyncio.run(main())

04 Combining the Three Techniques into a Production‑Ready Pipeline

Individually each technique is simple; together they form a robust solution for complex problems. The combined flow uses Few‑shot CoT as the base chain, a structured output schema to lock the answer format, and Self‑Consistency to vote across multiple paths.

import { ChatOpenAI } from "@langchain/openai";
import { FewShotPromptTemplate, PromptTemplate } from "@langchain/core/prompts";
import { StringOutputParser } from "@langchain/core/output_parsers";
import { z } from "zod";

// First layer: Few‑shot CoT base chain
const reasoningExamples = [{
  problem: "计算2026年Q1总营收：1月120万，2月98万，3月145万，比去年同期315万增长多少？",
  reasoning: `1. 今年Q1总营收 = 120+98+145 = 363万
2. 增长额 = 363-315 = 48万
3. 增长率 = 48/315×100% ≈ 15.2%`,
  answer: "今年Q1总营收363万，同比增长15.2%（+48万）"
}];

const examplePrompt = PromptTemplate.fromTemplate(`问题：{problem}
推理：{reasoning}
答案：{answer}`);

const fewShotCotPrompt = new FewShotPromptTemplate({
  examples: reasoningExamples,
  examplePrompt,
  prefix: "你是一个严谨的分析助手。请按照示例的推理格式回答：
",
  suffix: "
问题：{input}
推理：",
  inputVariables: ["input"]
});

// Second layer: Structured output schema
const ReasoningOutput = z.object({
  steps: z.array(z.string()).describe("推理步骤列表"),
  answer: z.string().describe("最终答案"),
  confidence: z.enum(["高", "中", "低"]).describe("答案置信度")
});

const structuredLlm = new ChatOpenAI({ model: "gpt-4o" }).withStructuredOutput(ReasoningOutput);

// Third layer: Self‑Consistency wrapper
async function selfConsistencySolve(input, numPaths = 3) {
  const llm = new ChatOpenAI({ model: "gpt-4o", temperature: 0.5 });
  const chain = fewShotCotPrompt.pipe(llm).pipe(new StringOutputParser());
  const paths = await Promise.all(Array.from({ length: numPaths }, () => chain.invoke({ input })));
  const answers = paths.map(p => p.split("
").filter(l => l.trim()).pop()?.replace(/^答案[：:]/, "").trim() ?? "");
  const counts = new Map();
  answers.forEach(a => counts.set(a, (counts.get(a) || 0) + 1));
  const [[bestAnswer, maxVotes]] = [...counts.entries()].sort((a, b) => b[1] - a[1]);
  return { answer: bestAnswer, confidence: maxVotes / numPaths };
}

Python counterpart:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from pydantic import BaseModel
from typing import Literal

# First layer
reasoning_examples = [{
    "problem": "计算2026年Q1总营收：1月120万，2月98万，3月145万，比去年同期315万增长多少？",
    "reasoning": "1. 今年Q1总营收 = 120+98+145 = 363万
2. 增长额 = 363-315 = 48万
3. 增长率 = 48/315×100% ≈ 15.2%",
    "answer": "今年Q1总营收363万，同比增长15.2%（+48万）"
}]

example_prompt = PromptTemplate.from_template("问题：{problem}
推理：{reasoning}
答案：{answer}")

few_shot_cot_prompt = FewShotPromptTemplate(
    examples=reasoning_examples,
    example_prompt=example_prompt,
    prefix="你是一个严谨的分析助手。请按照示例的推理格式回答：
",
    suffix="
问题：{input}
推理：",
    input_variables=["input"]
)

# Second layer – structured output
class ReasoningOutput(BaseModel):
    steps: list[str]  # 推理步骤列表
    answer: str        # 最终答案
    confidence: Literal["高", "中", "低"]  # 答案置信度

structured_llm = ChatOpenAI(model="gpt-4o").with_structured_output(ReasoningOutput)

# Third layer – Self‑Consistency
import asyncio, re

async def self_consistency_solve(input_text: str, num_paths: int = 3):
    llm = ChatOpenAI(model="gpt-4o", temperature=0.5)
    chain = few_shot_cot_prompt | llm | StrOutputParser()
    paths = await asyncio.gather(*[chain.ainvoke({"input": input_text}) for _ in range(num_paths)])
    answers = []
    for p in paths:
        lines = [l.strip() for l in p.split("
") if l.strip()]
        last = lines[-1] if lines else ""
        answers.append(re.sub(r"^答案[：:]", "", last).strip())
    counts: dict[str, int] = {}
    for a in answers:
        counts[a] = counts.get(a, 0) + 1
    best = max(counts, key=counts.get)
    return {"answer": best, "confidence": counts[best] / num_paths}

05 Dynamic Routing: Deciding When to Apply CoT

Not every query needs multi‑step reasoning. A lightweight router model (e.g., gpt-4o‑mini) decides whether a question warrants CoT; simple queries are answered directly, saving latency and cost.

// Router: does the question need multi‑step reasoning?
async function needsCoT(question) {
  const llm = new ChatOpenAI({ model: "gpt-4o-mini", temperature: 0 });
  const result = await llm.invoke(`判断以下问题是否需要多步推理。只回答 yes 或 no。
需要多步推理：数学计算、逻辑推断、多条件判断、因果分析
不需要：事实查询、简单翻译、格式转换
问题：${question}
答案（yes/no）：`);
  return result.content.toString().toLowerCase().includes("yes");
}

async function smartSolve(question) {
  const useCot = await needsCoT(question);
  if (useCot) {
    const { answer, confidence } = await selfConsistencySolve(question, 3);
    console.log(`使用 SC 策略，置信度：${(confidence * 100).toFixed(0)}%`);
    return answer;
  }
  const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
  const response = await llm.invoke(question);
  return response.content.toString();
}

const testCases = [
  "把'Hello World'翻译成中文", // 简单 → Zero‑shot
  "仓库120件，卖1/4，入30件，再卖1/3，剩多少？", // 复杂 → CoT+SC
  "北京和上海哪个城市人口更多？" // 简单 → Zero‑shot
];

for (const q of testCases) {
  const ans = await smartSolve(q);
  console.log(`Q: ${q}
A: ${ans}
`);
}

Python version:

async def needs_cot(question: str) -> bool:
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
    result = await llm.ainvoke(
        f"""判断以下问题是否需要多步推理。只回答 yes 或 no。
需要多步推理：数学计算、逻辑推断、多条件判断、因果分析
不需要：事实查询、简单翻译、格式转换
问题：{question}
答案（yes/no）："""
    )
    return "yes" in result.content.lower()

async def smart_solve(question: str) -> str:
    if await needs_cot(question):
        res = await self_consistency_solve(question, num_paths=3)
        print(f"使用 SC 策略，置信度：{res['confidence']*100:.0f}%")
        return res["answer"]
    llm = ChatOpenAI(model="gpt-4o-mini")
    response = await llm.ainvoke(question)
    return response.content

async def main():
    test_cases = [
        "把'Hello World'翻译成中文",
        "仓库120件，卖1/4，入30件，再卖1/3，剩多少？",
        "北京和上海哪个城市人口更多？"
    ]
    for q in test_cases:
        ans = await smart_solve(q)
        print(f"Q: {q}
A: {ans}
")

asyncio.run(main())

06 Common Pitfalls and How to Avoid Them

Few‑shot example quality : Bad or mismatched examples can degrade performance. Always validate examples against a test set before deployment.

CoT without a fixed answer format : Inconsistent step counts break downstream parsing. Explicitly require the final line to start with "答案：" or use structured output.

Too few Self‑Consistency runs : An even number of paths can lead to ties. Use an odd N (e.g., 3 or 5) and set temperature around 0.5‑0.8 for diversity.

CoT inside tool‑calling Agents : Mixing reasoning steps with tool parameters confuses the Agent. Separate the "思考" block from the "行动" block and keep tool arguments strictly structured.

Lack of change tracking : Prompt tweaks without documentation make regressions hard to diagnose. Treat prompts as code and version‑control them.

Insufficient test coverage : Small test suites may not reflect production data distribution. Use at least 100 diverse cases covering edge conditions.

Conclusion

The three "prompt engineering swords" address distinct failure modes:

Few‑shot stabilizes output format, especially when the example pool is dynamically selected to stay within token budgets.

CoT forces the model to articulate reasoning, boosting accuracy for multi‑step tasks.

Self‑Consistency aggregates multiple reasoning paths to improve answer stability, at the cost of additional API calls.

Dynamic routing directs the majority of simple queries to a cheap zero‑shot path while reserving the full Few‑shot + CoT + Self‑Consistency pipeline for the remaining complex cases, cutting overall cost by ~75% and latency by ~60%.

Future articles will explore Agent security, prompt injection, privilege escalation, and data leakage defenses.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM prompt engineering LangChain chain-of-thought Dynamic Routing Self-Consistency Few-shot

Written by

James' Growth Diary

I am James, focusing on AI Agent learning and growth. I continuously update two series: “AI Agent Mastery Path,” which systematically outlines core theories and practices of agents, and “Claude Code Design Philosophy,” which deeply analyzes the design thinking behind top AI tools. Helping you build a solid foundation in the AI era.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.