Advanced Prompting — CoT, ToT, Few-Shot & Self-Consistency (2026)

Advanced prompting techniques — Chain-of-Thought, Tree-of-Thought, Few-Shot, Self-Consistency, and ReAct — are the difference between a model that tries its best and one that reliably solves hard problems. This guide covers each technique in depth, with practical examples and the production tradeoffs senior engineers need to know.

Who this is for:

GenAI engineers who have used basic prompting and want to unlock higher accuracy on complex tasks.
Software engineers transitioning into AI roles who need to understand why prompting technique choices matter for production systems.
Candidates preparing for senior GenAI interviews where CoT, Self-Consistency, and ReAct come up regularly.
Applied ML engineers building reasoning pipelines who need a reference on when to use which technique.

1. The Prompting Landscape — From Basic to Advanced

Prompt engineering starts simple: write an instruction, get a response. For a large class of tasks — summarization, translation, classification — that is enough. But as tasks become more complex — multi-step reasoning, ambiguous inputs, problems with multiple valid solution paths — naive prompting breaks down.

The progression from basic to advanced prompting is not about magic words. It is about giving the model the right scaffolding to use its internal representations effectively.

Why Basic Prompting Fails on Hard Tasks

Consider asking a model: “A store has 120 apples. It sells 35% on Monday and 25% of the remainder on Tuesday. How many are left?”

A zero-shot prompt with no structure often gets this wrong — not because the model lacks arithmetic knowledge, but because it jumps to an answer without working through the intermediate steps. The model predicts the next token as “the answer” rather than computing each step in sequence.

The core insight behind advanced prompting: LLMs are autoregressive token predictors. Every token they generate becomes context for the next. If you force the model to generate intermediate reasoning tokens before the final answer, those intermediate tokens improve the quality of what comes next. This is not a hack — it is how the model’s attention mechanism actually works.

Understanding LLM fundamentals — especially how attention mechanisms process context — makes these techniques intuitive rather than mysterious.

The Technique Landscape

Technique	Core idea	Best for	Cost multiplier
Zero-shot CoT	”Think step by step” trigger	Math, logic, multi-step reasoning	~1.5x (longer output)
Few-shot CoT	Provide example reasoning chains	Domain-specific reasoning, consistent format	~2x (examples in prompt)
Tree-of-Thought	Explore multiple reasoning paths	Planning, puzzles, open-ended problems	~3–10x
Self-Consistency	Majority vote over N responses	High-stakes accuracy needs	~5–10x
ReAct	Interleave reasoning and tool use	Agents, tasks requiring external data	Variable

Each technique trades latency and cost for accuracy. The engineering judgment is knowing when the accuracy improvement justifies the cost.

2. Chain-of-Thought (CoT) Prompting

Chain-of-Thought prompting is the foundational advanced technique. It instructs the model to produce intermediate reasoning steps before giving a final answer, and it consistently and measurably improves accuracy on reasoning tasks.

Zero-Shot CoT

The simplest form of CoT requires no examples. You append a trigger phrase to your prompt that activates step-by-step reasoning:

Q: A store has 120 apples. It sells 35% on Monday and 25% of the remainder
   on Tuesday. How many apples remain?

Let's think step by step.

The phrase “Let’s think step by step” is not magic. It works because the model has seen enormous amounts of text where problems are solved by showing work. Triggering that pattern causes the model to generate reasoning tokens before the answer token, and those reasoning tokens constrain the answer toward correctness.

Other effective zero-shot CoT triggers:

“Think through this carefully before answering.”
“Walk through your reasoning.”
“First, let’s identify what we know…”
“Step 1:” (opening a numbered reasoning scaffold)

When zero-shot CoT works well: Math word problems, logical deductions, multi-condition reasoning, any task where the model can generate correct reasoning from its training knowledge. It does not require task-specific examples.

When it falls short: Domain-specific reasoning where the “correct” reasoning pattern is not obvious from the model’s training, or tasks where format consistency is critical.

Few-Shot CoT

Few-shot CoT provides 2–5 complete examples of problem + reasoning chain + answer before presenting the actual task. This teaches the model both the reasoning pattern and the expected output format.

Q: A train leaves Chicago at 8:00 AM traveling at 60 mph toward New York,
   which is 790 miles away. Another train leaves New York at 9:00 AM
   traveling at 80 mph toward Chicago. At what time do they meet?

Reasoning: At 9:00 AM, the first train has traveled 60 miles and is
590 miles from New York. They are now closing the gap at 60 + 80 = 140 mph.
Time to meet: 590 / 140 ≈ 4.21 hours after 9:00 AM.
4.21 hours = 4 hours 13 minutes.
Answer: They meet at approximately 1:13 PM.

---

Q: A store has 120 apples. It sells 35% on Monday and 25% of the remainder
   on Tuesday. How many apples remain?

Reasoning:

The model sees the pattern: decompose, compute each step, state the answer clearly. It follows that pattern for the new question.

Example selection strategy for few-shot CoT:

Choose examples that cover the reasoning patterns needed for the task, not just similar surface topics.
Include at least one example where the naive approach leads to the wrong answer if applied directly — this demonstrates why careful step-by-step reasoning matters.
Order examples from simpler to more complex; the model should build confidence through the sequence.
Keep reasoning chains concise — verbose examples cause the model to over-explain, increasing output length without accuracy benefit.

Format consistency: If your production system needs structured output (JSON, a specific text format), include that structure in your few-shot examples. The model will mirror the format of the examples it sees.

CoT in Production

A few CoT patterns come up repeatedly in production GenAI systems:

Scratchpad pattern: Use a <thinking> XML tag to separate the reasoning from the final answer. This is the pattern Anthropic’s Claude uses natively.

Analyze this customer complaint and extract: sentiment, main issue, urgency.
Think through your analysis in <thinking> tags before giving the final JSON.

Complaint: "I ordered three weeks ago and my package still hasn't arrived.
I needed it for my daughter's birthday last week. Very disappointed."

This separates the model’s reasoning process from the output it returns to users, reducing output length while preserving the accuracy benefits of CoT.

Structured reasoning: For complex classification or extraction tasks, prompt the model to reason through each criterion explicitly before making a final determination. This makes the reasoning auditable and helps with hallucination mitigation.

3. Tree-of-Thought (ToT) Prompting

Tree-of-Thought extends Chain-of-Thought by exploring multiple reasoning paths simultaneously. Instead of committing to a single chain of reasoning, the model generates several candidate approaches, evaluates each, and pursues the most promising one.

The Core Idea

CoT follows one path through a reasoning tree. For problems where the first approach is likely wrong — combinatorial puzzles, planning problems, creative tasks with many valid solutions — committing to one path early leads to local optima.

ToT generates a “tree” of possibilities:

Thought generation: Produce several candidate next steps or complete solution attempts.
State evaluation: For each candidate, evaluate how promising it appears using a scoring prompt.
Search strategy: Use breadth-first or depth-first search to explore the tree, pruning poor candidates.
Answer extraction: Return the highest-scoring path’s conclusion.

When to Use ToT

ToT is expensive — it requires multiple LLM calls per problem. Use it when:

The problem has a well-defined success criterion but an unclear solution path (puzzles, optimization, code debugging).
Initial approaches frequently fail and backtracking is necessary.
You need to verify that an answer is correct before returning it.
Cost is justified by the stakes (e.g., a reasoning step in an agentic pipeline where a wrong decision cascades).

Avoid ToT when:

Tasks are time-sensitive (ToT introduces multi-second latency per query).
The problem is well-structured enough that CoT reliably succeeds.
Cost is a primary constraint.

Implementation Pattern

A practical ToT implementation for agentic systems:

def tree_of_thought(problem: str, model, n_branches: int = 3) -> str:
    # Step 1: Generate multiple candidate approaches
    branches_prompt = f"""
    Problem: {problem}

    Generate {n_branches} distinct approaches to solving this problem.
    For each approach, briefly explain the reasoning strategy.
    Format as:
    Approach 1: [strategy description]
    Approach 2: [strategy description]
    Approach 3: [strategy description]
    """
    branches = model.generate(branches_prompt)

    # Step 2: Evaluate each approach
    eval_prompt = f"""
    Problem: {problem}

    Candidate approaches:
    {branches}

    Evaluate each approach. Which is most likely to reach a correct solution?
    Consider: logical soundness, completeness, edge cases.
    Select the best approach and explain why.
    """
    best_approach = model.generate(eval_prompt)

    # Step 3: Execute the selected approach with full CoT
    solve_prompt = f"""
    Problem: {problem}

    Selected approach: {best_approach}

    Now solve the problem step by step using this approach.
    Show all reasoning. State your final answer clearly.
    """
    return model.generate(solve_prompt)

In production, ToT is often implemented as an orchestration pattern rather than a single prompt — each step is a separate LLM call, and the orchestrator manages branching and selection. See AI agents for how ToT maps onto agentic architectures.

4. Few-Shot Prompting — Example Selection Strategies

Few-shot prompting provides the model with labeled examples before the task. It is one of the most consistently effective techniques across all model families, but the quality of examples matters enormously.

Why Few-Shot Works

LLMs learn in-context: the examples in the prompt temporarily shift the model’s effective behavior without updating its weights. The model observes the pattern demonstrated by the examples and applies it to the new input. This is in-context learning — the model uses its attention mechanism to identify what the examples have in common and generalize.

The implication: poorly chosen examples can hurt performance, not just fail to help. If your examples demonstrate inconsistent reasoning or ambiguous formatting, the model picks up that inconsistency.

Example Selection Strategies

Diversity over similarity. Do not select examples that are all near-identical to the test input. A diverse set of examples teaches the model the underlying pattern more robustly. If classifying customer intent, include examples from multiple product areas, not just the most common one.

Coverage of edge cases. Identify the hardest cases your system will encounter. Include at least one example of each hard case type. The model needs to see how to handle them.

Consistent format. Every example must use exactly the same input-output format. A single formatting inconsistency in your few-shot set introduces ambiguity about what the expected output structure is.

Label balance for classification. If classifying into 3 classes, include roughly equal examples of each. Imbalanced few-shot sets bias the model toward the over-represented class.

Calibrated reasoning length. For CoT few-shot, the length of reasoning in examples sets an implicit norm. Short examples produce short reasoning; long examples produce long reasoning. Calibrate to the task’s actual complexity.

Dynamic Few-Shot (RAG-Based Selection)

For high-volume production systems with diverse inputs, static few-shot examples are often insufficient. Dynamic few-shot retrieves the most relevant examples from a database based on the similarity of the current input:

def dynamic_few_shot_prompt(query: str, example_store, k: int = 3) -> str:
    # Retrieve k most similar examples from the store
    relevant_examples = example_store.similarity_search(query, k=k)

    # Build prompt
    examples_text = ""
    for ex in relevant_examples:
        examples_text += f"Input: {ex.input}\nOutput: {ex.output}\n\n"

    return f"{examples_text}Input: {query}\nOutput:"

This is the same retrieval mechanism used in RAG — except instead of retrieving document chunks, you retrieve prompt examples. The selection quality depends on your embedding model and the diversity of your example store.

5. Prompting Techniques — Progression Diagram

The diagram below shows how prompting techniques build on each other from basic instruction-following through to full agentic reasoning.

📊 Visual Explanation

Advanced Prompting Techniques — Progression

Each technique builds on the previous, trading cost for accuracy

Basic Prompting

Instruction only

Zero-Shot

Direct instruction

Role Assignment

System prompt persona

Format Constraints

Output structure

In-Context Learning

Examples + patterns

Few-Shot

2–5 labeled examples

Dynamic Selection

RAG-based retrieval

Format Mirroring

Consistent structure

Reasoning Chains

Step-by-step thinking

Zero-Shot CoT

Think step by step

Few-Shot CoT

Example reasoning chains

Scratchpad

Hidden reasoning tokens

Multi-Path Reasoning

Explore + verify

Tree-of-Thought

Branch & evaluate

Self-Consistency

Majority vote

ReAct

Reason + act + observe

Idle

6. Self-Consistency — Majority Voting for Higher Accuracy

Self-Consistency is a simple but powerful technique: generate multiple independent responses to the same prompt, then select the answer that appears most frequently across all responses.

Why It Works

For reasoning tasks, correct reasoning paths tend to converge on the same answer. Incorrect reasoning produces diverse wrong answers. If you ask a model the same math problem 10 times using CoT, and 7 of the 10 responses arrive at 42 while the other 3 arrive at different numbers, the probability is high that 42 is correct.

This is majority voting over a sampled distribution of reasoning paths. It does not require a judge model or any additional logic — just counting.

Implementation

from collections import Counter

def self_consistent_answer(
    prompt: str,
    model,
    n_samples: int = 7,
    temperature: float = 0.7
) -> str:
    responses = []
    for _ in range(n_samples):
        response = model.generate(
            prompt + "\nLet's think step by step.",
            temperature=temperature
        )
        # Extract final answer from CoT response
        answer = extract_final_answer(response)
        responses.append(answer)

    # Return the most common answer
    vote_counts = Counter(responses)
    return vote_counts.most_common(1)[0][0]

A non-zero temperature is critical. Self-Consistency requires diverse reasoning paths, not identical ones. Temperature 0 produces the same response every time; temperature 0.5–0.8 provides enough variance for the majority vote to be meaningful.

Cost vs. Accuracy Tradeoff

Self-Consistency multiplies your inference cost by N (the number of samples). For N=7, you pay 7x the token cost and incur roughly 7x the latency if calls are sequential (or ~1x if parallelized).

The accuracy gain is real but not unlimited. Published results from the original Self-Consistency paper (Wang et al., 2022) show 5–15% accuracy improvement on math reasoning benchmarks. Gains diminish after N=10 — the marginal benefit of the 11th sample is small.

When Self-Consistency is worth the cost:

High-stakes decisions where a wrong answer has significant consequences.
Tasks where evaluation is cheap (the model can verify the answer).
Asynchronous batch workloads where latency is not a constraint.
Complementing CoT on math, logic, or multi-step reasoning tasks.

When it is not justified:

Real-time user-facing applications where latency matters.
Tasks where the answer space is too large for majority vote to be meaningful.
Use cases where cost per query is tightly constrained.

Self-Consistency can be combined with evaluation frameworks — using a judge model to score each candidate instead of simple majority vote, which is more robust for open-ended tasks.

7. ReAct — Reasoning and Acting

ReAct (Reasoning + Acting) is the prompting pattern that powers most production AI agents. It interleaves natural language reasoning with concrete actions — tool calls, searches, code execution — in an alternating loop.

The ReAct Pattern

A ReAct agent operates in a continuous thought-action-observation cycle:

Thought: I need to find the current price of AAPL stock.
Action: search("AAPL stock price today")
Observation: AAPL is trading at $187.43 as of March 5, 2026.

Thought: Now I have the current price. The user asked for a percentage
         change from the 52-week low of $164.08.
Action: calculate((187.43 - 164.08) / 164.08 * 100)
Observation: 14.23%

Thought: I have all the information needed to answer.
Final Answer: AAPL is currently trading at $187.43, up 14.23% from its
              52-week low of $164.08.

Each step is transparent and auditable. The model’s reasoning is visible, the actions it took are logged, and the observations are grounded in real data rather than the model’s training knowledge.

Why ReAct Reduces Hallucination

A core failure mode of vanilla prompting is that the model generates plausible-sounding facts from training data, even when those facts are outdated or wrong. ReAct forces the model to fetch fresh data before reasoning about it.

The structure creates a discipline: the model cannot state facts it does not have grounded observations for. When it needs a fact, it takes an action to retrieve it. This is why ReAct is the recommended prompting pattern for any task requiring current information or precise external data. See hallucination mitigation for the broader set of strategies.

ReAct in Production

Tool definition matters. The model’s reasoning quality depends on having clearly defined tools with precise descriptions. Ambiguous tool names or vague parameter descriptions cause the model to misuse tools or pick the wrong one.

tools = [
    {
        "name": "web_search",
        "description": "Search the web for current information. Use for: "
                       "current prices, recent events, real-time data. "
                       "Do NOT use for: historical facts you already know.",
        "parameters": {
            "query": "Specific search query. Be precise."
        }
    }
]

Termination conditions. ReAct loops need explicit stopping criteria. Without them, the model can loop indefinitely taking unnecessary actions. Common patterns: maximum step count (e.g., 10 actions), explicit “Final Answer:” marker that the orchestrator detects, or a separate judge model that evaluates whether the reasoning is complete.

Error handling in observations. When a tool call fails, the observation should convey the error clearly so the model can adapt. “Tool error: rate limit exceeded” is more useful than a generic failure message — the model can choose to wait, retry, or select an alternative tool.

ReAct is the prompting foundation of agentic frameworks like LangGraph, CrewAI, and Claude’s tool use API. Understanding it at the prompt level gives you the foundation to reason about agent architectures at the system design level.

8. Interview Preparation

Advanced prompting is a standard topic in senior GenAI engineering interviews. Questions range from conceptual (explain the technique) to applied (design a system using it) to tradeoff analysis (when would you choose A over B).

Common Interview Questions and Strong Answers

Q: Explain Chain-of-Thought prompting and when you would use it.

Strong answer: Chain-of-Thought prompting instructs the model to generate intermediate reasoning steps before producing a final answer. It exploits the autoregressive nature of LLMs — tokens generated early in the response become context that improves the quality of tokens generated later. Zero-shot CoT uses a trigger phrase like “Let’s think step by step” and requires no examples. Few-shot CoT provides complete reasoning chains as examples, which is better for domain-specific tasks requiring consistent format. I use CoT for any multi-step reasoning task — math, logical deductions, complex classification — where direct prompting has measurably lower accuracy. In production, I typically use the scratchpad pattern: CoT reasoning inside <thinking> tags, clean output visible to users.

Q: What is Self-Consistency and when is it worth the cost?

Strong answer: Self-Consistency generates N independent responses to the same prompt at non-zero temperature, then selects the majority answer. It works because correct reasoning paths tend to converge on the same answer, while errors produce diverse wrong answers. The accuracy gain is typically 5–15% on reasoning benchmarks, but the cost is N times the single-call cost. I use Self-Consistency for high-stakes batch workloads — financial calculations, medical triage, legal analysis — where accuracy justifies cost, and latency is not a hard constraint. For real-time applications, I prefer few-shot CoT or a verification step rather than Self-Consistency.

Q: How does ReAct reduce hallucination in agentic systems?

Strong answer: ReAct forces the model to take an action and receive an observation before reasoning about facts that require current or external data. Without ReAct, the model generates facts from training knowledge, which may be outdated or incorrect. With ReAct, the model cannot state a fact unless it has retrieved it — the thought-action-observation loop grounds the reasoning in actual retrieved data. This does not eliminate hallucination entirely — the model can still misinterpret observations or reason incorrectly — but it addresses the most common hallucination failure mode in production agents: the model confidently stating outdated information as current fact.

Q: When would you choose Tree-of-Thought over Chain-of-Thought?

Strong answer: I choose Tree-of-Thought when the problem space has multiple plausible solution paths and committing to the first one is likely to produce a suboptimal result. Concretely: planning tasks where different orderings of steps lead to different outcomes, complex debugging where the first hypothesis is often wrong, and combinatorial problems where the model should evaluate alternatives before committing. The cost is 3–10x higher than CoT, so ToT is reserved for cases where the accuracy improvement justifies it — typically automated pipeline steps, not real-time user queries. For most production applications with reasoning requirements, well-crafted few-shot CoT with Self-Consistency is a better cost-accuracy tradeoff than full ToT.

Q: Design a prompting strategy for a customer support classification system that must handle 50 intent categories with high accuracy.

Strong answer: I would use dynamic few-shot CoT. Static few-shot would require 50 × 3 = 150+ examples in every prompt, which is expensive and hits context limits. Instead: maintain an example database of 5–10 examples per intent (500+ total), embed them, and retrieve the 5–10 most relevant examples for each incoming query using cosine similarity. Present those examples with CoT reasoning chains that show how to distinguish between similar intents. For the highest-stakes categories — billing disputes, legal complaints — layer in Self-Consistency with N=5 to catch edge cases. Monitor confidence scores and route low-confidence classifications to human review. This approach scales to any number of intents while keeping per-query costs bounded.

9. Summary — Choosing the Right Technique

Advanced prompting is not about applying the most sophisticated technique available. It is about matching technique to task requirements, cost constraints, and latency budgets.

Decision framework:

Start with zero-shot CoT. For most reasoning tasks, it is free (just a trigger phrase) and often sufficient.
Add few-shot examples if zero-shot CoT produces inconsistent format or misses domain-specific patterns.
Use Self-Consistency for high-stakes accuracy requirements where you can afford N × cost.
Use Tree-of-Thought for planning and optimization tasks where the solution space requires exploration.
Use ReAct whenever your prompt needs to reason about external or current data — do not let the model hallucinate facts it should retrieve.

These techniques are not mutually exclusive. Production systems often combine them: dynamic few-shot CoT with Self-Consistency for a classification step, then ReAct for the action step that follows. The evaluation layer tells you whether your chosen combination is actually working.

The deeper you go into AI agent design and system design, the more these techniques appear as building blocks in larger architectures. Understanding them at the prompt level gives you the foundation to reason about where reasoning failures come from and how to fix them.

Prompt Engineering — Foundation techniques: few-shot, CoT, system prompts
Prompt Testing — Evaluate and A/B test prompt changes
Tool Calling — How LLMs invoke functions and APIs
Prompt Management — Versioning, registries, and rollback

Frequently Asked Questions

What is Chain-of-Thought (CoT) prompting?

Chain-of-Thought prompting instructs the model to reason step-by-step before giving a final answer. Instead of jumping to a conclusion, the model shows its intermediate reasoning steps. This dramatically improves accuracy on math, logic, and multi-step reasoning tasks. Zero-shot CoT uses a simple trigger like "Let's think step by step", while few-shot CoT provides example reasoning chains for the model to follow.

What is Tree-of-Thought (ToT) prompting?

Tree-of-Thought prompting extends CoT by exploring multiple reasoning paths simultaneously. Instead of following one chain, the model generates several candidate solutions, evaluates each one, and selects the best path. This is useful for problems where the first reasoning approach might be wrong — puzzles, planning, creative writing, and complex analysis tasks. ToT trades speed for accuracy by exploring the solution space more broadly.

What is the difference between zero-shot and few-shot prompting?

Zero-shot prompting gives the model a task with no examples — it relies entirely on the model's training knowledge. Few-shot prompting provides 2-5 examples of input-output pairs before the actual task, teaching the model the expected format and reasoning pattern by demonstration. Few-shot generally produces more consistent outputs, especially for specialized formats, classification tasks, or domain-specific reasoning.

What is Self-Consistency in prompting?

Self-Consistency generates multiple responses to the same prompt (typically 5-10), then selects the most common answer by majority vote. It works because correct reasoning paths tend to converge on the same answer, while incorrect reasoning produces diverse wrong answers. Self-Consistency improves accuracy by 5-15% on reasoning tasks but increases cost linearly with the number of samples.

What is the ReAct prompting pattern?

ReAct (Reasoning + Acting) interleaves natural language reasoning with concrete actions like tool calls, searches, and code execution in a thought-action-observation loop. It is the prompting pattern that powers most production AI agents. ReAct reduces hallucination by forcing the model to retrieve facts before reasoning about them, rather than generating plausible-sounding facts from training data.

How does the scratchpad pattern work with CoT?

The scratchpad pattern uses XML tags like <thinking> to separate the model's reasoning process from the final output. The model performs Chain-of-Thought reasoning inside the tags, then produces a clean answer outside them. This preserves the accuracy benefits of CoT while keeping the user-facing output concise and focused.

When should you use Tree-of-Thought instead of Chain-of-Thought?

Tree-of-Thought is best for problems where the solution space has multiple plausible paths and committing to the first one is likely to produce a suboptimal result. This includes planning tasks, complex debugging, and combinatorial problems. ToT costs 3-10x more than CoT, so it is reserved for cases where accuracy improvement justifies the cost, typically in automated pipeline steps rather than real-time user queries.

What is dynamic few-shot prompting?

Dynamic few-shot prompting retrieves the most relevant examples from a database based on the similarity of the current input, rather than using the same static examples for every query. It uses the same retrieval mechanism as RAG, except instead of retrieving document chunks, you retrieve prompt examples. This approach scales to any number of intent categories while keeping per-query costs bounded.

How much does Self-Consistency improve accuracy?

Self-Consistency typically improves accuracy by 5-15% on reasoning benchmarks, based on published results from Wang et al. (2022). The cost is N times the single-call cost, where N is the number of samples (typically 5-10). Gains diminish after N=10. A non-zero temperature (0.5-0.8) is critical because Self-Consistency requires diverse reasoning paths to make majority voting meaningful.

How does ReAct reduce hallucination in agentic systems?

ReAct forces the model to take an action and receive an observation before reasoning about facts that require current or external data. Without ReAct, the model generates facts from training knowledge, which may be outdated or incorrect. The thought-action-observation loop grounds reasoning in actual retrieved data, addressing the most common hallucination failure mode: the model confidently stating outdated information as current fact.

Advanced Prompting — CoT, ToT, Few-Shot & Self-Consistency (2026)

1. The Prompting Landscape — From Basic to Advanced

Why Basic Prompting Fails on Hard Tasks

The Technique Landscape

2. Chain-of-Thought (CoT) Prompting

Zero-Shot CoT

Few-Shot CoT

CoT in Production

3. Tree-of-Thought (ToT) Prompting

The Core Idea

When to Use ToT

Implementation Pattern

4. Few-Shot Prompting — Example Selection Strategies

Why Few-Shot Works

Example Selection Strategies

Dynamic Few-Shot (RAG-Based Selection)

5. Prompting Techniques — Progression Diagram

📊 Visual Explanation

6. Self-Consistency — Majority Voting for Higher Accuracy

Why It Works

Implementation

Cost vs. Accuracy Tradeoff

7. ReAct — Reasoning and Acting

The ReAct Pattern

Why ReAct Reduces Hallucination

ReAct in Production

8. Interview Preparation

Common Interview Questions and Strong Answers

9. Summary — Choosing the Right Technique

Related

Frequently Asked Questions