Prompt Engineering Techniques — 10 Named Patterns That Work (2026)

Most prompt engineering guides stop at zero-shot, few-shot, and Chain-of-Thought. Those are foundations, not the full toolkit. This guide catalogs 10 named prompt engineering techniques — each with a specific use case, Python implementation, and model compatibility notes — so you can match the right pattern to the right problem without guesswork.

Who this is for:

GenAI engineers who know the basics and need a reference catalog of production-ready prompt patterns.
Software engineers building LLM pipelines who need to reduce hallucination, improve reasoning accuracy, or speed up generation.
Interview candidates preparing for senior GenAI roles where pattern selection and tradeoff analysis come up regularly.

1. Why Named Prompt Patterns Matter

Prompt engineering has evolved past “write a clear instruction and hope for the best.” Named patterns exist because engineers kept solving the same problems — hallucination, slow generation, inconsistent reasoning — and the solutions crystallized into repeatable templates.

The value of named patterns is threefold:

Shared vocabulary. When a team agrees on “use CoVe for factual queries,” everyone knows the implementation. No ambiguity, no reinvention.

Predictable tradeoffs. Each pattern has a known cost profile. Self-Consistency costs N times a single call. Skeleton-of-Thought reduces perceived latency. Chain-of-Verification roughly doubles output tokens. Knowing these tradeoffs before implementation prevents surprises in production.

Composability. Patterns combine. You can use Meta-Prompting to generate an optimized prompt, apply that prompt with Self-Consistency, and run Chain-of-Verification on the result. Named patterns are building blocks, not one-size-fits-all solutions.

This page assumes you already understand prompt engineering fundamentals — system prompts, few-shot examples, and basic Chain-of-Thought. If those concepts are new, start there first.

2. When to Use Advanced Prompt Techniques

Not every problem needs an advanced technique. The decision depends on the failure mode you are solving.

Problem	Recommended Pattern	Why
Model hallucinating facts	Chain-of-Verification (CoVe)	Forces self-fact-checking before final output
Unsure how to prompt a novel task	Meta-Prompting	LLM generates the optimized prompt for you
Slow response for long-form content	Skeleton-of-Thought (SoT)	Generates structure first, then fills in parallel
Inconsistent answers on reasoning tasks	Self-Consistency	Majority vote across multiple reasoning paths
Complex multi-step problem	Least-to-Most Prompting	Decomposes into subproblems, solves sequentially
Task requires external data	ReAct	Interleaves reasoning with tool calls
Model ignoring subtle constraints	Directional Stimulus Prompting	Adds hint keywords to steer output
Summarization losing key details	Chain-of-Density	Progressive compression preserves information
Ambiguous or poorly worded input	Rephrase and Respond (RaR)	Model rephrases the question before answering
Reasoning errors on math/logic	Contrastive Chain-of-Thought	Shows correct AND incorrect examples

Rule of thumb: Start with the simplest technique that addresses your failure mode. Add complexity only when evaluation shows the simpler approach falls short. Every technique adds tokens and latency.

3. How Prompt Patterns Work — Architecture

Every prompt pattern follows the same structural idea: insert a processing step between the raw user input and the model’s final output. The pattern template shapes how the model reasons before committing to an answer.

Prompt Pattern Pipeline

Named patterns insert structured reasoning between input and output — turning a single LLM call into a multi-stage pipeline.

InputUser query + context

Raw Query

Retrieved Context

System Prompt

Pattern TemplateStructured reasoning scaffold

Decomposition

Verification Steps

Output Constraints

LLM ProcessingModel follows the scaffold

Step-by-Step Reasoning

Self-Checking

Final Synthesis

Idle

The pattern template is the differentiator. Without it, the model jumps straight from input to output — fast, but prone to the failure modes each pattern is designed to prevent. With the template, the model generates intermediate tokens that improve the quality of what comes next. This is not magic; it is a direct consequence of how autoregressive models work: every generated token becomes context for subsequent tokens.

4. The 10 Prompt Engineering Techniques Catalog

This is the core reference. Each pattern includes what it does, when to use it, a Python implementation using the OpenAI SDK, and which models handle it best.

4.1 Chain-of-Verification (CoVe)

What it does: The model generates an initial response, creates verification questions about its own claims, answers those questions independently, and revises the response based on contradictions.

When to use: Factual queries where hallucination is the primary risk — knowledge base Q&A, data extraction, and claim verification tasks.

from openai import OpenAI

client = OpenAI()

def chain_of_verification(query: str, context: str = "") -> str:
    # Step 1: Generate initial response
    initial = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Answer the question based on your knowledge."},
            {"role": "user", "content": f"Context: {context}\n\nQuestion: {query}"}
        ]
    ).choices[0].message.content

    # Step 2: Generate verification questions
    verification_qs = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Generate 3-5 specific factual questions that would verify the claims in this response. Output only the questions, one per line."},
            {"role": "user", "content": f"Original question: {query}\n\nResponse to verify:\n{initial}"}
        ]
    ).choices[0].message.content

    # Step 3: Answer verification questions independently
    verified = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Answer each question independently. If you are unsure, say 'uncertain'."},
            {"role": "user", "content": verification_qs}
        ]
    ).choices[0].message.content

    # Step 4: Revise based on verification
    final = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Revise the original response. Remove or correct any claims that contradict the verification answers. Keep verified claims intact."},
            {"role": "user", "content": f"Original response:\n{initial}\n\nVerification Q&A:\n{verified}"}
        ]
    ).choices[0].message.content

    return final

Best models: GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro. Mid-tier models tend to confirm their own errors during verification rather than catching them.

4.2 Meta-Prompting

What it does: Instead of writing the prompt yourself, you ask the LLM to generate or refine the prompt for your task. The model acts as a prompt engineer.

When to use: Novel tasks where you are unsure of the optimal prompt structure, or when systematically improving an existing prompt.

def meta_prompt(task_description: str, examples: str = "") -> str:
    # Step 1: Generate an optimized prompt
    meta = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "You are an expert prompt engineer. Given a task description, "
                "write an optimized system prompt that will produce the best results. "
                "Include: role definition, explicit constraints, output format, "
                "and 2 few-shot examples."
            )},
            {"role": "user", "content": f"Task: {task_description}\n\nExample inputs/outputs:\n{examples}"}
        ]
    ).choices[0].message.content

    return meta  # Use this as the system prompt for your actual task

Best models: Frontier models only (GPT-4o, Claude 3.5 Sonnet). Smaller models produce generic prompts that do not outperform hand-written ones.

4.3 Skeleton-of-Thought (SoT)

What it does: Splits generation into two phases. First, the model produces a skeleton (outline). Then each section is expanded. This reduces perceived latency for long-form content.

When to use: Long-form generation (articles, reports, documentation) where users need to see structure quickly.

def skeleton_of_thought(query: str) -> str:
    # Phase 1: Generate skeleton
    skeleton = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "Generate a concise outline for answering this question. "
                "Output 3-7 bullet points, each one sentence. "
                "Do not expand — skeleton only."
            )},
            {"role": "user", "content": query}
        ]
    ).choices[0].message.content

    # Phase 2: Expand each point
    expanded = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "Expand each bullet point into a detailed paragraph. "
                "Maintain the original structure. Add specifics, examples, "
                "and technical depth."
            )},
            {"role": "user", "content": f"Question: {query}\n\nSkeleton:\n{skeleton}"}
        ]
    ).choices[0].message.content

    return expanded

Best models: All models benefit. Particularly effective with Claude 3.5 Sonnet and GPT-4o for structured long-form output.

4.4 Self-Consistency

What it does: Generates N independent responses at non-zero temperature and selects the most common answer by majority vote.

When to use: Math, logic, and multi-step reasoning tasks where a single reasoning chain may take a wrong turn. See the advanced prompting guide for the theoretical foundation.

from collections import Counter

def self_consistency(query: str, n: int = 5) -> str:
    answers = []
    for _ in range(n):
        response = client.chat.completions.create(
            model="gpt-4o",
            temperature=0.7,  # Non-zero is critical
            messages=[
                {"role": "system", "content": (
                    "Solve this step by step. After your reasoning, "
                    "write your final answer on the last line as: "
                    "ANSWER: <your answer>"
                )},
                {"role": "user", "content": query}
            ]
        ).choices[0].message.content

        # Extract final answer
        for line in response.strip().split("\n")[::-1]:
            if line.strip().startswith("ANSWER:"):
                answers.append(line.split("ANSWER:")[-1].strip())
                break

    # Majority vote
    if not answers:
        return "No consistent answer found"
    most_common = Counter(answers).most_common(1)[0][0]
    return most_common

Best models: GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro. Diminishing returns after N=7. Temperature 0.5-0.8 produces sufficient diversity.

4.5 Least-to-Most Prompting

What it does: Decomposes a complex problem into subproblems, then solves each sequentially — feeding the solution of each subproblem into the next.

When to use: Multi-step problems where the model fails to plan the full solution upfront — compositional generalization, complex reasoning chains, and tasks that require building up from simpler components.

def least_to_most(query: str) -> str:
    # Step 1: Decompose into subproblems
    decomposition = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "Break this problem into a sequence of simpler subproblems. "
                "List them in order from simplest to most complex. "
                "Each subproblem should build on the previous one. "
                "Output only the numbered list of subproblems."
            )},
            {"role": "user", "content": query}
        ]
    ).choices[0].message.content

    # Step 2: Solve sequentially
    solved_so_far = ""
    subproblems = [line.strip() for line in decomposition.strip().split("\n") if line.strip()]

    for sub in subproblems:
        solution = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": (
                    "Solve this subproblem. Use the previously solved "
                    "subproblems as context."
                )},
                {"role": "user", "content": (
                    f"Original problem: {query}\n\n"
                    f"Previously solved:\n{solved_so_far}\n\n"
                    f"Current subproblem: {sub}"
                )}
            ]
        ).choices[0].message.content
        solved_so_far += f"\n{sub}\nSolution: {solution}\n"

    return solved_so_far

Best models: All models benefit, but smaller models (Llama 3 8B, Mistral 7B) see the largest relative improvement because they struggle most with complex planning.

4.6 ReAct (Reason + Act)

What it does: Interleaves reasoning steps with tool calls in a thought-action-observation loop. The model thinks about what to do, executes a tool, observes the result, and continues reasoning.

When to use: Tasks requiring external data — web search, database lookups, API calls, code execution. ReAct is the pattern powering most production AI agents. See the advanced prompting guide for deeper coverage of the reasoning loop.

import json

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_docs",
            "description": "Search the knowledge base for relevant documents",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"}
                },
                "required": ["query"]
            }
        }
    }
]

def react_loop(query: str, max_iterations: int = 5) -> str:
    messages = [
        {"role": "system", "content": (
            "You are a research assistant. Think step by step. "
            "Use the search_docs tool when you need factual information. "
            "Do not guess facts — search first, then reason."
        )},
        {"role": "user", "content": query}
    ]

    for _ in range(max_iterations):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
            tool_choice="auto"
        )
        msg = response.choices[0].message
        messages.append(msg)

        if msg.tool_calls:
            for call in msg.tool_calls:
                # Execute tool (implement search_docs separately)
                result = execute_tool(call.function.name,
                                      json.loads(call.function.arguments))
                messages.append({
                    "role": "tool",
                    "tool_call_id": call.id,
                    "content": result
                })
        else:
            return msg.content  # Final answer

    return messages[-1].content

Best models: GPT-4o and Claude 3.5 Sonnet handle complex tool orchestration best. Gemini 1.5 Pro is strong for multi-turn tool use. Smaller models often call tools incorrectly or skip tool use when it is needed.

4.7 Directional Stimulus Prompting

What it does: Adds a hint keyword or phrase to the prompt that steers the model toward a specific aspect of the answer without dictating the full response.

When to use: When the model consistently misses a specific constraint or emphasis — for example, always forgetting to mention security implications, or ignoring edge cases in code generation.

def directional_stimulus(query: str, hints: list[str]) -> str:
    hint_text = ", ".join(hints)
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "Answer the question thoroughly. "
                f"Pay special attention to these aspects: {hint_text}. "
                "Make sure your response addresses each of these points."
            )},
            {"role": "user", "content": query}
        ]
    ).choices[0].message.content

    return response

# Example: steer toward security + performance
result = directional_stimulus(
    "How should I implement user authentication in a FastAPI app?",
    hints=["rate limiting", "token rotation", "OWASP top 10"]
)

Best models: All models respond to directional hints. This technique is lightweight and model-agnostic — effective even with smaller models like GPT-4o-mini and Llama 3 8B.

4.8 Chain-of-Density (Summarization)

What it does: Progressively compresses a summary over multiple iterations. Each iteration adds missing entities while keeping the summary length roughly constant, increasing information density.

When to use: Summarization tasks where initial summaries are too vague or miss key details. Produces summaries that are both concise and information-rich.

def chain_of_density(text: str, iterations: int = 3) -> str:
    summary = ""
    for i in range(iterations):
        if i == 0:
            prompt = (
                f"Write a concise summary of this text in 3-4 sentences:\n\n{text}"
            )
        else:
            prompt = (
                f"Here is a text and its current summary. The summary is missing "
                f"key entities and details. Rewrite the summary to be equally concise "
                f"but more information-dense. Add missing entities without increasing "
                f"length significantly.\n\n"
                f"Text:\n{text}\n\n"
                f"Current summary:\n{summary}\n\n"
                f"Write an improved, denser summary:"
            )

        summary = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "You are a precise summarizer. Every word must earn its place."},
                {"role": "user", "content": prompt}
            ]
        ).choices[0].message.content

    return summary

Best models: GPT-4o and Claude 3.5 Sonnet produce the most balanced density increases. Mid-tier models sometimes over-compress or lose coherence after 3+ iterations.

4.9 Rephrase and Respond (RaR)

What it does: Before answering, the model first rephrases the question in its own words, then answers the rephrased version. This surfaces ambiguities and forces the model to fully understand the query before responding.

When to use: Ambiguous queries, user-facing systems with varied input quality, and tasks where misinterpretation is common.

def rephrase_and_respond(query: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "Before answering, first rephrase the question in your own words "
                "to ensure you understand it correctly. Format:\n\n"
                "REPHRASED QUESTION: [your rephrasing]\n\n"
                "ANSWER: [your detailed answer]"
            )},
            {"role": "user", "content": query}
        ]
    ).choices[0].message.content

    return response

Best models: All models benefit. Particularly effective with mid-tier models (GPT-4o-mini, Claude 3.5 Haiku) that are more prone to misinterpreting complex or multi-part questions.

4.10 Contrastive Chain-of-Thought

What it does: Provides the model with both a correct reasoning example and an incorrect example for the same type of problem. The contrast helps the model identify and avoid common reasoning errors.

When to use: Math, logic, and classification tasks where specific error patterns are predictable — for example, models consistently confusing correlation with causation, or making off-by-one errors in counting problems.

def contrastive_cot(query: str, correct_example: str, incorrect_example: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "You will see a correct and an incorrect reasoning example "
                "for a similar problem. Study both to understand the right "
                "approach and the common mistake. Then solve the new problem "
                "using the correct reasoning pattern."
            )},
            {"role": "user", "content": (
                f"CORRECT EXAMPLE:\n{correct_example}\n\n"
                f"INCORRECT EXAMPLE (common mistake):\n{incorrect_example}\n\n"
                f"NOW SOLVE THIS PROBLEM:\n{query}\n\n"
                "Show your reasoning step by step, then give your final answer."
            )}
        ]
    ).choices[0].message.content

    return response

# Usage
result = contrastive_cot(
    query="A train travels 120 km in 2 hours, stops for 30 min, then travels 90 km in 1.5 hours. What is the average speed for the entire journey?",
    correct_example=(
        "Q: A car drives 100 km in 2 hours, stops for 1 hour, then drives 50 km in 1 hour.\n"
        "Average speed = total distance / total time = 150 km / 4 hours = 37.5 km/h\n"
        "(Stop time IS included in total time for average speed)"
    ),
    incorrect_example=(
        "Q: A car drives 100 km in 2 hours, stops for 1 hour, then drives 50 km in 1 hour.\n"
        "Average speed = total distance / driving time = 150 km / 3 hours = 50 km/h\n"
        "(WRONG: this ignores stop time, giving an inflated average speed)"
    )
)

Best models: GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro. Even smaller models show measurable improvement when given contrastive examples — the explicit “do not do this” signal is a strong teaching pattern.

5. Pattern Selection Framework

Choosing the right technique depends on your problem type, accuracy requirements, and budget constraints. Work through these layers from top to bottom.

Pattern Selection Decision Layers

Start at the top. Each layer narrows your options until you arrive at the right technique for your problem.

Identify the Failure Mode

Hallucination? Slow response? Inconsistent reasoning? Missed constraints? Ambiguous input?

Check Token Budget

Single-call techniques (RaR, Directional Stimulus) vs multi-call (CoVe, Self-Consistency)

Assess Model Capability

Frontier models support all patterns. Mid-tier models need simpler patterns. Small models benefit most from Least-to-Most.

Evaluate Latency Tolerance

Real-time chat: RaR, Directional Stimulus. Async pipeline: CoVe, Self-Consistency, Meta-Prompting.

Consider Composability

Can you chain patterns? Meta-Prompting + Self-Consistency + CoVe is a production-grade pipeline.

Validate with Evaluation

Run against your test dataset. If the pattern does not measurably improve your target metric, remove it.

Idle

Quick-start recommendations:

Just need better accuracy on reasoning? Start with Self-Consistency (N=5).
Hallucination is the problem? Apply Chain-of-Verification.
Building a new prompt from scratch? Use Meta-Prompting to generate your first draft.
Users ask ambiguous questions? Add Rephrase and Respond as a preprocessing step.

6. Implementation Examples

Three complete examples showing how to combine patterns in realistic production scenarios.

Example 1: Factual Q&A Pipeline (CoVe + RaR)

Combine Rephrase and Respond for input clarity with Chain-of-Verification for output accuracy.

def factual_qa_pipeline(query: str, context: str) -> dict:
    # Step 1: Rephrase for clarity
    rephrased = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Rephrase this question to be clear and unambiguous. Output only the rephrased question."},
            {"role": "user", "content": query}
        ]
    ).choices[0].message.content

    # Step 2: Generate answer with CoVe
    initial_answer = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Answer based on the provided context. Cite specific passages."},
            {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {rephrased}"}
        ]
    ).choices[0].message.content

    # Step 3: Verify
    verification = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "List 3 claims made in this answer. For each, state whether it is supported by the context."},
            {"role": "user", "content": f"Context:\n{context}\n\nAnswer:\n{initial_answer}"}
        ]
    ).choices[0].message.content

    # Step 4: Revise if needed
    final = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Revise the answer, removing any claims not supported by the context."},
            {"role": "user", "content": f"Answer:\n{initial_answer}\n\nVerification:\n{verification}"}
        ]
    ).choices[0].message.content

    return {"original_query": query, "rephrased": rephrased, "answer": final}

Example 2: Code Generation with Contrastive CoT

Use contrastive examples to prevent common code generation errors.

def safe_code_generation(task: str, language: str = "python") -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                f"Generate {language} code for the given task. "
                "Study the correct and incorrect examples to avoid common mistakes."
            )},
            {"role": "user", "content": (
                "CORRECT PATTERN:\n"
                "```python\n"
                "# Always validate input before processing\n"
                "def process_data(items: list[dict]) -> list[dict]:\n"
                "    if not items:\n"
                "        return []\n"
                "    return [transform(item) for item in items if is_valid(item)]\n"
                "```\n\n"
                "INCORRECT PATTERN (common mistake):\n"
                "```python\n"
                "# Missing input validation — crashes on None or empty input\n"
                "def process_data(items):\n"
                "    return [transform(item) for item in items]  # No type check, no validation\n"
                "```\n\n"
                f"TASK: {task}"
            )}
        ]
    ).choices[0].message.content

    return response

Example 3: Research Summarization (SoT + Chain-of-Density)

Generate a structured outline first, then compress each section for maximum information density.

def research_summary(text: str) -> str:
    # Phase 1: Skeleton
    skeleton = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Create a 5-point outline summarizing the key findings. One sentence per point."},
            {"role": "user", "content": text}
        ]
    ).choices[0].message.content

    # Phase 2: Expand each point
    expanded = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Expand each outline point into a detailed paragraph with specific data and findings."},
            {"role": "user", "content": f"Source text:\n{text}\n\nOutline:\n{skeleton}"}
        ]
    ).choices[0].message.content

    # Phase 3: Density compression (2 rounds)
    dense = expanded
    for _ in range(2):
        dense = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "Rewrite this summary to be equally concise but more information-dense. Add missing key entities without increasing length."},
                {"role": "user", "content": f"Source:\n{text}\n\nCurrent summary:\n{dense}"}
            ]
        ).choices[0].message.content

    return dense

7. Prompt Patterns vs Prompt Engineering Fundamentals

Understanding how named patterns relate to the prompt engineering fundamentals helps you decide when to reach for an advanced technique.

Fundamentals vs Named Patterns

Prompt Fundamentals

How to write effective prompts

System prompt design — role, constraints, format
Zero-shot and few-shot examples
Output format specification (JSON, markdown)
Single LLM call per request — minimal latency
Sufficient for 70-80% of production tasks

Named Patterns

Structured multi-step reasoning

Reusable templates for specific failure modes
Multi-call pipelines with intermediate reasoning
Composable — patterns combine into complex pipelines
Higher token cost and latency per request
Required for the remaining 20-30% of hard problems

Verdict: Start with fundamentals. Apply named patterns only when evaluation shows fundamentals are insufficient for your specific failure mode.

Use Prompt Fundamentals when…

Classification, extraction, formatting, simple Q&A, structured output generation

Use Named Patterns when…

Factual verification, complex reasoning, multi-step research, ambiguous input handling, long-form summarization

8. Interview Questions on Prompt Engineering Techniques

Senior GenAI interviews frequently test pattern selection and tradeoff analysis — not just knowledge of individual techniques.

Q: Your RAG system hallucinates facts in 12% of responses. Which prompt pattern would you apply and why?

Chain-of-Verification (CoVe) targets this directly. After the model generates its response from retrieved context, CoVe forces it to generate verification questions about its own claims, answer them independently by re-reading the context, and revise any contradictions. The key is that verification questions are answered in a separate call — the model cannot simply confirm its own hallucination if it re-examines the source material independently. For a RAG system, you would also check whether the retrieval step is returning relevant context, since CoVe cannot fix answers based on irrelevant documents.

Q: When would you choose Self-Consistency over Chain-of-Thought, and what is the cost tradeoff?

Chain-of-Thought generates one reasoning path. Self-Consistency generates N paths (typically 5-7) and takes a majority vote. Choose Self-Consistency when accuracy on reasoning tasks justifies the cost — it multiplies inference cost by N. The technique works because correct reasoning paths converge while errors produce diverse wrong answers. Diminishing returns kick in around N=7. A practical production pattern is to use Chain-of-Thought by default and escalate to Self-Consistency only for queries flagged as high-stakes or where the model’s confidence score is low.

Q: A product manager asks you to reduce response time from 8 seconds to under 3 seconds for a documentation assistant. What patterns help?

Skeleton-of-Thought addresses perceived latency — the skeleton arrives fast, giving users immediate structure while details fill in. For actual latency reduction, you would combine this with model routing: use a smaller, faster model (GPT-4o-mini or Claude 3.5 Haiku) for the skeleton generation and a frontier model only for expanding complex sections. Rephrase and Respond can also help indirectly — by clarifying the query upfront, the model spends fewer tokens on hedging and off-topic content.

Q: How do you evaluate whether a prompt pattern is actually helping?

Run a controlled comparison against your evaluation dataset. Measure the target metric (accuracy, hallucination rate, user satisfaction) with and without the pattern on the same test set. Track token cost and latency alongside accuracy. A pattern that improves accuracy by 3% but doubles cost may not be worth it for most queries. The evaluation should include both average-case and worst-case analysis — some patterns help the median case but do not affect the failure modes you care about most.

9. Prompt Patterns in Production — Cost Impact

Every advanced technique adds tokens and latency. Plan for this in your budget and architecture.

Pattern	Extra LLM Calls	Token Multiplier	Latency Impact	Best For
Chain-of-Verification	3 additional	~2-3x output tokens	+3-5s per request	Async fact-checking pipelines
Meta-Prompting	1 additional (one-time)	~1.5x for prompt generation	One-time setup cost	Prompt development, not per-request
Skeleton-of-Thought	1 additional	~1.2x total	Reduces perceived latency	User-facing long-form generation
Self-Consistency	N-1 additional	Nx total cost	Nx latency (parallel: ~1x)	High-stakes reasoning, parallelizable
Least-to-Most	K additional (K = subproblems)	~1.5-2x	+2-4s per subproblem	Complex sequential reasoning
ReAct	Variable (1-10 tool calls)	Variable	+1-3s per tool call	Agent tasks requiring external data
Directional Stimulus	0	~1x	Negligible	Always-on, no cost penalty
Chain-of-Density	2-3 additional	~2x	+2-3s per iteration	Batch summarization pipelines
Rephrase and Respond	0 (same call)	~1.1x	Negligible	Always-on, minimal cost penalty
Contrastive CoT	0	~1.3x (longer prompt)	Negligible	Tasks with predictable error patterns

Cost optimization strategies:

Route by complexity. Use Directional Stimulus and RaR (zero extra calls) for most queries. Escalate to Self-Consistency or CoVe only for queries flagged as high-risk.
Parallelize where possible. Self-Consistency samples can run in parallel, reducing latency from Nx to ~1x while keeping the same cost.
Cache pattern outputs. Meta-Prompting generates prompts once — cache and reuse. Chain-of-Density summaries of static documents can be precomputed.
Use smaller models for intermediate steps. CoVe’s verification questions can be generated by GPT-4o-mini. Only the final revision needs the frontier model.

10. Summary and Key Takeaways

The 10 prompt engineering techniques in this catalog address specific, known failure modes: hallucination (CoVe), slow generation (SoT), inconsistent reasoning (Self-Consistency), ambiguous input (RaR), and missed constraints (Directional Stimulus). Each is a named, reusable pattern — not ad hoc prompt tweaking.

Key principles:

Start simple. Prompt fundamentals solve 70-80% of problems. Apply named patterns only when evaluation shows they help.
Match pattern to failure mode. The decision table in Section 2 maps problems to solutions directly.
Measure everything. Run your evaluation pipeline before and after applying a pattern. If the metric does not improve, remove the pattern.
Budget for tokens. Every technique except Directional Stimulus and RaR adds at least one extra LLM call. Plan your cost model accordingly.
Compose deliberately. Patterns combine (RaR + CoVe, SoT + Chain-of-Density), but each addition multiplies cost. Use the prompt testing framework to validate combinations.

Prompt Engineering Fundamentals — System prompts, few-shot, structured output, and the basics this guide builds on
Advanced Prompting — CoT, ToT, Self-Consistency — Deeper coverage of Chain-of-Thought and Tree-of-Thought theory
Prompt Testing Guide — How to build evaluation datasets and test prompt changes systematically
Prompt Management — Versioning, deployment, and lifecycle management for production prompts
LLM Evaluation Guide — RAGAS, LLM-as-judge, and A/B testing for measuring prompt quality

Frequently Asked Questions

What are prompt engineering techniques?

Prompt engineering techniques are named, reusable patterns for structuring LLM inputs to improve output quality on specific problem types. Unlike basic prompting (writing a good instruction), techniques like Chain-of-Verification, Meta-Prompting, and Skeleton-of-Thought provide repeatable templates that address known failure modes such as hallucination, slow generation, and inconsistent reasoning.

What is Chain-of-Verification (CoVe)?

Chain-of-Verification is a prompt pattern where the model generates an initial response, then produces verification questions about its own claims, answers those questions independently, and revises the original response based on any contradictions found. CoVe reduces hallucination rates by forcing the model to fact-check itself before delivering a final answer.

What is Meta-Prompting and when should you use it?

Meta-Prompting asks the LLM to generate or refine a prompt before executing the actual task. You describe what you need, and the model writes an optimized prompt for that purpose. Use it when you are unsure how to structure a prompt for a novel task, or when you want to systematically improve an existing prompt by having the model analyze its own instruction-following patterns.

How does Skeleton-of-Thought speed up LLM responses?

Skeleton-of-Thought splits generation into two phases. First, the model produces a skeleton — a list of bullet points outlining the structure of the answer. Then each skeleton point is expanded in parallel (or sequentially). This reduces perceived latency because the skeleton arrives fast and gives users an immediate sense of the answer structure while details fill in progressively.

What is the difference between Self-Consistency and Chain-of-Thought?

Chain-of-Thought generates one reasoning path and one answer. Self-Consistency generates multiple independent reasoning paths (typically 3-7) at non-zero temperature, then selects the most common final answer by majority vote. Self-Consistency improves accuracy on reasoning tasks by filtering out reasoning errors that produce diverse wrong answers, while correct reasoning tends to converge on the same result.

What is Contrastive Chain-of-Thought?

Contrastive Chain-of-Thought provides the model with both a correct reasoning example and an incorrect reasoning example for the same problem. By seeing what right and wrong look like side by side, the model learns to avoid common reasoning mistakes. This technique is effective for math, logic, and classification tasks where specific error patterns are predictable.

How do prompt engineering techniques affect token cost?

Most advanced techniques increase token usage. Self-Consistency multiplies cost by the number of samples (3-7x). Chain-of-Verification roughly doubles output tokens. Meta-Prompting adds an extra LLM call. Skeleton-of-Thought can reduce perceived latency but uses similar total tokens. The cost increase is justified when accuracy improvements prevent downstream failures that are more expensive than the additional tokens.

Which prompt engineering technique reduces hallucination the most?

Chain-of-Verification (CoVe) is specifically designed to reduce hallucination. By forcing the model to generate verification questions about its own claims and answer them independently, CoVe catches factual errors before they reach the user. ReAct also reduces hallucination by grounding responses in retrieved data rather than relying on the model's parametric knowledge, but ReAct requires tool access while CoVe works with any LLM.

Can you combine multiple prompt engineering techniques?

Yes. Combining techniques is common in production systems. For example, you can use Meta-Prompting to generate an optimized prompt, then apply that prompt with Self-Consistency for higher accuracy, and finally run Chain-of-Verification on the winning answer to catch hallucinations. The key constraint is cost — each additional technique adds latency and token usage, so combine only when the accuracy gain justifies the overhead.

Which models support advanced prompt engineering techniques best?

Frontier models (GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro) support all 10 techniques reliably. Mid-tier models (GPT-4o-mini, Claude 3.5 Haiku, Gemini 1.5 Flash) work well with most techniques but may struggle with Meta-Prompting and complex multi-step patterns. Smaller open-source models (Llama 3 8B, Mistral 7B) benefit most from Least-to-Most and basic Chain-of-Thought but are less reliable with Self-Consistency and CoVe.