Prompt Engineering Techniques — 10 Named Patterns That Work (2026)
Most prompt engineering guides stop at zero-shot, few-shot, and Chain-of-Thought. Those are foundations, not the full toolkit. This guide catalogs 10 named prompt engineering techniques — each with a specific use case, Python implementation, and model compatibility notes — so you can match the right pattern to the right problem without guesswork.
Who this is for:
- GenAI engineers who know the basics and need a reference catalog of production-ready prompt patterns.
- Software engineers building LLM pipelines who need to reduce hallucination, improve reasoning accuracy, or speed up generation.
- Interview candidates preparing for senior GenAI roles where pattern selection and tradeoff analysis come up regularly.
1. Why Named Prompt Patterns Matter
Section titled “1. Why Named Prompt Patterns Matter”Prompt engineering has evolved past “write a clear instruction and hope for the best.” Named patterns exist because engineers kept solving the same problems — hallucination, slow generation, inconsistent reasoning — and the solutions crystallized into repeatable templates.
The value of named patterns is threefold:
Shared vocabulary. When a team agrees on “use CoVe for factual queries,” everyone knows the implementation. No ambiguity, no reinvention.
Predictable tradeoffs. Each pattern has a known cost profile. Self-Consistency costs N times a single call. Skeleton-of-Thought reduces perceived latency. Chain-of-Verification roughly doubles output tokens. Knowing these tradeoffs before implementation prevents surprises in production.
Composability. Patterns combine. You can use Meta-Prompting to generate an optimized prompt, apply that prompt with Self-Consistency, and run Chain-of-Verification on the result. Named patterns are building blocks, not one-size-fits-all solutions.
This page assumes you already understand prompt engineering fundamentals — system prompts, few-shot examples, and basic Chain-of-Thought. If those concepts are new, start there first.
2. When to Use Advanced Prompt Techniques
Section titled “2. When to Use Advanced Prompt Techniques”Not every problem needs an advanced technique. The decision depends on the failure mode you are solving.
| Problem | Recommended Pattern | Why |
|---|---|---|
| Model hallucinating facts | Chain-of-Verification (CoVe) | Forces self-fact-checking before final output |
| Unsure how to prompt a novel task | Meta-Prompting | LLM generates the optimized prompt for you |
| Slow response for long-form content | Skeleton-of-Thought (SoT) | Generates structure first, then fills in parallel |
| Inconsistent answers on reasoning tasks | Self-Consistency | Majority vote across multiple reasoning paths |
| Complex multi-step problem | Least-to-Most Prompting | Decomposes into subproblems, solves sequentially |
| Task requires external data | ReAct | Interleaves reasoning with tool calls |
| Model ignoring subtle constraints | Directional Stimulus Prompting | Adds hint keywords to steer output |
| Summarization losing key details | Chain-of-Density | Progressive compression preserves information |
| Ambiguous or poorly worded input | Rephrase and Respond (RaR) | Model rephrases the question before answering |
| Reasoning errors on math/logic | Contrastive Chain-of-Thought | Shows correct AND incorrect examples |
Rule of thumb: Start with the simplest technique that addresses your failure mode. Add complexity only when evaluation shows the simpler approach falls short. Every technique adds tokens and latency.
3. How Prompt Patterns Work — Architecture
Section titled “3. How Prompt Patterns Work — Architecture”Every prompt pattern follows the same structural idea: insert a processing step between the raw user input and the model’s final output. The pattern template shapes how the model reasons before committing to an answer.
Prompt Pattern Pipeline
Named patterns insert structured reasoning between input and output — turning a single LLM call into a multi-stage pipeline.
The pattern template is the differentiator. Without it, the model jumps straight from input to output — fast, but prone to the failure modes each pattern is designed to prevent. With the template, the model generates intermediate tokens that improve the quality of what comes next. This is not magic; it is a direct consequence of how autoregressive models work: every generated token becomes context for subsequent tokens.
4. The 10 Prompt Engineering Techniques Catalog
Section titled “4. The 10 Prompt Engineering Techniques Catalog”This is the core reference. Each pattern includes what it does, when to use it, a Python implementation using the OpenAI SDK, and which models handle it best.
4.1 Chain-of-Verification (CoVe)
Section titled “4.1 Chain-of-Verification (CoVe)”What it does: The model generates an initial response, creates verification questions about its own claims, answers those questions independently, and revises the response based on contradictions.
When to use: Factual queries where hallucination is the primary risk — knowledge base Q&A, data extraction, and claim verification tasks.
from openai import OpenAI
client = OpenAI()
def chain_of_verification(query: str, context: str = "") -> str: # Step 1: Generate initial response initial = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "Answer the question based on your knowledge."}, {"role": "user", "content": f"Context: {context}\n\nQuestion: {query}"} ] ).choices[0].message.content
# Step 2: Generate verification questions verification_qs = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "Generate 3-5 specific factual questions that would verify the claims in this response. Output only the questions, one per line."}, {"role": "user", "content": f"Original question: {query}\n\nResponse to verify:\n{initial}"} ] ).choices[0].message.content
# Step 3: Answer verification questions independently verified = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "Answer each question independently. If you are unsure, say 'uncertain'."}, {"role": "user", "content": verification_qs} ] ).choices[0].message.content
# Step 4: Revise based on verification final = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "Revise the original response. Remove or correct any claims that contradict the verification answers. Keep verified claims intact."}, {"role": "user", "content": f"Original response:\n{initial}\n\nVerification Q&A:\n{verified}"} ] ).choices[0].message.content
return finalBest models: GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro. Mid-tier models tend to confirm their own errors during verification rather than catching them.
4.2 Meta-Prompting
Section titled “4.2 Meta-Prompting”What it does: Instead of writing the prompt yourself, you ask the LLM to generate or refine the prompt for your task. The model acts as a prompt engineer.
When to use: Novel tasks where you are unsure of the optimal prompt structure, or when systematically improving an existing prompt.
def meta_prompt(task_description: str, examples: str = "") -> str: # Step 1: Generate an optimized prompt meta = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": ( "You are an expert prompt engineer. Given a task description, " "write an optimized system prompt that will produce the best results. " "Include: role definition, explicit constraints, output format, " "and 2 few-shot examples." )}, {"role": "user", "content": f"Task: {task_description}\n\nExample inputs/outputs:\n{examples}"} ] ).choices[0].message.content
return meta # Use this as the system prompt for your actual taskBest models: Frontier models only (GPT-4o, Claude 3.5 Sonnet). Smaller models produce generic prompts that do not outperform hand-written ones.
4.3 Skeleton-of-Thought (SoT)
Section titled “4.3 Skeleton-of-Thought (SoT)”What it does: Splits generation into two phases. First, the model produces a skeleton (outline). Then each section is expanded. This reduces perceived latency for long-form content.
When to use: Long-form generation (articles, reports, documentation) where users need to see structure quickly.
def skeleton_of_thought(query: str) -> str: # Phase 1: Generate skeleton skeleton = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": ( "Generate a concise outline for answering this question. " "Output 3-7 bullet points, each one sentence. " "Do not expand — skeleton only." )}, {"role": "user", "content": query} ] ).choices[0].message.content
# Phase 2: Expand each point expanded = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": ( "Expand each bullet point into a detailed paragraph. " "Maintain the original structure. Add specifics, examples, " "and technical depth." )}, {"role": "user", "content": f"Question: {query}\n\nSkeleton:\n{skeleton}"} ] ).choices[0].message.content
return expandedBest models: All models benefit. Particularly effective with Claude 3.5 Sonnet and GPT-4o for structured long-form output.
4.4 Self-Consistency
Section titled “4.4 Self-Consistency”What it does: Generates N independent responses at non-zero temperature and selects the most common answer by majority vote.
When to use: Math, logic, and multi-step reasoning tasks where a single reasoning chain may take a wrong turn. See the advanced prompting guide for the theoretical foundation.
from collections import Counter
def self_consistency(query: str, n: int = 5) -> str: answers = [] for _ in range(n): response = client.chat.completions.create( model="gpt-4o", temperature=0.7, # Non-zero is critical messages=[ {"role": "system", "content": ( "Solve this step by step. After your reasoning, " "write your final answer on the last line as: " "ANSWER: <your answer>" )}, {"role": "user", "content": query} ] ).choices[0].message.content
# Extract final answer for line in response.strip().split("\n")[::-1]: if line.strip().startswith("ANSWER:"): answers.append(line.split("ANSWER:")[-1].strip()) break
# Majority vote if not answers: return "No consistent answer found" most_common = Counter(answers).most_common(1)[0][0] return most_commonBest models: GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro. Diminishing returns after N=7. Temperature 0.5-0.8 produces sufficient diversity.
4.5 Least-to-Most Prompting
Section titled “4.5 Least-to-Most Prompting”What it does: Decomposes a complex problem into subproblems, then solves each sequentially — feeding the solution of each subproblem into the next.
When to use: Multi-step problems where the model fails to plan the full solution upfront — compositional generalization, complex reasoning chains, and tasks that require building up from simpler components.
def least_to_most(query: str) -> str: # Step 1: Decompose into subproblems decomposition = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": ( "Break this problem into a sequence of simpler subproblems. " "List them in order from simplest to most complex. " "Each subproblem should build on the previous one. " "Output only the numbered list of subproblems." )}, {"role": "user", "content": query} ] ).choices[0].message.content
# Step 2: Solve sequentially solved_so_far = "" subproblems = [line.strip() for line in decomposition.strip().split("\n") if line.strip()]
for sub in subproblems: solution = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": ( "Solve this subproblem. Use the previously solved " "subproblems as context." )}, {"role": "user", "content": ( f"Original problem: {query}\n\n" f"Previously solved:\n{solved_so_far}\n\n" f"Current subproblem: {sub}" )} ] ).choices[0].message.content solved_so_far += f"\n{sub}\nSolution: {solution}\n"
return solved_so_farBest models: All models benefit, but smaller models (Llama 3 8B, Mistral 7B) see the largest relative improvement because they struggle most with complex planning.
4.6 ReAct (Reason + Act)
Section titled “4.6 ReAct (Reason + Act)”What it does: Interleaves reasoning steps with tool calls in a thought-action-observation loop. The model thinks about what to do, executes a tool, observes the result, and continues reasoning.
When to use: Tasks requiring external data — web search, database lookups, API calls, code execution. ReAct is the pattern powering most production AI agents. See the advanced prompting guide for deeper coverage of the reasoning loop.
import json
tools = [ { "type": "function", "function": { "name": "search_docs", "description": "Search the knowledge base for relevant documents", "parameters": { "type": "object", "properties": { "query": {"type": "string", "description": "Search query"} }, "required": ["query"] } } }]
def react_loop(query: str, max_iterations: int = 5) -> str: messages = [ {"role": "system", "content": ( "You are a research assistant. Think step by step. " "Use the search_docs tool when you need factual information. " "Do not guess facts — search first, then reason." )}, {"role": "user", "content": query} ]
for _ in range(max_iterations): response = client.chat.completions.create( model="gpt-4o", messages=messages, tools=tools, tool_choice="auto" ) msg = response.choices[0].message messages.append(msg)
if msg.tool_calls: for call in msg.tool_calls: # Execute tool (implement search_docs separately) result = execute_tool(call.function.name, json.loads(call.function.arguments)) messages.append({ "role": "tool", "tool_call_id": call.id, "content": result }) else: return msg.content # Final answer
return messages[-1].contentBest models: GPT-4o and Claude 3.5 Sonnet handle complex tool orchestration best. Gemini 1.5 Pro is strong for multi-turn tool use. Smaller models often call tools incorrectly or skip tool use when it is needed.
4.7 Directional Stimulus Prompting
Section titled “4.7 Directional Stimulus Prompting”What it does: Adds a hint keyword or phrase to the prompt that steers the model toward a specific aspect of the answer without dictating the full response.
When to use: When the model consistently misses a specific constraint or emphasis — for example, always forgetting to mention security implications, or ignoring edge cases in code generation.
def directional_stimulus(query: str, hints: list[str]) -> str: hint_text = ", ".join(hints) response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": ( "Answer the question thoroughly. " f"Pay special attention to these aspects: {hint_text}. " "Make sure your response addresses each of these points." )}, {"role": "user", "content": query} ] ).choices[0].message.content
return response
# Example: steer toward security + performanceresult = directional_stimulus( "How should I implement user authentication in a FastAPI app?", hints=["rate limiting", "token rotation", "OWASP top 10"])Best models: All models respond to directional hints. This technique is lightweight and model-agnostic — effective even with smaller models like GPT-4o-mini and Llama 3 8B.
4.8 Chain-of-Density (Summarization)
Section titled “4.8 Chain-of-Density (Summarization)”What it does: Progressively compresses a summary over multiple iterations. Each iteration adds missing entities while keeping the summary length roughly constant, increasing information density.
When to use: Summarization tasks where initial summaries are too vague or miss key details. Produces summaries that are both concise and information-rich.
def chain_of_density(text: str, iterations: int = 3) -> str: summary = "" for i in range(iterations): if i == 0: prompt = ( f"Write a concise summary of this text in 3-4 sentences:\n\n{text}" ) else: prompt = ( f"Here is a text and its current summary. The summary is missing " f"key entities and details. Rewrite the summary to be equally concise " f"but more information-dense. Add missing entities without increasing " f"length significantly.\n\n" f"Text:\n{text}\n\n" f"Current summary:\n{summary}\n\n" f"Write an improved, denser summary:" )
summary = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "You are a precise summarizer. Every word must earn its place."}, {"role": "user", "content": prompt} ] ).choices[0].message.content
return summaryBest models: GPT-4o and Claude 3.5 Sonnet produce the most balanced density increases. Mid-tier models sometimes over-compress or lose coherence after 3+ iterations.
4.9 Rephrase and Respond (RaR)
Section titled “4.9 Rephrase and Respond (RaR)”What it does: Before answering, the model first rephrases the question in its own words, then answers the rephrased version. This surfaces ambiguities and forces the model to fully understand the query before responding.
When to use: Ambiguous queries, user-facing systems with varied input quality, and tasks where misinterpretation is common.
def rephrase_and_respond(query: str) -> str: response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": ( "Before answering, first rephrase the question in your own words " "to ensure you understand it correctly. Format:\n\n" "REPHRASED QUESTION: [your rephrasing]\n\n" "ANSWER: [your detailed answer]" )}, {"role": "user", "content": query} ] ).choices[0].message.content
return responseBest models: All models benefit. Particularly effective with mid-tier models (GPT-4o-mini, Claude 3.5 Haiku) that are more prone to misinterpreting complex or multi-part questions.
4.10 Contrastive Chain-of-Thought
Section titled “4.10 Contrastive Chain-of-Thought”What it does: Provides the model with both a correct reasoning example and an incorrect example for the same type of problem. The contrast helps the model identify and avoid common reasoning errors.
When to use: Math, logic, and classification tasks where specific error patterns are predictable — for example, models consistently confusing correlation with causation, or making off-by-one errors in counting problems.
def contrastive_cot(query: str, correct_example: str, incorrect_example: str) -> str: response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": ( "You will see a correct and an incorrect reasoning example " "for a similar problem. Study both to understand the right " "approach and the common mistake. Then solve the new problem " "using the correct reasoning pattern." )}, {"role": "user", "content": ( f"CORRECT EXAMPLE:\n{correct_example}\n\n" f"INCORRECT EXAMPLE (common mistake):\n{incorrect_example}\n\n" f"NOW SOLVE THIS PROBLEM:\n{query}\n\n" "Show your reasoning step by step, then give your final answer." )} ] ).choices[0].message.content
return response
# Usageresult = contrastive_cot( query="A train travels 120 km in 2 hours, stops for 30 min, then travels 90 km in 1.5 hours. What is the average speed for the entire journey?", correct_example=( "Q: A car drives 100 km in 2 hours, stops for 1 hour, then drives 50 km in 1 hour.\n" "Average speed = total distance / total time = 150 km / 4 hours = 37.5 km/h\n" "(Stop time IS included in total time for average speed)" ), incorrect_example=( "Q: A car drives 100 km in 2 hours, stops for 1 hour, then drives 50 km in 1 hour.\n" "Average speed = total distance / driving time = 150 km / 3 hours = 50 km/h\n" "(WRONG: this ignores stop time, giving an inflated average speed)" ))Best models: GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro. Even smaller models show measurable improvement when given contrastive examples — the explicit “do not do this” signal is a strong teaching pattern.
5. Pattern Selection Framework
Section titled “5. Pattern Selection Framework”Choosing the right technique depends on your problem type, accuracy requirements, and budget constraints. Work through these layers from top to bottom.
Pattern Selection Decision Layers
Start at the top. Each layer narrows your options until you arrive at the right technique for your problem.
Quick-start recommendations:
- Just need better accuracy on reasoning? Start with Self-Consistency (N=5).
- Hallucination is the problem? Apply Chain-of-Verification.
- Building a new prompt from scratch? Use Meta-Prompting to generate your first draft.
- Users ask ambiguous questions? Add Rephrase and Respond as a preprocessing step.
6. Implementation Examples
Section titled “6. Implementation Examples”Three complete examples showing how to combine patterns in realistic production scenarios.
Example 1: Factual Q&A Pipeline (CoVe + RaR)
Section titled “Example 1: Factual Q&A Pipeline (CoVe + RaR)”Combine Rephrase and Respond for input clarity with Chain-of-Verification for output accuracy.
def factual_qa_pipeline(query: str, context: str) -> dict: # Step 1: Rephrase for clarity rephrased = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "Rephrase this question to be clear and unambiguous. Output only the rephrased question."}, {"role": "user", "content": query} ] ).choices[0].message.content
# Step 2: Generate answer with CoVe initial_answer = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "Answer based on the provided context. Cite specific passages."}, {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {rephrased}"} ] ).choices[0].message.content
# Step 3: Verify verification = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "List 3 claims made in this answer. For each, state whether it is supported by the context."}, {"role": "user", "content": f"Context:\n{context}\n\nAnswer:\n{initial_answer}"} ] ).choices[0].message.content
# Step 4: Revise if needed final = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "Revise the answer, removing any claims not supported by the context."}, {"role": "user", "content": f"Answer:\n{initial_answer}\n\nVerification:\n{verification}"} ] ).choices[0].message.content
return {"original_query": query, "rephrased": rephrased, "answer": final}Example 2: Code Generation with Contrastive CoT
Section titled “Example 2: Code Generation with Contrastive CoT”Use contrastive examples to prevent common code generation errors.
def safe_code_generation(task: str, language: str = "python") -> str: response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": ( f"Generate {language} code for the given task. " "Study the correct and incorrect examples to avoid common mistakes." )}, {"role": "user", "content": ( "CORRECT PATTERN:\n" "```python\n" "# Always validate input before processing\n" "def process_data(items: list[dict]) -> list[dict]:\n" " if not items:\n" " return []\n" " return [transform(item) for item in items if is_valid(item)]\n" "```\n\n" "INCORRECT PATTERN (common mistake):\n" "```python\n" "# Missing input validation — crashes on None or empty input\n" "def process_data(items):\n" " return [transform(item) for item in items] # No type check, no validation\n" "```\n\n" f"TASK: {task}" )} ] ).choices[0].message.content
return responseExample 3: Research Summarization (SoT + Chain-of-Density)
Section titled “Example 3: Research Summarization (SoT + Chain-of-Density)”Generate a structured outline first, then compress each section for maximum information density.
def research_summary(text: str) -> str: # Phase 1: Skeleton skeleton = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "Create a 5-point outline summarizing the key findings. One sentence per point."}, {"role": "user", "content": text} ] ).choices[0].message.content
# Phase 2: Expand each point expanded = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "Expand each outline point into a detailed paragraph with specific data and findings."}, {"role": "user", "content": f"Source text:\n{text}\n\nOutline:\n{skeleton}"} ] ).choices[0].message.content
# Phase 3: Density compression (2 rounds) dense = expanded for _ in range(2): dense = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "Rewrite this summary to be equally concise but more information-dense. Add missing key entities without increasing length."}, {"role": "user", "content": f"Source:\n{text}\n\nCurrent summary:\n{dense}"} ] ).choices[0].message.content
return dense7. Prompt Patterns vs Prompt Engineering Fundamentals
Section titled “7. Prompt Patterns vs Prompt Engineering Fundamentals”Understanding how named patterns relate to the prompt engineering fundamentals helps you decide when to reach for an advanced technique.
Fundamentals vs Named Patterns
- System prompt design — role, constraints, format
- Zero-shot and few-shot examples
- Output format specification (JSON, markdown)
- Single LLM call per request — minimal latency
- Sufficient for 70-80% of production tasks
- Reusable templates for specific failure modes
- Multi-call pipelines with intermediate reasoning
- Composable — patterns combine into complex pipelines
- Higher token cost and latency per request
- Required for the remaining 20-30% of hard problems
8. Interview Questions on Prompt Engineering Techniques
Section titled “8. Interview Questions on Prompt Engineering Techniques”Senior GenAI interviews frequently test pattern selection and tradeoff analysis — not just knowledge of individual techniques.
Q: Your RAG system hallucinates facts in 12% of responses. Which prompt pattern would you apply and why?
Chain-of-Verification (CoVe) targets this directly. After the model generates its response from retrieved context, CoVe forces it to generate verification questions about its own claims, answer them independently by re-reading the context, and revise any contradictions. The key is that verification questions are answered in a separate call — the model cannot simply confirm its own hallucination if it re-examines the source material independently. For a RAG system, you would also check whether the retrieval step is returning relevant context, since CoVe cannot fix answers based on irrelevant documents.
Q: When would you choose Self-Consistency over Chain-of-Thought, and what is the cost tradeoff?
Chain-of-Thought generates one reasoning path. Self-Consistency generates N paths (typically 5-7) and takes a majority vote. Choose Self-Consistency when accuracy on reasoning tasks justifies the cost — it multiplies inference cost by N. The technique works because correct reasoning paths converge while errors produce diverse wrong answers. Diminishing returns kick in around N=7. A practical production pattern is to use Chain-of-Thought by default and escalate to Self-Consistency only for queries flagged as high-stakes or where the model’s confidence score is low.
Q: A product manager asks you to reduce response time from 8 seconds to under 3 seconds for a documentation assistant. What patterns help?
Skeleton-of-Thought addresses perceived latency — the skeleton arrives fast, giving users immediate structure while details fill in. For actual latency reduction, you would combine this with model routing: use a smaller, faster model (GPT-4o-mini or Claude 3.5 Haiku) for the skeleton generation and a frontier model only for expanding complex sections. Rephrase and Respond can also help indirectly — by clarifying the query upfront, the model spends fewer tokens on hedging and off-topic content.
Q: How do you evaluate whether a prompt pattern is actually helping?
Run a controlled comparison against your evaluation dataset. Measure the target metric (accuracy, hallucination rate, user satisfaction) with and without the pattern on the same test set. Track token cost and latency alongside accuracy. A pattern that improves accuracy by 3% but doubles cost may not be worth it for most queries. The evaluation should include both average-case and worst-case analysis — some patterns help the median case but do not affect the failure modes you care about most.
9. Prompt Patterns in Production — Cost Impact
Section titled “9. Prompt Patterns in Production — Cost Impact”Every advanced technique adds tokens and latency. Plan for this in your budget and architecture.
| Pattern | Extra LLM Calls | Token Multiplier | Latency Impact | Best For |
|---|---|---|---|---|
| Chain-of-Verification | 3 additional | ~2-3x output tokens | +3-5s per request | Async fact-checking pipelines |
| Meta-Prompting | 1 additional (one-time) | ~1.5x for prompt generation | One-time setup cost | Prompt development, not per-request |
| Skeleton-of-Thought | 1 additional | ~1.2x total | Reduces perceived latency | User-facing long-form generation |
| Self-Consistency | N-1 additional | Nx total cost | Nx latency (parallel: ~1x) | High-stakes reasoning, parallelizable |
| Least-to-Most | K additional (K = subproblems) | ~1.5-2x | +2-4s per subproblem | Complex sequential reasoning |
| ReAct | Variable (1-10 tool calls) | Variable | +1-3s per tool call | Agent tasks requiring external data |
| Directional Stimulus | 0 | ~1x | Negligible | Always-on, no cost penalty |
| Chain-of-Density | 2-3 additional | ~2x | +2-3s per iteration | Batch summarization pipelines |
| Rephrase and Respond | 0 (same call) | ~1.1x | Negligible | Always-on, minimal cost penalty |
| Contrastive CoT | 0 | ~1.3x (longer prompt) | Negligible | Tasks with predictable error patterns |
Cost optimization strategies:
- Route by complexity. Use Directional Stimulus and RaR (zero extra calls) for most queries. Escalate to Self-Consistency or CoVe only for queries flagged as high-risk.
- Parallelize where possible. Self-Consistency samples can run in parallel, reducing latency from Nx to ~1x while keeping the same cost.
- Cache pattern outputs. Meta-Prompting generates prompts once — cache and reuse. Chain-of-Density summaries of static documents can be precomputed.
- Use smaller models for intermediate steps. CoVe’s verification questions can be generated by GPT-4o-mini. Only the final revision needs the frontier model.
10. Summary and Key Takeaways
Section titled “10. Summary and Key Takeaways”The 10 prompt engineering techniques in this catalog address specific, known failure modes: hallucination (CoVe), slow generation (SoT), inconsistent reasoning (Self-Consistency), ambiguous input (RaR), and missed constraints (Directional Stimulus). Each is a named, reusable pattern — not ad hoc prompt tweaking.
Key principles:
- Start simple. Prompt fundamentals solve 70-80% of problems. Apply named patterns only when evaluation shows they help.
- Match pattern to failure mode. The decision table in Section 2 maps problems to solutions directly.
- Measure everything. Run your evaluation pipeline before and after applying a pattern. If the metric does not improve, remove the pattern.
- Budget for tokens. Every technique except Directional Stimulus and RaR adds at least one extra LLM call. Plan your cost model accordingly.
- Compose deliberately. Patterns combine (RaR + CoVe, SoT + Chain-of-Density), but each addition multiplies cost. Use the prompt testing framework to validate combinations.
Related
Section titled “Related”- Prompt Engineering Fundamentals — System prompts, few-shot, structured output, and the basics this guide builds on
- Advanced Prompting — CoT, ToT, Self-Consistency — Deeper coverage of Chain-of-Thought and Tree-of-Thought theory
- Prompt Testing Guide — How to build evaluation datasets and test prompt changes systematically
- Prompt Management — Versioning, deployment, and lifecycle management for production prompts
- LLM Evaluation Guide — RAGAS, LLM-as-judge, and A/B testing for measuring prompt quality
Frequently Asked Questions
What are prompt engineering techniques?
Prompt engineering techniques are named, reusable patterns for structuring LLM inputs to improve output quality on specific problem types. Unlike basic prompting (writing a good instruction), techniques like Chain-of-Verification, Meta-Prompting, and Skeleton-of-Thought provide repeatable templates that address known failure modes such as hallucination, slow generation, and inconsistent reasoning.
What is Chain-of-Verification (CoVe)?
Chain-of-Verification is a prompt pattern where the model generates an initial response, then produces verification questions about its own claims, answers those questions independently, and revises the original response based on any contradictions found. CoVe reduces hallucination rates by forcing the model to fact-check itself before delivering a final answer.
What is Meta-Prompting and when should you use it?
Meta-Prompting asks the LLM to generate or refine a prompt before executing the actual task. You describe what you need, and the model writes an optimized prompt for that purpose. Use it when you are unsure how to structure a prompt for a novel task, or when you want to systematically improve an existing prompt by having the model analyze its own instruction-following patterns.
How does Skeleton-of-Thought speed up LLM responses?
Skeleton-of-Thought splits generation into two phases. First, the model produces a skeleton — a list of bullet points outlining the structure of the answer. Then each skeleton point is expanded in parallel (or sequentially). This reduces perceived latency because the skeleton arrives fast and gives users an immediate sense of the answer structure while details fill in progressively.
What is the difference between Self-Consistency and Chain-of-Thought?
Chain-of-Thought generates one reasoning path and one answer. Self-Consistency generates multiple independent reasoning paths (typically 3-7) at non-zero temperature, then selects the most common final answer by majority vote. Self-Consistency improves accuracy on reasoning tasks by filtering out reasoning errors that produce diverse wrong answers, while correct reasoning tends to converge on the same result.
What is Contrastive Chain-of-Thought?
Contrastive Chain-of-Thought provides the model with both a correct reasoning example and an incorrect reasoning example for the same problem. By seeing what right and wrong look like side by side, the model learns to avoid common reasoning mistakes. This technique is effective for math, logic, and classification tasks where specific error patterns are predictable.
How do prompt engineering techniques affect token cost?
Most advanced techniques increase token usage. Self-Consistency multiplies cost by the number of samples (3-7x). Chain-of-Verification roughly doubles output tokens. Meta-Prompting adds an extra LLM call. Skeleton-of-Thought can reduce perceived latency but uses similar total tokens. The cost increase is justified when accuracy improvements prevent downstream failures that are more expensive than the additional tokens.
Which prompt engineering technique reduces hallucination the most?
Chain-of-Verification (CoVe) is specifically designed to reduce hallucination. By forcing the model to generate verification questions about its own claims and answer them independently, CoVe catches factual errors before they reach the user. ReAct also reduces hallucination by grounding responses in retrieved data rather than relying on the model's parametric knowledge, but ReAct requires tool access while CoVe works with any LLM.
Can you combine multiple prompt engineering techniques?
Yes. Combining techniques is common in production systems. For example, you can use Meta-Prompting to generate an optimized prompt, then apply that prompt with Self-Consistency for higher accuracy, and finally run Chain-of-Verification on the winning answer to catch hallucinations. The key constraint is cost — each additional technique adds latency and token usage, so combine only when the accuracy gain justifies the overhead.
Which models support advanced prompt engineering techniques best?
Frontier models (GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro) support all 10 techniques reliably. Mid-tier models (GPT-4o-mini, Claude 3.5 Haiku, Gemini 1.5 Flash) work well with most techniques but may struggle with Meta-Prompting and complex multi-step patterns. Smaller open-source models (Llama 3 8B, Mistral 7B) benefit most from Least-to-Most and basic Chain-of-Thought but are less reliable with Self-Consistency and CoVe.