Skip to content

LangChain LCEL Tutorial — Build Chains the Modern Way (2026)

LangChain LCEL is the modern way to build chains in LangChain v0.3+. If you have used LangChain before 2024, you probably wrote LLMChain or SequentialChain code that no longer works. LCEL (LangChain Expression Language) replaced all of that with a single concept: pipe Runnables together using the | operator. This tutorial walks you through 6 runnable Python examples — from basic chains to parallel execution and fallbacks.

For the broader LangChain ecosystem (RAG, agents, tool calling), see the full LangChain tutorial. This page focuses specifically on LCEL composition patterns.


1. Why LangChain LCEL Replaced Legacy Chains

Section titled “1. Why LangChain LCEL Replaced Legacy Chains”

LangChain LCEL exists because the old chain API was a dead end. Here is what happened and why it matters.

The Problem with LLMChain and SequentialChain

Section titled “The Problem with LLMChain and SequentialChain”

Before LCEL, LangChain had dedicated chain classes for every pattern:

Legacy ClassWhat It DidWhy It Was Removed
LLMChainPrompt + LLM in one objectCould not stream, could not compose with other chains easily
SequentialChainMultiple chains in sequenceRequired explicit input_keys / output_keys mapping — brittle
TransformChainCustom transformation stepSeparate API from LLMChain — not interchangeable
RouterChainConditional routingComplex setup for a simple if/else

Every chain type had its own API. Streaming required special methods. Combining chains meant learning each class’s unique interface. The cognitive overhead was enormous.

LCEL unified everything under one interface: Runnable. Every component — prompts, models, parsers, retrievers, custom functions — implements the same three methods: .invoke(), .stream(), .batch(). You compose them with the | operator:

# Old way (deprecated)
chain = LLMChain(llm=llm, prompt=prompt)
# New way (LCEL)
chain = prompt | llm | parser

The | operator calls Python’s __or__ method, creating a RunnableSequence that passes output from one step as input to the next. Streaming works automatically through every step in the chain.


2. When LCEL Matters — Real-World Patterns

Section titled “2. When LCEL Matters — Real-World Patterns”

LCEL shines when you need to compose multiple steps. Here are the patterns you will actually use in production.

PatternLCEL ExpressionUse Case
Simple generationprompt | llm | parserChatbots, text generation, summarization
RAG retrieval chain{"context": retriever, "question": passthrough} | prompt | llm | parserQ&A over documents, knowledge bases
Multi-step reasoningchain_1 | chain_2 | chain_3Extract → analyze → summarize pipelines
Parallel tool callsRunnableParallel(a=chain_a, b=chain_b)Run sentiment + keywords + summary simultaneously
Streaming responseschain.stream(input)Real-time UX in web applications
Fallback chainschain.with_fallbacks([backup])GPT-4o primary, Claude fallback on failure

If your use case is a single LLM API call with no composition, skip LCEL and use the raw SDK directly. LCEL adds value when you chain 2+ components together.


The pipe operator is the core of LCEL. Understanding how data flows through a chain unlocks every pattern in this tutorial.

Every LCEL component implements the Runnable interface:

class Runnable:
def invoke(self, input) # Single input → single output
def stream(self, input) # Single input → streamed output chunks
def batch(self, inputs) # Multiple inputs → multiple outputs (concurrent)
async def ainvoke(self, input) # Async version of invoke
async def astream(self, input) # Async version of stream

When you write prompt | llm | parser, Python creates a RunnableSequence. Calling .invoke() on the sequence passes data through each component in order. Calling .stream() streams tokens through the entire pipeline.

Each Runnable declares its input and output schemas. The pipe operator connects them:

  1. PromptTemplate — Input: dict with template variables. Output: ChatPromptValue (formatted messages).
  2. ChatModel — Input: messages. Output: AIMessage with content and metadata.
  3. OutputParser — Input: AIMessage. Output: str, dict, or Pydantic object.

If the output schema of step N does not match the input schema of step N+1, you get a clear error at chain construction time — not at runtime.

LCEL Pipe Operator — Data Flow

Each component transforms input and passes output to the next

Input
Template variables
Dict: {topic: 'RAG'}
Passed to PromptTemplate
Processing
prompt | llm | parser
PromptTemplate renders messages
ChatModel generates response
OutputParser extracts result
Output
Structured result
String, dict, or Pydantic object
Ready for your application
Idle

Follow these 6 steps to go from zero to a working LCEL chain with streaming.

Terminal window
pip install langchain langchain-openai langchain-core

Pin versions in production: langchain==0.3.x and langchain-core==0.3.x. LangChain releases weekly and occasionally introduces breaking changes.

from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
("system", "You are a senior engineer who explains concepts clearly."),
("user", "Explain {topic} in 3 bullet points. Be specific.")
])

The template declares {topic} as an input variable. When invoked, it produces formatted chat messages.

from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o", temperature=0)
chain = prompt | llm

This two-step chain renders the prompt, then sends it to GPT-4o. The output is an AIMessage object.

from langchain_core.output_parsers import StrOutputParser
parser = StrOutputParser()
chain = prompt | llm | parser

Now the chain returns a plain string instead of an AIMessage. For structured output, use JsonOutputParser or PydanticOutputParser.

from langchain_core.runnables import RunnablePassthrough
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| parser
)

RunnablePassthrough() forwards the user question unchanged. The retriever fetches relevant documents in parallel. Both feed into the prompt template. See the RAG architecture guide for the full pattern.

for chunk in chain.stream({"topic": "LCEL in LangChain"}):
print(chunk, end="", flush=True)

Every component in the chain supports streaming. The prompt renders instantly, the LLM streams tokens, and the parser yields chunks as they arrive. No extra configuration needed.


5. LCEL Architecture — Runnable Composition

Section titled “5. LCEL Architecture — Runnable Composition”

LCEL’s power comes from composing Runnables in two patterns: sequential chains and parallel branches.

LCEL Composition Patterns

Sequential chains vs parallel branches — both are Runnables

Sequential Chainprompt | llm | parser
PromptTemplate
ChatModel
OutputParser
Result: single output
Parallel ChainRunnableParallel(a=..., b=...)
Input splits to branches
Branch A: summary chain
Branch B: keywords chain
Result: {summary: ..., keywords: ...}
Idle

RunnableParallel takes a dictionary of named chains and runs them all concurrently:

from langchain_core.runnables import RunnableParallel
parallel = RunnableParallel(
summary=prompt_summary | llm | parser,
keywords=prompt_keywords | llm | parser,
)
# Both chains run at the same time
results = parallel.invoke({"text": "Your input here..."})
# results = {"summary": "...", "keywords": "..."}

The total latency equals the slowest branch, not the sum of all branches. For 3 chains that each take 2 seconds, sequential execution takes 6 seconds while parallel execution takes 2 seconds.

You do not always need to import RunnableParallel explicitly. A plain dictionary in an LCEL chain creates one automatically:

# This is equivalent to RunnableParallel
chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| parser
)

The dictionary {"context": ..., "question": ...} runs both branches concurrently and passes the combined result to the prompt.


Six runnable examples covering the patterns you will use most. Each example is self-contained.

The simplest possible chain — prompt, model, parser:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful coding assistant."),
("user", "Write a Python function that {task}")
])
chain = prompt | ChatOpenAI(model="gpt-4o") | StrOutputParser()
result = chain.invoke({"task": "reverses a linked list"})
print(result)

A retrieval-augmented generation chain that answers questions from your documents:

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma(embedding_function=embeddings)
vectorstore.add_texts(["LCEL uses the pipe operator for composition.",
"RunnableParallel runs chains concurrently.",
"Every LCEL component implements invoke, stream, and batch."])
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})
def format_docs(docs):
return "\n".join(doc.page_content for doc in docs)
prompt = ChatPromptTemplate.from_messages([
("system", "Answer based only on this context:\n{context}"),
("user", "{question}")
])
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| ChatOpenAI(model="gpt-4o")
| StrOutputParser()
)
answer = rag_chain.invoke("How does LCEL composition work?")
print(answer)

Example 3: RunnablePassthrough for Data Forwarding

Section titled “Example 3: RunnablePassthrough for Data Forwarding”

RunnablePassthrough forwards input unchanged. Use it when one branch processes data while another passes it through:

from langchain_core.runnables import RunnablePassthrough
# RunnablePassthrough.assign() adds new keys alongside existing ones
chain = RunnablePassthrough.assign(
word_count=lambda x: len(x["text"].split())
)
result = chain.invoke({"text": "LCEL makes LangChain chains composable"})
# result = {"text": "LCEL makes LangChain chains composable", "word_count": 6}

Example 4: RunnableParallel for Concurrent Execution

Section titled “Example 4: RunnableParallel for Concurrent Execution”

Run 3 analysis chains simultaneously on the same input:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableParallel
llm = ChatOpenAI(model="gpt-4o")
parser = StrOutputParser()
summary_prompt = ChatPromptTemplate.from_template("Summarize in 1 sentence: {text}")
keywords_prompt = ChatPromptTemplate.from_template("Extract 5 keywords from: {text}")
sentiment_prompt = ChatPromptTemplate.from_template("Is this positive, negative, or neutral? {text}")
analysis = RunnableParallel(
summary=summary_prompt | llm | parser,
keywords=keywords_prompt | llm | parser,
sentiment=sentiment_prompt | llm | parser,
)
results = analysis.invoke({"text": "LCEL is a game-changer for LangChain development."})
print(results["summary"])
print(results["keywords"])
print(results["sentiment"])

Stream tokens to the user in real time:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
prompt = ChatPromptTemplate.from_messages([
("system", "You explain technical concepts for senior engineers."),
("user", "Explain {topic} with a concrete code example.")
])
chain = prompt | ChatOpenAI(model="gpt-4o") | StrOutputParser()
# Synchronous streaming
for chunk in chain.stream({"topic": "LCEL Runnables"}):
print(chunk, end="", flush=True)
# Async streaming (for FastAPI, Django, etc.)
# async for chunk in chain.astream({"topic": "LCEL Runnables"}):
# yield chunk

Use .with_fallbacks() to add resilience when the primary model fails:

from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
prompt = ChatPromptTemplate.from_template("Explain {topic} in 3 sentences.")
parser = StrOutputParser()
primary = prompt | ChatOpenAI(model="gpt-4o") | parser
fallback = prompt | ChatAnthropic(model="claude-sonnet-4-20250514") | parser
# If GPT-4o fails (rate limit, outage), automatically try Claude
resilient_chain = primary.with_fallbacks([fallback])
result = resilient_chain.invoke({"topic": "LangChain LCEL"})
print(result)

LCEL is not always the right tool. Here are the cases where you should reach for something else.

  • Single API call — If you just need openai.chat.completions.create(), the raw SDK is simpler. LCEL adds value only when you compose 2+ steps.
  • Simple function calls — A Python function that calls an API and returns results does not need to be a Runnable. Regular Python is fine.
  • Prototyping — When you are exploring an idea, write plain Python first. Convert to LCEL only when you need streaming, batching, or composition.

LCEL handles linear pipelines where data flows in one direction. LangGraph handles everything else:

PatternLCELLangGraph
Linear chain (A → B → C)YesOverkill
Conditional routing (if X, do A; else do B)Possible but awkwardBuilt-in
Loops (retry until success)NoYes
State persistence across requestsNoYes
Human-in-the-loop approvalNoYes

Most production agent systems use LangGraph for orchestration with LCEL chains inside individual graph nodes. See the LangChain vs LangGraph comparison for a detailed breakdown.

The pipe operator syntax is elegant for short chains but degrades quickly:

# Readable (3 pipes)
chain = prompt | llm | parser
# Still OK (with named dictionary)
chain = {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | parser
# Getting hard to follow (nested parallels)
chain = RunnableParallel(a=p1 | llm | pa, b=p2 | llm | pb) | merge_prompt | llm | parser

If you catch yourself writing deeply nested LCEL, refactor into named sub-chains:

extract_chain = prompt_extract | llm | parser_extract
analyze_chain = prompt_analyze | llm | parser_analyze
final_chain = extract_chain | analyze_chain # Much clearer

These questions test whether you understand LCEL’s mechanics or just memorized the syntax.

Q1: “What is LCEL and why did LangChain adopt it?”

Section titled “Q1: “What is LCEL and why did LangChain adopt it?””

What they test: Do you understand the architectural motivation, not just the syntax?

Strong answer: “LCEL is LangChain’s declarative composition syntax built on the Runnable interface. Every component implements invoke, stream, and batch. The pipe operator connects them: prompt | llm | parser. LangChain adopted it because the legacy chain classes (LLMChain, SequentialChain) each had separate APIs, did not support streaming natively, and could not be easily composed. LCEL unified everything under one interface.”

Q2: “When would you use RunnableParallel vs a sequential chain?”

Section titled “Q2: “When would you use RunnableParallel vs a sequential chain?””

Strong answer: “Sequential for dependent steps where step N needs step N-1’s output. Parallel for independent steps that can run simultaneously — like extracting keywords, generating a summary, and scoring sentiment from the same input. Parallel reduces total latency from the sum of all steps to the duration of the slowest step.”

Q3: “Your LCEL chain is 8 pipes deep. How do you refactor?”

Section titled “Q3: “Your LCEL chain is 8 pipes deep. How do you refactor?””

Strong answer: “Break it into named sub-chains of 2-3 pipes each, then compose those. If the chain has conditional logic or loops, migrate to LangGraph where each node is a short LCEL chain. Name each sub-chain descriptively so the pipeline reads like a recipe.”

Q4: “How does LCEL handle errors in a chain?”

Section titled “Q4: “How does LCEL handle errors in a chain?””

Strong answer: “By default, an exception in any step propagates and stops the chain. You can add .with_fallbacks([backup_chain]) to try alternatives on failure, and .with_retry(stop_after_attempt=3) for transient errors like rate limits. For fine-grained error handling, wrap individual steps in try/except inside a RunnableLambda.”


9. LCEL in Production — Streaming and Tracing

Section titled “9. LCEL in Production — Streaming and Tracing”

Running LCEL chains in production requires observability, async execution, and cost awareness.

Enable LangSmith to trace every chain execution:

Terminal window
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=your-key
export LANGCHAIN_PROJECT=my-project

With tracing enabled, every .invoke() and .stream() call creates a trace span. You see the exact input and output of each step, latency per component, and token usage. Without this, debugging a 4-step chain that returns wrong answers is guesswork.

In FastAPI or Django, always use async methods to avoid blocking the event loop:

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
app = FastAPI()
@app.post("/chat")
async def chat(question: str):
async def generate():
async for chunk in chain.astream({"question": question}):
yield chunk
return StreamingResponse(generate(), media_type="text/plain")

Using chain.invoke() in an async web server blocks the entire event loop for the duration of the LLM call. A single slow request blocks all other requests. Always use chain.ainvoke() or chain.astream().

Production chains need resilience against transient failures:

# Retry on rate limits (up to 3 attempts with exponential backoff)
resilient = chain.with_retry(stop_after_attempt=3)
# Fallback to a different model on any error
from langchain_anthropic import ChatAnthropic
backup = prompt | ChatAnthropic(model="claude-sonnet-4-20250514") | parser
resilient = chain.with_fallbacks([backup])

Log token usage per request to catch cost spikes early. A RAG chain that retrieves too many chunks can cost 10-50x more than an optimized one:

from langchain_core.callbacks import BaseCallbackHandler
class CostTracker(BaseCallbackHandler):
def on_llm_end(self, response, **kwargs):
usage = response.llm_output.get("token_usage", {})
print(f"Tokens: {usage.get('total_tokens', 0)}")

For prompt engineering best practices that reduce token costs, see the dedicated guide.


  • LCEL is the current LangChain API — if you see LLMChain or SequentialChain in a tutorial, it is outdated
  • The pipe operator composes Runnablesprompt | llm | parser creates a streamable, batchable chain
  • RunnablePassthrough forwards input unchanged — essential for RAG chains where the question must reach the prompt
  • RunnableParallel runs branches concurrently — total latency equals the slowest branch, not the sum
  • Streaming works automatically — call .stream() instead of .invoke() on any chain
  • Use .with_fallbacks() for resilience — primary model fails, backup model takes over
  • Know when to stop — if your chain exceeds 4 pipes or needs loops, switch to LangGraph

Frequently Asked Questions

What is LCEL in LangChain?

LCEL (LangChain Expression Language) is the declarative composition syntax in LangChain v0.3+. You chain components together using the pipe operator (|), connecting prompts, models, output parsers, and retrievers into a runnable pipeline. LCEL replaced the legacy LLMChain and SequentialChain APIs and handles streaming, batching, and async execution automatically.

How does the pipe operator work in LangChain LCEL?

The pipe operator (|) connects Runnable components in sequence. Each component takes the output of the previous one as input. For example, prompt | llm | parser creates a chain where a prompt template renders variables, the LLM generates a response, and the parser extracts structured output. Under the hood, the | operator calls the __or__ method on the Runnable interface.

What is a Runnable in LangChain?

A Runnable is LangChain's universal interface. Every component — prompts, models, parsers, retrievers, and custom functions — implements the Runnable protocol with three methods: invoke() for single inputs, stream() for token-by-token output, and batch() for processing multiple inputs concurrently. This uniform interface is what makes the pipe operator composition possible.

What is RunnablePassthrough in LangChain?

RunnablePassthrough passes its input through unchanged to the next step in the chain. It is most commonly used in RAG chains to forward the user question alongside retrieved context. For example, you use RunnablePassthrough() as the question key in a dictionary so the original input flows through while the retriever processes the context key separately.

What is RunnableParallel in LangChain?

RunnableParallel runs multiple chains simultaneously and collects their results into a dictionary. You pass a dictionary of named chains, and LangChain executes them all concurrently. This is useful when you need to generate a summary, extract keywords, and analyze sentiment from the same input in parallel rather than running each chain sequentially.

How do I stream responses with LCEL?

Call .stream() instead of .invoke() on any LCEL chain. Every component in the chain supports streaming natively. The prompt renders instantly, the LLM streams tokens as they are generated, and the parser yields chunks as they arrive. Use async streaming with async for chunk in chain.astream(input) in web frameworks like FastAPI.

Should I use LCEL or LangGraph?

Use LCEL for linear pipelines where data flows in one direction — prompt to model to parser. Use LangGraph when you need cycles, conditional routing, state persistence, or human-in-the-loop workflows. Most production agents use LangGraph for orchestration while using LCEL chains inside individual graph nodes for the actual LLM calls.

How does LCEL replace LLMChain?

The legacy LLMChain combined a prompt and model into a single object. LCEL replaces this with prompt | llm | parser, which is more composable and supports streaming natively. LLMChain required special methods for streaming and could not be easily extended with new components. LCEL chains are just Runnables, so any component can be added or swapped without changing the chain structure.

Can I add error handling to LCEL chains?

Yes. LCEL provides .with_fallbacks() to specify backup chains that run when the primary chain fails. You can also use .with_retry() to automatically retry on transient errors like rate limits or timeouts. For example, chain.with_fallbacks([fallback_chain]) tries the main chain first and falls back to the alternative if it raises an exception.

What are the best practices for LCEL in production?

Pin your langchain and langchain-core versions explicitly. Use async methods (ainvoke, astream) in web frameworks to avoid blocking the event loop. Enable LangSmith tracing with LANGCHAIN_TRACING_V2=true for observability. Add .with_fallbacks() for resilience and .with_retry() for transient failures. Keep chains short and composable rather than building deeply nested pipelines.

Last updated: March 2026 | LangChain v0.3+ / Python 3.10+