Skip to content

LangChain Tutorial: Build Your First LLM App (2026)

LangChain is the most popular framework for building LLM-powered applications in Python — and this tutorial gets you from zero to a working app in 20 minutes. You’ll build three things: a simple chain, a RAG pipeline, and a tool-calling agent. By the end, you’ll understand LangChain’s core abstractions and when to use each one.

Who this is for:

  • Junior engineers: You’re new to LLM development and want a structured path through LangChain’s components
  • Senior engineers: You want a quick refresher on LangChain’s current API (LCEL) before deciding if it fits your project

You need to build an LLM-powered feature. The raw OpenAI SDK works for simple chat, but production apps need more:

What You NeedRaw SDKWith LangChain
Combine a prompt template + LLM + output parserManual string formatting + API call + JSON parsing`prompt
Search your docs and answer questions (RAG)Build embedding pipeline, vector search, prompt injection from scratchcreate_retrieval_chain() — 5 lines
Let the LLM call external toolsParse tool_calls JSON, match function names, handle results manuallyllm.bind_tools([...]) — auto-dispatch
Swap between GPT-4o, Claude, and GeminiDifferent SDKs, different response formatsChange one import, same interface
Stream responses token by tokenCustom SSE handling per provider.stream() — unified across all models

Over 40 million monthly downloads make LangChain the most-used LLM framework. Even if you choose a different tool for production, understanding LangChain’s abstractions is essential for GenAI interviews.

The value isn’t magic — it’s standardization. LangChain gives you a common interface across LLM providers, a composable chain syntax, and pre-built components for common patterns like RAG.


Think of LangChain like UNIX pipes. In UNIX, you chain commands: cat file | grep pattern | sort. Each command takes input, transforms it, and passes output to the next.

LangChain works the same way. You chain components with the | operator:

prompt | llm | parser

Each component in the chain implements the Runnable interface — meaning it has .invoke(), .stream(), and .batch() methods. This is called LCEL (LangChain Expression Language).

  1. Chat Models — Wrappers around LLM APIs (OpenAI, Anthropic, Google). They take messages in, return messages out. All implement the same interface.

  2. Prompt Templates — Reusable templates with variables. ChatPromptTemplate.from_messages([("system", "..."), ("user", "{input}")]) produces a formatted prompt.

  3. Output Parsers — Transform LLM text output into structured data. StrOutputParser gives you a plain string. JsonOutputParser gives you a dict. PydanticOutputParser gives you a validated object.

  4. Retrievers — Fetch relevant documents from a vector store, database, or API. Used in RAG pipelines to give the LLM context.

  5. Tools — Python functions the LLM can call. You define the function, LangChain generates the schema, and the LLM decides when to call it.


Each of the three builds introduces a progressively more powerful LangChain pattern: a simple chain, a RAG pipeline, and a tool-calling agent.

Install LangChain and a model provider:

Terminal window
pip install langchain langchain-openai
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
# 1. Create the model
llm = ChatOpenAI(model="gpt-4o")
# 2. Create a prompt template
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant that explains tech concepts simply."),
("user", "Explain {topic} in 3 sentences for a beginner.")
])
# 3. Create an output parser
parser = StrOutputParser()
# 4. Chain them together with LCEL
chain = prompt | llm | parser
# 5. Run it
result = chain.invoke({"topic": "vector databases"})
print(result)

That’s it. The | operator connects the prompt → model → parser into a single runnable chain. Call .invoke() with your variables, get your result.

RAG (Retrieval-Augmented Generation) lets the LLM answer questions about your documents:

Terminal window
pip install langchain-openai chromadb langchain-chroma
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
# 1. Create embeddings and vector store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma(embedding_function=embeddings)
# 2. Add your documents
docs = [
"Our API rate limit is 100 requests per minute per key.",
"Enterprise plans support up to 10,000 requests per minute.",
"Rate limit errors return HTTP 429 with a Retry-After header.",
]
vectorstore.add_texts(docs)
# 3. Create a retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})
# 4. Build the RAG chain
prompt = ChatPromptTemplate.from_messages([
("system", "Answer based only on this context:\n{context}"),
("user", "{question}")
])
def format_docs(docs):
return "\n".join(doc.page_content for doc in docs)
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| ChatOpenAI(model="gpt-4o")
| StrOutputParser()
)
# 5. Ask a question
answer = rag_chain.invoke("What happens when I hit the rate limit?")
print(answer)

The retriever finds the 2 most relevant docs, the prompt injects them as context, and the LLM answers using only that context. This is the foundational pattern behind every AI-powered search and Q&A feature.

Build 3: A Tool-Calling Agent (10 minutes)

Section titled “Build 3: A Tool-Calling Agent (10 minutes)”

Agents let the LLM decide which tools to call based on the user’s question:

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain_core.messages import HumanMessage
# 1. Define tools
@tool
def get_weather(city: str) -> str:
"""Get the current weather for a city."""
# Replace with a real API call in production
weather_data = {"London": "12C, cloudy", "Tokyo": "22C, sunny", "NYC": "8C, rain"}
return weather_data.get(city, f"Weather data not available for {city}")
@tool
def calculate(expression: str) -> str:
"""Evaluate a mathematical expression."""
return str(eval(expression)) # Use a safe evaluator in production
# 2. Bind tools to the model
llm = ChatOpenAI(model="gpt-4o")
llm_with_tools = llm.bind_tools([get_weather, calculate])
# 3. Call the model — it decides whether to use tools
response = llm_with_tools.invoke([
HumanMessage(content="What's the weather in Tokyo?")
])
# 4. Check if it wants to call a tool
if response.tool_calls:
for tc in response.tool_calls:
print(f"Tool: {tc['name']}, Args: {tc['args']}")
# Execute the tool
tool_fn = {"get_weather": get_weather, "calculate": calculate}[tc["name"]]
result = tool_fn.invoke(tc["args"])
print(f"Result: {result}")

The LLM sees the tool schemas, decides get_weather is the right tool, and returns a structured tool_calls list. You execute the tools and can feed results back for a full ReAct loop.


LCEL chains execute by passing data through each component in sequence, with the LCEL runtime managing streaming, batching, and error propagation automatically.

LangChain LCEL — Chain Execution Flow

Data flows left to right through the pipe operator

InputUser variables
Dict with template variables
e.g. {topic: 'RAG'}
PromptTemplate rendering
ChatPromptTemplate formats messages
Injects variables into system/user slots
ModelLLM API call
ChatOpenAI / ChatAnthropic / etc.
Returns AIMessage with content
ParserOutput transformation
StrOutputParser → plain string
JsonOutputParser → dict
PydanticOutputParser → typed object
Idle

LangChain Architecture Layers

From your application code to the LLM provider APIs

Your Application
Chains, agents, RAG pipelines
LCEL Runtime
Pipe operator, invoke/stream/batch
Core Components
Prompts, parsers, retrievers, tools
Provider Packages
langchain-openai, langchain-anthropic, etc.
LLM APIs
OpenAI, Anthropic, Google, Ollama
Idle

Every LangChain component implements the Runnable interface. This means any component can be swapped, composed, or parallelized without changing the rest of the chain. The LCEL runtime handles streaming, batching, and error propagation automatically.


These patterns cover the most common LangChain use cases beyond the basic tutorial builds: streaming, model switching, and parallel execution.

For real-time UX, stream tokens as the LLM generates them:

chain = prompt | llm | parser
for chunk in chain.stream({"topic": "neural networks"}):
print(chunk, end="", flush=True)

Every component in the chain supports streaming. The prompt renders instantly, the LLM streams tokens, and the parser yields chunks as they arrive.

LangChain’s unified interface makes model switching trivial:

from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_google_genai import ChatGoogleGenerativeAI
# Same chain, different models
chain_gpt = prompt | ChatOpenAI(model="gpt-4o") | parser
chain_claude = prompt | ChatAnthropic(model="claude-sonnet-4-20250514") | parser
chain_gemini = prompt | ChatGoogleGenerativeAI(model="gemini-2.0-flash") | parser
# Same invoke interface
result = chain_gpt.invoke({"topic": "transformers"})

Run multiple chains simultaneously:

from langchain_core.runnables import RunnableParallel
parallel = RunnableParallel(
summary=prompt_summary | llm | parser,
keywords=prompt_keywords | llm | parser,
sentiment=prompt_sentiment | llm | parser
)
# All three run concurrently
results = parallel.invoke({"text": "The product launch was a massive success..."})
print(results["summary"])
print(results["keywords"])
print(results["sentiment"])

LangChain’s abstraction layer accelerates prototyping but introduces debugging overhead, rapid API churn, and higher complexity than raw SDK approaches for simple use cases.

LangChain vs Raw SDK vs Pydantic AI

LangChain
Batteries-included framework with 700+ integrations
  • Composable chains with LCEL pipe syntax
  • Pre-built RAG, agents, and retriever components
  • Unified interface across all LLM providers
  • Steep learning curve — many abstractions to learn
  • Debugging through abstraction layers is painful
  • Rapid API changes — code breaks across versions
VS
Raw SDK / Pydantic AI
Minimal abstractions, maximum control
  • Full control — no hidden behavior
  • Easier to debug — you wrote all the code
  • Stable APIs — fewer breaking changes
  • No pre-built components — build everything yourself
  • Provider switching requires code changes (raw SDK)
  • No streaming/batching infrastructure (raw SDK)
Verdict: Use LangChain when you need pre-built components and rapid prototyping. Use raw SDKs or Pydantic AI when you need full control, debuggability, and simpler code.
Use case
Building LLM-powered applications in Python

Other mistakes to avoid:

  • Over-abstracting simple tasks — If you just need to call GPT-4o with a prompt, use the OpenAI SDK directly. LangChain adds value when you need composition, retrieval, or tool calling. Don’t import a framework for a single API call.
  • Ignoring token costs in RAG — Retrieving 10 chunks and stuffing them all into the prompt can cost 5-10x more than retrieving 2-3 focused chunks. Always set search_kwargs={"k": 3} and measure cost per query.
  • Not pinning versions — LangChain releases frequently and sometimes introduces breaking changes. Pin your langchain and langchain-core versions in requirements.txt.
  • Forgetting async — In web apps (FastAPI, Django), use await chain.ainvoke() instead of chain.invoke(). The sync version blocks the event loop.

These four questions cover the LangChain concepts that consistently come up in GenAI engineering interviews, from LCEL mechanics to framework selection trade-offs.

Q1: “What is LCEL and why does LangChain use it?”

Section titled “Q1: “What is LCEL and why does LangChain use it?””

What they’re testing: Do you understand the current API, or are you stuck on the legacy chain API?

Strong answer: “LCEL is LangChain Expression Language — a declarative way to compose components using the pipe operator. Every component implements the Runnable interface with invoke, stream, and batch methods. The pipe operator connects them: prompt | llm | parser. It replaced the old LLMChain/SequentialChain API because it’s more composable and supports streaming natively.”

Weak answer: “LCEL is LangChain’s way of building chains.” (Too vague — doesn’t show understanding)

Q2: “Walk me through how you’d build a RAG pipeline with LangChain.”

Section titled “Q2: “Walk me through how you’d build a RAG pipeline with LangChain.””

What they’re testing: Can you build the most common LLM application pattern?

Strong answer: “First, I’d chunk and embed documents into a vector store like Chroma or Pinecone. Then create a retriever with vectorstore.as_retriever(). The RAG chain combines the retriever output (formatted as context) with the user question in a prompt template, passes it to the LLM, and parses the response. The key decisions are chunk size, embedding model, number of retrieved chunks, and whether to add a reranking step.”

Q3: “When would you not use LangChain?”

Section titled “Q3: “When would you not use LangChain?””

What they’re testing: Critical thinking — can you identify when the framework hurts more than it helps?

Strong answer: “I’d skip LangChain for simple single-model API calls — the raw SDK is cleaner. I’d also avoid it when I need fine-grained control over the request/response cycle, like custom retry logic or streaming protocols. And for teams that value type safety, Pydantic AI gives better structured outputs with less complexity.”

Q4: “What’s the difference between LangChain and LangGraph?”

Section titled “Q4: “What’s the difference between LangChain and LangGraph?””

Strong answer: “LangChain is for linear pipelines — data flows in one direction. LangGraph is for stateful workflows with cycles — the agent can loop back, retry, and maintain state across process restarts. Most production systems use LangChain components inside LangGraph nodes.”


At scale, LangChain production deployments follow these patterns:

Version management: Teams pin langchain==0.3.x and langchain-core==0.3.x explicitly. LangChain’s release cadence is fast — weekly updates with occasional breaking changes. Unpinned dependencies in production cause mysterious failures.

Observability: Integrate LangSmith for tracing. Set LANGCHAIN_TRACING_V2=true and every chain execution becomes a traceable span. Without this, debugging a RAG pipeline that returns wrong answers is nearly impossible.

RAG optimization: Production RAG pipelines rarely use the basic retriever. Teams add reranking (Cohere Rerank, cross-encoder models), hybrid search (combining vector + keyword), and chunk strategies optimized for their content. The initial retriever is just the starting point.

Cost monitoring: Log token usage per request. A poorly configured RAG chain that retrieves too many chunks or uses a system prompt that’s too long can cost 10-50x more than an optimized one. The langchain callbacks system makes it easy to capture token counts.

LangGraph migration: Teams that start with LangChain chains often migrate agent workflows to LangGraph when they need cycles, persistence, or human-in-the-loop. The migration path is smooth because LangChain components work inside LangGraph nodes.


  • LangChain standardizes LLM development — one interface across OpenAI, Anthropic, Google, and local models
  • LCEL (pipe syntax) is the current API — ignore tutorials using LLMChain or SequentialChain
  • Three core patterns: simple chains (prompt | llm | parser), RAG pipelines (retriever + LLM), and tool-calling agents
  • Every component is a Runnable — supports .invoke(), .stream(), and .batch() uniformly
  • RAG is the killer use case — LangChain’s retriever + vector store integrations make it the fastest path to a working RAG prototype
  • Pin your versions — LangChain releases frequently; unpinned deps cause production surprises
  • Know when not to use it — for simple API calls, the raw SDK is simpler. For type-safe agents, consider Pydantic AI

Frequently Asked Questions

What is LangChain and what is it used for?

LangChain is the most popular Python framework for building LLM-powered applications. It provides composable abstractions for chains (sequential operations), RAG pipelines (retrieval-augmented generation), and tool-calling agents. Its core abstraction is LCEL (LangChain Expression Language) which lets you compose prompts, models, and output parsers using the pipe operator: prompt | llm | parser.

What is LCEL in LangChain?

LCEL (LangChain Expression Language) is LangChain's declarative syntax for composing pipelines. You chain components together using the pipe operator (|), connecting prompts, language models, output parsers, retrievers, and other components into a runnable pipeline. LCEL handles streaming, batching, and async execution automatically, making it the recommended way to build LangChain applications since version 0.3.

How do I build a RAG pipeline with LangChain?

Install LangChain with a vector store (like ChromaDB). Create embeddings from your documents using OpenAIEmbeddings, store them in the vector database, and create a retriever. Build a RAG chain that takes a user question, retrieves relevant document chunks via similarity search, formats them into a prompt with the question, sends it to the LLM, and parses the grounded response.

How does tool calling work in LangChain?

Define Python functions as tools using the @tool decorator with a docstring describing when to use the tool. Bind tools to the LLM with llm.bind_tools(tools). When invoked, the model decides whether to call a tool based on the user query. If it does, it returns a structured tool call with the function name and arguments. You execute the function and return the result for the model to incorporate into its response.

What are the five core components of LangChain?

The five core components are Chat Models (wrappers around LLM APIs like OpenAI and Anthropic), Prompt Templates (reusable templates with variables), Output Parsers (transform LLM text into structured data like strings, dicts, or Pydantic objects), Retrievers (fetch relevant documents from vector stores for RAG), and Tools (Python functions the LLM can call). Every component implements the Runnable interface with invoke, stream, and batch methods.

How do I stream responses in LangChain?

Use the .stream() method on any LCEL chain instead of .invoke(). Every component in the chain supports streaming: the prompt renders instantly, the LLM streams tokens as they are generated, and the parser yields chunks as they arrive. This enables real-time UX where users see the response being generated token by token.

Can I switch between different LLM providers in LangChain?

Yes. LangChain provides a unified interface across all LLM providers, so switching models is trivial. You change one import and model name while keeping the same chain structure. For example, you can swap ChatOpenAI for ChatAnthropic or ChatGoogleGenerativeAI and use the exact same invoke and stream interface without rewriting your pipeline.

What is RunnableParallel in LangChain?

RunnableParallel lets you run multiple LCEL chains simultaneously and collect their results. You pass a dictionary of named chains, and LangChain executes them all concurrently. This is useful when you need to generate a summary, extract keywords, and analyze sentiment from the same input in parallel rather than sequentially.

Should I use LangChain or the raw OpenAI SDK?

Use the raw OpenAI SDK for simple single-model API calls where LangChain adds unnecessary complexity. Use LangChain when you need composition (chaining multiple steps), retrieval (RAG pipelines), tool calling, or provider switching. LangChain adds value through its pre-built components and unified interface, but for a single API call, the raw SDK is simpler and more direct.

What is the difference between LangChain and LangGraph?

LangChain is for linear pipelines where data flows in one direction through the chain. LangGraph is for stateful workflows with cycles, where the agent can loop back, retry, and maintain state across process restarts. Most production systems use LangChain components inside LangGraph nodes when they need agent orchestration with persistence and conditional routing.

Last updated: February 2026 | LangChain v0.3+ / Python 3.10+