LangChain Tutorial: Build Your First LLM App (2026)

LangChain is the most popular framework for building LLM-powered applications in Python — and this tutorial gets you from zero to a working app in 20 minutes. You’ll build three things: a simple chain, a RAG pipeline, and a tool-calling agent. By the end, you’ll understand LangChain’s core abstractions and when to use each one.

Who this is for:

Junior engineers: You’re new to LLM development and want a structured path through LangChain’s components
Senior engineers: You want a quick refresher on LangChain’s current API (LCEL) before deciding if it fits your project

Real-World Problem Context

You need to build an LLM-powered feature. The raw OpenAI SDK works for simple chat, but production apps need more:

What You Need	Raw SDK	With LangChain
Combine a prompt template + LLM + output parser	Manual string formatting + API call + JSON parsing	`prompt
Search your docs and answer questions (RAG)	Build embedding pipeline, vector search, prompt injection from scratch	`create_retrieval_chain()` — 5 lines
Let the LLM call external tools	Parse tool_calls JSON, match function names, handle results manually	`llm.bind_tools([...])` — auto-dispatch
Swap between GPT-4o, Claude, and Gemini	Different SDKs, different response formats	Change one import, same interface
Stream responses token by token	Custom SSE handling per provider	`.stream()` — unified across all models

Over 40 million monthly downloads make LangChain the most-used LLM framework. Even if you choose a different tool for production, understanding LangChain’s abstractions is essential for GenAI interviews.

The value isn’t magic — it’s standardization. LangChain gives you a common interface across LLM providers, a composable chain syntax, and pre-built components for common patterns like RAG.

LangChain Tutorial: Core Concepts

Think of LangChain like UNIX pipes. In UNIX, you chain commands: cat file | grep pattern | sort. Each command takes input, transforms it, and passes output to the next.

LangChain works the same way. You chain components with the | operator:

prompt | llm | parser

Each component in the chain implements the Runnable interface — meaning it has .invoke(), .stream(), and .batch() methods. This is called LCEL (LangChain Expression Language).

The Five Core Components

Chat Models — Wrappers around LLM APIs (OpenAI, Anthropic, Google). They take messages in, return messages out. All implement the same interface.
Prompt Templates — Reusable templates with variables. ChatPromptTemplate.from_messages([("system", "..."), ("user", "{input}")]) produces a formatted prompt.
Output Parsers — Transform LLM text output into structured data. StrOutputParser gives you a plain string. JsonOutputParser gives you a dict. PydanticOutputParser gives you a validated object.
Retrievers — Fetch relevant documents from a vector store, database, or API. Used in RAG pipelines to give the LLM context.
Tools — Python functions the LLM can call. You define the function, LangChain generates the schema, and the LLM decides when to call it.

Step-by-Step: Three Builds

Each of the three builds introduces a progressively more powerful LangChain pattern: a simple chain, a RAG pipeline, and a tool-calling agent.

Build 1: A Simple Chain (5 minutes)

Install LangChain and a model provider:

pip install langchain langchain-openai

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# 1. Create the model
llm = ChatOpenAI(model="gpt-4o")

# 2. Create a prompt template
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant that explains tech concepts simply."),
    ("user", "Explain {topic} in 3 sentences for a beginner.")
])

# 3. Create an output parser
parser = StrOutputParser()

# 4. Chain them together with LCEL
chain = prompt | llm | parser

# 5. Run it
result = chain.invoke({"topic": "vector databases"})
print(result)

That’s it. The | operator connects the prompt → model → parser into a single runnable chain. Call .invoke() with your variables, get your result.

Build 2: A RAG Pipeline (10 minutes)

RAG (Retrieval-Augmented Generation) lets the LLM answer questions about your documents:

pip install langchain-openai chromadb langchain-chroma

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# 1. Create embeddings and vector store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma(embedding_function=embeddings)

# 2. Add your documents
docs = [
    "Our API rate limit is 100 requests per minute per key.",
    "Enterprise plans support up to 10,000 requests per minute.",
    "Rate limit errors return HTTP 429 with a Retry-After header.",
]
vectorstore.add_texts(docs)

# 3. Create a retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

# 4. Build the RAG chain
prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer based only on this context:\n{context}"),
    ("user", "{question}")
])

def format_docs(docs):
    return "\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | ChatOpenAI(model="gpt-4o")
    | StrOutputParser()
)

# 5. Ask a question
answer = rag_chain.invoke("What happens when I hit the rate limit?")
print(answer)

The retriever finds the 2 most relevant docs, the prompt injects them as context, and the LLM answers using only that context. This is the foundational pattern behind every AI-powered search and Q&A feature.

Build 3: A Tool-Calling Agent (10 minutes)

Agents let the LLM decide which tools to call based on the user’s question:

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain_core.messages import HumanMessage

# 1. Define tools
@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    # Replace with a real API call in production
    weather_data = {"London": "12C, cloudy", "Tokyo": "22C, sunny", "NYC": "8C, rain"}
    return weather_data.get(city, f"Weather data not available for {city}")

@tool
def calculate(expression: str) -> str:
    """Evaluate a mathematical expression."""
    return str(eval(expression))  # Use a safe evaluator in production

# 2. Bind tools to the model
llm = ChatOpenAI(model="gpt-4o")
llm_with_tools = llm.bind_tools([get_weather, calculate])

# 3. Call the model — it decides whether to use tools
response = llm_with_tools.invoke([
    HumanMessage(content="What's the weather in Tokyo?")
])

# 4. Check if it wants to call a tool
if response.tool_calls:
    for tc in response.tool_calls:
        print(f"Tool: {tc['name']}, Args: {tc['args']}")
        # Execute the tool
        tool_fn = {"get_weather": get_weather, "calculate": calculate}[tc["name"]]
        result = tool_fn.invoke(tc["args"])
        print(f"Result: {result}")

The LLM sees the tool schemas, decides get_weather is the right tool, and returns a structured tool_calls list. You execute the tools and can feed results back for a full ReAct loop.

LangChain Architecture and Design

LCEL chains execute by passing data through each component in sequence, with the LCEL runtime managing streaming, batching, and error propagation automatically.

📊 How LCEL Chains Execute

LangChain LCEL — Chain Execution Flow

Data flows left to right through the pipe operator

InputUser variables

Dict with template variables

e.g. {topic: 'RAG'}

PromptTemplate rendering

ChatPromptTemplate formats messages

Injects variables into system/user slots

ModelLLM API call

ChatOpenAI / ChatAnthropic / etc.

Returns AIMessage with content

ParserOutput transformation

StrOutputParser → plain string

JsonOutputParser → dict

PydanticOutputParser → typed object

Idle

📊 The LangChain Component Stack

LangChain Architecture Layers

From your application code to the LLM provider APIs

Your Application

Chains, agents, RAG pipelines

LCEL Runtime

Pipe operator, invoke/stream/batch

Core Components

Prompts, parsers, retrievers, tools

Provider Packages

langchain-openai, langchain-anthropic, etc.

LLM APIs

OpenAI, Anthropic, Google, Ollama

Idle

Every LangChain component implements the Runnable interface. This means any component can be swapped, composed, or parallelized without changing the rest of the chain. The LCEL runtime handles streaming, batching, and error propagation automatically.

LangChain Tutorial Code Examples

These patterns cover the most common LangChain use cases beyond the basic tutorial builds: streaming, model switching, and parallel execution.

Streaming Responses

For real-time UX, stream tokens as the LLM generates them:

chain = prompt | llm | parser

for chunk in chain.stream({"topic": "neural networks"}):
    print(chunk, end="", flush=True)

Every component in the chain supports streaming. The prompt renders instantly, the LLM streams tokens, and the parser yields chunks as they arrive.

Switching Models

LangChain’s unified interface makes model switching trivial:

from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_google_genai import ChatGoogleGenerativeAI

# Same chain, different models
chain_gpt = prompt | ChatOpenAI(model="gpt-4o") | parser
chain_claude = prompt | ChatAnthropic(model="claude-sonnet-4-20250514") | parser
chain_gemini = prompt | ChatGoogleGenerativeAI(model="gemini-2.0-flash") | parser

# Same invoke interface
result = chain_gpt.invoke({"topic": "transformers"})

Parallel Execution with RunnableParallel

Run multiple chains simultaneously:

from langchain_core.runnables import RunnableParallel

parallel = RunnableParallel(
    summary=prompt_summary | llm | parser,
    keywords=prompt_keywords | llm | parser,
    sentiment=prompt_sentiment | llm | parser
)

# All three run concurrently
results = parallel.invoke({"text": "The product launch was a massive success..."})
print(results["summary"])
print(results["keywords"])
print(results["sentiment"])

Trade-offs and Pitfalls

LangChain’s abstraction layer accelerates prototyping but introduces debugging overhead, rapid API churn, and higher complexity than raw SDK approaches for simple use cases.

LangChain vs Raw SDK vs Pydantic AI

📊 Visual Explanation

LangChain vs Raw SDK vs Pydantic AI

LangChain

Batteries-included framework with 700+ integrations

Composable chains with LCEL pipe syntax
Pre-built RAG, agents, and retriever components
Unified interface across all LLM providers
Steep learning curve — many abstractions to learn
Debugging through abstraction layers is painful
Rapid API changes — code breaks across versions

Raw SDK / Pydantic AI

Minimal abstractions, maximum control

Full control — no hidden behavior
Easier to debug — you wrote all the code
Stable APIs — fewer breaking changes
No pre-built components — build everything yourself
Provider switching requires code changes (raw SDK)
No streaming/batching infrastructure (raw SDK)

Verdict: Use LangChain when you need pre-built components and rapid prototyping. Use raw SDKs or Pydantic AI when you need full control, debuggability, and simpler code.

Use case

Building LLM-powered applications in Python

Where Engineers Get Burned

Other mistakes to avoid:

Over-abstracting simple tasks — If you just need to call GPT-4o with a prompt, use the OpenAI SDK directly. LangChain adds value when you need composition, retrieval, or tool calling. Don’t import a framework for a single API call.
Ignoring token costs in RAG — Retrieving 10 chunks and stuffing them all into the prompt can cost 5-10x more than retrieving 2-3 focused chunks. Always set search_kwargs={"k": 3} and measure cost per query.
Not pinning versions — LangChain releases frequently and sometimes introduces breaking changes. Pin your langchain and langchain-core versions in requirements.txt.
Forgetting async — In web apps (FastAPI, Django), use await chain.ainvoke() instead of chain.invoke(). The sync version blocks the event loop.

Interview Questions

These four questions cover the LangChain concepts that consistently come up in GenAI engineering interviews, from LCEL mechanics to framework selection trade-offs.

Q1: “What is LCEL and why does LangChain use it?”

What they’re testing: Do you understand the current API, or are you stuck on the legacy chain API?

Strong answer: “LCEL is LangChain Expression Language — a declarative way to compose components using the pipe operator. Every component implements the Runnable interface with invoke, stream, and batch methods. The pipe operator connects them: prompt | llm | parser. It replaced the old LLMChain/SequentialChain API because it’s more composable and supports streaming natively.”

Weak answer: “LCEL is LangChain’s way of building chains.” (Too vague — doesn’t show understanding)

Q2: “Walk me through how you’d build a RAG pipeline with LangChain.”

What they’re testing: Can you build the most common LLM application pattern?

Strong answer: “First, I’d chunk and embed documents into a vector store like Chroma or Pinecone. Then create a retriever with vectorstore.as_retriever(). The RAG chain combines the retriever output (formatted as context) with the user question in a prompt template, passes it to the LLM, and parses the response. The key decisions are chunk size, embedding model, number of retrieved chunks, and whether to add a reranking step.”

Q3: “When would you not use LangChain?”

What they’re testing: Critical thinking — can you identify when the framework hurts more than it helps?

Strong answer: “I’d skip LangChain for simple single-model API calls — the raw SDK is cleaner. I’d also avoid it when I need fine-grained control over the request/response cycle, like custom retry logic or streaming protocols. And for teams that value type safety, Pydantic AI gives better structured outputs with less complexity.”

Q4: “What’s the difference between LangChain and LangGraph?”

Strong answer: “LangChain is for linear pipelines — data flows in one direction. LangGraph is for stateful workflows with cycles — the agent can loop back, retry, and maintain state across process restarts. Most production systems use LangChain components inside LangGraph nodes.”

Production Deployment Tips

At scale, LangChain production deployments follow these patterns:

Version management: Teams pin langchain==0.3.x and langchain-core==0.3.x explicitly. LangChain’s release cadence is fast — weekly updates with occasional breaking changes. Unpinned dependencies in production cause mysterious failures.

Observability: Integrate LangSmith for tracing. Set LANGCHAIN_TRACING_V2=true and every chain execution becomes a traceable span. Without this, debugging a RAG pipeline that returns wrong answers is nearly impossible.

RAG optimization: Production RAG pipelines rarely use the basic retriever. Teams add reranking (Cohere Rerank, cross-encoder models), hybrid search (combining vector + keyword), and chunk strategies optimized for their content. The initial retriever is just the starting point.

Cost monitoring: Log token usage per request. A poorly configured RAG chain that retrieves too many chunks or uses a system prompt that’s too long can cost 10-50x more than an optimized one. The langchain callbacks system makes it easy to capture token counts.

LangGraph migration: Teams that start with LangChain chains often migrate agent workflows to LangGraph when they need cycles, persistence, or human-in-the-loop. The migration path is smooth because LangChain components work inside LangGraph nodes.

Summary and Key Takeaways

LangChain standardizes LLM development — one interface across OpenAI, Anthropic, Google, and local models
LCEL (pipe syntax) is the current API — ignore tutorials using LLMChain or SequentialChain
Three core patterns: simple chains (prompt | llm | parser), RAG pipelines (retriever + LLM), and tool-calling agents
Every component is a Runnable — supports .invoke(), .stream(), and .batch() uniformly
RAG is the killer use case — LangChain’s retriever + vector store integrations make it the fastest path to a working RAG prototype
Pin your versions — LangChain releases frequently; unpinned deps cause production surprises
Know when not to use it — for simple API calls, the raw SDK is simpler. For type-safe agents, consider Pydantic AI

LangChain vs LangGraph — Key Differences — When to use chains vs state machines
LangGraph Tutorial — Build stateful agents with LangGraph
RAG Architecture — Deep dive into retrieval-augmented generation
Agentic Frameworks Compared — LangChain vs CrewAI vs AutoGen
LangSmith vs Langfuse — Observability for your LangChain apps
Pydantic AI Tutorial — The type-safe alternative to LangChain

Frequently Asked Questions

What is LangChain and what is it used for?

LangChain is the most popular Python framework for building LLM-powered applications. It provides composable abstractions for chains (sequential operations), RAG pipelines (retrieval-augmented generation), and tool-calling agents. Its core abstraction is LCEL (LangChain Expression Language) which lets you compose prompts, models, and output parsers using the pipe operator: prompt | llm | parser.

What is LCEL in LangChain?

LCEL (LangChain Expression Language) is LangChain's declarative syntax for composing pipelines. You chain components together using the pipe operator (|), connecting prompts, language models, output parsers, retrievers, and other components into a runnable pipeline. LCEL handles streaming, batching, and async execution automatically, making it the recommended way to build LangChain applications since version 0.3.

How do I build a RAG pipeline with LangChain?

Install LangChain with a vector store (like ChromaDB). Create embeddings from your documents using OpenAIEmbeddings, store them in the vector database, and create a retriever. Build a RAG chain that takes a user question, retrieves relevant document chunks via similarity search, formats them into a prompt with the question, sends it to the LLM, and parses the grounded response.

How does tool calling work in LangChain?

Define Python functions as tools using the @tool decorator with a docstring describing when to use the tool. Bind tools to the LLM with llm.bind_tools(tools). When invoked, the model decides whether to call a tool based on the user query. If it does, it returns a structured tool call with the function name and arguments. You execute the function and return the result for the model to incorporate into its response.

What are the five core components of LangChain?

The five core components are Chat Models (wrappers around LLM APIs like OpenAI and Anthropic), Prompt Templates (reusable templates with variables), Output Parsers (transform LLM text into structured data like strings, dicts, or Pydantic objects), Retrievers (fetch relevant documents from vector stores for RAG), and Tools (Python functions the LLM can call). Every component implements the Runnable interface with invoke, stream, and batch methods.

How do I stream responses in LangChain?

Use the .stream() method on any LCEL chain instead of .invoke(). Every component in the chain supports streaming: the prompt renders instantly, the LLM streams tokens as they are generated, and the parser yields chunks as they arrive. This enables real-time UX where users see the response being generated token by token.

Can I switch between different LLM providers in LangChain?

Yes. LangChain provides a unified interface across all LLM providers, so switching models is trivial. You change one import and model name while keeping the same chain structure. For example, you can swap ChatOpenAI for ChatAnthropic or ChatGoogleGenerativeAI and use the exact same invoke and stream interface without rewriting your pipeline.

What is RunnableParallel in LangChain?

RunnableParallel lets you run multiple LCEL chains simultaneously and collect their results. You pass a dictionary of named chains, and LangChain executes them all concurrently. This is useful when you need to generate a summary, extract keywords, and analyze sentiment from the same input in parallel rather than sequentially.

Should I use LangChain or the raw OpenAI SDK?

Use the raw OpenAI SDK for simple single-model API calls where LangChain adds unnecessary complexity. Use LangChain when you need composition (chaining multiple steps), retrieval (RAG pipelines), tool calling, or provider switching. LangChain adds value through its pre-built components and unified interface, but for a single API call, the raw SDK is simpler and more direct.

What is the difference between LangChain and LangGraph?

LangChain is for linear pipelines where data flows in one direction through the chain. LangGraph is for stateful workflows with cycles, where the agent can loop back, retry, and maintain state across process restarts. Most production systems use LangChain components inside LangGraph nodes when they need agent orchestration with persistence and conditional routing.

Last updated: February 2026 | LangChain v0.3+ / Python 3.10+

LangChain Tutorial: Build Your First LLM App (2026)

Real-World Problem Context

LangChain Tutorial: Core Concepts

The Five Core Components

Step-by-Step: Three Builds

Build 1: A Simple Chain (5 minutes)

Build 2: A RAG Pipeline (10 minutes)

Build 3: A Tool-Calling Agent (10 minutes)

LangChain Architecture and Design

📊 How LCEL Chains Execute

📊 The LangChain Component Stack

LangChain Tutorial Code Examples

Streaming Responses

Switching Models

Parallel Execution with RunnableParallel

Trade-offs and Pitfalls

LangChain vs Raw SDK vs Pydantic AI

📊 Visual Explanation

Where Engineers Get Burned

Interview Questions

Q1: “What is LCEL and why does LangChain use it?”

Q2: “Walk me through how you’d build a RAG pipeline with LangChain.”

Q3: “When would you not use LangChain?”

Q4: “What’s the difference between LangChain and LangGraph?”

Production Deployment Tips

Summary and Key Takeaways

Related

Frequently Asked Questions