LangChain Tutorial: Build Your First LLM App (2026)
LangChain is the most popular framework for building LLM-powered applications in Python — and this tutorial gets you from zero to a working app in 20 minutes. You’ll build three things: a simple chain, a RAG pipeline, and a tool-calling agent. By the end, you’ll understand LangChain’s core abstractions and when to use each one.
Who this is for:
- Junior engineers: You’re new to LLM development and want a structured path through LangChain’s components
- Senior engineers: You want a quick refresher on LangChain’s current API (LCEL) before deciding if it fits your project
Real-World Problem Context
Section titled “Real-World Problem Context”You need to build an LLM-powered feature. The raw OpenAI SDK works for simple chat, but production apps need more:
| What You Need | Raw SDK | With LangChain |
|---|---|---|
| Combine a prompt template + LLM + output parser | Manual string formatting + API call + JSON parsing | `prompt |
| Search your docs and answer questions (RAG) | Build embedding pipeline, vector search, prompt injection from scratch | create_retrieval_chain() — 5 lines |
| Let the LLM call external tools | Parse tool_calls JSON, match function names, handle results manually | llm.bind_tools([...]) — auto-dispatch |
| Swap between GPT-4o, Claude, and Gemini | Different SDKs, different response formats | Change one import, same interface |
| Stream responses token by token | Custom SSE handling per provider | .stream() — unified across all models |
Over 40 million monthly downloads make LangChain the most-used LLM framework. Even if you choose a different tool for production, understanding LangChain’s abstractions is essential for GenAI interviews.
The value isn’t magic — it’s standardization. LangChain gives you a common interface across LLM providers, a composable chain syntax, and pre-built components for common patterns like RAG.
LangChain Tutorial: Core Concepts
Section titled “LangChain Tutorial: Core Concepts”Think of LangChain like UNIX pipes. In UNIX, you chain commands: cat file | grep pattern | sort. Each command takes input, transforms it, and passes output to the next.
LangChain works the same way. You chain components with the | operator:
prompt | llm | parserEach component in the chain implements the Runnable interface — meaning it has .invoke(), .stream(), and .batch() methods. This is called LCEL (LangChain Expression Language).
The Five Core Components
Section titled “The Five Core Components”-
Chat Models — Wrappers around LLM APIs (OpenAI, Anthropic, Google). They take messages in, return messages out. All implement the same interface.
-
Prompt Templates — Reusable templates with variables.
ChatPromptTemplate.from_messages([("system", "..."), ("user", "{input}")])produces a formatted prompt. -
Output Parsers — Transform LLM text output into structured data.
StrOutputParsergives you a plain string.JsonOutputParsergives you a dict.PydanticOutputParsergives you a validated object. -
Retrievers — Fetch relevant documents from a vector store, database, or API. Used in RAG pipelines to give the LLM context.
-
Tools — Python functions the LLM can call. You define the function, LangChain generates the schema, and the LLM decides when to call it.
Step-by-Step: Three Builds
Section titled “Step-by-Step: Three Builds”Each of the three builds introduces a progressively more powerful LangChain pattern: a simple chain, a RAG pipeline, and a tool-calling agent.
Build 1: A Simple Chain (5 minutes)
Section titled “Build 1: A Simple Chain (5 minutes)”Install LangChain and a model provider:
pip install langchain langchain-openaifrom langchain_openai import ChatOpenAIfrom langchain_core.prompts import ChatPromptTemplatefrom langchain_core.output_parsers import StrOutputParser
# 1. Create the modelllm = ChatOpenAI(model="gpt-4o")
# 2. Create a prompt templateprompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful assistant that explains tech concepts simply."), ("user", "Explain {topic} in 3 sentences for a beginner.")])
# 3. Create an output parserparser = StrOutputParser()
# 4. Chain them together with LCELchain = prompt | llm | parser
# 5. Run itresult = chain.invoke({"topic": "vector databases"})print(result)That’s it. The | operator connects the prompt → model → parser into a single runnable chain. Call .invoke() with your variables, get your result.
Build 2: A RAG Pipeline (10 minutes)
Section titled “Build 2: A RAG Pipeline (10 minutes)”RAG (Retrieval-Augmented Generation) lets the LLM answer questions about your documents:
pip install langchain-openai chromadb langchain-chromafrom langchain_openai import ChatOpenAI, OpenAIEmbeddingsfrom langchain_chroma import Chromafrom langchain_core.prompts import ChatPromptTemplatefrom langchain_core.output_parsers import StrOutputParserfrom langchain_core.runnables import RunnablePassthrough
# 1. Create embeddings and vector storeembeddings = OpenAIEmbeddings(model="text-embedding-3-small")vectorstore = Chroma(embedding_function=embeddings)
# 2. Add your documentsdocs = [ "Our API rate limit is 100 requests per minute per key.", "Enterprise plans support up to 10,000 requests per minute.", "Rate limit errors return HTTP 429 with a Retry-After header.",]vectorstore.add_texts(docs)
# 3. Create a retrieverretriever = vectorstore.as_retriever(search_kwargs={"k": 2})
# 4. Build the RAG chainprompt = ChatPromptTemplate.from_messages([ ("system", "Answer based only on this context:\n{context}"), ("user", "{question}")])
def format_docs(docs): return "\n".join(doc.page_content for doc in docs)
rag_chain = ( {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | ChatOpenAI(model="gpt-4o") | StrOutputParser())
# 5. Ask a questionanswer = rag_chain.invoke("What happens when I hit the rate limit?")print(answer)The retriever finds the 2 most relevant docs, the prompt injects them as context, and the LLM answers using only that context. This is the foundational pattern behind every AI-powered search and Q&A feature.
Build 3: A Tool-Calling Agent (10 minutes)
Section titled “Build 3: A Tool-Calling Agent (10 minutes)”Agents let the LLM decide which tools to call based on the user’s question:
from langchain_openai import ChatOpenAIfrom langchain_core.tools import toolfrom langchain_core.messages import HumanMessage
# 1. Define tools@tooldef get_weather(city: str) -> str: """Get the current weather for a city.""" # Replace with a real API call in production weather_data = {"London": "12C, cloudy", "Tokyo": "22C, sunny", "NYC": "8C, rain"} return weather_data.get(city, f"Weather data not available for {city}")
@tooldef calculate(expression: str) -> str: """Evaluate a mathematical expression.""" return str(eval(expression)) # Use a safe evaluator in production
# 2. Bind tools to the modelllm = ChatOpenAI(model="gpt-4o")llm_with_tools = llm.bind_tools([get_weather, calculate])
# 3. Call the model — it decides whether to use toolsresponse = llm_with_tools.invoke([ HumanMessage(content="What's the weather in Tokyo?")])
# 4. Check if it wants to call a toolif response.tool_calls: for tc in response.tool_calls: print(f"Tool: {tc['name']}, Args: {tc['args']}") # Execute the tool tool_fn = {"get_weather": get_weather, "calculate": calculate}[tc["name"]] result = tool_fn.invoke(tc["args"]) print(f"Result: {result}")The LLM sees the tool schemas, decides get_weather is the right tool, and returns a structured tool_calls list. You execute the tools and can feed results back for a full ReAct loop.
LangChain Architecture and Design
Section titled “LangChain Architecture and Design”LCEL chains execute by passing data through each component in sequence, with the LCEL runtime managing streaming, batching, and error propagation automatically.
📊 How LCEL Chains Execute
Section titled “📊 How LCEL Chains Execute”LangChain LCEL — Chain Execution Flow
Data flows left to right through the pipe operator
📊 The LangChain Component Stack
Section titled “📊 The LangChain Component Stack”LangChain Architecture Layers
From your application code to the LLM provider APIs
Every LangChain component implements the Runnable interface. This means any component can be swapped, composed, or parallelized without changing the rest of the chain. The LCEL runtime handles streaming, batching, and error propagation automatically.
LangChain Tutorial Code Examples
Section titled “LangChain Tutorial Code Examples”These patterns cover the most common LangChain use cases beyond the basic tutorial builds: streaming, model switching, and parallel execution.
Streaming Responses
Section titled “Streaming Responses”For real-time UX, stream tokens as the LLM generates them:
chain = prompt | llm | parser
for chunk in chain.stream({"topic": "neural networks"}): print(chunk, end="", flush=True)Every component in the chain supports streaming. The prompt renders instantly, the LLM streams tokens, and the parser yields chunks as they arrive.
Switching Models
Section titled “Switching Models”LangChain’s unified interface makes model switching trivial:
from langchain_openai import ChatOpenAIfrom langchain_anthropic import ChatAnthropicfrom langchain_google_genai import ChatGoogleGenerativeAI
# Same chain, different modelschain_gpt = prompt | ChatOpenAI(model="gpt-4o") | parserchain_claude = prompt | ChatAnthropic(model="claude-sonnet-4-20250514") | parserchain_gemini = prompt | ChatGoogleGenerativeAI(model="gemini-2.0-flash") | parser
# Same invoke interfaceresult = chain_gpt.invoke({"topic": "transformers"})Parallel Execution with RunnableParallel
Section titled “Parallel Execution with RunnableParallel”Run multiple chains simultaneously:
from langchain_core.runnables import RunnableParallel
parallel = RunnableParallel( summary=prompt_summary | llm | parser, keywords=prompt_keywords | llm | parser, sentiment=prompt_sentiment | llm | parser)
# All three run concurrentlyresults = parallel.invoke({"text": "The product launch was a massive success..."})print(results["summary"])print(results["keywords"])print(results["sentiment"])Trade-offs and Pitfalls
Section titled “Trade-offs and Pitfalls”LangChain’s abstraction layer accelerates prototyping but introduces debugging overhead, rapid API churn, and higher complexity than raw SDK approaches for simple use cases.
LangChain vs Raw SDK vs Pydantic AI
Section titled “LangChain vs Raw SDK vs Pydantic AI”📊 Visual Explanation
Section titled “📊 Visual Explanation”LangChain vs Raw SDK vs Pydantic AI
- Composable chains with LCEL pipe syntax
- Pre-built RAG, agents, and retriever components
- Unified interface across all LLM providers
- Steep learning curve — many abstractions to learn
- Debugging through abstraction layers is painful
- Rapid API changes — code breaks across versions
- Full control — no hidden behavior
- Easier to debug — you wrote all the code
- Stable APIs — fewer breaking changes
- No pre-built components — build everything yourself
- Provider switching requires code changes (raw SDK)
- No streaming/batching infrastructure (raw SDK)
Where Engineers Get Burned
Section titled “Where Engineers Get Burned”Other mistakes to avoid:
- Over-abstracting simple tasks — If you just need to call GPT-4o with a prompt, use the OpenAI SDK directly. LangChain adds value when you need composition, retrieval, or tool calling. Don’t import a framework for a single API call.
- Ignoring token costs in RAG — Retrieving 10 chunks and stuffing them all into the prompt can cost 5-10x more than retrieving 2-3 focused chunks. Always set
search_kwargs={"k": 3}and measure cost per query. - Not pinning versions — LangChain releases frequently and sometimes introduces breaking changes. Pin your
langchainandlangchain-coreversions inrequirements.txt. - Forgetting async — In web apps (FastAPI, Django), use
await chain.ainvoke()instead ofchain.invoke(). The sync version blocks the event loop.
Interview Questions
Section titled “Interview Questions”These four questions cover the LangChain concepts that consistently come up in GenAI engineering interviews, from LCEL mechanics to framework selection trade-offs.
Q1: “What is LCEL and why does LangChain use it?”
Section titled “Q1: “What is LCEL and why does LangChain use it?””What they’re testing: Do you understand the current API, or are you stuck on the legacy chain API?
Strong answer: “LCEL is LangChain Expression Language — a declarative way to compose components using the pipe operator. Every component implements the Runnable interface with invoke, stream, and batch methods. The pipe operator connects them: prompt | llm | parser. It replaced the old LLMChain/SequentialChain API because it’s more composable and supports streaming natively.”
Weak answer: “LCEL is LangChain’s way of building chains.” (Too vague — doesn’t show understanding)
Q2: “Walk me through how you’d build a RAG pipeline with LangChain.”
Section titled “Q2: “Walk me through how you’d build a RAG pipeline with LangChain.””What they’re testing: Can you build the most common LLM application pattern?
Strong answer: “First, I’d chunk and embed documents into a vector store like Chroma or Pinecone. Then create a retriever with vectorstore.as_retriever(). The RAG chain combines the retriever output (formatted as context) with the user question in a prompt template, passes it to the LLM, and parses the response. The key decisions are chunk size, embedding model, number of retrieved chunks, and whether to add a reranking step.”
Q3: “When would you not use LangChain?”
Section titled “Q3: “When would you not use LangChain?””What they’re testing: Critical thinking — can you identify when the framework hurts more than it helps?
Strong answer: “I’d skip LangChain for simple single-model API calls — the raw SDK is cleaner. I’d also avoid it when I need fine-grained control over the request/response cycle, like custom retry logic or streaming protocols. And for teams that value type safety, Pydantic AI gives better structured outputs with less complexity.”
Q4: “What’s the difference between LangChain and LangGraph?”
Section titled “Q4: “What’s the difference between LangChain and LangGraph?””Strong answer: “LangChain is for linear pipelines — data flows in one direction. LangGraph is for stateful workflows with cycles — the agent can loop back, retry, and maintain state across process restarts. Most production systems use LangChain components inside LangGraph nodes.”
Production Deployment Tips
Section titled “Production Deployment Tips”At scale, LangChain production deployments follow these patterns:
Version management: Teams pin langchain==0.3.x and langchain-core==0.3.x explicitly. LangChain’s release cadence is fast — weekly updates with occasional breaking changes. Unpinned dependencies in production cause mysterious failures.
Observability: Integrate LangSmith for tracing. Set LANGCHAIN_TRACING_V2=true and every chain execution becomes a traceable span. Without this, debugging a RAG pipeline that returns wrong answers is nearly impossible.
RAG optimization: Production RAG pipelines rarely use the basic retriever. Teams add reranking (Cohere Rerank, cross-encoder models), hybrid search (combining vector + keyword), and chunk strategies optimized for their content. The initial retriever is just the starting point.
Cost monitoring: Log token usage per request. A poorly configured RAG chain that retrieves too many chunks or uses a system prompt that’s too long can cost 10-50x more than an optimized one. The langchain callbacks system makes it easy to capture token counts.
LangGraph migration: Teams that start with LangChain chains often migrate agent workflows to LangGraph when they need cycles, persistence, or human-in-the-loop. The migration path is smooth because LangChain components work inside LangGraph nodes.
Summary and Key Takeaways
Section titled “Summary and Key Takeaways”- LangChain standardizes LLM development — one interface across OpenAI, Anthropic, Google, and local models
- LCEL (pipe syntax) is the current API — ignore tutorials using LLMChain or SequentialChain
- Three core patterns: simple chains (
prompt | llm | parser), RAG pipelines (retriever + LLM), and tool-calling agents - Every component is a Runnable — supports
.invoke(),.stream(), and.batch()uniformly - RAG is the killer use case — LangChain’s retriever + vector store integrations make it the fastest path to a working RAG prototype
- Pin your versions — LangChain releases frequently; unpinned deps cause production surprises
- Know when not to use it — for simple API calls, the raw SDK is simpler. For type-safe agents, consider Pydantic AI
Related
Section titled “Related”- LangChain vs LangGraph — Key Differences — When to use chains vs state machines
- LangGraph Tutorial — Build stateful agents with LangGraph
- RAG Architecture — Deep dive into retrieval-augmented generation
- Agentic Frameworks Compared — LangChain vs CrewAI vs AutoGen
- LangSmith vs Langfuse — Observability for your LangChain apps
- Pydantic AI Tutorial — The type-safe alternative to LangChain
Frequently Asked Questions
What is LangChain and what is it used for?
LangChain is the most popular Python framework for building LLM-powered applications. It provides composable abstractions for chains (sequential operations), RAG pipelines (retrieval-augmented generation), and tool-calling agents. Its core abstraction is LCEL (LangChain Expression Language) which lets you compose prompts, models, and output parsers using the pipe operator: prompt | llm | parser.
What is LCEL in LangChain?
LCEL (LangChain Expression Language) is LangChain's declarative syntax for composing pipelines. You chain components together using the pipe operator (|), connecting prompts, language models, output parsers, retrievers, and other components into a runnable pipeline. LCEL handles streaming, batching, and async execution automatically, making it the recommended way to build LangChain applications since version 0.3.
How do I build a RAG pipeline with LangChain?
Install LangChain with a vector store (like ChromaDB). Create embeddings from your documents using OpenAIEmbeddings, store them in the vector database, and create a retriever. Build a RAG chain that takes a user question, retrieves relevant document chunks via similarity search, formats them into a prompt with the question, sends it to the LLM, and parses the grounded response.
How does tool calling work in LangChain?
Define Python functions as tools using the @tool decorator with a docstring describing when to use the tool. Bind tools to the LLM with llm.bind_tools(tools). When invoked, the model decides whether to call a tool based on the user query. If it does, it returns a structured tool call with the function name and arguments. You execute the function and return the result for the model to incorporate into its response.
What are the five core components of LangChain?
The five core components are Chat Models (wrappers around LLM APIs like OpenAI and Anthropic), Prompt Templates (reusable templates with variables), Output Parsers (transform LLM text into structured data like strings, dicts, or Pydantic objects), Retrievers (fetch relevant documents from vector stores for RAG), and Tools (Python functions the LLM can call). Every component implements the Runnable interface with invoke, stream, and batch methods.
How do I stream responses in LangChain?
Use the .stream() method on any LCEL chain instead of .invoke(). Every component in the chain supports streaming: the prompt renders instantly, the LLM streams tokens as they are generated, and the parser yields chunks as they arrive. This enables real-time UX where users see the response being generated token by token.
Can I switch between different LLM providers in LangChain?
Yes. LangChain provides a unified interface across all LLM providers, so switching models is trivial. You change one import and model name while keeping the same chain structure. For example, you can swap ChatOpenAI for ChatAnthropic or ChatGoogleGenerativeAI and use the exact same invoke and stream interface without rewriting your pipeline.
What is RunnableParallel in LangChain?
RunnableParallel lets you run multiple LCEL chains simultaneously and collect their results. You pass a dictionary of named chains, and LangChain executes them all concurrently. This is useful when you need to generate a summary, extract keywords, and analyze sentiment from the same input in parallel rather than sequentially.
Should I use LangChain or the raw OpenAI SDK?
Use the raw OpenAI SDK for simple single-model API calls where LangChain adds unnecessary complexity. Use LangChain when you need composition (chaining multiple steps), retrieval (RAG pipelines), tool calling, or provider switching. LangChain adds value through its pre-built components and unified interface, but for a single API call, the raw SDK is simpler and more direct.
What is the difference between LangChain and LangGraph?
LangChain is for linear pipelines where data flows in one direction through the chain. LangGraph is for stateful workflows with cycles, where the agent can loop back, retry, and maintain state across process restarts. Most production systems use LangChain components inside LangGraph nodes when they need agent orchestration with persistence and conditional routing.
Last updated: February 2026 | LangChain v0.3+ / Python 3.10+