OpenAI Codex Guide — AI Coding Agent in ChatGPT (2026)
OpenAI Codex (2026) is an autonomous cloud-based coding agent built directly into ChatGPT. You describe a task in plain English, Codex clones your GitHub repository into a sandboxed environment, writes code across multiple files, runs your test suite, iterates on failures, and opens a pull request — all without you touching the keyboard again. This guide covers how Codex works, when to use it over Claude Code or Cursor, and how to write tasks that produce high-quality PRs.
Who this is for:
- GenAI engineers evaluating whether Codex fits their development workflow alongside or instead of Claude Code
- Team leads exploring asynchronous, cloud-based code generation for batch tasks and routine maintenance
- Developers already using ChatGPT who want to understand what Codex unlocks beyond conversation-level code generation
The Evolution: 2021 Codex vs 2026 Codex Agent
Section titled “The Evolution: 2021 Codex vs 2026 Codex Agent”The name “Codex” refers to two completely different things separated by five years of AI development. Understanding the distinction matters if you’re reading older documentation or job postings that reference “Codex.”
The 2021 Codex Model
Section titled “The 2021 Codex Model”The original Codex was a code-specialized language model released in August 2021 as part of the GitHub Copilot beta. It was based on GPT-3, fine-tuned on publicly available GitHub code. You called it via API — you sent a prompt, it returned a code completion. That was the entire interaction.
The 2021 Codex model was deprecated in March 2023 as GPT-3.5 and GPT-4 surpassed it on coding benchmarks. It is no longer available.
The 2026 Codex Agent
Section titled “The 2026 Codex Agent”The 2026 Codex is architecturally unrelated to the 2021 model. It is an agentic system powered by codex-1, a model derived from the o3 reasoning family and fine-tuned specifically for software engineering tasks — writing clean code, following repository conventions, running commands, and iterating on test failures.
What changed is not just the underlying model. The entire interaction pattern changed:
| Dimension | 2021 Codex (model) | 2026 Codex (agent) |
|---|---|---|
| Interaction | API completion — prompt in, code out | Autonomous agent — task in, PR out |
| Context | Single prompt window | Full repository clone |
| Execution | Stateless — model only | Stateful — runs commands, installs deps, executes tests |
| Output | Code text | Pull request with diffs, test results, and explanation |
| Pricing | Pay-per-token API | Included in ChatGPT Pro / Team / Enterprise |
| Duration | Milliseconds | Minutes to hours (async) |
The 2026 Codex is better understood as OpenAI’s answer to Claude Code — not a successor to the 2021 API model.
Real-World Problem Context
Section titled “Real-World Problem Context”Every engineering team has a backlog of tasks that are clearly scoped but time-consuming to execute: updating dependencies, adding test coverage to legacy modules, migrating from one library version to another, fixing linting violations across a codebase. These tasks are well-defined enough that a skilled developer could complete them in a few hours — but skilled developers have better uses for their time.
This is the problem Codex is designed to solve. It is optimized for asynchronous, batch-style coding work where you describe the goal, step away, and review the result later.
When Codex Is the Right Tool
Section titled “When Codex Is the Right Tool”- Dependency upgrades — “Upgrade all lodash calls to use the native ES2023 equivalents and run the test suite”
- Test coverage expansion — “Add unit tests for every public function in src/lib/ that currently has <50% coverage”
- Code style migrations — “Convert all callback-style async functions in this codebase to async/await”
- Documentation generation — “Add JSDoc comments to all exported functions in src/api/”
- Bug fixes from issue descriptions — Paste a GitHub issue, Codex reads the code and opens a PR
When Codex Is the Wrong Tool
Section titled “When Codex Is the Wrong Tool”- Interactive debugging — If you need to step through code with breakpoints, use your IDE
- Exploratory prototyping — When you’re not sure what you want yet, the back-and-forth of Cursor or Claude Code is more effective
- Local-only codebases — Codex requires GitHub access; it cannot operate on purely local repos
- Real-time pair programming — There is no inline autocomplete; it’s a task-oriented agent, not a keystroke assistant
How Codex Works: Sandboxed Execution and the Async Task Model
Section titled “How Codex Works: Sandboxed Execution and the Async Task Model”Understanding Codex’s architecture explains both its strengths and its limitations.
The Sandboxed Execution Environment
Section titled “The Sandboxed Execution Environment”When you give Codex a task, it provisions a fresh microVM — an isolated cloud environment with no access to the internet or external services. Inside this sandbox:
- Your GitHub repository is cloned at the specified branch or commit
- Codex reads the repository structure and relevant files to build context
- If an AGENTS.md file exists at the repo root, Codex reads it as its operational instructions — analogous to Claude Code’s CLAUDE.md
- Code is written, tests are run, and failures are iterated on — all inside the sandbox
- The sandbox is destroyed when the task completes
The network-isolated sandbox is a deliberate security design. Codex cannot exfiltrate your code, call external APIs without explicit configuration, or introduce supply chain attacks through unexpected network calls. This makes it substantially safer to grant GitHub write access than a general-purpose agent with internet access.
The Asynchronous Task Model
Section titled “The Asynchronous Task Model”Unlike Claude Code, which runs interactively in your terminal, Codex is explicitly asynchronous. You submit a task and Codex works independently in the background — for minutes or hours depending on complexity. This means:
- You can queue multiple tasks simultaneously and review all PRs at once
- You do not need to babysit the agent or approve intermediate steps
- Tasks run even when your laptop is closed
- Complex multi-step tasks can run to completion without hitting session timeouts
The tradeoff is that you lose the ability to redirect mid-task. If Codex goes down the wrong path, you find out when you review the PR — not in real-time. This makes clear, specific task descriptions even more critical than in interactive agents.
GitHub Integration
Section titled “GitHub Integration”Codex integrates with GitHub at the PR level. After completing a task, it creates a branch, pushes its changes, and opens a pull request with a description of what it did, why, and what tests it ran. All of your standard code review, CI/CD, and branch protection rules apply — Codex operates as a contributor to your repository, not an override of your workflow.
Codex Workflow: Task to Pull Request
Section titled “Codex Workflow: Task to Pull Request”Codex handles the entire path from a plain-English task description to a reviewed, testable pull request — without human hand-holding at each step.
Visual Explanation
Section titled “Visual Explanation”📊 Visual Explanation
Section titled “📊 Visual Explanation”OpenAI Codex: Task to Pull Request Workflow
Codex provisions a sandboxed VM, clones your repo, executes the task autonomously, and delivers a reviewed PR — all asynchronously.
Key Features
Section titled “Key Features”Codex’s value comes from five capabilities that work together to turn a natural language description into a reviewed, tested pull request.
Multi-File Editing with Full Repository Context
Section titled “Multi-File Editing with Full Repository Context”Codex does not operate on a single file or a limited context window of recently opened files. It clones the entire repository and reads the files relevant to your task. When you ask it to refactor an authentication module, it reads the module, its tests, the files that import it, and the relevant documentation before writing a single line.
This full-repository context is what separates Codex from simpler AI code generation. A model generating code in isolation may write something that works syntactically but conflicts with your existing patterns. Codex understands what your project already does and writes code that fits.
Test Execution and Iteration
Section titled “Test Execution and Iteration”Codex does not just write code and stop. It runs your test suite. If tests fail, it reads the failure output, diagnoses the cause, and modifies the code to fix it — iterating until tests pass or it reaches a confidence threshold where it reports the remaining failures in the PR description.
This test-driven iteration loop is one of the strongest differentiators from simple code generation. You are not reviewing code that “looks right” — you are reviewing code that passed tests.
AGENTS.md: Your Instructions to Codex
Section titled “AGENTS.md: Your Instructions to Codex”The AGENTS.md file is Codex’s equivalent of Claude Code’s CLAUDE.md. Placed at your repository root, it tells Codex how your project works:
<ProductCTA product="study-tracker" heading="Want the AI engineer roadmap?" text="Master this topic with daily tasks, dedicated build projects, and weekly scorecards — part of a 12-week structured preparation system." />## Test commandsnpm run test # Unit testsnpm run test:e2e # End-to-end tests (skip for non-UI tasks)npm run lint # ESLint — must pass before any PR
## Code conventions- TypeScript strict mode — no `any` types- Prefer named exports over default exports- API routes follow REST conventions in src/api/- Database queries go through the service layer, never directly in route handlers
## What to avoid- Do not modify package-lock.json manually- Do not add dependencies without noting them in the PR description- Do not disable TypeScript errors with @ts-ignoreCodex reads AGENTS.md before starting work. Without it, Codex relies on inference — it reads your code to figure out conventions. With it, you get consistent, convention-respecting output from the first task.
Dependency Management
Section titled “Dependency Management”Codex can install and manage dependencies within its sandbox. If your task requires a new library, Codex can install it, use it, and document the addition in the PR description. The sandbox isolation means these installs never affect your local environment — they exist only within Codex’s working context until you merge the PR.
Parallel Task Execution
Section titled “Parallel Task Execution”Because Codex runs asynchronously in the cloud, you can queue multiple tasks simultaneously. Three different engineers can each submit a Codex task at the same time, each operating on its own independent sandbox. Reviews arrive as PRs — standard GitHub code review handles the rest.
Codex vs Claude Code
Section titled “Codex vs Claude Code”Both Codex and Claude Code are autonomous coding agents capable of multi-file editing, test execution, and codebase-wide changes. But they are built for different workflows.
📊 Visual Explanation
Section titled “📊 Visual Explanation”OpenAI Codex vs Claude Code
- Runs asynchronously — queue and review later
- Network-isolated sandbox — strong security model
- Parallel task execution — multiple PRs simultaneously
- Full GitHub PR integration — fits existing review workflows
- No real-time interaction — cannot redirect mid-task
- Requires GitHub access — no local-only codebase support
- Included in ChatGPT Pro ($200/mo) — no per-task billing
- Real-time interaction — redirect at any step
- Direct local filesystem and shell access
- CLAUDE.md persistent project memory across sessions
- Hooks for automated linting and testing on every edit
- Requires your terminal to stay open — not fully async
- Usage-based API billing — costs grow with session length
- Works on any local repo — no GitHub requirement
The Practical Choice
Section titled “The Practical Choice”For most engineering teams, the decision is not either/or. Codex and Claude Code occupy different positions in the workflow:
- Codex handles the repeatable, well-scoped work that fills your maintenance backlog — the tasks you’d assign to a junior developer if you had the bandwidth
- Claude Code handles the exploratory, iterative, or time-sensitive work where real-time interaction matters — feature development, debugging, one-off transformations
If you use ChatGPT Pro already, Codex comes included. If you’re primarily working in the terminal with a local codebase, Claude Code’s interactive model may fit better. If your team uses GitHub as the hub of your development workflow, Codex’s PR-based output integrates with zero friction.
Best Practices
Section titled “Best Practices”Task description quality is the single biggest lever on output quality — the practices below maximize how often Codex delivers a mergeable PR on the first attempt.
Writing Effective Codex Task Descriptions
Section titled “Writing Effective Codex Task Descriptions”The quality of Codex’s output is directly proportional to the quality of your task description. Vague tasks produce vague PRs. Specific tasks produce specific PRs.
Weak task description:
“Fix the auth module”
Strong task description:
“The JWT token refresh logic in src/lib/auth.ts does not handle clock skew. Add a 30-second leeway to the expiry check at line 47. Run
npm run test:unitto verify existing tests still pass, and add a test case that simulates a 15-second clock drift.”
The strong description specifies:
- What file to look at
- What behavior to fix
- What the correct behavior should be
- Which test command to run
- What new test to add
Configure AGENTS.md Before Your First Task
Section titled “Configure AGENTS.md Before Your First Task”The single highest-leverage action you can take before using Codex is writing an AGENTS.md file. Even a minimal file — your test command, your linting command, and three or four code conventions — dramatically improves output consistency.
A useful AGENTS.md template:
## Setup[How to install dependencies if needed]
## Test and lint commands[Exact commands Codex should run before completing a task]
## Code conventions[3-5 most important patterns in your codebase]
## Things to never do[Common mistakes or anti-patterns specific to your project]Size Tasks for the Right Scope
Section titled “Size Tasks for the Right Scope”Codex performs best on tasks that are:
- Clearly bounded — a well-defined start and end state
- Independently testable — existing or new tests can verify correctness
- Not dependent on external state — no manual database seeding, no third-party API calls required
Tasks that span multiple independent concerns (“refactor auth AND update the dashboard AND fix the signup form”) are better split into separate Codex submissions. Each PR is easier to review, and failures in one task don’t block the others.
Review Every PR
Section titled “Review Every PR”Codex is a tool, not a replacement for engineering judgment. Review every PR it opens:
- Verify that the changes match what you asked for
- Check that test coverage is meaningful, not just coverage-percentage padding
- Confirm that no unintended files were modified
- Look at the PR description — Codex documents its reasoning, which helps you catch misunderstandings before they reach main
Interview Preparation
Section titled “Interview Preparation”Agentic coding tools like Codex are increasingly present in senior engineering interviews. Here are the questions you’re likely to encounter and what strong answers look like.
Q: “What is OpenAI Codex and how is it different from GitHub Copilot?”
Strong answer: “GitHub Copilot is a keystroke-level assistant — it provides inline autocomplete as you type within your IDE. OpenAI Codex is an autonomous agent — you describe a complete task, it clones your GitHub repository into a sandboxed environment, writes code across multiple files, runs your tests, and opens a pull request. Copilot augments your editing; Codex replaces the editing session entirely for well-scoped tasks. They operate at completely different levels of abstraction.”
Q: “What are the security considerations when using an AI agent with GitHub write access?”
Strong answer: “The key controls are scope and isolation. Codex runs in a network-isolated sandbox — no internet access — which prevents supply chain attacks and code exfiltration during execution. GitHub branch protection rules still apply, so Codex cannot push directly to main without review. I’d also recommend granting Codex access to specific repositories rather than your entire GitHub organization, and auditing its PRs the same way you’d audit any external contributor’s work. For sensitive repositories, AGENTS.md can explicitly restrict which files Codex is allowed to modify.”
Q: “How would you integrate Codex into a team’s CI/CD workflow?”
Strong answer: “I’d treat Codex as a contributor, not an administrator. It opens PRs that go through the same review and CI gates as human contributions. The practical integration is defining AGENTS.md to specify test commands and code conventions, then using Codex for maintenance-category work: dependency updates, test coverage, code style migrations. I’d avoid giving Codex access to production infrastructure configuration or secrets — its strength is application code, not operations. Teams that use it well tend to create a ‘Codex queue’ of well-specified issues and batch-submit them at the start of the week.”
Q: “When would you choose Codex over Claude Code for a development task?”
Strong answer: “The decision comes down to interactivity and location. If I need real-time feedback — trying multiple approaches, redirecting based on intermediate results, or working with a local-only codebase — I’d use Claude Code in my terminal. If the task is well-specified, independently testable, and I want to work on something else while the agent runs, I’d use Codex. In practice, I’d use both: Claude Code for active feature development and exploratory work, Codex for the maintenance backlog — the tasks that accumulate but never get prioritized because they’re not urgent.”
Summary and Key Takeaways
Section titled “Summary and Key Takeaways”- OpenAI Codex (2026) is an autonomous coding agent, not the 2021 code completion API — it clones your repo, runs tests, and opens PRs
- The sandboxed execution model provides strong security: network-isolated, ephemeral, and scoped to your GitHub repository
- AGENTS.md is your primary configuration lever — write it before your first task to establish conventions and test commands
- Asynchronous execution enables parallel task queues but removes real-time interaction — task clarity is everything
- Codex and Claude Code are complementary: Codex for async batch work and PR-based workflows, Claude Code for interactive terminal-based development
- Available to ChatGPT Pro subscribers ($200/month) — no per-task billing, no separate API setup
Related
Section titled “Related”- AI Code Editors Comparison — Full landscape of Cursor, Claude Code, Copilot, Windsurf, and Codex
- Claude Code Guide — Terminal-based AI coding agent with local filesystem access
- Cursor AI Guide — AI-enhanced IDE for real-time, visual coding
- Cursor vs Claude Code — Head-to-head comparison of the two dominant coding agent workflows
- AI Agents Fundamentals — How agentic systems like Codex are architected
- Evaluating AI Agents — How to measure and assess autonomous agent output quality
Frequently Asked Questions
What is OpenAI Codex in 2026?
OpenAI Codex (2026) is an autonomous coding agent built into ChatGPT. Unlike the original Codex model from 2021, the new Codex is a cloud-based agent powered by codex-1 (an o3-derivative model). It can read entire repositories, write code across multiple files, run tests in a sandboxed environment, and create pull requests — all from a ChatGPT conversation. It runs asynchronously, handling tasks that take minutes to hours.
How does OpenAI Codex compare to Claude Code?
Both are autonomous coding agents, but they run in different environments. Claude Code runs locally in your terminal with direct access to your filesystem, shell, and git. OpenAI Codex runs in a cloud sandbox — it clones your repo, works in isolation, and pushes results via PR. Claude Code is better for interactive development; Codex is better for batch tasks you can queue and review later.
Is OpenAI Codex free?
OpenAI Codex is available to ChatGPT Pro, Team, and Enterprise subscribers. Pro costs $200/month and includes generous Codex usage. Team plans start at $25/user/month with limited Codex access. There is no standalone Codex API — it is accessed exclusively through the ChatGPT interface.
What can OpenAI Codex do that ChatGPT cannot?
While ChatGPT can generate code snippets in conversation, Codex operates as a full agent: it clones your GitHub repository into a sandboxed environment, reads the entire codebase for context, writes and modifies multiple files, installs dependencies, runs your test suite, iterates on failures, and creates a pull request with its changes. It turns natural language task descriptions into verified, tested code changes.
What is AGENTS.md and why does it matter for Codex?
AGENTS.md is a configuration file placed at your repository root that tells Codex how your project works — test commands, linting commands, code conventions, and things to avoid. Without it, Codex infers conventions by reading your code. With it, you get consistent, convention-respecting output from the first task. Writing an AGENTS.md is the single highest-leverage action before using Codex.
How does the Codex sandboxed execution environment work?
When you give Codex a task, it provisions a fresh microVM — an isolated cloud environment with no internet access. Inside this sandbox, your GitHub repo is cloned, Codex reads the codebase and AGENTS.md for context, writes code, runs tests, and iterates on failures. The sandbox is destroyed when the task completes. This network isolation prevents code exfiltration and supply chain attacks.
What types of tasks is OpenAI Codex best suited for?
Codex is optimized for asynchronous, batch-style coding work: dependency upgrades, test coverage expansion, code style migrations, documentation generation, and bug fixes from issue descriptions. It performs best on tasks that are clearly bounded, independently testable, and not dependent on external state. Tasks requiring interactive debugging or exploratory prototyping are better suited to agentic IDEs or Claude Code.
Can I run multiple Codex tasks at the same time?
Yes. Because Codex runs asynchronously in the cloud, you can queue multiple tasks simultaneously. Each task operates in its own independent sandbox. Multiple engineers on a team can each submit Codex tasks at the same time, and results arrive as separate pull requests. This parallel execution model is one of Codex's key advantages over interactive coding agents.
How is the 2026 Codex different from the original 2021 Codex?
The 2021 Codex was a code-completion model based on GPT-3 that returned code snippets via API — it was deprecated in March 2023. The 2026 Codex is an autonomous agentic system powered by codex-1 (derived from o3). It clones entire repositories, writes multi-file changes, runs tests, iterates on failures, and opens pull requests. The interaction pattern changed from prompt-in-code-out to task-in-PR-out.
What are the security considerations when using Codex with GitHub?
Codex runs in a network-isolated sandbox with no internet access, which prevents supply chain attacks and code exfiltration during execution. GitHub branch protection rules still apply, so Codex cannot push directly to main without review. Best practices include granting Codex access to specific repositories rather than your entire GitHub organization, auditing its PRs like any external contributor, and using AGENTS.md to restrict which files Codex may modify.