OpenAI Codex Guide — AI Coding Agent in ChatGPT (2026)

Q: What is OpenAI Codex in 2026?

OpenAI Codex (2026) is an autonomous coding agent built into ChatGPT. Unlike the original Codex model from 2021, the new Codex is a cloud-based agent powered by codex-1 (an o3-derivative model). It can read entire repositories, write code across multiple files, run tests in a sandboxed environment, and create pull requests — all from a ChatGPT conversation. It runs asynchronously, handling tasks that take minutes to hours.

Q: How does OpenAI Codex compare to Claude Code?

Both are autonomous coding agents, but they run in different environments. Claude Code runs locally in your terminal with direct access to your filesystem, shell, and git. OpenAI Codex runs in a cloud sandbox — it clones your repo, works in isolation, and pushes results via PR. Claude Code gives you real-time interaction and local tool access. Codex gives you parallel task execution and fire-and-forget workflows. Claude Code is better for interactive development; Codex is better for batch tasks you can queue and review later.

Q: Is OpenAI Codex free?

OpenAI Codex is available to ChatGPT Pro, Team, and Enterprise subscribers. Pro costs $200/month and includes generous Codex usage. Team plans start at $25/user/month with limited Codex access. There is no standalone Codex API — it is accessed exclusively through the ChatGPT interface.

Q: What can OpenAI Codex do that ChatGPT cannot?

While ChatGPT can generate code snippets in conversation, Codex operates as a full agent: it clones your GitHub repository into a sandboxed environment, reads the entire codebase for context, writes and modifies multiple files, installs dependencies, runs your test suite, iterates on failures, and creates a pull request with its changes. It turns natural language task descriptions into verified, tested code changes.

Q: What is AGENTS.md and why does it matter for Codex?

AGENTS.md is a configuration file placed at your repository root that tells Codex how your project works — test commands, linting commands, code conventions, and things to avoid. It is analogous to Claude Code's CLAUDE.md. Without it, Codex infers conventions by reading your code. With it, you get consistent, convention-respecting output from the first task. Writing an AGENTS.md is the single highest-leverage action before using Codex.

Q: How does the Codex sandboxed execution environment work?

When you give Codex a task, it provisions a fresh microVM — an isolated cloud environment with no internet access. Inside this sandbox, your GitHub repo is cloned, Codex reads the codebase and AGENTS.md for context, writes code, runs tests, and iterates on failures. The sandbox is destroyed when the task completes. This network isolation prevents code exfiltration and supply chain attacks, making it safe to grant GitHub write access.

Q: What types of tasks is OpenAI Codex best suited for?

Codex is optimized for asynchronous, batch-style coding work: dependency upgrades, test coverage expansion, code style migrations, documentation generation, and bug fixes from issue descriptions. It performs best on tasks that are clearly bounded, independently testable, and not dependent on external state. Tasks that require interactive debugging, exploratory prototyping, or real-time pair programming are better suited to Claude Code or an IDE.

Q: Can I run multiple Codex tasks at the same time?

Yes. Because Codex runs asynchronously in the cloud, you can queue multiple tasks simultaneously. Each task operates in its own independent sandbox. Multiple engineers on a team can each submit Codex tasks at the same time, and results arrive as separate pull requests. This parallel execution model is one of Codex's key advantages over interactive coding agents.

Q: How is the 2026 Codex different from the original 2021 Codex?

The 2021 Codex was a code-completion model based on GPT-3 that returned code snippets via API — it was deprecated in March 2023. The 2026 Codex is an autonomous agentic system powered by codex-1 (derived from o3). It clones entire repositories, writes multi-file changes, runs tests, iterates on failures, and opens pull requests. The interaction pattern changed from prompt-in-code-out to task-in-PR-out.

Q: What are the security considerations when using Codex with GitHub?

Codex runs in a network-isolated sandbox with no internet access, which prevents supply chain attacks and code exfiltration during execution. GitHub branch protection rules still apply, so Codex cannot push directly to main without review. Best practices include granting Codex access to specific repositories rather than your entire GitHub organization, auditing its PRs like any external contributor, and using AGENTS.md to restrict which files Codex may modify.

OpenAI Codex (2026) is an autonomous cloud-based coding agent built directly into ChatGPT. You describe a task in plain English, Codex clones your GitHub repository into a sandboxed environment, writes code across multiple files, runs your test suite, iterates on failures, and opens a pull request — all without you touching the keyboard again. This guide covers how Codex works, when to use it over Claude Code or Cursor, and how to write tasks that produce high-quality PRs.

Who this is for:

GenAI engineers evaluating whether Codex fits their development workflow alongside or instead of Claude Code
Team leads exploring asynchronous, cloud-based code generation for batch tasks and routine maintenance
Developers already using ChatGPT who want to understand what Codex unlocks beyond conversation-level code generation

The Evolution: 2021 Codex vs 2026 Codex Agent

The name “Codex” refers to two completely different things separated by five years of AI development. Understanding the distinction matters if you’re reading older documentation or job postings that reference “Codex.”

The 2021 Codex Model

The original Codex was a code-specialized language model released in August 2021 as part of the GitHub Copilot beta. It was based on GPT-3, fine-tuned on publicly available GitHub code. You called it via API — you sent a prompt, it returned a code completion. That was the entire interaction.

The 2021 Codex model was deprecated in March 2023 as GPT-3.5 and GPT-4 surpassed it on coding benchmarks. It is no longer available.

The 2026 Codex Agent

The 2026 Codex is architecturally unrelated to the 2021 model. It is an agentic system powered by codex-1, a model derived from the o3 reasoning family and fine-tuned specifically for software engineering tasks — writing clean code, following repository conventions, running commands, and iterating on test failures.

What changed is not just the underlying model. The entire interaction pattern changed:

Dimension	2021 Codex (model)	2026 Codex (agent)
Interaction	API completion — prompt in, code out	Autonomous agent — task in, PR out
Context	Single prompt window	Full repository clone
Execution	Stateless — model only	Stateful — runs commands, installs deps, executes tests
Output	Code text	Pull request with diffs, test results, and explanation
Pricing	Pay-per-token API	Included in ChatGPT Pro / Team / Enterprise
Duration	Milliseconds	Minutes to hours (async)

The 2026 Codex is better understood as OpenAI’s answer to Claude Code — not a successor to the 2021 API model.

Real-World Problem Context

Every engineering team has a backlog of tasks that are clearly scoped but time-consuming to execute: updating dependencies, adding test coverage to legacy modules, migrating from one library version to another, fixing linting violations across a codebase. These tasks are well-defined enough that a skilled developer could complete them in a few hours — but skilled developers have better uses for their time.

This is the problem Codex is designed to solve. It is optimized for asynchronous, batch-style coding work where you describe the goal, step away, and review the result later.

When Codex Is the Right Tool

Dependency upgrades — “Upgrade all lodash calls to use the native ES2023 equivalents and run the test suite”
Test coverage expansion — “Add unit tests for every public function in src/lib/ that currently has <50% coverage”
Code style migrations — “Convert all callback-style async functions in this codebase to async/await”
Documentation generation — “Add JSDoc comments to all exported functions in src/api/”
Bug fixes from issue descriptions — Paste a GitHub issue, Codex reads the code and opens a PR

When Codex Is the Wrong Tool

Interactive debugging — If you need to step through code with breakpoints, use your IDE
Exploratory prototyping — When you’re not sure what you want yet, the back-and-forth of Cursor or Claude Code is more effective
Local-only codebases — Codex requires GitHub access; it cannot operate on purely local repos
Real-time pair programming — There is no inline autocomplete; it’s a task-oriented agent, not a keystroke assistant

How Codex Works: Sandboxed Execution and the Async Task Model

Understanding Codex’s architecture explains both its strengths and its limitations.

The Sandboxed Execution Environment

When you give Codex a task, it provisions a fresh microVM — an isolated cloud environment with no access to the internet or external services. Inside this sandbox:

Your GitHub repository is cloned at the specified branch or commit
Codex reads the repository structure and relevant files to build context
If an AGENTS.md file exists at the repo root, Codex reads it as its operational instructions — analogous to Claude Code’s CLAUDE.md
Code is written, tests are run, and failures are iterated on — all inside the sandbox
The sandbox is destroyed when the task completes

The network-isolated sandbox is a deliberate security design. Codex cannot exfiltrate your code, call external APIs without explicit configuration, or introduce supply chain attacks through unexpected network calls. This makes it substantially safer to grant GitHub write access than a general-purpose agent with internet access.

The Asynchronous Task Model

Unlike Claude Code, which runs interactively in your terminal, Codex is explicitly asynchronous. You submit a task and Codex works independently in the background — for minutes or hours depending on complexity. This means:

You can queue multiple tasks simultaneously and review all PRs at once
You do not need to babysit the agent or approve intermediate steps
Tasks run even when your laptop is closed
Complex multi-step tasks can run to completion without hitting session timeouts

The tradeoff is that you lose the ability to redirect mid-task. If Codex goes down the wrong path, you find out when you review the PR — not in real-time. This makes clear, specific task descriptions even more critical than in interactive agents.

GitHub Integration

Codex integrates with GitHub at the PR level. After completing a task, it creates a branch, pushes its changes, and opens a pull request with a description of what it did, why, and what tests it ran. All of your standard code review, CI/CD, and branch protection rules apply — Codex operates as a contributor to your repository, not an override of your workflow.

Codex Workflow: Task to Pull Request

Codex handles the entire path from a plain-English task description to a reviewed, testable pull request — without human hand-holding at each step.

Visual Explanation

📊 Visual Explanation

OpenAI Codex: Task to Pull Request Workflow

Codex provisions a sandboxed VM, clones your repo, executes the task autonomously, and delivers a reviewed PR — all asynchronously.

Task SubmissionYou describe the goal in ChatGPT

Natural language task description

Repository and branch selection

AGENTS.md context loaded

Sandbox ProvisioningIsolated cloud VM, no internet

Fresh microVM provisioned

GitHub repo cloned

Dependencies installed

Autonomous Executioncodex-1 reads, writes, runs tests

Codebase context built

Multi-file edits written

Test suite executed

Failures iterated on

Pull Request DeliveryBranch pushed, PR opened with full context

Branch created and pushed

PR opened with diff summary

Test results attached

Sandbox destroyed

Idle

Key Features

Codex’s value comes from five capabilities that work together to turn a natural language description into a reviewed, tested pull request.

Multi-File Editing with Full Repository Context

Codex does not operate on a single file or a limited context window of recently opened files. It clones the entire repository and reads the files relevant to your task. When you ask it to refactor an authentication module, it reads the module, its tests, the files that import it, and the relevant documentation before writing a single line.

This full-repository context is what separates Codex from simpler AI code generation. A model generating code in isolation may write something that works syntactically but conflicts with your existing patterns. Codex understands what your project already does and writes code that fits.

Test Execution and Iteration

Codex does not just write code and stop. It runs your test suite. If tests fail, it reads the failure output, diagnoses the cause, and modifies the code to fix it — iterating until tests pass or it reaches a confidence threshold where it reports the remaining failures in the PR description.

This test-driven iteration loop is one of the strongest differentiators from simple code generation. You are not reviewing code that “looks right” — you are reviewing code that passed tests.

AGENTS.md: Your Instructions to Codex

The AGENTS.md file is Codex’s equivalent of Claude Code’s CLAUDE.md. Placed at your repository root, it tells Codex how your project works:

<ProductCTA product="study-tracker" heading="Want the AI engineer roadmap?" text="Master this topic with daily tasks, dedicated build projects, and weekly scorecards — part of a 12-week structured preparation system." />
## Test commands
npm run test          # Unit tests
npm run test:e2e      # End-to-end tests (skip for non-UI tasks)
npm run lint          # ESLint — must pass before any PR

## Code conventions
- TypeScript strict mode — no `any` types
- Prefer named exports over default exports
- API routes follow REST conventions in src/api/
- Database queries go through the service layer, never directly in route handlers

## What to avoid
- Do not modify package-lock.json manually
- Do not add dependencies without noting them in the PR description
- Do not disable TypeScript errors with @ts-ignore

Codex reads AGENTS.md before starting work. Without it, Codex relies on inference — it reads your code to figure out conventions. With it, you get consistent, convention-respecting output from the first task.

Dependency Management

Codex can install and manage dependencies within its sandbox. If your task requires a new library, Codex can install it, use it, and document the addition in the PR description. The sandbox isolation means these installs never affect your local environment — they exist only within Codex’s working context until you merge the PR.

Parallel Task Execution

Because Codex runs asynchronously in the cloud, you can queue multiple tasks simultaneously. Three different engineers can each submit a Codex task at the same time, each operating on its own independent sandbox. Reviews arrive as PRs — standard GitHub code review handles the rest.

Codex vs Claude Code

Both Codex and Claude Code are autonomous coding agents capable of multi-file editing, test execution, and codebase-wide changes. But they are built for different workflows.

📊 Visual Explanation

OpenAI Codex vs Claude Code

OpenAI Codex

Cloud agent — async, sandboxed, PR-based

Runs asynchronously — queue and review later
Network-isolated sandbox — strong security model
Parallel task execution — multiple PRs simultaneously
Full GitHub PR integration — fits existing review workflows
No real-time interaction — cannot redirect mid-task
Requires GitHub access — no local-only codebase support
Included in ChatGPT Pro ($200/mo) — no per-task billing

Claude Code

Terminal agent — interactive, local, real-time

Real-time interaction — redirect at any step
Direct local filesystem and shell access
CLAUDE.md persistent project memory across sessions
Hooks for automated linting and testing on every edit
Requires your terminal to stay open — not fully async
Usage-based API billing — costs grow with session length
Works on any local repo — no GitHub requirement

Verdict: Use Codex for async batch tasks and PR-based workflows. Use Claude Code for interactive, exploratory, or local development.

Use OpenAI Codex when…

Codex: dependency upgrades, test coverage gaps, codebase-wide migrations, batch maintenance

Use Claude Code when…

Claude Code: feature development, live debugging sessions, local scripts, CI/CD integration

The Practical Choice

For most engineering teams, the decision is not either/or. Codex and Claude Code occupy different positions in the workflow:

Codex handles the repeatable, well-scoped work that fills your maintenance backlog — the tasks you’d assign to a junior developer if you had the bandwidth
Claude Code handles the exploratory, iterative, or time-sensitive work where real-time interaction matters — feature development, debugging, one-off transformations

If you use ChatGPT Pro already, Codex comes included. If you’re primarily working in the terminal with a local codebase, Claude Code’s interactive model may fit better. If your team uses GitHub as the hub of your development workflow, Codex’s PR-based output integrates with zero friction.

Best Practices

Task description quality is the single biggest lever on output quality — the practices below maximize how often Codex delivers a mergeable PR on the first attempt.

Writing Effective Codex Task Descriptions

The quality of Codex’s output is directly proportional to the quality of your task description. Vague tasks produce vague PRs. Specific tasks produce specific PRs.

Weak task description:

“Fix the auth module”

Strong task description:

“The JWT token refresh logic in src/lib/auth.ts does not handle clock skew. Add a 30-second leeway to the expiry check at line 47. Run npm run test:unit to verify existing tests still pass, and add a test case that simulates a 15-second clock drift.”

The strong description specifies:

What file to look at
What behavior to fix
What the correct behavior should be
Which test command to run
What new test to add

Configure AGENTS.md Before Your First Task

The single highest-leverage action you can take before using Codex is writing an AGENTS.md file. Even a minimal file — your test command, your linting command, and three or four code conventions — dramatically improves output consistency.

A useful AGENTS.md template:

## Setup
[How to install dependencies if needed]

## Test and lint commands
[Exact commands Codex should run before completing a task]

## Code conventions
[3-5 most important patterns in your codebase]

## Things to never do
[Common mistakes or anti-patterns specific to your project]

Size Tasks for the Right Scope

Codex performs best on tasks that are:

Clearly bounded — a well-defined start and end state
Independently testable — existing or new tests can verify correctness
Not dependent on external state — no manual database seeding, no third-party API calls required

Tasks that span multiple independent concerns (“refactor auth AND update the dashboard AND fix the signup form”) are better split into separate Codex submissions. Each PR is easier to review, and failures in one task don’t block the others.

Review Every PR

Codex is a tool, not a replacement for engineering judgment. Review every PR it opens:

Verify that the changes match what you asked for
Check that test coverage is meaningful, not just coverage-percentage padding
Confirm that no unintended files were modified
Look at the PR description — Codex documents its reasoning, which helps you catch misunderstandings before they reach main

Interview Preparation

Agentic coding tools like Codex are increasingly present in senior engineering interviews. Here are the questions you’re likely to encounter and what strong answers look like.

Q: “What is OpenAI Codex and how is it different from GitHub Copilot?”

Strong answer: “GitHub Copilot is a keystroke-level assistant — it provides inline autocomplete as you type within your IDE. OpenAI Codex is an autonomous agent — you describe a complete task, it clones your GitHub repository into a sandboxed environment, writes code across multiple files, runs your tests, and opens a pull request. Copilot augments your editing; Codex replaces the editing session entirely for well-scoped tasks. They operate at completely different levels of abstraction.”

Q: “What are the security considerations when using an AI agent with GitHub write access?”

Strong answer: “The key controls are scope and isolation. Codex runs in a network-isolated sandbox — no internet access — which prevents supply chain attacks and code exfiltration during execution. GitHub branch protection rules still apply, so Codex cannot push directly to main without review. I’d also recommend granting Codex access to specific repositories rather than your entire GitHub organization, and auditing its PRs the same way you’d audit any external contributor’s work. For sensitive repositories, AGENTS.md can explicitly restrict which files Codex is allowed to modify.”

Q: “How would you integrate Codex into a team’s CI/CD workflow?”

Strong answer: “I’d treat Codex as a contributor, not an administrator. It opens PRs that go through the same review and CI gates as human contributions. The practical integration is defining AGENTS.md to specify test commands and code conventions, then using Codex for maintenance-category work: dependency updates, test coverage, code style migrations. I’d avoid giving Codex access to production infrastructure configuration or secrets — its strength is application code, not operations. Teams that use it well tend to create a ‘Codex queue’ of well-specified issues and batch-submit them at the start of the week.”

Q: “When would you choose Codex over Claude Code for a development task?”

Strong answer: “The decision comes down to interactivity and location. If I need real-time feedback — trying multiple approaches, redirecting based on intermediate results, or working with a local-only codebase — I’d use Claude Code in my terminal. If the task is well-specified, independently testable, and I want to work on something else while the agent runs, I’d use Codex. In practice, I’d use both: Claude Code for active feature development and exploratory work, Codex for the maintenance backlog — the tasks that accumulate but never get prioritized because they’re not urgent.”

Summary and Key Takeaways

OpenAI Codex (2026) is an autonomous coding agent, not the 2021 code completion API — it clones your repo, runs tests, and opens PRs
The sandboxed execution model provides strong security: network-isolated, ephemeral, and scoped to your GitHub repository
AGENTS.md is your primary configuration lever — write it before your first task to establish conventions and test commands
Asynchronous execution enables parallel task queues but removes real-time interaction — task clarity is everything
Codex and Claude Code are complementary: Codex for async batch work and PR-based workflows, Claude Code for interactive terminal-based development
Available to ChatGPT Pro subscribers ($200/month) — no per-task billing, no separate API setup

AI Code Editors Comparison — Full landscape of Cursor, Claude Code, Copilot, Windsurf, and Codex
Claude Code Guide — Terminal-based AI coding agent with local filesystem access
Cursor AI Guide — AI-enhanced IDE for real-time, visual coding
Cursor vs Claude Code — Head-to-head comparison of the two dominant coding agent workflows
AI Agents Fundamentals — How agentic systems like Codex are architected
Evaluating AI Agents — How to measure and assess autonomous agent output quality

Frequently Asked Questions

What is OpenAI Codex in 2026?

OpenAI Codex (2026) is an autonomous coding agent built into ChatGPT. Unlike the original Codex model from 2021, the new Codex is a cloud-based agent powered by codex-1 (an o3-derivative model). It can read entire repositories, write code across multiple files, run tests in a sandboxed environment, and create pull requests — all from a ChatGPT conversation. It runs asynchronously, handling tasks that take minutes to hours.

How does OpenAI Codex compare to Claude Code?

Both are autonomous coding agents, but they run in different environments. Claude Code runs locally in your terminal with direct access to your filesystem, shell, and git. OpenAI Codex runs in a cloud sandbox — it clones your repo, works in isolation, and pushes results via PR. Claude Code is better for interactive development; Codex is better for batch tasks you can queue and review later.

Is OpenAI Codex free?

OpenAI Codex is available to ChatGPT Pro, Team, and Enterprise subscribers. Pro costs $200/month and includes generous Codex usage. Team plans start at $25/user/month with limited Codex access. There is no standalone Codex API — it is accessed exclusively through the ChatGPT interface.

What can OpenAI Codex do that ChatGPT cannot?

While ChatGPT can generate code snippets in conversation, Codex operates as a full agent: it clones your GitHub repository into a sandboxed environment, reads the entire codebase for context, writes and modifies multiple files, installs dependencies, runs your test suite, iterates on failures, and creates a pull request with its changes. It turns natural language task descriptions into verified, tested code changes.

What is AGENTS.md and why does it matter for Codex?

AGENTS.md is a configuration file placed at your repository root that tells Codex how your project works — test commands, linting commands, code conventions, and things to avoid. Without it, Codex infers conventions by reading your code. With it, you get consistent, convention-respecting output from the first task. Writing an AGENTS.md is the single highest-leverage action before using Codex.

How does the Codex sandboxed execution environment work?

When you give Codex a task, it provisions a fresh microVM — an isolated cloud environment with no internet access. Inside this sandbox, your GitHub repo is cloned, Codex reads the codebase and AGENTS.md for context, writes code, runs tests, and iterates on failures. The sandbox is destroyed when the task completes. This network isolation prevents code exfiltration and supply chain attacks.

What types of tasks is OpenAI Codex best suited for?

Codex is optimized for asynchronous, batch-style coding work: dependency upgrades, test coverage expansion, code style migrations, documentation generation, and bug fixes from issue descriptions. It performs best on tasks that are clearly bounded, independently testable, and not dependent on external state. Tasks requiring interactive debugging or exploratory prototyping are better suited to agentic IDEs or Claude Code.

Can I run multiple Codex tasks at the same time?

Yes. Because Codex runs asynchronously in the cloud, you can queue multiple tasks simultaneously. Each task operates in its own independent sandbox. Multiple engineers on a team can each submit Codex tasks at the same time, and results arrive as separate pull requests. This parallel execution model is one of Codex's key advantages over interactive coding agents.

How is the 2026 Codex different from the original 2021 Codex?

The 2021 Codex was a code-completion model based on GPT-3 that returned code snippets via API — it was deprecated in March 2023. The 2026 Codex is an autonomous agentic system powered by codex-1 (derived from o3). It clones entire repositories, writes multi-file changes, runs tests, iterates on failures, and opens pull requests. The interaction pattern changed from prompt-in-code-out to task-in-PR-out.

What are the security considerations when using Codex with GitHub?

Codex runs in a network-isolated sandbox with no internet access, which prevents supply chain attacks and code exfiltration during execution. GitHub branch protection rules still apply, so Codex cannot push directly to main without review. Best practices include granting Codex access to specific repositories rather than your entire GitHub organization, auditing its PRs like any external contributor, and using AGENTS.md to restrict which files Codex may modify.

OpenAI Codex Guide — AI Coding Agent in ChatGPT (2026)

The Evolution: 2021 Codex vs 2026 Codex Agent

The 2021 Codex Model

The 2026 Codex Agent

Real-World Problem Context

When Codex Is the Right Tool

When Codex Is the Wrong Tool

How Codex Works: Sandboxed Execution and the Async Task Model

The Sandboxed Execution Environment

The Asynchronous Task Model

GitHub Integration

Codex Workflow: Task to Pull Request

Visual Explanation

📊 Visual Explanation

Key Features

Multi-File Editing with Full Repository Context

Test Execution and Iteration

AGENTS.md: Your Instructions to Codex

Dependency Management

Parallel Task Execution

Codex vs Claude Code

📊 Visual Explanation

The Practical Choice

Best Practices

Writing Effective Codex Task Descriptions

Configure AGENTS.md Before Your First Task

Size Tasks for the Right Scope

Review Every PR

Interview Preparation

Summary and Key Takeaways

Related

Frequently Asked Questions