Claude Code Codex AI Developer Tools Terminal

Claude Code vs OpenAI Codex: Which One Lives in My Terminal

Sanin Mulić May 9, 2026 7 min read

I gave both terminal agents real client work for a month. The winner wasn't who I expected, and the deciding factor wasn't the model.

I'm a terminal person. My editor matters less than my shell. A tool that lives next to git and pnpm gets used a hundred times more than one that lives in a sidebar. So when both Anthropic and OpenAI shipped serious terminal-resident agents (Claude Code and OpenAI Codex), I gave both a real four-week trial on actual client work. Here's what stuck.

The setup

Both ran against the same Astro 5 + Strapi + Svelte 5 monorepo (this site, actually). Same tasks: ship a feature branch, fix a flaky test, refactor a Svelte 5 component to use runes properly, debug a production issue from a stack trace. I switched between them daily for a month.

What Claude Code does well

Claude Code's killer feature is that it reads the codebase first. When I tell it "fix the broken project detail page," it goes and looks at the page, the layout, the related components, the GraphQL query, and the Strapi schema before touching anything. That sounds obvious. It isn't. Most coding agents, Codex included, start typing within seconds.

The result is a slower first iteration but dramatically fewer follow-ups. A typical Claude Code session looks like 30 seconds of "I'm reading X, Y, Z" then a focused diff that does what I asked, no more. I rarely need a second round. The tool earns its longer think time.

The other thing it nails: respecting project conventions. The first time I worked on this portfolio with Claude Code, it noticed the path aliases (@components, @lib), the cn() utility for class merging, and the apps/web/src/api-strapi/queries/* pattern within the first task. Subsequent edits used those conventions without me asking. That's senior-engineer behavior.

What Codex does well

Codex feels like a power user's tool. The defaults are aggressive. It will happily run shell commands, modify multiple files, and assume you have your back covered with version control. For exploratory work that speed is intoxicating. "Add a webhook endpoint that does X" gets you a 95% complete result in under a minute.

It's also good at narrow code transformations. Need to convert 40 files from one import style to another? Codex will rip through that with confidence. The structural pattern recognition is excellent.

Where Codex struggles, in my experience, is in big-context multi-step reasoning. The Astro/Strapi GraphQL stack has a couple of non-obvious failure modes (component fragments collide on differently-required fields, API tokens cache schema scope at issuance, and so on). Codex tends to power through these without noticing them. Claude Code tends to stop, read more, and ask.

The deciding factor: handling its own mistakes

The week I made up my mind was when both tools introduced the same regression: a Strapi query missing an explicit pagination: { limit: N }, which silently caps results at 10. Both shipped working-looking code. Both pretended everything was fine until the build failed.

When I fed each the failure, they responded differently. Codex jumped to a fix. The fix was wrong. It added a workaround that masked the symptom but kept the underlying bug. Claude Code, given the same input, said: "I think the underlying issue is the default pagination cap. Can you confirm by showing me the original query?"

That second move, refusing to fix the wrong thing, is the muscle that matters when an agent is operating autonomously.

What I run today

Claude Code is my default. Codex is on standby for narrow mechanical refactors where I want raw speed and don't care if I have to push back twice. The split is roughly 80/20 in Claude Code's favor.

Two specific reasons it stuck:

1. Skills. Claude Code's plugin/skill system means I can hand it a pdf skill, a docx skill, a project-specific skill bundle, and it composes them automatically. My Cowork setup ships a portfolio-specific skill plugin that knows our schema, our conventions, and our deployment targets. Codex doesn't have an equivalent that's matured to this level yet.

2. Hooks and slash commands. Wiring /security-review and /init and project-specific commands into the agent loop, plus pre-commit hooks the agent respects, means Claude Code can actually be trusted with --auto-edit or --allow-write over a long session. With Codex I still feel the need to babysit.

The honest caveat

Both tools improve fast enough that any specific comparison ages in months, not years. What I'd bet on long-term is the philosophy each carries. Anthropic's tool reflects Anthropic's "ask before acting" alignment posture in real, observable ways. OpenAI's tool reflects OpenAI's "ship fast, optimize later" product instinct. If you like the former, you'll probably stay with Claude Code. If you like the latter, Codex will feel native.

For me, building portfolio pieces and shipping client work where mistakes cost real money, Claude Code with claude on my PATH wins. Codex sits in the back pocket for when I need a hammer instead of a scalpel.