How OpenAI Engineers use Codex to Tackle Big Projects with Rigor

Written by Chris Nicholson, Member of Global Affairs Staff at OpenAI

December 4, 2025

Chris Nicholson

Romain Huet (Head of Developer Experience) and Aaron Friel (Member of Technical Staff) were hosted by Chris Nicholson (Global Affairs) to explore how software engineering is changing in real time as Codex shifts from “code generator” to true teammate. The theme was “vibe engineering”: moving faster with agents across planning, architecture, debugging, and documentation, while keeping humans fully accountable for every line that ships.

Instead of choosing a quick and easy demo, Friel picked a task designed to outlast the talk: rewrite Bazel Diff, a Kotlin tool used to decide what to build in continuous integration, into a Rust implementation targeting 100% compatibility. Starting from an empty directory, Codex spun up sub-agents, maintained a long-horizon exec plan, and created a “watchdog” agent to keep the work aligned with requirements. The real run, which Frield started before the talk, took about 12 hours, and along the way Codex researched upstream code, tested assumptions like Bazel 8 versus Bazel 9 differences, and continuously updated a plan file so progress stayed legible to both people and models.

One surprise was how much rigor can arise during a long run. Friel described an earlier project where Codex worked for seven hours to produce a roughly 500-line diff, with more than 200 turns spent iterating on tests. Romain framed this as the new unit of progress: fewer mistakes, better reviews, and higher confidence, even when the final patch is compact. Inside OpenAI, Huet noted that Codex adoption among technical staff is above 92%, Codex reviews every internal pull request, and engineers using it ship about 70% more merged pull requests, with real bugs caught before production.

The conversation also spotlighted who gets empowered next. Non-technical teams can use a Codex Slack integration to ask questions about systems and unblock themselves without adding meetings to engineering calendars. Designers can begin pulling Figma components toward code via Model Context Protocol integrations, and everyday edits like website copy updates become easier to locate and change safely. The skill bottleneck is shifting toward domain expertise about the problem to be solved, discernment regarding the solution, clear communication, and a habit of producing artifacts that humans enjoy reading, since those same artifacts make agents more reliable.

We are grateful to Romain Huet, Aaron Friel, and the OpenAI Forum community for showing what human-led, production-grade AI collaboration looks like in practice!

Comments (0)

Popular

How OpenAI Engineers use Codex to Tackle Big Projects with Rigor

Written by Chris Nicholson, Member of Global Affairs Staff at OpenAI

Popular

Related