OpenAI Forum
+00:00 GMT
Groups
/
Developers and Startups
/
Content
Sign in or Join the community to continue

Event Replay: Vibe Engineering with OpenAI’s Codex

# Codex
# Vibe Coding
# Developers
Share

speakers

user's Avatar
Romain Huet
Head of Developer Experience @ OpenAI

Romain Huet is a French entrepreneur and engineer with a passion for developer platforms. He currently leads Developer Experience at OpenAI, inspiring and supporting founders and builders to integrate AI into their applications, and directing the creation of elegant and powerful tools for all developers.

+ Read More
user's Avatar
Aaron Friel
Member of Technical Staff @ OpenAI

Aaron Friel is pushing the frontier of AI adoption and developer productivity on OpenAI's technical staff. A passionate open source contributor, he has contributed to compilers, web frameworks, and cryptography libraries. Friel believes we should make systems work for users, make software development easier, faster, and safer, and apply theory in the service of accessibility. In his free time, he enjoys Dungeons & Dragons and exploring the applications of large language models, sometimes at the same time.

+ Read More
user's Avatar
Chris Nicholson
Member of Global Affairs Staff @ OpenAI

Chris V. Nicholson serves on OpenAI’s Global Affairs team, where he uses data and storytelling to document major AI use cases and support the company’s economic research. He co-founded the deep learning company Skymind (Y Combinator W16), which created the open-source AI framework Eclipse Deeplearning4j. He previously reported for the New York Times and Bloomberg News. Born in Montana, he now lives in the San Francisco Bay Area with his family.

+ Read More

SUMMARY

This Forum was a glimpse into how engineering is changing in real time. Chris, Romain, and Friel didn’t just talk about coding with AI—they showed what it looks like when AI acts as a true teammate, helping engineers plan, build, and debug while people stay firmly in charge of the results. They demonstrated Codex running a long, complex build that not only produced code but also managed sub-agents, tested ideas, and documented its work like a seasoned collaborator.

Nearly every OpenAI engineer uses Codex, leading to better reviews, fewer bugs, and faster progress on meaningful projects. But it’s not just for experts—non-technical teams are using it too, asking questions about code or making small changes on their own.

These tools aren’t replacing creativity or judgment—they’re freeing people up to focus on what really matters. Codex isn’t just “AI writing code”; it’s a way for teams to work smarter, move faster, and actually enjoy the process again.

+ Read More

TRANSCRIPT

[00:00:00] Hi, everybody. [00:00:12] Welcome to OpenAI Forum. My name is Chris Nicholson. I'm on the Global Affairs team. I'm an active Codex user and I'm very glad you joined us today. I'll bet a lot of software engineers watching now have faced brutal deadlines and wished they had a colleague who never got tired. At OpenAI, Codex sits in that seat. So today we're going to talk about vibe engineering, using AI to build real, production-grade software faster while keeping humans responsible for every line that ships. It can be tempting to let a model pour out code and then just cross your fingers that the test and code review will catch any mistakes. But the more interesting part is using AI in design and architecture and debugging even as you use it for long multi-step projects. [00:01:12] So before we start, let me just say that I hope every single person here will take away at least one concrete change in your workflow to try before the end of the year. [00:01:21] And I'm actually here with two people who live this. So today, we'll hear from Romain Huet. [00:01:23] He leads developer experience at OpenAI. Romain is a French entrepreneur and engineer who spent years building developer platforms and communities at great companies like Twitter and Stripe. [00:01:38] So now, he helps engineering teams turn raw AI capability into products that launch and actually scale. [00:01:45] We're also here with Aaron Friel, who goes by Friel, a software engineer who's pushing the frontier of developer productivity on OpenAI's technical staff. [00:01:49] Friel started coding in grade school, and he's a longtime contributor to open source. [00:02:00] He's worked on compilers, web frameworks, and cryptography libraries, the other crypto. [00:02:05] At Dev Day this fall, Friel showed how Codex helps him run multi-step projects from idea to working system. [00:02:13] And since then, I've seen it, the projects have only gotten longer, more complex. So these two are both obsessed with making development easier, safer, and more impactful. [00:02:26] We've chosen a demo, a project that takes longer than this talk. So what we're going to show you is the live setup, and then we're going to skip to the results later on. [00:02:35] In fact, Friel is going to share the results and the prompts with you in a GitHub repo. [00:02:40] As Friel walks through his live Codex workflow, and later as Romain and I unpack what it means for teams, you will be able to see how the engineer stays in control while AI does more and more of the legwork. [00:02:52] And at the end of this talk, Natalie Cone of the Forum will open this up for Q&A between you and us. [00:02:58] So now Friel, please show us how you kick off one of these projects.

[00:03:03] Yeah, thanks Chris. [00:03:04] So here at OpenAI, I work on developer systems and help accelerate all of our engineers. [00:03:11] And so often my tasks have me looking at systems that we need to improve, maybe performance, we maybe need to improve the ergonomics of some tool for our engineers. [00:03:20] And sometimes it involves rewriting, sometimes it involves making patches to an upstream project or building something in-house. [00:03:27] We find that it's very helpful that there's a wonderful open-source ecosystem where they've built essentially specifications for how software should work in various ways. [00:03:35] And we use those specifications, essentially the working software that already exists, to drive Codex towards successful results every day. [00:03:43] What I have up here on the screen is a pretty substantial task. [00:03:47] So here we use Bazel, as some might say, and we use a tool called Bazel Diff in order to understand what projects we need to build in our CI system. [00:04:01] And there's a great upstream project by The Match Group to do this. It's written in Kotlin. [00:04:07] It supports all of the Bazel features that we want, but we would like maybe some additional features or maybe some extra performance. [00:04:15] And so this is a pretty substantial task. [00:04:17] And so I'm telling it here that we're going to go ahead and rewrite this entire project from Kotlin to Rust, starting from scratch. [00:04:24] So this is an empty directory. There's nothing in here except this prompt. [00:04:28] And I'm going to go ahead and have it tell it to use that submodule, that repo as a submodule, and write all the tests against it. [00:04:35] So we're going to have a test harness where we can compare currently what does good look like and what does this new project look like. [00:04:43] I'm going to tell it also to use an exec plan. [00:04:45] And I won't go over all of that in detail. You can find on our OpenAI Cookbooks a little session about how we use exec plans, which is a planning document that enables these agents to run for longer tasks and keep track.

[00:04:58] for longer tasks and keep track of sort of a long horizon goal.

[00:05:02] So I'm going to tell it to use an exec plan, and it's the exact same prompt as in our OpenAI cookbook. It's going to use that to track its work and complete all of these requirements. I'm just telling it its goal is 100% compatibility between BazelDiff and this new project BazelDifferis. I'm giving it a lot of sort of requirements here and this would take an engineer weeks maybe to build, to validate. But we're actually going to see that this is going to be able to start at the task. Probably not going to finish within our time frame today.

[00:05:33] That's a good thing though. Overnight I ran this and it took about 12 hours to complete this task. So this is a fairly standard, we're able to see these long horizon tasks. I'll also caveat this and say I'm using an internal forked version of codecs that I've been hacking on myself. This is just some of the ideas that we're thinking about in order to make codecs be able to solve these long problems. It's not sort of a promise from Roman and folks that we're gonna be delivering these as exactly the features you see today.

[00:06:04] Sound good? Sounds great. Let's do it.

[00:06:06] Okay so I'm just gonna launch codecs in my terminal here. I'm gonna paste in my prompt. So I think the first thing we're likely to see here is that it's going to kick off a sub-agent. So it's going to think about what we need to do and how we keep track of this work as it goes. So the first thing it did was it created a watchdog.

[00:06:26] So in thinking about how we can use sub-agents to keep a codec session on task for a long period of time, it's created this watchdog which is a sub-agent that will just remind the codec session here's what our overarching goal is, here's what the users' requirements are and it will continuously remind the root agent of these goals using the context of the root agent. We're also going to see it's already spawned a few interesting sub-agents as well so it's using a mix of models, GPT 5.1 Codex mini, to go ahead and do some research on what are these upstream projects, what is a good structure for this project or investigate the difference between Bazel 8 and Bazel 9.

[00:07:15] I'll be the first to admit that these tasks are a little out of domain. There's not a lot of Bazel in our training data, there's just frankly not a lot of Bazel open source available on the internet and so this is a pretty challenging task. Now it's going to go ahead and continue to read this and we can see now that it's actually got these agents running in parallel and it's continuing to work. It's going to be adding the sub module and this will take quite a while. I don't expect we're going to see any code for a little bit so let's let this cook for a while.

[00:07:44] All right, awesome. Great, all right well guys we can start with my questions then and let's start with you, Rahman. So Simon Willison talks about VIB engineering, that's probably a phrase that many people in the audience have heard. If you haven't, it's different than VIB coding, it's the serious end of AI powered software development where senior engineers stay fully accountable while using agents and models everywhere they can. So I'd like to ask you both, you first please Rahman. How does that phrase land with you and to what extent does it overlap with real work? What you'd like to see?

[00:08:13] For sure, yeah, I think like it's a natural evolution of what we've seen the models do right, like maybe if you rewind just even one year, the model got pretty good at coding as in like outputting code. And so that's why we saw a lot of people being very excited about this VIB coding term and like actually building things. I remember when we launched like O1 for instance, the first reasoning model, one of my first demo was to like build an iPhone app from scratch in zero shot or even like programming a mini drone.

[00:08:51] But it's kind of like already pushing like the frontier at the time. Now we're like the models have become so capable that they become effectively teammates and as we see here live on this demo running in the background they can work on minutes or hours at the time on very complex tasks and so back to that term like from coding to engineering, that now encompasses everything an engineer does in his real work right it's not just writing code it's also the model being able to make a plan, make architecture decisions, write some tests, verify the work and we've seen when the models are able to check their own work they also perform better and so this is kind of like part of this agentic coding set of capabilities and yeah I think it's effectively becoming a teammate that helps you build your work faster.

[00:09:38] Very cool. How does the term Vibe Engineering land for you and how are you living it?

[00:09:42] Yeah I mean I think that as an individual contributor and engineer for most of my career it's been you know software has always been a trade-off of deadlines and the many many approaches we can take.

[00:09:56] Many, many approaches we can take to solve a problem. [00:09:59] And often, we spend a lot of time thinking about what is the right sort of solution to take, [00:10:03] and there's so many doors that we don't go down. [00:10:07] With the advent of agentic coding tools, I find in my day to day – for example, with [00:10:13] this demo I'm doing today, I will have agents take these tasks on multiple times and I'll [00:10:19] compare the results. I will ask them to take different approaches. [00:10:22] And so I think we're actually getting closer to maybe, I hope I'm not overstepping here, [00:10:27] but closer to sort of real quote unquote engineering work. Where we build and actually [00:10:31] test something, and we try something and maybe we back out and go to a different approach. [00:10:35] But we're able to iterate so much faster because these agents can do a lot of this [00:10:41] groundwork for us.

[00:10:44] I really resonate with this Vibe engineering idea. With this particular task, [00:10:49] we have a very specific problem we're trying to solve and we have a very specific [00:10:53] criteria of success. I think I would be remiss not to give credit to the authors [00:11:03] of Bazel Diff for doing the hard work of doing the specification work. I think [00:11:05] that's still integral to building great software, but then I can use the agent in order [00:11:10] to sort of accelerate the development.

[00:11:12] That's interesting. So it sounds like this makes it possible to base architectural [00:11:19] decisions much more empirically on what happened when you built it, because you can [00:11:23] build many more things. [00:11:24] Yeah. I've run this particular experiment a few times in preparation today and I'll [00:11:29] have it run benchmarks and compare its results and make sure that I'm choosing the [00:11:33] right one that we would potentially want to use. [00:11:34] That's super cool.

[00:11:36] Okay. Right, Ramon. You've led developer platforms at Twitter and Stripe, and now here. [00:11:43] So what feels different about this technology? How would you characterize, in a nutshell, [00:11:54] those previous platforms and what does that do to the community? [00:11:56] Yeah. I mean, it's an interesting vector, Ramon, because I think for a while, [00:12:02] we've known what great API design and great developer tools look like. [00:12:05] And typically, it used to mean having great primitives, great documentation, [00:12:12] and basically the first step that you obsessed about was onboarding developers for them [00:12:17] to learn about your API and learn about the capabilities.

[00:12:20] But now with tools like Codex and agency coding, we're moving from capabilities to [00:12:26] collaborators. So it's almost like the onboarding has been reversed. When you think about [00:12:29] exposing capabilities to the world as a platform, you want to make sure the models [00:12:33] understand these capabilities well. [00:12:36] And I think what's very exciting to me is that developers now can go from an idea to [00:12:41] a prototype and ultimately to production so much faster. [00:12:45] I don't have to educate myself in detail about how the Stripe API works. Most likely, [00:12:50] I have an idea for a business that Codex is going to help me build and ultimately [00:12:54] monetize without me spending a ton of time on the Stripe documentation, for instance. [00:12:59] So it's very different in terms of how you go from idea to actually something you can [00:13:03] show and ship to the world.

[00:13:06] And of course, we, ourselves, also have our own capabilities on the API. [00:13:10] It's also the ability to reinvent how your users interact with software. [00:13:15] Speech to speech I think now is at a moment where developers should look into it and build [00:13:20] more with it. I think last year, for instance, the quality of the voices was there, but [00:13:26] the price point was still a bit high. Now I think it's an amazing time to build voice [00:13:31] applications. So it's both the coding tools, but also the capabilities of AI, [00:13:37] letting you reinvent and build faster.

[00:13:38] Totally agree. I think we're in a moment where you can really get ahead of the curve on [00:13:42] voice because the prices are going to make so many things, the change in pricing will [00:13:46] make so many things possible. [00:13:47] Totally agree. [00:13:48] And I guess there's something unique about this tool because it can teach users how to [00:13:52] use it. And at the same time, weirdly, I see people using our tools in ways that I hadn't [00:13:59] imagined, which is to say, our users are teaching me what our tool does. [00:14:04] So that's another form of reverse onboarding. [00:14:06] Yeah. [00:14:07] Yeah. It's a good collaborator idea. [00:14:09] Yeah. I think it'd be great to show what we've got on the screen here. [00:14:12] Speaking of what Roman just said about looking at the Stripe documentation, I find that [00:14:18] these tools, Codex in particular, is an incredible researcher. [00:14:23] You give it a source code, a repository to look at, and so actually, just here, it looks [00:14:28] like it was considering how to architect some input. It's awaiting figuring out how to [00:14:32] do research, this new thing called Bazelmod, introduced recently in Bazel. [00:14:40] As part of this, I've seen the agent frequently clone down repositories in order to [00:14:44] actually figure out what's ground truth here. [00:14:47] It can sometimes be really hard. [00:14:49] Stripe has incredible developer documentation, but we want to look at maybe [00:14:55] the ground truth of what is the SDK?

[00:14:54] Maybe the ground truth of what is the SDK that we're working with? What is the actual tool that we're using?

[00:14:59] The ability for codecs to actually do that research on its own, come to its own conclusions, and figure out the right way to use things.

[00:15:06] Then I can ask it, I can interrogate it and ask it questions, and I've learned a lot about the systems that we use here at OpenAI by forking a repository or checking it in, adding it as an environment into the codecs web interface and then asking it questions.

[00:15:19] I found that would be an incredible tool for learning, and we can also see at this point in the repository, it actually looks like it's already scaffolded out a project and I have a plan file here.

[00:15:29] So if we want to take a look at that real quick, great, it has sort of a big picture description of what we're working on here, and it's already accomplished several of our tasks.

[00:15:43] But it's got a long ways to go, so let's let it keep working. Great. It's really amazing to me how you can have this task run for a long time, but also the model being able to kind of fetch documentation when need be, clone code when need be to kind of verify its work and actually iterate on this plan.

[00:16:02] And how do you keep up with it? This is so much that is being done right here.

[00:16:07] Yeah, I find sometimes there's an inference mismatch between what I can process and what the models can do. How do you keep up?

[00:16:12] Oh, yeah. I mean, I think that's where having a plan file or some sort of artifact that it maintains is helpful.

[00:16:19] As it's working for this sustained period of time, it's helpful not just for the model. I try to reinforce to the engineers here at OpenAI that there are many things that we wish we could do if we had more time.

[00:16:36] I would document this better, I would add more tests here. These are things that are helpful not just for the agents, but also for us. The agents give us the ability to spend more time on those tasks, and those benefits accrue, right?

[00:16:50] The better the tests are, the more likely we're going to stay on track for this project and deliver something successful.

[00:16:56] Right, the better our documentation, the easier it is for me to onboard the next engineer on my team or the next agent that works on this code base. And so I think that's really helpful.

[00:17:05] So you're saying creating human-readable artifacts actually improves the performance of agents on our code bases dramatically?

[00:17:14] Yes.

[00:17:17] Okay, question for Rahman first. You have an overview of how these powerful tools are being used internally.

[00:17:27] What are some of the coolest ways we're using? I mean, we've seen a huge inflection point in the summer, which you specify, when I think the model capabilities became so high that you could do a meaningful amount of work.

[00:17:37] Now we see internally, for instance, virtually every technical staff person is using Codex—like it's over 92 percent, if I recall correctly, but that number might be higher now.

[00:17:50] Another thing that I think is very powerful for us and for so many companies is code reviews. Like, all of our PRs internally at OpenAI are reviewed by Codex, and we've really caught some complex bugs that could have hit production.

[00:18:05] Thanks to Codex catching them, the product got better—catching those early. We've also seen that the engineers using Codex actually produce, on average, 70% more PRs.

[00:18:18] But these are real PRs that get merged in production—like real features, not extra noise.

[00:18:24] So we already see these trends, and they are only going up. Again, this vision that the models are no longer just outputting code; they're taking on some tasks that engineers would have had to do by hand.

[00:18:38] It's effectively like teammate vision coming to life.

[00:18:44] Are we using Codex to build Codex?

[00:18:45] Of course! I think that's why you might have seen that we cut releases almost every couple of days, and the pace of change is dramatically high, thanks to Codex helping build Codex.

[00:18:58] Recursive self-improvement starts with Codex.

[00:19:02] Amazing.

[00:19:06] So Aaron, I'd like to ask you the same question. What are some of the coolest use cases you've seen or participated in?

[00:19:12] Yeah, just to reiterate what Rahman said: again, not just technical users, but I've been sitting where I sit in acceleration.

[00:19:18] I also work with non-technical or less technical folks, right? I'm teaching them to use things like our Codex Slack integration to be able to ask Codex questions about our code base.

[00:19:32] This is sort of empowerment across all departments here at OpenAI, where if you have a question about how a feature works, maybe the first step is to ask Codex.

[00:19:43] And you can get an answer, and often I'll admit it does a better job of answering these questions than I do. So what are some examples of...?

[00:19:52] So what are some examples of non-technical teams who need to know things about our code? [00:19:57] Yeah, I'm sure Rahman has some more examples than I do, but whether it's a product manager or someone working in sales or maybe an on-site engineer working with some other team, maybe one of our many customers, being able to use Codex as sort of a first stop to ask these technical questions can lighten the load, right? It's like being able to just do that initial search can often answer that first question and maybe give them the opportunity to dive deeper on their own.

[00:20:31] And has that been a big bottleneck in the past when those, say, non-technical or non-engineering teams want answers and the engineers just don't have time? [00:20:41] It's definitely been a bottleneck in the past. I think that would resonate with most engineers is you have a finite amount of time in the day and we love to bemoan the number of meetings we have. We would love to have zero meetings and be able to just focus on coding. And I think Codex helps sort of take some of that load off.

[00:21:02] So this is maybe not the favorite activity of engineers answering questions about their code. Is this a universal problem for companies running software? [00:21:11] I feel like that does resonate as fairly universal. Some engineers love talking about their software a little bit more, so I enjoy chatting with these folks and doing that. I think my manager would prefer that I spend a little more time heads down.

[00:21:24] Yeah, what I find really cool also with agentic coding and tools like Codex is that even if not everyone has to become a software engineer, everybody is becoming more technical. You take designers, for instance, if they live in Figma every day, well now thanks to Codex talking to the MCP server of Figma, you can pull your Figma components into actual code. That's pretty powerful.

[00:21:48] Or imagine you want to change a marketing copy on the website. You can ask Codex to find the right place to change that copy. Everybody is becoming more technical and as a result, you give everyone in the company more superpowers to ship. [00:22:03] I agree with that. I think a lot of non-technical people are almost non-technical by accident. They had trouble finding someone to answer their questions because anybody who could answer their questions would probably be more valuable to the company building something. But now that they get those answers, they're building too, which is very encouraging.

[00:22:25] Erin, at DevDay, you showed us doing how you did some infrastructure work. What was the most surprising way that Codex changed how problems are framed for you? [00:22:41] Yeah, so at DevDay I spoke about how the infamous now claim of making Codex run for seven hours originated with the run that I did. So I was working on a problem, sitting on the couch, watching TV with my wife, and just hacking on some things on the side. I happened to set my laptop so that it wouldn't go to sleep, and I had it go run on some problem.

[00:23:08] And to my surprise, I woke up the next morning and I was like, okay, it looks like it's still running. And I'm like, wait, it's still running! It's actually still working on the problem. I think, so that was sort of the first surprise, right? And I think a lot of engineers might hear that and think, oh, no, it's written 100,000 lines of slop, right?

[00:23:30] But that 7-hour run actually only produced a diff of about 500 lines. It was a very complex change that it worked on, and it worked on the tests for, I think it was like over 200 turns of it iterating on tests, running them, and then iterating. And that final change is what I ended up merging. And so I think that was kind of shocking to me, is that given sort of this concrete objective, it actually could run for a really numerical period of time.

[00:23:59] And this was before Codex Max, right? [00:24:01] That's right. Which now takes on a longer task. But by the way, to your point, one thing that resonated with me at an event recently with some technical leaders is that oftentimes they ask the question to us, like, how many lines of code is AI writing for you? But I think it might've been the right question a year ago. It's no longer quite the right question, because now it's about the quality of the output, even though you may be changing just a few hundred lines of code, as you described in your example, right?

[00:24:30] I think the thinking and the checking the quality of the work is actually the meat of what's happening. [00:24:37] That story reminds me of the guy who discovered x-rays by accident, leaving some film in a room overnight. So Rahman, you're in touch with a lot of leaders, a lot of teams. How do you see their expectations changing?

[00:24:50] their expectations changing as these tools become more capable, like are they aspiring to new goals?

[00:24:57] Yeah, I think like the people who master these tools the most really embrace this like teammate vision. The same way you would wanna onboard like a human teammate and make them as successful as possible, you wanna give them context, you wanna give them like access, you wanna give them trust, and that's where you get the most out of these AI models, especially like GPT 5.1 Codex Max.

[00:25:22] And I think the expectation is also that like you can do so much more. Like now when I talk to like startup founders, like they can get so much more done in a short period of time. You know, it reminds me of my own startup many years back when these tools did not exist, like sometimes like a net new feature could take us like four to six weeks to implement, and now is like, is that even possible these days to even think about a feature that would take like a month to build? Most likely not.

[00:25:51] Like if you have these tools, like maybe in a couple days you can get a V1 of something, you know? So that I think is quite profound. And you know, like in startup land these days, like what matters is like how much can you learn from your customers when your product enters reality and they start using it? And that iteration speed and this velocity, that's what you can get with Codex. So that I think to me is very exciting.

[00:26:18] Yeah, that's really interesting, just getting to that reality check of having a real product and in a real human's hand.

[00:26:24] Yeah. So important. Okay so people are using these tools in a lot of different ways. Sometimes I think they might stumble or explore and find better ways to use them. When you're talking to somebody who's maybe new with the tools or maybe hasn't fully explored everything as possible, what's your litmus test to figure out if you think they're realizing the potential, if they're avoiding certain mistakes, what do you look for?

[00:26:50] What are the signs of healthy, impactful codecs use?

[00:26:55] Yeah, I mean I think like with a lot of software or a lot of even use of any of these tools, it's like are you building something that you'd want to use? Are you building something that you would love as a tool? And thinking about it and framing it in that way, you know, people talk about AI slop, right? And what we don't, we don't want to produce that, we wanna produce things that we care about, we enjoy using, that improve our day to day.

[00:27:21] And so I think that to me is the key test of like good versus bad LLM use is what is really bringing more human fruition into the world, what is sort of saving me from toil. And I can focus on the things that matter like maybe building a specification for a project or thinking through some of the harder problems of a project are what I enjoy, or maybe solving a customer's need. That's really what is valuable. A lot of the boilerplate of writing software is just that.

[00:27:50] Yeah, so it sounds to me like a key question is, do the users of CodeX actually enjoy reading its output? Is its output an invitation to them to collaborate, understand more deeply, appreciate, say the elegance of the code base and why is that important?

[00:28:10] Yeah, I think it goes back to producing artifacts that are valuable for humans and agents. If output is not something that you would personally enjoy reading or would not be beneficial to you, I don't know if it'll be beneficial to the agents that we're building. And so that's sort of the test that I have is we wanna be sort of human-centered as we move towards this AGI vision of producing things that are great for people.

[00:28:39] Cool, so Romain, you talk to a lot of teams and you probably are in a position to instruct them and give them nudges. For that litmus test, what are some signs that you might want to give them a nudge?

[00:28:49] Yeah, that's a good point. I think it goes back to how much everything has changed in the summer. And sometimes, when I hear teams telling me, oh, I tried Codex, and I was like, I tried to dig in, like when was that, like six months ago. Like, you have to check again. Like, the world has changed since. It's no longer the same kind of product.

[00:29:11] And if there is kind of like, any skepticism from a large, let's say, company to adopt these tools, the place I found the most exciting is to not try to convince them that like, our tools or some other tool is better than what their setup looks like today. But it's like, take a look at things like code reviews. Like, use exactly the tools you have today, but turn on CodeReview and see what happens.

[00:29:35] I've had like, some conversations where like, just a couple of weeks ago in London, we were just sitting next to a CTO. He turned on CodeReview. We tried to review like, one random PR they had recently merged.

[00:29:48] They had recently merged. And we found two major bugs that even the most senior engineers had missed. And they were chasing that bug in production. And so that to me is really profound. It's like seeing the glimpses of having this very senior teammate that can be omnipresent to really empower your team. And when you start seeing that, you start building more trust into the models. And you want to bring them closer to you. Like the engineers will be like, wow, OK, if I'm not using these tools, I'm going to fall behind. Because clearly, they are doing amazing work. So, yeah, that's been one of the stories recently.

[00:30:24] Cool. So a lot of people are software engineers because they like to code. They love solving problems. And I've heard on the internets, some people are saying, I'm not able to solve problems in the old way that really brought me into this profession. So I want to ask you guys, I'll start with you, Aaron. Where is your excitement moving? What are you excited about when you're using these tools?

[00:30:43] Yeah, I think at the end of the day, I'm excited about unblocking. And actually, if we could segue and look at the screen real quick here. So actually, we just saw the model produce some output. And it said, here's what I've accomplished. And here's what the next steps are. We can see that watchdog, that subagent kick in and say, hey, here's what your goal is. Let's keep driving towards that, and keep it on task. So I think it just ran for about 20 minutes. I wasn't keeping track of when we started. And it's now going to, OK, it's just continuing to start working on the code again. So it's cleared its obstacle, and it's proceeding. It's pretty amazing to see.

[00:31:28] Well, let's keep that running. And, sorry, if you could re-say your question. So as these tools step in and really master certain tasks, what are you excited about? What are you excited about doing with code? How is that evolving?

[00:31:39] Yeah, I think I'm excited that I get to spend, for me personally, I really enjoy the face time with other engineers, or product managers, or helping people with their problems. And I'm excited that I can set Codex on a task, and I can walk away from my laptop, and talk with someone, and think about what's the next problem that we want to work on.

[00:32:11] Cool. How do you see that evolving? For me, it's always been about creative tools, and how do you let creativity flow? And how are you able to test your ideas faster? For instance, one of my favorite features of Codex is what we call best of N. Even if you have a very nebulous idea, you can let Codex take four different approaches in parallel for that idea. And based on that, you see screenshots of the different approaches. And you're like, you know what? One and two don't really resonate with me, but three is very interesting. And then you start using that as a starting point, and you start iterating even more on the tasks.

[00:32:51] I think to me, it's like, how do you get this creativity flowing? And how do you make sure that everything is scaffolding, testing? You can offload that work to the models. But once again, I think what's very critical in this new era is how do you bring your taste, your vantage point? Again, anyone could build a solution to any problem, but I think if you are someone who is obsessed about that problem for months or years, you have a very unique vantage point as to what the solution looks like. And that's where your value comes in. But then you have the model as a teammate to help offload everything else. So I think that combination to me is what's exciting.

[00:33:33] Yeah. I can attest to the value of that best of N using the Codex Cloud interface to have it try and solve a problem in four different ways in a single prompt, and then just look at the results. When you combine it with this approach of producing a living document or a plan, it lets me really quickly survey what did each of these agents sort of, how did they tackle this problem? And I find it really actually very helpful to give it my end goal. What do I actually want to accomplish? And I leave the middle part of, how are we going to solve that problem ambiguous. And I like to see how the different agents tackle a problem. It's like you're embodying an evolutionary algorithm with your code, right? You're trying a lot of things, you're selecting the winners, you're going forward with the winners, maybe spawning some new variants and moving on.

[00:34:20] Absolutely.

[00:34:22] Okay, are you running several of these at once?

[00:34:26] Yeah, I frequently and so actually, when I was preparing for this, I was running I think four or five of these in parallel. And I would actually, the repository that I ended up uploading is actually the first one. So I felt pretty pleased with that result. And I found that the success rate for this, this is a fairly complex path.

[00:34:46] this is a fairly complex task. The success rate for completing it in a single prompt was over 75% in my tests here, so you can imagine just firing off a few of these and you're going to have some very good success rates.

[00:34:58] Okay, and how about several different projects that don't have a lot to do with each other? Are you tackling several disparate projects at once with different agents?

[00:35:03] Yeah, so I'm sort of a little bit limited in my own context window and my ability to switch between tasks, but I do find it very helpful for whether I'm talking to another engineer about a problem to fire off a Codex task and give me some background information. I can sort of file that away and I treat the Codex cloud interface as an inbox for me to look at later.

[00:35:29] That gives me the ability to parallelize a number of tasks throughout the day, which I can then come back to and review. I can say, okay, I really like this approach. I need to expand on this one; I'll file that for tomorrow. I really need to get to inbox zero in my Codex cloud. I'm still working on that.

[00:35:51] Yeah. Ramon, what do you see in terms of people parallelizing their agents in multiple directions at once?

[00:35:56] Yeah, I mean, we are also doing it ourselves on my team because we support an increasingly larger set of platforms, right? Like we have ChatGPT about to become an app ecosystem with the apps SDK, we have obviously the API capabilities that keep on being larger, agents SDK, agent kit, and so on. We use Codex, of course, to accelerate this work ourselves.

[00:36:20] Now it's pretty amazing to see how you can get a lot more done by getting organized with all of your Codex teammates. For instance, if you say, hey, I have all of these docs to rewrite in this particular way, I can offload this task, it's probably gonna take half an hour, and I'll check back on that. In the meantime, I want to focus on these other things.

[00:36:41] It's almost like increasingly increasing the possibility of parallelizing because beforehand, without these tools, you really had to be focused to go to completion. But now you can really trust Codex to take a very strong pass at a problem before coming back to verify it.

[00:37:01] So yeah, we use it a ton on our side and I think talking to also founders and builders, the very best ones or the most curious ones with these tools tend to lean into not having much of a backlog anymore. Because what is a backlog? It's a set of ideas you'd like to get to but probably won't until some time later. Well, Codex can probably take a first pass at many of these.

[00:37:30] I think when you're gonna start living into that future, almost every task is something you can start taking a V0 on, you know? So it's very interesting.

[00:37:34] Yeah, I think something we're really excited about in the developer productivity team is doing things like every time a test is slow in our CI system, let's spawn a Codex task to see if we can make that faster. Every time a test fails and an engineer says this was a flake, let's try and have Codex fix it.

[00:37:52] We get this sort of automatic triage of every incoming issue, and that's something that we are seeing. We're already working on things like this internally and having Codex automatically fix issues in my pull requests as I make them. It's not just going from review but it's going to automatic triage and resolution.

[00:38:05] If I can sketch out 99% of a solution but there's like a linter error, I don't want to have to spend time and come back to this, check out the code, work on it, and make sure that it passes. It'd be really great if an agent would just close that gap for me, and that's what we're seeing.

[00:38:25] I have a feature request. I need Codex to help me get better at context switching.

[00:38:31] Yeah, can we do that? We'll see if we can work on that.

[00:38:41] We're unlocking productivity gains and people are really working through their backlogs. What are the key skills? What is the bottleneck that people can get better at to maximize what these tools can offer?

[00:38:53] Yeah, I think what remains critical is design, taste, and point of view. Ultimately, you're going to be in the driver's seat to decide what is a great feature versus what is good enough, and if you want to shoot for outstanding, you're the one that has to bring the taste for the problem you're trying to solve for your customers.

[00:39:18] So, I think that's still critical. To your point of multitasking, you still have to collaborate with your human teammates, and now you have those AI teammates into the mix.

[00:39:30] I think being crisp in terms of communication with your team, making sure you have a great separation of concerns, those things remain true. It's just that you embrace now more.

[00:39:44] that you embrace now more potential with these AI teammates that take on tasks and then come back to check when they're ready.

[00:39:52] Yeah, so I'm hearing taste, kind of a knowledge of what's needed. It is how we create a feedback loop with these agentic tools. Like, we're the supervisor and the sense of supervised learning, which is probably hard to replicate. I think it's always been true in software that just getting to clarity on what ought to be built is hard, right? What are your guys' best tips on doing that?

[00:40:17] Yeah, I joked the other day, demoing some of these things to some folks internally, that we've sort of all become, we were profusely, all of us engineers, becoming sort of managers in a way. And with sort of thinking about sub-agents and having tasks that spawn more tasks, we're all sort of becoming directors. We've all got a little bit of a promotion there.

[00:40:38] And I think that in terms of making sure that you're adopting these coding tools, or even LMs and AI tools in general, is to be sort of flexible and be learning, right? These things are changing very quickly, I think we all acknowledge that. And the capabilities are expanding year over year.

[00:40:56] So I think being willing to have a strategy that works, experiment and try things out, and then to say, okay, the new version, the next GPT, can solve actually even more problems than before, and I don't need to tell it necessarily how to do this and this. I think that's very exciting.

[00:41:18] So it seems to me that one of the great things about this tool is that it allows humanity to explore the space of possible solutions. Many new products that haven't been imagined or only dimly imagined are possible now.

[00:41:35] Okay, quickly, last question. What do you wish somebody would build that hasn't been built yet?

[00:41:40] Oh, yeah. You play Dungeons and Dragons, maybe it has something to do with the game. You know, I love the art and artistry that goes into making Dungeons and Dragons modules, so I don't want to besmirch the wonderful people that create all of that work.

[00:41:58] But I think for me, there are so many little things I would like to do to organize my own life, and maybe Codex can help me build these things too, help me keep track of where all the stuff is in my home, or to help me organize my closet and actually make it sensible for me.

[00:42:15] I'm really excited for a future in which software is very cheap, if not sort of free.

[00:42:23] Yeah, I don't have one specific idea that comes to mind, but it reminds me of John Carlison's tweet a while ago, which I really loved, around the world being like a museum of fashion projects. I'm like, how many of these passion projects actually should exist, but we don't get to see, just because life comes in the way and they don't have the time to build them, or they don't have the skills to build them.

[00:42:49] And hopefully tools like Codex can be the bridge, like all of these passion projects that only live in your head, hopefully can actually now become real.

[00:42:58] Yes, we're heading into a flourishing, I totally agree.

[00:43:01] Okay, with that, Natalie Cone is going to open this up for Q&A. So people who have questions for Frio and Rahman, please shoot them into the forum and we are here to answer what's on your mind.

[00:43:13] Absolutely, and if we could show real quick, this task is still working and it's continuing to write tests and iterate on the plan. We kind of expected this, again, took many hours last night, but I've pushed up a version of this to GitHub and we can see that, for example, it has produced passing CI checks, really thorough formatting, linting, testing, checking parity with the upstream repository, and some really great documentation as well.

[00:43:43] From the README file to, sorry for the scrolling here, I know that's always a little distracting, to architecture documents that I can then look at and say, okay, this is actually how this project is laid out, this is a great starting point for me to understand what it did, and I think it goes back to what we were talking about earlier, is have the model output something that you would personally enjoy reading, learning about, and using.

[00:44:07] Awesome, thank you.

[00:44:08] Okay Natalie, over to you.

[00:44:13] Hey guys, that was awesome. Freel, I did not know that you were a Dungeons and Dragons fan.

[00:44:20] So I've actually been wanting to curate a small Dungeons and Dragons, what would you call it? Not a competition, but you know.

[00:44:26] A campaign?

[00:44:29] A campaign, a gathering for forum members. So I'll definitely tap you for that in-person event. And if anyone here watching is interested, please raise your hand and let us know so we can invite you.

[00:44:42] And one thing that I've...

[00:44:42] And one thing that I've noticed over the years of hosting this community fellows is that people in our community, or just in the public, are often surprised by who we are at OpenAI. They only know us by our products. So I just wanted to highlight a few fun facts. So Dungeons and Dragons for Friel, and Chris, Chris and I are on the same team. Chris is actually a journalist. But Chris, you just shared with us that you lived in France for 14 years. And we know that Romain is also French. And so we're hoping that the two of you, let's bring you back and do something for the developers in French, in the native language. Would you guys be willing to do that?

[00:45:27] Oh boy, that should be a lot easier for Romain than for me, Natalie. I'll give it my best shot.

[00:45:30] I totally believe in you, Chris. Okay, let's dive into the questions. But that was really beautiful. We loved it. And there are a lot of people live here today. So I'm so excited we did this. This is from a PhD researcher, Madina Akhan. And she asked, is Codex good for beginners in coding? How does Codex support beginners? Maybe Friel, we'll first kick that to you. And then I'm sure Romain might have something to say about it.

[00:45:59] Yeah, I think I mentioned earlier that I've been supporting all of the engineers here who work on a myriad of different products internally or externally. I've had to learn a lot of frameworks and toolkits that I've never seen before. And being able to use Codex to ask questions about how does this work, or how should I do this thing? I think it's an incredible tool. And it's really vital, I think, as we think about building software is making sure that we understand what it's doing. And so I think, yeah, Codex is an incredible tool for people just learning to code. We've optimized the models, too, as we saw on your screen, go for very, very long, complex tasks. And obviously for professional engineers.

[00:46:45] But I think, as we alluded to in the conversation, it gives superpowers to very different kind of profiles. Imagine you're not an engineer, but you're a product manager or a product thinker, now you get more technical. Thanks to Codex in the cloud and Best of N, you can actually start creating prototypes of features. And I think that's really powerful.

[00:47:05] Yeah, I think it's important to highlight that with GPT 5.1 Codex, we have this adaptive thinking where it spends the right amount of time essentially working on a problem. And so if you're asking it a quick question, you'll typically get a quick answer. And so not every use of Codex is starting a task and walking away for a few hours.

[00:47:26] And Natalie, if I could jump in.

[00:47:28] Of course.

[00:47:29] I've seen example after example of people bootstrapping their own ability to code with LLMs in general, let alone Codex, which actually has its hands in the code, so to speak. And I think the crucial feature there is that LLMs are an infinitely patient tutor that doesn't judge you. So people actually end up asking the real questions that they might be embarrassed to ask a skilled engineer.

[00:47:52] Yeah, awesome. And I think that ties us back. Chris and Roman, we hosted a CEO of Stack Overflow not too long ago, Chris Schandt. And Stack Overflow is weaving AI, building upon OpenAI's APIs to kind of give newer engineers that freedom or lack of judgment to ask questions and learn. So Chris, that really resonates with what Prashanth was telling us is kind of a new feature at Stack Overflow.

[00:48:17] And Chris, you're not necessarily an engineer by trade, but you are quite technical. You're the most technical person on our team. Do you use Codex?

[00:48:24] Oh yeah, I'd say almost every day. And that in fact is what led me to Ramon and to Friel, what was questions I had about better ways to use Codex.

[00:48:34] Awesome, yeah, and just so everyone knows here today, actually, this was all, we wouldn't be here if not for Chris. This talk was Chris's idea. So thank you so much for bringing this to the table, Chris.

[00:48:48] This next question is from Svetlana Romanava. She is a machine learning engineer and asks, in your own day-to-day work, what concrete practices have turned LLMs into a reliable part of your developmental workflow? For example, in terms of testing, code review, or architecture, rather than just an occasional productivity boost? Maybe Friel, like you're the, oh, Ramon, you've got something.

[00:49:12] No, I was gonna think about like my own experience, for instance, and like on our team, like we try to make sure the onboarding for developers is very like, you know, easy to use and all of that. I think there are many times where we discuss ideas or features that without tools like Codex, we would probably put on the back burner, and now there's no reason to do that. It's like, if we have an idea, great, let's actually experiment with it right away, because it's pretty cheap.

[00:49:40] Right away because it's pretty cheap to actually build those ideas. Codex can take the majority of the work. So I think it's actually changed the way we perceive work and difficulty, and we can achieve just so much more.

[00:49:54] Yeah, I think, yeah, reducing the friction to get started on something and to just try something out, that's been incredible. I think having these plan files or some sort of living document that the model produces and works on and produces a sort of a testament to what it's been working on has been incredibly valuable and probably more valuable than any specific focus area on testing or any other aspect of architecture in building a product.

[00:50:22] Awesome, and Roman, your team has been growing. Speaking of collaborating with your team and experimenting with projects, do you wanna give a shout out? I'm curious, who's working on your team now? And yeah, let us know, who is the developer experience team at OpenAI?

[00:50:34] Well, there's a handful of us, but I'm very excited about new teammates joining in the coming weeks. But yeah, we have now a presence in SF with people like Dominic, Corey. They're kind of like the experts of things like codecs or the chat GPT apps SDK coming up soon. But we also wanna be meeting developers where they are and be present in not just the SF Bay area, but also where they build and where they need support from OpenAI.

[00:51:06] So we have teammates in Asia, in Europe, like Katya, for instance, in Paris, Kaz in Japan, and a handful more coming very soon. So what we're very excited about is being that bridge between like the 4 million developers building on OpenAI every day and all our talented researchers, product engineering teams while building things for them.

[00:51:29] Yes, awesome. We can't wait to meet your new teammates. Okay, let's dive into another community question. This is from Daniel Green and I actually wanna give a shout out to Daniel Green because he's been in the chat the whole time answering questions and dropping links. Daniel is really such an awesome community builder.

[00:51:46] And he asks, which non-technical skills have you found the most helpful for engineers shifting from IC dev work to guiding codecs?

[00:51:55] Wow, interesting. Yeah, I think that as an engineer, I've had to put on sort of the product manager hat more and more often. And I find it very helpful to chat with folks who actually do that role, right? And learn from them.

[00:52:15] And to think about, like I said earlier, thinking about sort of the end state of a problem and how we get there, thinking about solutions. And it's just sort of, it's a new sort of skill that I'm picking up, and I found it incredible. And it translates really well to giving Codex instructions.

[00:52:29] Totally, yeah. Comes back to communication and discernment. And I think maybe a couple years ago, we talked a lot about prompting techniques and prompt engineering. Because you had to kind of steer the model in a particular way by doing some tricks.

[00:52:44] But I think what matters more now, the model's being so capable, is just the precision in how you describe the problem and where you'd like the solution to go. And I think you would describe such a thing, to your point, like a product strategist or a product thinker. If you're precise in what you're expecting the model to do, the better the output.

[00:53:07] So I think it puts the spotlight once more on discernment, taste, and communication, in some ways. And, Natalie, if I could jump in, I think something everybody should know about LLMs is that the model can't see the whole world.

[00:53:20] Right, it can't see the world. So as an LLM user or manager, like a question to keep in mind all the time is what is it about the world that the model can't see? It has the intelligence to process data about the world if you can get at that data, but what is it that it hasn't seen yet? Because then your job is to go out and get that data.

[00:53:42] Yeah, actually that's a really funny point is in several of the iterations of working on sort of the demo for today, I ran into issues where I'm on a Mac and I'm pushing this repository to GitHub and running it on these Linux runners. There are some discrepancies there.

[00:54:00] And we think about like, the question I had sort of coming into this and didn't have time to sort of prepare was if I'd given it the GitHub MCP and told it, oh, you can actually go ahead and look at the test results, would it have just solved that autonomously, right?

[00:54:13] And so when I think about like using codecs, MCPs, and connecting it to other systems that it can do even more work and learn about the world or learn about what it's actually trying to accomplish, we can do really incredible things.

[00:54:28] For sure. I think that's what's powerful with these coding tools now and like MCPs, like teams and companies, we tend to all live in like ecosystem, right? Where the data lives, where your backlog lives, like the bugs.

[00:54:38] your back-up lifts, like the bugs, Slack obviously for team conversations, and now with like having a coding teammate that can get all of that context, of course you get so much better results.

[00:54:49] Yeah, yeah. And just to take it back to that, so Codex is more powerful than say just an LM you talk to through a GUI, because Codex can see things happening on your computer. So it's already more powerful because it can see more.

[00:55:00] But that final integration test with the user, right, still requires people like Romain and Friel and me to go out and kind of take whatever they've intuited from the world based on their conversations with real humans who have real problems, to share that in order to create a compass for what Codex ought to do.

[00:55:21] Definitely. So going back to AI is a tool that helps us solve hard problems. I love that, Chris, thank you for clarifying.

[00:55:30] Okay, we have so many good questions. It's really hard to pick, but I'm gonna give us our last one. And this is from Andrew Mendez, a deep learning solutions engineer at HPE. He asks, what would you recommend software engineers do to stay sharp with coding skills when tools like Codex write a lot of code and developers, so that developers don't lose their sharpness?

[00:55:52] And just wanted to remind us here how Joaquin, our new head of people, he was formerly leading preparedness and Joaquin took a few steps back for about four months to become an intern because he said he wanted to go back in the weeds of coding. And this question just reminds me of that.

[00:56:16] So do you think that software engineers need to keep their coding skills sharp? And if they do, what types of personal practices, routines and rituals should they be engaging with fellas?

[00:56:31] Yeah, I think as we move to a world where lines of code is cheap, right? The end and sort of the proof is in the pudding, right? Does the code do what I want it to do? That's it's really important to be able to at least read some of that code.

[00:56:48] And I find that I spend more of my time and this is probably true of a lot of engineers even prior to the advent of LLMs is we spend a lot of time reading code, reading to understand, reading to debug. Now it's just reading the output of these models and making sure that it aligns.

[00:57:04] You mentioned a discernment, I really love that word here is discerning whether or not the output was something that I think is a quality use. And then if it's maybe it's using a technique I'm not familiar with asking it, why or how or what is actually going on under the hood here?

[00:57:19] And that all makes, I think me a better, more discerning software engineer, but it also helps them feedback into prompting the model the next time. So I agree, I think like reading code is actually like the bulk of what an engineer does like, you know, reviewing the PRs from your teammates and so on, while now you just add AI teammates into the mix.

[00:57:41] And I think that's how you stay sharp too because you still have an understanding of the whole picture and the whole system even though you collaborate with these AI agents. And I think you can also learn from the mistakes being caught by code review, by code experience.

[00:57:55] And you're like, wow, okay, I didn't think of that. So yeah, that makes so much sense now that I see it.

[00:58:00] Yes, I love this. Fellas, thank you so much. This was wonderful. I can't wait to have you back for multiple different iterations in different languages.

[00:58:10] It was really lovely to have you for the lunchtime talk. Chris, Friel, Roman, such an honor to have you in the forum today. I will let you all know when the recording is ready so that you can share it in your communities.

[00:58:21] Thank you so much, fellas.

[00:58:23] Thank you, Natalie, thank you everyone.

[00:58:23] Thanks, Natalie.

[00:58:24] Yeah, thanks, Chris.

[00:58:25] All right, friends, just to wrap things up, we have a few events in the pipeline that I'd like to make you aware of. We are rapidly approaching the end of 2025, and on the horizon, we have Discoveries Across Disciplines.

[00:58:37] And this is going to be a restream of an in-person event that we hosted a few months back with faculty researchers across disciplines. We specifically want to share with you that we have Catherine Elkins, who is a social science researcher in academia who has been leveraging AI to advance her research and her students' researchers, and Leonardo Emped, who comes from the Max Planck Institute in Cambridge, and he is actually an art historian.

[00:59:11] So we're gonna see how AI is really moving the ball down the field for researchers outside of STEM with that one. And then on December 15th, we have an in-person event that will also be live streamed to anyone in the world.

[00:59:25] And it is going to be featuring our VP of Open AI for Science, Kevin Wheel, as well as Brian Spears from the Lawrence Livermore National Labs. So if you'd like to join us in-person.

[00:59:36] So if you'd like to join us in person, please let me know, you can request an invite. We do have limited seats for that because it's in person.

[00:59:43] But you can also just join the virtual stream if you're interested in that piece or you're coming from another part of the world. Thank you so much, everyone, for joining us today.

[00:59:50] I'm Natalie Cone, the head of the Open AI Forum, and it is always such a pleasure to have you. See you next week.

+ Read More
Comments (0)
Popular
avatar


Watch More

Event Replay: Stack Overflow & Learning to Code in the Age of AI
Posted Nov 14, 2025 | Views 538
Terms of Service