Sign in or Join the community to continue

Event Replay: Decoding Biological Intelligence: Building AI Agents for the Brain Genome

Posted Apr 23, 2026 | Views 752

# AI Science

# AI Research

# Healthcare

Share

Speakers

Xin Jin

Co-founder @ PerturbAI

Xin Jin is a neuroscientist and molecular biologist whose work lies at the intersection of in vivo functional genomics and disease biology. She is an Associate Professor in the Department of Neuroscience and the Dorris Neuroscience Center at Scripps Research, and an HHMI Freeman Hrabowski Scholar. Her lab develops scalable in vivo genomic technologies to uncover how genetic programs shape brain circuits across development, homeostasis, and disease. She recently co-founded PerturbAI to help bring this vision into therapeutic discovery.

Her work has pioneered in vivo Perturb-seq, a high-throughput approach that combines pooled CRISPR perturbation with single-cell readouts directly in living tissue. By integrating in vivo CRISPR screening with molecular, spatial, and whole-brain imaging approaches, her research aims to define how genes influence cell types, tissue architecture, homeostasis, and neural circuit function.

Xin’s contributions have been recognized by honors including the HHMI Freeman Hrabowski Scholar Award, Sloan Research Fellowship, Pew Biomedical Scholar Award, McKnight Scholar Award, Peter Gruss Young Investigator Award from the Max Planck Society, and MIT Technology Review’s Innovators Under 35. Before joining Scripps Research, she was a Junior Fellow in the Harvard Society of Fellows. She received her PhD in Biology from The Rockefeller University and her BS in Chemistry from MIT.

+ Read More

Grace Zheng

Co-Founder, CEO @ PerturbAI

Genomics and machine-learning leader with over 13 years of experience developing breakthrough platforms at the intersection of biology, data, and therapeutics.

I am the Co-Founder and CEO of PerturbAI, an AI-native therapeutics company accelerating drug discovery through systems-level understanding of disease. Our platform integrates in vivo Perturb-seq with AI agents and models to map biological circuits directly inside intact organisms, enabling a new generation of biological models and therapeutics grounded in causal, in vivo biology.

Previously, I served as VP of Computational Biology and Machine Learning at ArsenalBio, where I built and led interdisciplinary teams spanning computational biology, machine learning, and software engineering. I also led strategic collaborations with partners including Genentech and NVIDIA to generate large-scale functional genomics datasets and develop AI models for cell therapy discovery.

Earlier in my career, I joined 10x Genomics as one of the company’s first employees, where I helped pioneer and launch several foundational single-cell genomics technologies.

My work focuses on building enabling technologies, scaling interdisciplinary teams, and turning biological data into predictive models that can guide the next generation of medicines.

+ Read More

Joy Jiao

Life Science Research Lead @ OpenAI

Joy Jiao leads the life sciences team at OpenAI. The goal of the team is to accelerate basic research and drug discovery, by operating across the model training stack to improve model capabilities at all levels of biology, from molecules to organisms. At OpenAI, Joy previously worked on model safety, personalization, search, and representation learning. She holds a PhD in Systems Biology from Harvard, where she studied the in-patient evolution of cancer cells during immunotherapy as well as the evolution of antibiotic resistance.

+ Read More

Natalie Cone

Forum Community Architect @ OpenAI

Natalie Cone launched and now manages OpenAI’s interdisciplinary community, the Forum. The OpenAI Forum is a community designed to unite thoughtful contributors from a diverse array of backgrounds, skill sets, and domain expertise to enable discourse related to the intersection of AI and an array of academic, professional, and societal domains. Before joining OpenAI, Natalie managed and stewarded Scale’s ML/AI community of practice, the AI Exchange. She has a background in the Arts, with a degree in History of Art from UC, Berkeley, and has served as Director of Operations and Programs, as well as on the board of directors for the radical performing arts center, CounterPulse, and led visitor experience at Yerba Buena Center for the Arts.

+ Read More

SUMMARY

This conversation framed biology as a field moving from description to prediction. Grace Zheng emphasized that modern sequencing, imaging, single-cell measurement, and editing tools are making it possible to see biological systems more realistically and model how changes may affect outcomes. Xin Jin described predictive biology as a shift from asking what something is to asking what happens if it changes, whether through a mutation or a drug intervention. Natalie Cone connected that framing to the broader OpenAI science effort, including GPT-Rosalind, which is intended to uplift life science research and accelerate discovery. The discussion repeatedly returned to the idea that biology is too complex for any one lab to measure experimentally in full, which is why AI-assisted prediction can meaningfully change research and medicine.

While we didn’t have time to get through all of the audience questions live, Grace and Xin kindly followed up with written responses linked here. https://tinyurl.com/44jzepa9

+ Read More

TRANSCRIPT

[00:00:00] Speaker 1: Hi, everyone, I'm Natalie Cone, head of the OpenAI Forum Community, and I'm so pleased you were able to join us this afternoon for another expert discussion on how AI is advancing science and laying the groundwork for solving some of humanity's hardest problems. What if we could predict disease before it starts? Provide faster treatments to each person, and discover new medicines in a fraction of the time? AI could unlock this future for human health care. Today we're joined by three leaders helping make that future possible. Joy Jiao, life science research lead at OpenAI, will be facilitating our talk today, welcoming Grace Zheng, co-founder and CEO of Perturb AI, and Xin Jin, co-founder of Perturb AI, professor at Scripps Research, and a Freeman Hrabowski Scholar at HHMI. Perturb AI is working to transform biology into a predicted science by combining one of the largest in vivo brain data sets ever assembled, with advanced AI systems that can help scientists explore more complex biological data in entirely new ways. OpenAI has been proud to collaborate with Perturb AI through our science efforts, reflecting how deeply we believe AI can advance scientific discovery along with world-class research partners. That includes our work on GPT-Rosalind, a new AI system built to advance life science research and speed biomedical discovery. Together these efforts aim to lower barriers to discovery, expand who can participate in research and accelerate the path from breakthrough science to new medicines and better care. Please join me in welcoming Joy, Grace, and Xin to the OpenAI Forum Community Stage.

[00:01:50] Speaker 2: So good to be in the studio with everyone, thank you so much for the intro, Natalie. Grace and Xin, to start us off, what do you think is changing in biology right now that makes this moment feel different from even just a few years ago?

[00:02:01] Speaker 3: What I'm seeing is that biology is shifting from a descriptive science to a more predictive one. Quite different from math and physics, which is full of theorems, biology is a discipline with few organizing principles but lots of exceptions. I can recall a handful of theorems like the central dogma of biology, but whenever people ask me about biology, the question is always that it depends. The same mutation can have very different effects in different people. One gene can have dramatically different functions depending on the cell type they're in. Over a long time, what people end up having to do is to choose what kind of questions they ask and be able to go really deep and measure those. But now, with the technology advances, what I'm seeing is that we can finally start to approach the biology as a whole system. Some of the technologies that I was witnessing allow us to measure biology much more realistically. For example, the sequencing technologies and the imaging technologies finally allow us to see what things are and where things are. The single cell technologies allow us to measure things at single cell resolution instead of on average. And then the amazing work that is done by human cell authors and the Brain Institute is giving us the opportunity to set up the reference maps on what's happening in healthy cells and healthy tissues. And last but not least, the editing technologies allow us to make a change and see what happens. So we're at an explosive time of technology and as a result, biology is changing for the better.

[00:04:01] Speaker 2: Very exciting! What do you mean by predictive biology?

[00:04:04] Speaker 3: Yeah, as we were talking about, biology is experiencing the transition from a descriptive science to a predictive one. So how I think about it is when we talk about descriptive biology, it's talking about what things are, but when we're talking about the predictive one, it's what happens if something changes. For example, what happens if there's a genetic mutation? What happens if I take a medicine? What's going to happen to my body? Fundamentally, medicine is a prediction and a perturbation problem. We want to see what happens if someone takes a drug, and how it's going to affect their outcome.

[00:04:52] Speaker 4: Yeah, I think biology is dynamic. It's like having a map to San Francisco or New York City. These are cartographies and generating a descriptive biology like Grace is talking about.

[00:04:58] Speaker 1: Descriptive biology like Grace is talking about. But having the map doesn't mean, you know, how to enjoy New York City. So I think being able to generate these predictions to some degree, but really understanding the system as a dynamic entity is what's really where the therapeutics and very effective treatments might come from. Yeah, and you know, ideally, if we have a way of measuring everything, then that will be great. But fortunately, unfortunately, biology doesn't give us a single variable, right? It gives us so many variables in terms of the number of genes and the number of proteins and tissues, and on top of that, the diseases. So, you know, the number of combinations is just astronomical. It's impossible for any one lab to measure experimentally what's going to happen if something changes. And this is where the predictive biology comes in, right? Without it, then medicine and biology becomes a trial and error problem. Makes sense.

[00:06:01] Speaker 2: Yeah, so biology has historically been a very difficult model, and to predict. What do you think has changed now that makes prediction more realistic? Is it, you know, something that happened across data, computes, or AI?

[00:06:11] Speaker 1: Yeah, well, you know, I think it's a combination of all three, if not more, right? We just talk about the data, because all the technology advancements, then we can not only measure DNA protein, what's happening inside the cell, but also longitudinally over time, over different spatial contexts, and different disease contexts. So it gave us a systems of view of what's happening in the biology. Then we talk about the, you mentioned compute, and for the first, you know, like nowadays, we can finally train the models that we couldn't imagine a few years ago. That's right. And then lastly, you know, AI, right, it's really just revolutionized our life.

[00:07:01] Speaker 2: And Julie, maybe you, you know, I wanna talk about the GPT Rosalind, right?

[00:07:08] Speaker 3: Yeah, so Rosalind is really, I think, kind of the beginnings of a series of models that we want to release that is focused on helping uplift the life sciences and kind of accelerating research. I think the way to kind of like mentally understand where Rosalind is, it's still a recent new model. So this is now kind of like a new biology foundation model, but the difference is, let's say you imagine that you have a maybe first-year grad student trying to do a task in biology. So it can still access things like AlphaFold or it can still do a single cell RNA seq analysis, but this first-year grad student kind of still has a lot to learn, right? Even though it kind of has a foundation, there's kind of a lot of like tacit, expert knowledge that is kind of the outcome of this process of getting a PhD. And so what we really want to instill into the Rosalind model series is kind of this expert tacit knowledge. And so I think ideally what we want to see is the model is kind of beyond PhD level as all of these tasks. And it also just has kind of great intuition for when to call these schools and just to have a really kind of foundational understanding of the biological and chemical world.

[00:08:25] Speaker 1: But yeah, also I think obviously at biology, foundation models also still has a place in the world. And so Grace, I don't know if you want to talk a little bit about that as well.

[00:08:34] Speaker 2: Yeah, there's foundation models, predictive models and turning the effort into going into that from industry, academia, the work that's done at CCI, that R Institute and Zyra and Genentech, we've seen wonderful examples and lots of improvement. So I can only imagine where we'll be just in the next year and the advancement in predictive biology.

[00:09:04] Speaker 2: Great, all right. Well, thank you so much Grace and Shen. With that, I'll let you present a few slides that you prepared and then I'll jump back on in a few minutes to continue our conversation.

[00:09:14] Speaker 1: Great.

[00:09:16] Speaker 2: All right. So we all know that one in every five adults will experience some form of serious mental illnesses in our life and that is over 60 million people in the country. And for the longest time, we know brain disorders, neurodegenerative diseases, and also other types of chronic disorders are very common, extremely costly. Many of them also are highly genetic. So these conditions impact a lot of people and we still don't yet understand their genetic mechanisms well. So the question is, why is this so hard and so difficult?

[00:09:56] Speaker 1: Hard and so difficult. And after decades of beautiful human genetics research, now we know that there are long lists of genetic variants or genes and mutations that are highly associated with these devastating diseases and disorders. The gene lists are getting longer, as the gene discoveries are ongoing and sequencing is getting cheaper, faster, and better. And yet, what we don't fully understand is how each of the individual genetic variants actually manifest our disease risk or resilience by acting on many different cell types in our body.

In other words, every single gene's job in different cell types in our body is just different. You can think about these different cell types like different citizens in a village, and their jobs can be highly specialized, such as the jobs of a nurse versus a teacher in a village. When disease happens, they also have different jobs—some serve as the first responders and some are compensatory. These jobs sometimes can be really hard to replace. We really need to understand how each individual mutation that contributes to disease risk acts differently across different kinds of cell types.

Putting this together is really a challenging problem because we're asking not just about many of these disease risk genes, but also how individual impacts manifest in different cell types. What we're really interested in building is this kind of functional analysis. Grace just talked about the difference of sort of description versus prediction biology. So just by looking at a static picture of the cell type or the village, what we need to learn is more than that to build functional analysis and introduce CRISPR mutations.

When disease contexts kick in, and when disease mutations kick in, how do different citizens in the village react or compensate and cope with the situation? This is obviously a dynamic situation that we really want to build this picture. If we want to study many genes across many different cell types, you will realize that the number of possible combinations also becomes enormous quite quickly. Across tens of thousands of different genes in our genome, across hundreds if not thousands of different cell types, we are looking at at least a sort of 20 million unique combinations to discover how each gene acts differently across all types.

At the experimental side, there are great advancements in CRISPR technology and genomic phenotypic readouts that allow us to build this space using technologies such as Proteroseq. However, if we're conducting experiments in vitro, analyzing thousands or tens of thousands of genes across a few cell types that we can raise on a petri dish, we are really sampling a very thin slice of this huge search space due to the lack of cell type diversity that is very hard to grow or mimic on a petri dish.

[00:12:57] Speaker 2: So at Perturb AI, we're building a new type of training data for biology and medicine. We use CRISPR to introduce perturbations directly in the brain and measure the outcomes at single-cell resolution. Rather than just observing the association between genes and cells, we're now starting to understand the underlying logic. When you introduce a change, which cells will respond and how?

The result is the largest in vivo CRISPR Atlas of the brain. It involves hundreds of cell types covering 2000 risk genes associated with neurodegeneration, psychiatric, metabolic, and inflammatory diseases. Most of those genes have never been disrupted in the brain and definitely not studied at cell type resolution. This scale dramatically opens up the new functional search space for predictive biology.

Since we open sourced the data, the response from the community has just been overwhelming. We've had thousands of downloads from academia and industry, reflecting the huge demand for such causal data. We've had lots of collaborations and deep discussions with technology and pharma companies. We’re truly gratified by the level of interest we're getting and very excited about what the entire community will discover. Our excitement comes from the level of resolution and what this data reveals.

What you're seeing is a whole living system of brain architecture interconnected with diverse cell types that can never be recapitulated in vitro, all captured at single-cell resolution. You can even see the pyramidal neuron axons projecting all the way to the striatum in interconnected brain regions in their native context.

[00:14:54] Speaker 1: In their native context. And context is what matters here because genes do not act in isolation, but function in cell types, dependent manner, very dependent on the environment they're in and then the biological cell states. This is where we can start to learn the biology. Let me give you a concrete example of, yeah, let me give you, let me give you the concrete example of the Green 2A and 2B subunits of the NMDA complex. Before this data, most of the knowledge about these two genes came from gene expression and human genetics. Green 2B is linked to intellectual disability, 2A linked to schizophrenia. But their disease mechanism remains mostly a black box. Our data helps shed light on this in this black box. What we found is that Green 2B functions across cortical and subcortical neurons, whereas the Green 2A function is much more limited. This is completely consistent with a clinical phenotype.

In addition, just because the 2A and 2B are part of the same complex, the functions are not interchangeable. What we found is that when they're disrupted, they have opposite phenotypes. This underscores the point that those two genes need to be treated and drugged very differently. This speaks to broadly the benefit of a functional in vivo data. It provides a missing link between genetics to mechanisms and mechanisms to drug targets. With this, drug discovery becomes much more informed and deliberate. We believe large-scale causal in vivo data will become the underlying infrastructure for system-level drug discovery and core training data for the next generation of AI models. This infrastructure is only as beneficial if we can make sense out of it. Within days of releasing this, generating the data, we're able to get critical insights. This would have been very difficult with the traditional bioinformatics approaches, which are slow, tedious, and highly specialized.

What I want to show you today is a different way of interacting with science. For this demo, I used our CRISPR Alice as the input and analyzed it directly with CHAT-GPT and Codex. There are two parts to this demo. First, and second, can you help us decide what matters? Let me start with the raw data. I uploaded the raw data directly to CHAT-GPT. For the demo, I used a subset and asked CHAT-GPT to summarize the data, assess the quality, and suggest the next analysis steps.

[00:18:02] Speaker 2: Right away, it recognized this is not ordinary single cell data. It's a CRISPR perturbation dataset with both gene expression and CRISPR guides capturing the same cells. It correctly identified 8,000 cells, 19,000 genes, and roughly 2,000 target groups. So immediately, you can tell this is a high-resolution causal dataset. It also recognized the RNA quality looks strong, high complexity transcriptomes, lots of genes detected per cell, and very low mitochondria signal. In other words, the dataset is clean enough to support real biological interpretation.

Without me telling it anything, CHAT-GPT correctly inferred that these are mostly brain cells, primarily excitatory and inhibitory neurons, directly from the marker genes in the data. This is already a meaningful biological readout. From there, it suggests the next set of analysis that a good computational biologist would do. For example, the cell type matters; you shouldn't do any analysis that treats all the cells the same way. The takeaway from the first step is with one natural language prompt, I can quickly assess the quality of the data and the biology inside it. Good data makes the biology readable, and AI makes it much faster to read. But processing is only half the story. The bigger challenge is turning the output into a hypothesis that actually guides the next experiment. Traditionally, this means reading papers, cross-checking databases, and manually stitching together a story for each gene.

[00:19:52] Speaker 1: Stitching together a story for each gene. It works, but it's slow and honestly pretty painful. This is where AI agents become very useful. Let me show that with Green2B. This report is generated by a small team of agents. One looks at the biology, which cell types respond, and what may be going on mechanistically. One looks at the clinical relevance, what has already been tried, and what the risks may be. And then a grader puts it together into a simple decision signal.

[00:20:28] In this case, the biology agent shows something important. Green2B is not affecting all the brains evenly. Its effects are concentrated in a specific set of neurons. This is much more useful than knowing something just happened in the brain. And this is only visible because the data has real cell type resolution. The report also shows what's changing in the cells. When Green2B is disrupted, these neurons lose genes involved in synaptic communication, basically genes that help neurons signal and connect. This fits what we know about Green2B biology, so the results make sense.

[00:21:12] Then it compares Green2B with Green2A. These genes are part of the same receptor complex, so you might think they would behave similarly, but they don't. Green2A looks more like a counter signal. This is exactly the kind of distinction that matters for drug discovery because it tells you these genes are not interchangeable. The clinical vision makes the biology more practical. Green2B has already been tested in the clinic, and the broad attempts to block it have not worked well. So if our data looks like reduced Green2B activity, more inhibition may be the wrong direction.

[00:21:56] And this is the kind of insight you really want, not just what to pursue or what not to pursue before spending years going down the wrong path. Finally, the grader turns all of that into a simple decision to help us prioritize. That matters even more at Alice scale. With 2000 genes, the challenge is not finding one interesting story, it's deciding systematically where to focus. That is what this index page is for. Turning thousands of the reports into something a scientist can navigate. We can filter, we can sort, and we can search.

[00:22:38] HHTT is a famous Huntington's disease gene and reassuringly scores very high on clinical relevance. I'll stop there, although there's much more we could do. For example, running synthesis agents across all of the reports, generating visualizations, and so on. But I hope this gives you a glimpse of the future. AI agents and reasoning models working together with scientists to turn raw data into high-quality biological insight. The goal is not just for faster analysis. It's better science getting to the biology faster, more systematically, and in a way that's much more accessible. And this is exactly what we're building at Voterpa AI.

[00:23:24] Beyond a single data set, beyond a faster workflow, a new discovery engine, combining high throughput in vivo CRISPR with artificial intelligence to decode biological intelligence and build better medicines. And let's bring Joy back home.

[00:23:43] Speaker 2: All right, thank you so much for your presentation. I also just think it's really funny how you can tell when ChatGPT has written an HTML report because it always uses the same color scheme. Yeah, that's like, okay, this is definitely from ChatGPT. Let's kind of just step to human health. How do you think that predictive biology can change what's possible in medicine over time?

[00:24:10] Speaker 1: Well, let me just start by saying I'm not a clinician, so I'm a biologist, but from a biologist's perspective, better medicine will come from better biology. And we just talked about diseases that are highly heterogeneous and every difference has a different underlying biology. By knowing all of those mechanisms, it will help us get better treatments in terms of precision medicine. So just an example, right? Type diabetes, type one, type two, and even within type two diabetes, there are so many different types. And the better we understand all the underlying biology, the better we can tailor the treatments.

[00:24:50] Speaker 1: Yeah, and I think in the future of young medicine, there's also a rather large field focused on AI and health. There are a lot of health records, a lot of information about human biology there. But don't you want the health AI to actually understand the foundational, let's say, about cell division, cell biology, to be able to hopefully integrate this information together and give better advice?

[00:24:15] Speaker 2: Definitely.

[00:25:16] Speaker 1: Do you think there are specific diseases of the brain, for example, neurodegenerative conditions like Alzheimer's where disruption could be especially important?

[00:25:26] Speaker 2: Yeah, definitely. You know, genetics is a huge component to neurodegenerative diseases, and we just talked about GWEN-2a and GWEN-2b; they're playing an important role in intellectual disability, but the same thing applies to Alzheimer's. It's such a devastating disease, but it's only recently we started to appreciate microglia, this immune cell type, and its role in Alzheimer's, right? It just underscores how important it is to look at cell types and combine that with genetics to see how they manifest in diseases.

[00:26:05] Speaker 1: Yeah.

[00:26:06] Speaker 2: And I would say, as powerful as genetics and human genetics can be, oftentimes what they tell you is sort of the focal point to look at. The genetics, I mean the causal genes themselves, often are not really the eventual sort of targets for the therapeutic. We see that in oncology and cancer biology all the time. So genetics is a very effective and strong sledgehammer, but to design medicine out of that, I think there are a lot of examples where people often use functional maps to not maybe look at the primary heater but the secondaries. Really, it’s a balance between the effect size and toxicity or tolerability. I think this is where functional data are helpful. Functional data from the right cell type in the right native context, from the disease environment—that's very helpful. But AI integrated with this large amount of realistic physiology data can really bring about effective treatments that ultimately help us to understand the biology underlying.

[00:27:04] Speaker 1: Got it. What do you think it means to move from mapping biology to predicting biological outcomes? So for example, what kind of infrastructure? Thinking about datasets and computing to different types of collaboration are needed to make this reliable.

[00:27:22] Speaker 2: Yeah, we just talked about the convergence of data, compute, and AI models. Just to double-click on the AI models, in order for it to move truly to a predictive science, we really need to come up with better evaluations of AI models—probably not only general models but also in different verticals. Those come with different validation datasets and evaluations. On top of that, something that we often ignore, but it's critically important, is the people participating in this whole loop. We want people coming from different perspectives to bring different expertise, and with more collaborations, it will help bring all those three ingredients together.

[00:28:19] Speaker 1: That's right. So speaking of people, do you think this changes who can participate in science?

[00:28:26] Speaker 2: Yeah, I think technologies, and especially in this case AI, are really good at breaking down the barriers for more people to participate in science and enjoy it. You and I have probably both experienced the time when we need to do a lot of installation troubleshooting before we can ask that burning biological question. AI can take care of a large portion of that and help us focus on these questions sooner and better. Because of that, I think science will be more interdisciplinary than ever, really allowing us biologists, including molecular biologists, neuroscientists, and genomicists to have a conversation with lots of clinicians, because they can help us to translate each other's languages. I think that's a really exciting time for more to participate, for everyone to work on these ambitious large projects together.

[00:29:25] Speaker 1: I would love the day when I no longer have to install any of my own packages and AI can just do it for me. AI can help us in a lot of different ways. What do you think requires either human judgment or domain expertise or experimental validation in this world?

[00:29:42] Speaker 2: Yeah, well, you know, biology... we just talked about it. It's complex; it's heterogeneous. So in addition to...

[00:29:48] Speaker 1: In addition to, we can get all the help we can from AI, but fundamentally we have to trust and verify, double checking, and exercise our critical judgment. And also designing experiments and thinking about interpreting the output, and coming up with better validations, and all of those still require human involvement. Right?

[00:30:15] Speaker 2: Yeah, I think that also is sort of consistent with how traditionally biologists are doing sort of research in the lab. We all know that any tools, no matter sequencing or imaging, or other types of data, they're not perfect. But usually the weaknesses of any technology are not overlapping, and therefore if you see a certain phenomenon using one technology, but you validated this sort of intersection, it's likely to point into something that's conserved.

[00:30:45] Speaker 3: Ladies, I'm just gonna jump in, cause I'm worried that maybe Joy's wifi is having challenges. Sorry, Joy.

[00:30:51] Speaker 4: Hi, I am so sorry. I think I have one of those critical Chroma updates. I just decided to cancel.

[00:30:58] Speaker 3: Oh, no.

[00:31:00] Speaker 4: Of course in the 24 hours of the day, we'll do it right in the middle of a live stream. But I am back.

[00:31:05] Speaker 1: That has happened to me before, Joy. No problem at all. Thank you so much for being here. You shouldn't even be working right now. We know you were not feeling well, so you didn't want to come into the office. No big deal. And we totally apologize to those in the audience for the technical challenges. We decided to make this work today because we really wanted to host Perturb AI. We made it work. We're all in community here. It's okay. Things don't have to be perfect. They just have to be good. And this is really amazing, awesome research, ladies.

[00:31:32] Speaker 1: So maybe we'll just jump into some audience Q&A because we have so much. I don't even think we can get through them all, but this is so fascinating. And I also wanted to share with you ladies that we have really amazing community members present right now, Dr. Shauram Yazdani. He's a faculty chair and faculty and chair of UCLA Pediatrics. We have Sarah Banya, Assistant Professor of Chapman University. Just so many cool people here to listen to your research and your talk, Joy, and Grace, and Xin. So I'm really glad we made it work. It truly is okay that it's not perfect. We are human and Joy, we also hear your babies in the background. How many babies do you have?

[00:32:15] Speaker 4: It's just one, but she is very loud.

[00:32:20] Speaker 1: I love it. She is one and a half. Awesome. Those are the good days.

[00:32:27] Speaker 1: Okay. Let's jump right into our questions.

[00:32:33] Speaker 1: So this is from Sarah Bonia, Assistant Professor at Chapman University. How can you causally target genes? Maybe my definition of causal is from economics, but I'm wondering what that means more specifically.

[00:32:51] Speaker 2: Yeah. So we leverage the wonderful advances in gene editing. In this case we use CRISPR guides with advanced proprietary delivery technologies to get it to specific parts of our body and tissues.

[00:33:06] Speaker 1: Awesome. Okay. And Jason Deluca asks, what new AI techniques are making it possible to model brain activity as a live system in near real time, and which seem most promising for future cognitive enhancement therapies?

[00:33:21] Speaker 2: I think this is eventually a question for Joy, but I think, you know, for the brain, it is one of the most exciting organs. I would say I'm obviously biased, but I think the complexity really comes from the fact that there are many brain cells and they're intelligently wired together. They are constantly communicating to each other as you're typing the question, as I'm answering questions. So, it boils down, if we're trying to understand the brain as a data science, then what kind of data is sufficient and what resolution do we want it? But I'm sure Joy has more thoughts on that.

[00:33:54] Speaker 4: I think it's a really open and exciting question. Yeah, definitely. I mean, I think thinking about actually modeling the brain is probably coming from a reasoning model perspective. I would think of our models more as kind of like a really good ML engineer to actually then build a model of the brain, as opposed to doing it directly inside of like a GPT model. But yeah, I think this is a field in which we're seeing a lot of uplift from the reasoning models. Right now, I think we're seeing kind of like superhuman performance already in this field. So yeah, really excited to see where it goes.

[00:34:29] Speaker 1: Awesome. I also just wanna take a moment to let you ladies know that we have many members of research scientists from the national labs like Los Alamos here too. So that's super rad.

[00:34:39] Speaker 1: Okay, diving back into the questions. This is from Adam Zimmerman, associate director of analysis and information management at Florida International...

[00:34:46] Speaker 1: at Florida International University. He asks, what's the biggest scientific bottleneck before predictive biology can reliably guide real longevity interventions using blood panels and longitudinal aging biomarkers? Yeah, I think aging is one of the most fascinating questions. It is a phenomenon that's conserved, highly conserved in almost all species on this planet. I would say, so going back to this like descriptive biology versus predictive biology. A lot of things are associated with aging. When I age my hair go whitened. But whitening, styling my hair is not really rejuvenating my hair follicle, right? So to some degree, these are age-associated changes. And it's much harder to thinking about when you have such a long time for the cascades of changes to happen, who are the first responders, who are the secondary, who are sort of compensatory, which obviously changing these downstream factors may not help with age itself. And I think from the biological understanding, we know so much about age and clock, the chromatic changes, cellular dynamics. But I think the question here is when a cell ages or an organism ages, many things happen. So amongst all the things that happen, how do you do a causal inference to know that across, let's say, thousands of genes that change, which is the one that we should reverse that's sufficient to change the cellular age per se? And then there are some really, really exciting examples out there, such as the Yamanaka factor. This is Nobel-winning sort of cocktails that can bring any terminal differentiated cells back to this ground state stem cell. So that's an extreme example. You probably don't want to do that too often. But I think the question is can we use AI to understand what drives age, first of all? What drives age differently in the skin and the brain and other organs, ultimately, to design that therapeutic mechanism? And this is why we're such a fan of causal, in vivo biology. So then we can now take a systems view, making all those changes and measuring their impact and then use AI to help us really understand and then effect change.

[00:37:00] Speaker 2: Awesome. Xin, I also want to raise your awareness that your colleagues are here from Scripps University. I see Marco Uetiebo in the audience. They're here to support you. Yay, awesome. This is truly a community. So just since we can't see their faces, I want them to know that we know that they are here.

[00:37:19] Speaker 1: OK, awesome. This is from Saman Hussain, a scientist at NIH. Do you think such a CRISPR atlas could be built with single-cell imaging data of the brain as opposed to transcriptomics? Would GPT be as useful for image analysis?

[00:37:40] Speaker 3: Yeah, that's a great question. Can we actually start building causal perturbation atlases across the whole brain and ask how the neural firing and their patterns of activity changes? And I think that's definitely one of the ongoing directions that we're working on. But I'm super curious to hear from Joy, if we have this data, let's say tomorrow, how do we analyze these, each perturbation longitudinal data and make sense out of them? See Joy, I'm without you here today.

[00:38:10] Speaker 4: Yeah, this is a good question. I feel like the model’s current capabilities, I think if you just gave it a bunch of images of cells, it’s probably not very good at doing that zero shot. This is something we’re looking at improving right now or in the near future kind of for the model to really start to understand these. But I do think the model would be able to write some really good scripts for analysis. And so I think you can actually get pretty far where even if the model, looking at the images natively, can’t perceive them very well. I think it will be able to still do a lot of really good analysis. So I think thinking of this as a really smart human, like a postdoc in the lab or something analyzing the data, they probably wouldn’t be looking at all the different images individually. They would also probably be scripting. So I think the model will already be able to do this quite well. And I would just say that my experience in Chatterpathy with single cell, you know it’s an example. I can only imagine what Chatterpathy will do in a few months or in a short period of time. I would continue to be amazed by its capabilities.

[00:39:19] Speaker 3: Joy, maybe you can take the first stab at this one. This one comes from Oliver Pehde, quant researcher. Do you think general large language model-based AI like GPT-Rosalind will have more impact on biology than modality-specific foundation models?

[00:39:37] Speaker 4: Yeah, that’s interesting. I think they do very different things. The example that I gave a little bit earlier is kind of by thinking about what...

[00:39:44] Speaker 1: Earlier, I was kind of thinking about what Rosalyn does versus a foundation model. I think right now, the way that they interact is mostly in terms of tool use. A common example people use is AlphaFold and how to hook that up to a large language model. Using a tool like AlphaFold or Bolt, etc., correctly and kind of understanding the outputs and trying to see if you get an output that’s still a high-quality prediction, actually requires a lot of expert knowledge. I think that is where the large language model can come in. So, you really need both to kind of push the frontiers of science.

[00:40:28] Speaker 2: Awesome. Thank you, Joy. So Peter Bryant, adjunct professor at IU University, asks to what degree can your method expose intracellular mechanisms? Or will the method point to the intracellular effects for a specific cell type that require further investigation?

[00:40:41] Speaker 3: Wow, neuroscience, I love it. I think the video that Grace showed is a depiction of how there’s this entirety of a full brain. What we’re seeing are individual labeled neurons, and you’ll all see that they have different beautiful architecture and connect to different parts of the brain. Each individual neuron that you’ll see that’s labeled actually harbors a different mutation. That’s exactly the type of data that we’re also collecting. Not only looking at when a mutation happened, so autonomously, intrinsically what happened, but also how does it change my ability to talk to the rest of the brain? Because I think ultimately the brain is intelligent because it’s a bunch of cells that intelligently wire together. So to study how they interact with each other and how that changes when the environment changes is definitely something that we are very excited about, generating this kind of imaging data to capture these complex interactions and further analysis.

[00:41:46] Speaker 1: Okay, maybe two more questions, ladies? Great. Let’s take a question from a non-technical research scientist in the audience. We have Nicole, VP of Marketing for S2 Genomics. She says, these are incredible results. I understand that you can quickly cross-check the metrics. Hallucinations come with big consequences when applying AI to biology and medicine. How do we mitigate these risks?

[00:42:11] Speaker 4: Yeah, I think hallucination is still kind of a frontier problem that is difficult to fully solve. But I think from GPT-3 to now, we've made a ton of progress. Especially for this use case, everything that the model concludes is very well-grounded in different citations. Here, the model is pulling actual analysis and patterns out of the data. When it does that, it’s associated with the actual analysis that it’s done. It’s cross-checking different databases and publications, and as it does that, it also cites where these different types of information came from. By forcing the model to be very grounded in this case, we then also have confidence that there’s no hallucination in these conclusions. But of course, for something that’s of high consequence, you want multiple layers of mitigation on top of that. If we’re here too, we also have the original agent do the analysis, and we also have a judging model go back over and check its work and so forth.

[00:43:19] Speaker 4: One thing that comes to my mind is this famous story about a talking dog. Natalie and Joy, have you heard of this? It's by this famous theoretical neuroscientist Terry, and what he said when Chat GPT came out was that there’s this talking dog, and people are just so amazed by it that it doesn't really even matter what the dog is saying. I think this applies to this question that just because the language models and AI have been giving us something, it doesn't mean we need to suspend our judgment. We need to be critical and think about it, and bring in orthogonal evidence together. It’s important to consider what the uncertainties are that we should be careful about.

[00:44:06] Speaker 1: I love that analogy. Okay, last question from the audience, ladies. This is from Medina Akon, PhD researcher at HGU. How can your AI model the cascading ripple effects of a gene modification across a network rather than just target one to predict unintended impacts on overlapping genetic pathways?

[00:44:31] Speaker 3: Yeah, this is one of the wonderful things that came out of our data. I don’t know if you remember this heat map we showed with green 2A and B, that the heat map kind of...

[00:44:42] Speaker 1: The heat map is essentially a representation of the network. It merges from our data rather than us specifically looking for it. What it nicely does is to group all the genes that are in a similar network together, and those are colored red. We didn't go into those details earlier, but then they're also the ones that are in opposite networks or opposing effects; they're colored a different color. So it's a really nice way of knowing not just how the genes are playing an important role in an individual cell, but how they're communicating with each other and how cells are communicating with each other. And so what I love about, one of the core things I love about this kind of a data set. This data set is released as well as our manuscripts, so check it out on our website. Definitely. And maybe we can drop that in the chat.

[00:45:38] Speaker 1: Well, ladies, that brings our talk to a conclusion. It was wonderful spending time with you in the OpenAI Forum and Shin and Grace, you're in the San Francisco office at OpenAI. So I think you have some really fun lunch dates after this. So glad you were able to join us in the office. Joy, thank you so much for showing up today. I do not think I could have hosted this without your expertise. I hope you take really good care today. Spend the rest of your day resting. We're going to completely not bug you for the rest of the day, I promise. But it was really, really a pleasure to host you ladies. Thank you so much for being here in the OpenAI Forum.

[00:46:14] Speaker 1: Grace and Shin, I hope to see you soon for another in-person event, a workshop that you're going to be contributing to. And Joy, I just cannot wait to see the incredible work that comes out of your team in the future. We will be in close connection and I hope this is just the very first time the forum gets to host you. For all of you in the audience, thank you so much for being here with us today. This is our last event in April, although who knows crazy things spring up all the time. Something else will land on our plate tomorrow morning. But we will be publishing all of May's events very soon. So keep an eye in your inbox for the OpenAI Forum newsletter. That's where we post all of those events.

[00:46:59] Speaker 1: And we're gonna have a sneak peek of another in-person event. We have two in-person events this coming up month, and one of them is going to be in DC. So if you're in the DC area, especially, please keep an eye on your inbox. Shin, Grace, Joy, have a wonderful rest of your day. Thank you for being here and to the community members, we'll see you soon.

[00:47:13] Speaker 2: Marco, it was really cool to see you again. Good to see you, bye everyone.

[00:47:20] Speaker 3: Thank you.

+ Read More

Comments (0)

Popular

Watch More

Event Replay: Sam Altman on Building the Future of AI

Posted Apr 06, 2026 | Views 7.2K

# OpenAI Leadership

# AI Governance

# AI Safety

# Economic Opportunity

Event Replay: Jobs in the Intelligence Age

Posted Sep 04, 2025 | Views 1K

# Future of Work

# OpenAI Certifications

# OpenAI Jobs Platform

The Importance of Public Input in Designing AI Systems: In Conversation with The Collective Intelligence Project

Posted Mar 11, 2025 | Views 25.1K

# Democratic Inputs to AI

# Public Inputs AI

# AI Literacy

# Socially Beneficial Use Cases

# Social Science