From Microscope to Monitor: Stanford Pathology’s AI Journey

Dr. Rojansky is assistant professor and co-Director for Pathology Informatics at Stanford University. She completed her MD/PhD at UCLA/Caltech and her residency and fellowship training at Stanford University with subspecialty training in hematopathology and molecular genetic pathology. She is a practicing pathologist and leader in the digital pathology transformation at Stanford. Her research spans image analysis and AI for improved medical diagnosis.

I'm currently working with OpenAI's Human Data team, supporting the company’s model training and safety initiatives while partnering with some of the most brilliant domain experts and researchers across many fields.
My background is a blend between academia, the humanities, and AI. I received a PhD in linguistics where I explored the intersection between language, sociology, and technology. Keen on taking my interest in research and data to an applied setting, after my PhD I worked as a data scientist supporting AI for Good projects with foundations and nonprofits, then joined an early stage startup as the second employee.
Also, I love learning and teaching. I currently lecture at Columbia University.
Dr. Rebecca Rojansky, Assistant Professor and Co-Director for Pathology Informatics at Stanford University, discussed the evolving role of AI in pathology and how digital transformation is reshaping the field. She provided an overview of pathology’s place in the medical landscape and highlighted how AI can improve diagnostic accuracy, efficiency, and accessibility. Using a case study of a 30-year-old patient with suspected leukemia, Dr. Rojansky demonstrated how AI-powered tools assist in diagnosis, from blood tests to bone marrow biopsy analysis. She explained how digital pathology has streamlined workflows, reduced turnaround times, and helped address the global shortage of pathologists. However, challenges remain, including high data storage costs, lack of standardized formats, and difficulties in validating AI models. Despite these barriers, AI is already enhancing collaboration among pathologists by automating tedious tasks, such as detecting tumor cells and analyzing blood smears, and is opening new possibilities for integrating pathology data with patient records to enable personalized treatment. Looking ahead, Dr. Rojansky emphasized the need for further collaboration to structure pathology data for AI training, improve multimodal models, and develop cost-effective data storage solutions.
In this event, we are welcoming Rebecca Rojansky. So Dr. Rojansky is an assistant professor and a co-director for pathology informatics at Stanford University. She completed her MD and PhD at UCLA Caltech, and then later her residency and fellowship training at Stanford University with a subspecialty training in molecular genetic pathology and more, which she will discuss during her presentation.
Rebecca is a practicing pathologist and leader in digital pathology transformation at Stanford, and her research touches on AI in many aspects. So she does work in image analysis, AI for improved medical diagnosis, and more. Well, I'm very excited. So please help me welcome Rebecca to the stage.
Welcome, Rebecca.
Hi, thank you so much, Ben, for that great introduction and for the opportunity to have this conversation with you all. I'm really excited to be here and to get a conversation started about AI in pathology. I think pathology is sort of an unsung hero in medicine, so it's a great opportunity to share it with everyone. I'm going to open up my presentation and get started.
All right, hopefully you can all see that. So yeah, once again, I work in informatics at Stanford, and we have been undergoing a journey of transitioning to a place where we can really utilize AI for medical diagnosis. And I think probably we have a mixed audience here, which is fantastic. And I think probably most of you don't think about pathology most of the time. So today, I think what I'd like to do is sort of introduce you to where pathology sits in the medical landscape and really get to understand the kind of data that we have within pathology and some of the things that are necessary to get us to a place where we can really take advantage of AI to bring us new diagnoses, better diagnoses, faster diagnoses for our patients, but also learn more about medicine and the biology of disease. And I think we have a really major opportunity to do that.
I want to make the disclaimer, I'm a pathologist. I'm a hematopathologist. I work on blood cancers and also a molecular pathologist. I'm not a specialist in AI, but I do know an awful lot about pathology data. So I think that the two things together can help inform each other.
So I thought it might be helpful to anchor today's discussion around a patient. So it's really accessible to understand where we are within the patient journey. So we've got a 30-year-old woman, she comes to the emergency department, and she's complaining that she has been very tired and has been short of breath when she's just walking from her car. She's also noticed that she's had a rash and has been bruising easily. So I thought it might be interesting to ask CHAT-GPT what it would do with this patient. And I was really pleasantly surprised by the answer. Dr. GPT, as I'll call it, would do an initial assessment, which is classically what a physician would do, including airway, breathing, and circulation, and get a history and physical, and then generate a differential diagnosis. And this differential diagnosis here, I won't read it to you, but it's very appropriate. And the next thing that it recommends is a laboratory workup, and that's where pathology comes in, and that's also a very appropriate next step.
So where does all of this happen? Well, it happens in the clinical laboratory. And the clinical laboratory used to look like this. This is probably what you're familiar with from your high school or college chemistry classes, but it has changed a lot in recent times. And in the last five years or so, it's become increasingly prevalent to have this sort of fully automated pipeline, which is a very different approach to the clinical laboratory. And we adopted this a few years back, and I would call this a digital transformation.
If we look at one particular aspect of that pipeline, in this case, the portion that does the sodium testing, what actually happens in that instrument? Well, we take a patient's blood sample, and we run it through an ion-selective electrode. And from that, we can actually get a digital output of the patient's sodium level. And you've probably seen this in your own lab results. And it's not just one lab result that we get. We can do that serially over time, and we can look at multiple different markers over time. So this is a lot of information that we get for the patient. And what's really useful is that we can track this information in the electronic medical record and be able to monitor the patient's progress over time.
This is an example of a patient who developed acute myeloid leukemia in 2022, but you can see that they previously had a surgery in 2016. And when we look at their hematocrit, which is a measure of the oxygen-carrying capacity of their blood, we can see a dip when they had their surgery and a really large dip when they were diagnosed with their AML. But of course, the electronic medical record includes not just one analyte, but many analytes collected over time. And so I sort of call this a spaghetti mess of information. And the electronic health record is really helpful in collating all of that information for us. But there's a lot of rich data there that we could potentially utilize with AI.
Before we get to that, we have more tests to do. So another test that Dr. GPT recommended was a differential. And what that means is getting a count of all of the different blood cells that are in the blood. And we do this under a microscope, typically. In this particular blood smear, you can see that the patient has a lot of atypical cells, which we would call blasts. And what I would call digital transformation number two is that in the last 20 years or so, we began to have tools to look at these blood smears digitally, so to scan the blood smears, and then AI to analyze the cell types and create an automated differential count rather than a human having to evaluate each of these cells. It saves us a lot of time, produces more accurate results, and is more reproducible between pathologists.
There was a study that was done in 2021 on one of these software tools. And while it concludes that the software produces an adequate differential count, so it does a good job at what it was designed for, there was no accepted standard to validate that kind of assay, except for looking at standard manual microscopy. And this is one of the challenges that we have in the field of digital pathology, is that every time that we want to validate something, we have to go back to manual microscopy. So that's a high bar to have to do. And we need to start thinking about if there are other validation methods that we can use, because maybe the output of the AI algorithm isn't exactly the same as what the pathologist would do.
So once this information is digitized, it goes into that same electronic medical record and gets aggregated with all of the information. And that's really great, because now we have this very rich resource that we can use to analyze and to train AI.
So let's step back to our patient for a moment and ask ourselves if we have her diagnosis. We know that her red blood cells are low and her white blood cells are high, her platelets are low, and we saw abnormal cells, which we call blasts, on her peripheral smear. And so we can say that she has acute leukemia. But if we ask Dr. GPT if that's enough to treat the patient, Dr. GPT will tell us that it's not. We actually need to know the leukemia subtype in order to know how to treat the patient. And in order to do that, we have to do another test. And that's a bone marrow biopsy.
So this is what I trained many years to be able to interpret. On the left-hand side is a normal bone marrow biopsy, and on the right-hand side is a bone marrow biopsy with acute leukemia in it. And by looking at the morphology of the cells in combination with some stains that we can do on the tissue to highlight particular protein expression, we can determine that this is acute myeloid leukemia. So in other words, we can subtype this leukemia. But how do we actually get to this picture on the screen? So first, we have to process the tissue. And I really wanted to highlight that this is still somewhat of a manual process. So on the left-hand side, the tissue gets sectioned and what we call gross dissection, and then it gets fixed. And that portion is manual.
Then it goes through a process of conversion from an aqueous solution to a solution that is more amenable to being embedded in a form of wax called paraffin. And that happens on a processor that is fairly automated, but that itself is siloed from the rest of the electronic information about the patient. And finally, it goes through a process of embedding in paraffin wax. And from that point, it's pretty much a manual process. The block has to be sectioned into thin, about four micron slices that get put on a microscope slide. And this is an entirely manual process.
Finally, that specimen gets stained. And nowadays, we use automated staining platforms, which look very whiz-bang and fancy, but again, they're completely siloed from the rest of the information about the patient. And we'll discuss a little bit further on why that matters.
And what I mean by staining is that we expose the tissue on the slide to different antibodies that allow us to identify which proteins are being expressed in those cells. And what's important about this is every time we request a stain to be performed on a specimen, it adds typically another day to the time to get that patient a diagnosis. And that will become important a little bit later.
I would say the third digital transformation, and probably the most important one that we have been doing at Stanford, is transitioning to digital pathology. We now have digital scanners that we can put the slides in, and basically it's a box with an objective in it, just like a microscope, but a detector underneath can convert that image into a digital image that can be viewed on a computer screen. And this has been a huge change in the field over the last about 10 years. We've been scanning our slides for about seven years, and we were early adopters of this technology. And it has had tremendous benefits for us already, even without applying any AI. It has improved our efficiency. It allows us to access experts for consultation, no matter where they are, no matter when. The information is extremely portable and durable. You don't have to worry about glass slides getting lost or broken. And that improves quality and safety of patient care in and of itself, even without any additional AI.
And it allows, though, for discovery. And that's really the part that I think can be driven by AI.
Just to give you a sense of the complexities of working with glass without digital pathology, this is the workflow, this mess of a diagram. The patient will go to the hospital and a sample of biopsy or an excision will be taken from the patient. And that actually has to travel to wherever the pathology laboratory is, and it may not be in the same location. Then we go through the process that I just described to prepare that glass slide. And even then, that glass slide has to be sorted and then distributed to the pathologist, who has to be sitting in a laboratory in a predictable location. They can't be at home. They can't be at another hospital. And if they decide that they need a consultation, that material actually has to be physically moved to the consultant. Typically, it gets mailed. That takes multiple days. And then they get their consult, and then the consultant mails the materials back to the pathologist, et cetera, et cetera.
Digital pathology completely gets rid of all of that. So once you create the glass and then scan the image, that's it. All the information can be moved around digitally. You don't have to worry about losing slides. You don't have the long turnaround times of moving things around. And consultations can happen asynchronously.
Synchronously whenever you need them with anyone around the world. Additionally, you can save that information in an archive indefinitely, and you don't have to worry about the quality of the slide or the stain degrading over time. So it's been a major benefit to digitize these images.
And part of the reason that this matters is that there's actually a pathologist shortage around the world. In the United States, there are about 65 pathologists per million people, but it's not the same if you go elsewhere in the world. And part of the real promise of digital pathology, and in particular of AI, is making expertise available more evenly to places where it's needed.
And one of the issues with that is how do we do that at a reasonable cost so that it really is accessible to the people who need it? But even in the United States, we have a problem that we're facing, which is that the number of pathologists is declining year on year. Between 2007 and 2017, it declined about 17.5%. I can tell you that during the COVID-19 pandemic, there was a mass retirement of pathologists, and so we're facing even more declines now than we were then.
And the result of this is that there's more work for each individual pathologist, and that results in burnout. We have a pandemic of burnout in medicine, and pathology is not isolated from that. And the problem as well is that patients can be waiting long periods of time to begin treatment while they're waiting for their pathology results.
So we need to use AI to facilitate a faster turnaround time for testing and to make it more democratized around the world. So there are a lot of ways that you can imagine using AI for pathology. This is one that came out of our group at Stanford, and I participated in this study.
And this is a pathologist-AI collaboration framework. The idea was, can we make difficult and tedious tasks that pathologists do easier and faster for them and potentially more accurate? So we focused on two particular tasks. One task was identifying tumor cells in lymph nodes. And this is something that we do all the time to find metastases, and it's very time-consuming. And we may be going through 100 lymph nodes in a case and looking for a single cell.
Another application that we looked at was finding plasma cells in endometrial biopsies. And the reason this matters is because the number of plasma cells in an endometrial biopsy is used to diagnose chronic endometritis, which is a cause of infertility. So we have to scour specimens to find one cell that looks very much like all the cells around it. So this is a perfect application for AI. And in this study, what we found was that we could improve the speed that pathologists could review lymph nodes and find metastases by about four seconds per 4x magnified field.
And that in the case of looking at chronic endometritis, that pathologists did not have to rely on immunohistochemical stains in order to highlight those plasma cells. When they were assisted by the AI, they could diagnose chronic endometritis without those immunohistochemical stains, which again, as I mentioned, add another day to the turnaround time for the case. Additionally, we saw improvements in the accuracy and the sensitivity of the diagnosis.
But there are other ways that AI can be useful for pathology. And one of them is in standardization. And this is really a new topic for the anatomic pathology laboratory, in other words, the part of the laboratory that makes the slides. Because historically, as I mentioned, each of these pieces of equipment is really siloed.
But it turns out that digital pathology can kind of connect these pieces together and provide the information to track them. So in this case, we found that one of our automated stainers was performing poorly relative to all the other automated stainers. And you can see that most of the lines are all at one angle, but the one stainer was performing differently. And we were able to quantitate that thanks to digital pathology.
Well, a company called VizioPharm noticed that and decided that this would really be a benefit for pathology laboratories. And they developed a quality assurance application that can actually track the performance of these automated stainers over time and trigger a warning when a stainer is out of range.
And that seems like a very simplistic application, but it's something that has never been done within the field of pathology before and results in better care for patients because we don't miss times when the stainers are performing poorly.
And there are some applications that have actually been FDA cleared in pathology. Probably the most notable one is the PAGE prostate algorithm that identifies and classifies prostatic adenocarcinoma in prostate biopsies. This was cleared by the FDA fairly recently.
There are many, many, many other applications currently under review by the FDA. So there will be probably a panoply of new clearances in the future. One of the other applications that was cleared by the FDA was the Hologic Genius cervical AI tool that identifies abnormal cells on a pap smear. And you can see that this is another one of those tedious tasks where you're looking at a lot of debris and a lot of abnormal cells, and you're looking for one cell that might be cancer.
Additionally, two other applications that have been FDA cleared are Cellavision and Scopio for bone marrow, and these are doing the differential counts that we just looked at in our patient's case. But it's interesting that although pathology had the very first FDA cleared AI ML enabled medical device back in 1995 and two more in 2001 and 2004, those are the Cellavision for doing the differential counts.
Radiology by far leads the AI ML enabled medical device field. Pathology represents a very small portion of the applications that have actually been cleared by the FDA. And I'm including hematology in that because some of those are related to bone marrow aspirates or peripheral smears, which fall under the category of pathology. So we have a lot of work to catch up to radiology, but also a lot of opportunity because there are many similarities between what we do in pathology and what is done in radiology.
But some of the reasons that why we haven't caught up yet are these, really very few labs are fully digital yet. These scanners are fairly expensive. Some of the ones that we have are about a quarter of a million dollars each.
The images that come off of those scanners are very large, and I'll talk about that a little bit more in a second. And they are in diverse file formats.
There really isn't standardization within the field about the file formats to use for imaging. This is one of the barriers even to using these images for further analysis. Also the laboratory information systems that different labs use are different. There's no standardization there. And so exchange of information is difficult. And that's not even to mention protected health information and data ownership and other information silos within medicine such that we're not fully integrated so that we can really take advantage of our data.
As I mentioned before, there's no accepted validation methods that are standard other than comparing to what a pathologist would do on the microscope. And sometimes that's really just not the appropriate way to evaluate an algorithm. And we will talk a lot more about the lack of structured data in pathology.
So with respect to storage, why is this a problem? Well, if you think about the canonical medical imaging field that's radiology, and an MRI image is about 50 megabytes, and an MRI study that is all the slices of the MRI may be somewhere around 6 gigabytes. But a single pathology slide image is 1.5 to even up to 100 gigabytes if it's a large piece of tissue. And a case in pathology could be from one to hundreds of slides.
So you can see that this adds up very quickly. And that's not to mention that we also have to deal with the issue of three-dimensional tissue. So what I showed you previously about how we section a paraffin embedded block to get a nice relatively two-dimensional section on a slide is not always true. Sometimes we have three-dimensional material that just gets placed on the slide.
And then we have to deal with this issue of three-dimensional imaging, which effectively is like the MRI in the sense that we slice through the tissue with our scanner, and we create multiple Z-stacks of that image. But of course, that adds to the data storage that we have to deal with.
So some of the ways in which people have been trying to deal with this is doing sparse Z-stacking. So in other words, just imaging with a full Z-stack in particular regions of the image where it's necessary, or stitching together the appropriate focal plane so you get one image that's completely in focus, the whole thing.
But again, how does that compare to glass? That's not the same thing that the pathologist would be seeing under the microscope. So again, we need validation methods that are appropriate to our digital methods.
So back to our patient. Do we finally have a diagnosis? Well, now we can say that she has acute myeloid leukemia after looking at her bone marrow biopsy. And so how do we report that to her primary care physician or whoever ordered the test? Well, unfortunately, for the entire existence of the field of pathology, we have done that through a free text report. Nowadays, that often gets placed in the patient's medical record in digital format, but at the end of the day, it's a lot of free text.
And this is one of the reasons why I think this conversation that we're going to have today is so very important, because one of the major challenges that we have is how to make use of all of the incredibly rich information that is in these reports in a structured way so that we can use it in combination with our images. And we'll talk about that more in a second.
So if we wanted to treat this patient at this point, what would we do? Well, Dr. GPT has bad news for us again, because a bone marrow biopsy is not enough on its own to make a definitive diagnosis. We actually have to further subtype that leukemia in order to know how to treat it.
And how do we do that? Well, luckily, we live in the day and age right now of targeted therapy, which has been a major benefit to patients. And what that means is that we're able to target the particular mutation or pathway that is altered in that patient's disease.
And so we do this in what's called the molecular laboratory or under the header of molecular diagnostics. And that includes everything from hereditary cancer syndrome testing to looking for markers that would indicate whether the patient will respond to a therapy, to monitoring the patient's disease over time using circulating tumor DNA, to helping to clarify what type of tumor a patient has if the other methods of analysis have been unsuccessful in illustrating that.
And so for this patient, we identified what's called an FLT3 internal tandem duplication. And that is a molecular alteration that causes the AML to be particularly aggressive. But what's great about it is that there are selective inhibitors for FLT3 that we can give to the patient that will put that patient in many cases into complete remission.
And at that point, if they receive a stem cell transplant and they continue to be in remission and we continue treating them with those FLT3 inhibitors, they can be leukemia free for the rest of their life, which is really incredible.
And this is a survival curve that shows that the survival probability is on the Y axis and time is on the X axis. And you can see that the patients in the red line are the ones that had a FLT3 mutation that were treated with a FLT3 inhibitor and that maintained their molecular remission, i.e. they didn't have that FLT3 mutation anymore. And they do very, very well. The patients who had the FLT3 and were treated with the FLT3 inhibitor, but were not able to clear their FLT3 mutation do worse. And that's the teal line. And then the purple and green lines are showing the patients who did not get treated with a FLT3 inhibitor. And you can see how much worse that they do.
So this is a very important finding. But how do we report this to the chart again? Well, unfortunately, in many cases, we do this in free text. And that's despite the fact that this information comes as structured data from the beginning.
So the purveyors of electronic medical records have realized that this is a major problem because it's a huge impediment to utilizing this data.
both for training AI, et cetera, but also for that individual patient, for triggering alerts, for letting the clinician know what might be good therapies for that patient.
Epic, which is one of the major purveyors of electronic health records, is working on getting that genomic data into their system in a discrete format.
And we are currently undergoing this, what I would call digital transformation number four. So I really see all of these data sources as an opportunity.
We have the electronic medical record, we have the laboratory information system, we have the clinical genomic test data, and we have medical imaging for pathology.
And when you put all of those things together, you have an incredibly rich data set, but the challenge is to bring it all together in a discrete format so you can actually train AI off of it and use AI on it.
And so I think this is increasingly important as the complexity of our testing goes up and up and up.
And an example of that is spatial transcriptomics, which some of you may be familiar with, which allows you to get the level of any particular transcript at the cellular level in the context of the tissue on a slide.
And as we begin to incorporate that kind of information into the medical laboratory, which is coming very soon, we're gonna have to have a handle on how we handle image data and how we correlate it with all the other medical data in the patient's chart.
The federal government has recognized that there's a need to begin to use our images better. And so ARPA-H launched this program, which they call INDEX, which will be to create a medical imaging data exchange platform, which will incorporate pathology images, radiology images, and surgical videos.
So if any of you are interested in this topic, I would urge you to take a look at this. They're currently taking applications.
But in the meantime, there's a lot of creative things that people have been able to do with not PHI-protected data. And this is a really fun example that came out of a lab at Stanford, and it's actually based on a clip from OpenAI.
This is a multimodal model that takes in the text from Twitter, as well as pathology images that pathologists who are very active on Twitter have posted, and creates this embedding for both.
And by using that, you can actually communicate with a pathology image.
So you can ask for text-to-image retrieval, or you can show an image and ask for a similar image.
And you can see how this could begin to provide a very interactive experience for pathologists. It's great for training. It's great for helping us as a consultant on the side.
But unfortunately, most of these kinds of studies have been done with not real clinical data. It's been done with PHI-free sort of datasets, which are fairly limited in scope.
A similar study that came out of Harvard based on a different model added this element of looking at the image tiles.
And by predicting or classifying the image tiles using this model, they were able to actually paint the physical location of tumor on the slide, which is an incredible thing to do without any labeled data to begin with.
So I think you can see that I think that AI has transformative potential in pathology, but I also think that it's dependent on our making our data accessible for model development.
And I think part of that is what we're doing right now, which is training the next generation of pathologists to both be comfortable with digital pathology and also understand the necessity of structured data, so we can start to move our future reporting into a more structured format.
But right now, I think it would be really helpful to us to build more collaborations. And here are some things that I think we need help with.
We need help deciphering our medical notes and reports. Let's take what we already have and turn it into a format that can be used for training.
We need constant development of multimodal models, because I think if there's one thing that I've shown you today, it's that the data that we have within pathology, which I think reflects a large subset of the data that we have within medicine, is very multimodal.
And also, we have an impending issue of data storage. So I think one of the things that people aren't really working on right now, but need to be, is how are we gonna store all of this data in a reasonable way?
And can we do it by reducing complexity and potentially just storing model weights instead of storing the actual images?
And I would love to have a conversation with all of you about what you think about these needs.
So with that, I'd like to thank everybody for your attention and hand this back over to Ben. Feel free to email me at this address if you have any questions.
All right, before you leave, let me just give a few last-minute reminders, because we have at least three or four, if I'm counting correct, upcoming events.
We have an NBA event next week with the San Antonio Spurs. So if you're into sports, if you're not into sports, you get to hear from some folks at the San Antonio Spurs and OpenAI folks who will be leading that.
We're also gonna have a Sora Alpha Artist. So it's preserving the past and shaping the future. It's a little bit of a sneak peek that we'll be hosting on how to get started with Sora events. It'll be in Spanish and French in the global chapters this spring.
So definitely be on the lookout for that. We have Music is Math. I think that's gonna be one of the more exciting ones this spring, that's on February 26th.
And an in-person event on AI economics in the OpenAI Forum. So it'll be with live stream and in-person. You should be getting all these details very soon. And a Paris AI startup that will be coming March 19th.
So TBD on a lot of new events and folks who will be hopefully be able to see you there. Thank you again, Rebecca. Thank you for your presentation and I will see you all later. Thank you.
