OpenAI Forum
+00:00 GMT
MEETING
Improving Mathematical Reasoning with Process Supervision

About the Talk:

In recent years, large language models have greatly improved in their ability to perform complex multi-step reasoning. However, even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can turn either to outcome supervision, which provides feedback for a final result, or process supervision, which provides feedback for each intermediate reasoning step. Given the importance of training reliable models, and given the high cost of human feedback, it is important to carefully compare the both methods. Recent work has already begun this comparison, but many questions still remain. We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset. Our process-supervised model solves 78% of problems from a representative subset of the MATH test set. Additionally, we show that active learning significantly improves the efficacy of process supervision. To support related research, we also release PRM800K, the complete dataset of 800,000 step-level human feedback labels used to train our best reward model.

Link to the full paper

Full list of Authors: Hunter LightmanVineet KosarajuYura BurdaHarri EdwardsBowen BakerTeddy LeeJan LeikeJohn SchulmanIlya Sutskever, Karl Cobbe

About the Speaker:

Hunter is a researcher at OpenAI focused on improving reasoning and reliability in large language models. His most recent work was on improving mathematical reasoning with process supervision. Prior to OpenAI, Hunter worked on self-driving cars, building data infrastructure for Nuro.

Speakers
Hunter Lightman
Hunter Lightman
Member of Technical Staff @ OpenAI
Natalie Cone
Natalie Cone
Forum Community @ OpenAI
Agenda
Track View
12:00 AM, GMT +1
-
12:40 AM, GMT+1
Stage 1
Presentation
Improving Mathematical Reasoning with Process Supervision

Join OpenAI researcher Karl Cobbe for an in depth presentation of his OpenAI team's research.

+ Read More
Hunter Lightman
12:40 AM, GMT +1
-
1:00 AM, GMT+1
Stage 1
Panel Discussion
OpenAI Forum Member and Presenter Q&A
Hunter Lightman
Natalie Cone
Attendees
Bessie
Bessie
Bessie
member
Arlene
Arlene
Arlene
member
Cody
Cody
Cody
member
Colleen
Colleen
Colleen
member
Kathryn
Kathryn
Kathryn
member
Bessie
Bessie
Bessie
member
Already registered?
Log in to access
Event has finished
July 27, 12:00 AM, GMT
Online
Organized by
OpenAI Forum
OpenAI Forum
Event has finished
July 27, 12:00 AM, GMT
Online
Organized by
OpenAI Forum
OpenAI Forum