Train better AI models

OpenAI Alignment team


OpenAI is an artificial intelligence research company most well known for language models. The Alignment team works on making sure models behave in ways humans find desirable. This means that models shouldn't ever perpetuate systemic racism, harass users, spread disinformation, etc. Sadly, we're not very good at preventing this things yet compared to where we want to be. We believe a big part of improvement is on the human data collection side, and that there are many interesting problems at the intersection of sociology and machine learning. We have a few full-time job openings for this work. Example jobs:

check New check Scoping check Scoping QA check Staffing check In progress check Final QA done_all Completed

Background and Motivation

See this document:

Project Description

The OpenAI API serves large GPT-3 language models that users can query in natural language. They are quite good at many language tasks, but there are many ways in which their behavior is not what we want: they can generate biased and toxic text, make up facts, give bad medical advice, etc.. This is because they are trained to predict the text that would come next on the Internet.

The Alignment team is working on a new project (which we are calling ‘instruction following (IF)’) to train new versions of GPT-3 that behave more closely to how we want them to: producing less potentially harmful text, following direct instructions in natural language, etc.. Our approach relies on having human labelers looking at model outputs and evaluating which ones are harmful (biased, toxic, misinformation, etc.). The labeling and evaluation process is extremely important and we would like to engage with external stakeholders on what it should look like.

Intended Impact

We believe this is critical work for the organization. We already have ~3 full-time people and some part time people working on the team. Hopefully the project will directly lead to less harms caused by AI systems which are already being deployed today.

Internal Stakeholders

Internal People Available During the Project

Start date: July 14, 2021
End date: Dec. 31, 2021
  • No volunteers in this project yet.

Project tasks

Project scoping 0