OpenAI is an artificial intelligence research company most well known for language models. The Alignment team works on making sure models behave in ways humans find desirable. This means that models shouldn't ever perpetuate systemic racism, harass users, spread disinformation, etc. Sadly, we're not very good at preventing this things yet compared to where we want to be. We believe a big part of improvement is on the human data collection side, and that there are many interesting problems at the intersection of sociology and machine learning. We have a few full-time job openings for this work. Example jobs: https://jobs.lever.co/openai/4fe793c7-5591-412a-95a6-8b787b1e8ade https://jobs.lever.co/openai/93ee05c7-74ee-4a9d-a32e-5fa88e286f1c
See this document: https://docs.google.com/document/d/13ZeidDrcF1gee-HyUxucVAIQpEDPcVhk99kuQ-K6T1s/edit#
The OpenAI API serves large GPT-3 language models that users can query in natural language. They are quite good at many language tasks, but there are many ways in which their behavior is not what we want: they can generate biased and toxic text, make up facts, give bad medical advice, etc.. This is because they are trained to predict the text that would come next on the Internet.
The Alignment team is working on a new project (which we are calling ‘instruction following (IF)’) to train new versions of GPT-3 that behave more closely to how we want them to: producing less potentially harmful text, following direct instructions in natural language, etc.. Our approach relies on having human labelers looking at model outputs and evaluating which ones are harmful (biased, toxic, misinformation, etc.). The labeling and evaluation process is extremely important and we would like to engage with external stakeholders on what it should look like.
We believe this is critical work for the organization. We already have ~3 full-time people and some part time people working on the team. Hopefully the project will directly lead to less harms caused by AI systems which are already being deployed today.