Artificial Intelligence 3 min read

OpenAI Announces Data Partnership Program for Public and Private Training Datasets

OpenAI revealed a new data partnership initiative to collect large‑scale public and private datasets across multiple modalities, aiming to improve AI model safety and usefulness by incorporating diverse, hard‑to‑access human‑generated content while respecting privacy and intent.

php Courses

Nov 10, 2023

OpenAI Announces Data Partnership Program for Public and Private Training Datasets

On November 10, OpenAI announced that it will collaborate with organizations to generate public and private datasets for training AI models, a data partnership aimed at enabling more organizations to help shape the future of AI and benefit from more useful models.

According to its blog, OpenAI said: “In order to ultimately make AI safer and beneficial to all of humanity, we want AI models to have deep understanding of all topics, industries, cultures, and languages, which requires training datasets that are as broad as possible.”

As part of the data partnership program, OpenAI says it will collect large‑scale datasets that “reflect human society” and are currently difficult to access online. While the company plans to work across multiple modalities, including images, audio, and video, it is especially seeking data that express human intent across different languages, topics, and formats (such as long‑form writing or dialogue).

OpenAI states that, if necessary, it will work with organizations and use optical character recognition and automatic speech recognition tools to digitize training data and remove sensitive or personal information when needed.

OpenAI hopes to create two types of datasets: an open‑source dataset that anyone can use for AI model training, and a private dataset for training proprietary AI models.

OpenAI says the private set is intended for organizations that wish to keep their data private while improving OpenAI’s models’ understanding of their domain; so far, OpenAI has collaborated with the Icelandic government and Miðeind ehf to improve GPT‑4’s Icelandic capabilities, and with the Free Law Project to enhance its model’s comprehension of legal documents.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

privacy OpenAI Multimodal AI training data Data Partnership large-scale datasets model improvement

Written by

php Courses

php中文网's platform for the latest courses and technical articles, helping PHP learners advance quickly.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.