How Tech Giants Cut Corners to Harvest Data for A.I.

 

OpenAI, Google and Meta ignored corporate policies, altered their own rules and discussed skirting copyright law as they sought online information to train their newest artificial intelligence systems.

In late 2021, OpenAI faced a supply problem.

The artificial intelligence lab had exhausted every reservoir of reputable English-language text on the internet as it developed its latest A.I. system. It needed more data to train the next version of its technology — lots more.

So OpenAI researchers created a speech recognition tool called Whisper. It could transcribe the audio from YouTube videos, yielding new conversational text that would make an A.I. system smarter.

Some OpenAI employees discussed how such a move might go against YouTube’s rules, three people with knowledge of the conversations said. YouTube, which is owned by Google, prohibits use of its videos for applications that are “independent” of the video platform.

Ultimately, an OpenAI team transcribed more than one million hours of YouTube videos, the people said. The team included Greg Brockman, OpenAI’s president, who personally helped collect the videos, two of the people said. The texts were then fed into a system called GPT-4, which was widely considered one of the world’s most powerful A.I. models and was the basis of the latest version of the ChatGPT chatbot.

Source: nytimes.com

Peter Tolan is a Junior Content Editor for the HIPTHER network, where he has quickly established himself as a versatile voice in the global iGaming and technology sectors. Operating across the network's specialized platforms, Peter leverages a deep understanding of the European and American gaming landscapes to deliver high-impact, B2B intelligence. He is a key contributor to the "Evolution" side of the industry, specializing in the analysis of online gaming trends, the fast-paced world of esports, and the integration of deep-tech innovations. With a sharp eye for emerging technologies, Peter ensures that the HIPTHER community remains at the forefront of the global digital revolution.