April 27, 2024

Google uses the entire internet for AI training

July 6, 2023 –
With a new clause in the terms of use, Google is enabling itself to use the entire publicly accessible internet to train artificial intelligence.

Google updated its Terms of Service on July 1, making a small but important change. The amendment states that from now on you will use “data available online or in other public sources” to train your AI models. This has a “Heise” gate. Discover It concludes, perhaps justifiably, that Google intends and will use all publicly available data – and therefore in principle the entire indexed Internet – to train AI. When your business information appears on a website, we may index and display it on Google services. So the selection of search words is gigantic.

While previously Google could index and display (in search results) any public website, unlike the official crawl now to train the AI, you as a website operator can exclude your page from indexing. Whether or not data is being collected on your website and used to train AI and whether it will end up in the output of a chatbot cannot be verified, let alone understood.

Also as mentioned, the action is probably not objectionable in terms of copyright, but it was not possible to speak the last word in terms of data protection law. (Wins)