Apple Intelligence isn’t entirely Apple’s intelligence; just like so many other artificial intelligence (AI) tools, it also leans into all the human experience shared on the internet because all that data informs the AI models the company builds.
That said, the company explained where it gets the information it uses when it announced Apple Intelligence last month: “We train our foundation models on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler, AppleBot,” Apple explained.
Your internet, their product
Apple isn’t alone in doing this. In using the public internet this way, it is following the same approach as others in the business. The problem: that approach is already generating arguments between copyright holders and AI firms, as both sides grapple with questions around copyright, fair use, and the extent to which data shared online is commodified to pour even more cash into the pockets of Big Tech firms.
Getty Images last year sued Stability AI for training its AI using 12 million images from its collection without permission. Individual creatives have also taken a stance against these practices. The concern is the extent to which AI firms are unfairly profiting from the work humans do, without consent, credit, or compensation.
In a small attempt to mitigate such accusations, Apple has told web publishers what they have to do to stop their content being used for Apple product development.
Can you unmake an AI model?
What isn’t clear is the extent to which information already scraped by Applebot for use in Apple Intelligence (or any generative AI service) can then be winnowed out of the models Apple has already made. Once the model is created using your data, to what extent can your data be subsequently removed from it? The learning — and potential for copyright abuse — has already been baked in.
But where is the compensation for those who’ve made their knowledge available online?
In most cases, the AI firms argue that what they are doing can be seen as fair use rather than being any violation of copyright laws. But, given that what constitutes fair use differs in different nations, it seems highly probable that the evolving AI industry is heading directly toward regulatory and legal challenges around their use of content.
That certainly seems to be part of the concern coming from regulators in some jurisdictions, and we know the legal framework around these matters is subject to change. This might also be part of what has prompted Apple to say it will not introduce the service in the EU just yet.
Move fast and take things
Right now, AI companies are racing faster than government regulation. Some in the space are attempting to side-step such debates by placing constraints around how data is trained. Adobe, for example, claims to train its imaging models only using legitimately licensed data.
In this case, that means Adobe Stock images licensed content and older content that is outside of copyright.
Adobe isn’t just being altruistic in this — it knows customers using its generative AI (genAI) tools will be creating commercial content and recognizes the need to ensure its customers don’t end up being sued for illegitimate use of images and other creative works.
What about privacy?
But when it comes to Apple Intelligence, it looks like the data you’ve published online has now become part of the company product, with one big exception: private data.
“We never use our users’ private personal data or user interactions when training our foundation models, and we apply filters to remove personally identifiable information like social security and credit card numbers that are publicly available on the Internet,” it said.
Apple deserves credit for its consistent attempts to maintain data privacy and security, but perhaps it should develop a stronger and more public framework toward the protection of the creative endeavors of its customer base.
Please follow me on Mastodon, or join me in the AppleHolic’s bar & grill and Apple Discussions groups on MeWe.
Apple Intelligence isn’t entirely Apple’s intelligence; just like so many other artificial intelligence (AI) tools, it also leans into all the human experience shared on the internet because all that data informs the AI models the company builds.
That said, the company explained where it gets the information it uses when it announced Apple Intelligence last month: “We train our foundation models on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler, AppleBot,” Apple explained.
Your internet, their product
Apple isn’t alone in doing this. In using the public internet this way, it is following the same approach as others in the business. The problem: that approach is already generating arguments between copyright holders and AI firms, as both sides grapple with questions around copyright, fair use, and the extent to which data shared online is commodified to pour even more cash into the pockets of Big Tech firms.
Getty Images last year sued Stability AI for training its AI using 12 million images from its collection without permission. Individual creatives have also taken a stance against these practices. The concern is the extent to which AI firms are unfairly profiting from the work humans do, without consent, credit, or compensation.
In a small attempt to mitigate such accusations, Apple has told web publishers what they have to do to stop their content being used for Apple product development.
Can you unmake an AI model?
What isn’t clear is the extent to which information already scraped by Applebot for use in Apple Intelligence (or any generative AI service) can then be winnowed out of the models Apple has already made. Once the model is created using your data, to what extent can your data be subsequently removed from it? The learning — and potential for copyright abuse — has already been baked in.
But where is the compensation for those who’ve made their knowledge available online?
In most cases, the AI firms argue that what they are doing can be seen as fair use rather than being any violation of copyright laws. But, given that what constitutes fair use differs in different nations, it seems highly probable that the evolving AI industry is heading directly toward regulatory and legal challenges around their use of content.
That certainly seems to be part of the concern coming from regulators in some jurisdictions, and we know the legal framework around these matters is subject to change. This might also be part of what has prompted Apple to say it will not introduce the service in the EU just yet.
Move fast and take things
Right now, AI companies are racing faster than government regulation. Some in the space are attempting to side-step such debates by placing constraints around how data is trained. Adobe, for example, claims to train its imaging models only using legitimately licensed data.
In this case, that means Adobe Stock images licensed content and older content that is outside of copyright.
Adobe isn’t just being altruistic in this — it knows customers using its generative AI (genAI) tools will be creating commercial content and recognizes the need to ensure its customers don’t end up being sued for illegitimate use of images and other creative works.
What about privacy?
But when it comes to Apple Intelligence, it looks like the data you’ve published online has now become part of the company product, with one big exception: private data.
“We never use our users’ private personal data or user interactions when training our foundation models, and we apply filters to remove personally identifiable information like social security and credit card numbers that are publicly available on the Internet,” it said.
Apple deserves credit for its consistent attempts to maintain data privacy and security, but perhaps it should develop a stronger and more public framework toward the protection of the creative endeavors of its customer base.
Please follow me on Mastodon, or join me in the AppleHolic’s bar & grill and Apple Discussions groups on MeWe. Read More