Apple is developing new techniques to analyse user data patterns and aggregated insights to improve its artificial intelligence (AI) features. The Cupertino-based tech giant shared these differential privacy techniques on Monday, highlighting that these methods will not breach users’ privacy. Instead, the company is focusing on gathering data such as usage trends and data embeddings to measure and improve its text generation tools and Genmoji. Notably, Apple said that this information will be taken only from those devices that have opted in to share Device Analytics.
Apple Wants to Learn from User Data Without Breaching Privacy
In a post on its Machine Learning Research domain, the iPhone maker detailed the new technique it is developing to improve some of the Apple Intelligence features. The tech giant’s AI offerings have been underwhelming so far, and the company claims one of the reasons for that is its ethical practices around pretraining and sourcing data for its AI models.
Apple claims that its generative AI models are trained on synthetic data (data that is created by other AI models or digital sources and not by any human). While this is still a fair way to train large language models (LLMs), as it does provide them with knowledge about the world, since the models are not learning from the human style of writing and presentation, the output could come off as bland and generic. This is also known as AI slop.
To fix these issues and to improve the output quality of its AI models, the tech giant is now looking at the option to learn from user data without really looking into users’ private data. Apple calls this technique “differential privacy.”
For Genmoji, Apple will use differentially private methods to identify popular prompts and prompt patterns from users who have opted in to share Device Analytics with the company. The iPhone maker says it will provide a mathematical guarantee that unique or rare prompts will not be discovered and that specific prompts cannot be linked to any individual.
Collecting this information will help the company evaluate the types of prompts that are “most representative of a real user engagement.” Essentially, Apple will be looking into the kind of prompts that lead to satisfactory output and where users repeatedly add prompts to get to the desired result. One example shared in the post included the models’ performance in generating multiple entities.
Apple plans to expand this approach for Image Playground, Image Wand, Memories Creation, and Writing Tools in Apple Intelligence, as well as in Visual Intelligence with future releases.
Differential Privacy in Apple Intelligence’s text generation feature
Photo Credit: Apple
Another key area where the tech giant is using this technique is text generation. The approach is somewhat different from the one used with Genmoji. To assess the capability of its tools in email generation, the company created a set of emails that cover common topics. For each topic, the company generated multiple variations and then derived representations of the emails, which included key dimensions such as language, topic, and length. Apple calls these embeddings.
These embeddings were then sent to a small number of users that have opted in to Device Analytics. The synthetic embeddings were then matched against a sample of the users’ emails. “As a result of these protections, Apple can construct synthetic data that is reflective of aggregate trends, without ever collecting or reading any user email content,” the tech giant said.
In essence, the company would not know the content of the emails but could still understand how people prefer their emails to be worded. Apple is currently using this method to improve text generation in emails, and says that in the future, it will also use the same approach for email summaries.