Day 78 of 100 Days of AI
Apple will be introducing large language models to their devices later this year. They have a high-level write-up here on the technical bits of how they got to where they are.
The post includes detail on how they conducted pre-training, post-training, optimization, and dynamic model adaptation. Some key bits I took away from reading the post are:
- Size — The models are small and will fit into powerful smartphones. For example one of Apple’s on-device models is a ~ 3 billion parameter language model. For comparison, Meta’s latest flagship model has 70 billion parameters.
- Fine-tuned – The on-device models are fine-tuned and specialised for a set of common use-cases (e.g. text summarisation, image generation, and in-app actions). This means you don’t really need supersized models.
- Smart optimization – Apple has done a lot of smart work to make on-device models exceptionally efficient. On an iPhone 15 Pro, they were able to get time-to-first token down to just 0.6 milliseconds per prompt token (for compraison, GPT 4 achieves 0.64 and GPT 3.5 Turbo achieves 0.27)
- Server-based models – For more difficult tasks, the phone can rely on server-based models that run on “Private Cloud Compute.”
Here is a sample of some benchmarks that Apple shared. It’s impressive that the on-device performance beats other larger models. But of course, these are Apple’s own benchmarks and it’s possible (though not necessarily true) there might have been some cherry picking to get the best numbers.
Overall, Apple has achieved promising results. We can expect even better performance of their on-device models in the years ahead.