Apple researchers develop method to run LLMs on iPhones

Spread the love

Large language models like GPT-4 typically require a lot of processing power and memory, but Apple AI researchers say they have found an efficient way to implement LLMs on iPhones and other Apple devices with relatively limited internal memory.

The researchers state in a research paper that they have found a solution to run large language models that exceed the available dram capacity on mobile devices, such as an iPhone. This would be possible by storing the model parameters on the flash memory and sending them to the dram when necessary.

To maximize throughput, the authors indicate using ‘recycling’ by reusing some of the processed data by an AI model. This would eliminate the need to continually retrieve memory, which should make for a smoother process. In addition, the researchers say that by grouping larger pieces of data, data can be read faster. This should also lead to faster processing and responses by the AI ​​model.

The two methods should make it possible to run AI models that take up to twice the size of the available dram and have up to 5 and 25 times faster inference speeds compared to loading directly into the CPU and GPU.

Making LLMs work more efficiently on iPhones could include advanced Siri commands, real-time language translation, and implementing AI features in photography. Apple is reportedly already working on its own major language model, which employees would refer to as ‘AppleGPT’. The company would also like to add generative AI to Siri, Xcode and Keynote.

You might also like