Apple’s new LLM Ferret is surprisingly open-source

The company released the model along with its code and weights but restricted usage to research only.

January 08, 2024 03:12 pm | Updated 03:12 pm IST

The Hindu Bureau
FILE PHOTO:

FILE PHOTO: | Photo Credit: Reuters

Researchers from Columbia University and Apple quietly released an open-source multimodal large language model called Ferret in October last year. The Cupertino-headquartered company released the model along with its code and weights but restricted usage to research only. The release is a surprise considering Apple has been famously guarded about their tech historically.

Ferret is able to study a specific portion of an image and determine the elements within it to respond to a query. Say, if a user highlights a dog within an image and asks the model what species it is, the model can answer that. It is also able to contextualise other objects in the image to understand what the dog is doing.

Apple AI research scientist Zhe Gan posted on X saying the system can “refer and ground anything anywhere at any granularity.”

Ferret is available in two sizes of the AI model - 7-billion parameter one and a 13-billion parameter one.

The smaller version is likely tailormade for iOS devices and adjusted to limitations of running on mobile hardware.

Apple recently released two other research papers that substantiated their efforts to deploy LLMs on phones. One introduced new techniques for 3D avatars and the other suggested a new way for efficient model inference. The company is pushing to integrate more AI components on to their devices and use them more effectively.

The paper also stated that the company has developed Ferret Bench, a benchmarking tool specially for this model that will help researchers evaluate its efficiency and flexibility in several use-cases.

