Microsoft says its AI captions images as accurately as humans do

The captioning model rolled out through its cloud platform will let developers to use it in their apps.

October 15, 2020 05:35 pm | Updated 05:42 pm IST

Microsoft showed a video of how its AI system describes images better than humans.

Microsoft showed a video of how its AI system describes images better than humans.

(Subscribe to our Today's Cache newsletter for a quick snapshot of top 5 tech stories. Click here to subscribe for free.)

Microsoft on Wednesday unveiled an artificial intelligence (AI) system for image captioning that, it says, can describe images as accurately as humans do.

The Redmond-based technology company said the new image captioning system is two times better than the image captioning model that’s been used in Microsoft product and services since 2015.

The captioning model rolled out through its cloud platform will let developers to use it in their apps.

“Image captioning is one of the core computer vision capabilities that can enable a broad range of services,” Xuedong Huang, Microsoft’s CTO, Azure AI Cognitive Services, said in a statement.

It is available in Seeing AI, a Microsoft app for blind and visually impaired users and will start rolling out later this year in Microsoft Word and Outlook, for Windows and Mac, and PowerPoint for Windows, Mac and web.

The feature will be used to generate alt text, photo description in a web page or document for people with no or limited eye-sight.

With Seeing AI, Microsoft aims to improve image captioning capability in Seeing AI talking camera app for differently abled people to describe photos, including those from social media apps.

Microsoft pre-trained a large AI model by pairing images with word tags, which were specific to the object in an image. Using word tags instead of full captions allowed researchers to feed lots of data into their model.

The pre-trained model was then fine-tuned for captioning on the dataset of captioned images. When it was presented with an image containing novel objects, the AI system leveraged the visual vocabulary to generate an accurate caption.

The model was evaluated on nocaps, the benchmark that evaluates AI systems on how well they generate captions for objects in images. The result showed AI system created captions that were more descriptive and accurate than the captions for the same images that were written by people, according to results presented in a research paper titled, VIVO: Surpassing Human Performance in Novel Object Captioning with Visual Vocabulary Pre-Training.

0 / 0
Sign in to unlock member-only benefits!
  • Access 10 free stories every month
  • Save stories to read later
  • Access to comment on every story
  • Sign-up/manage your newsletter subscriptions with a single click
  • Get notified by email for early access to discounts & offers on our products
Sign in

Comments

Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.

We have migrated to a new commenting platform. If you are already a registered user of The Hindu and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.