In recent years, Facebook has tried to be more accessible to people who are blind and visually impaired by using alt text for their photos so a screen reader will be able to describe photos they come across their visual feed. Later on they introduced automatic alternative text (AAT) that generates photo descriptions on demand since not everyone uses alt text. Now they have made a lot of improvements with AAT by using Artificial Intelligence and making it 10x more reliable and “detectable”.
This newest iteration of AAT has been able to use the multiple technological advances of the past few years since its launch. They have been able to extensively expand the number of concepts to over 1,200 from the 100 at launch. The descriptions are also now more detailed as it is now able to identify activities, landmarks, types of animals, etc. So a description can sound like “Maybe a selfie of two people outdoors, at the Eiffel Tower.
They are also able to include positional location and relative size of elements in a photo, which is apparently an industry first. So a description can now also sound like “An image of five people with two at the center and three others scattered on the fringe”. It will be able to detect and highlight which element is the primary object in the scene it’s describing based on the size and positioning.
To be able to achieve this, they used an AI model trained on weakly supervised data by using billions of public Instagram images and hashtags. They repurposed machine learning models as the starting point to train new tasks which is called transfer learning. They also consulted with users who use screen readers to determine how much information they want to hear and also when they hear it.
Facebook also used highly functional but simple phrasing for the descriptions so all ATTs are available in 45 different languages globally, making it really more inclusive and have a wider reach. Blind and visually impaired individuals can now experience this much-improved Automatic Alternative Text for photos when they browse using screen readers.