This is a quick top-of-my head post, so it’s ill thought out, and will most probably be badly expressed as well.
In recent years I have increasingly used my smartphone to take photos and they are automatically backed up in Google Photos. I also synchronise photos taken with a ‘proper’ camera via my computer.
I therefore have thousands of images in Google Photos (and elsewhere). Many images already have some useful metadata about where the photo was taken, what camera was used, and the like.
Expert amateurs and professional photographers will of course have a lot more comprehensive data about individual settings, but that’s not what I’m particularly interested in.
Manually adding Alt Text
Every time I share a photo on social networks like BlueSky or Mastodon, I manually add Alt Text to help describe the image for people who are visually impaired.
AI image tools
AI tools like Google’s Gemini Advanced, Microsoft’s CoPilot, and OpenAI’s ChatGPT have got very good at describing images uploaded through their chat prompt, which is great.
One we prepared earlier
Here’s an example of the sort of thing I mean. In this case I uploaded a photo to Gemini Advanced, and specifically asked it to give me a detailed description of what it saw.
Detailed description from Gemini Advanced
A sleek, black Greyhound dog stands on a sandy beach, its body language suggesting a playful shake after a dip in the water. Water droplets fly off its fur, creating a dynamic sense of movement. The dog’s head is turned to the side, its long ears flapping in the air, adding to the energetic impression. The dog wears a harness with tags, indicating it’s likely someone’s beloved pet enjoying a day out.
The background showcases a vast expanse of light-coloured sand with subtle ripples and patterns. The dog casts a distinct shadow on the sand, emphasising the brightness of the day. The overall atmosphere of the image is one of carefree fun and enjoyment in a natural setting.
That’s great, but
What I’d really like to see is a way to point a tool at photo libraries and automatically add detailed descriptions to every image.
To be honest, the description above is a bit over the top, but the idea of having something ready-made, automatically-generated, which we can just tweak would in my opinion be really useful. It’d be useful for sharing, but also when searching the library looking for a specific photo, group of photos, or certain characteristics.
Naturally, we’d need to read the description before relying on it, as there’s always a risk of errors or hallucination.
Is it possible?
I put that idea back to Gemini Advanced and asked for suggestions:
Thank you, that is very helpful. Can you think of a way that your capability to give detailed descriptions of uploaded photo could be used to automatically, or semi-automatically, add alt text to photos in Google Photos?
I have included the full reply below. Food for thought.
Gemini’s reply
Absolutely, there’s great potential to leverage AI capabilities for automatic or semi-automatic alt text generation in Google Photos. Here’s a possible implementation:
1. Automatic Alt Text Generation:
- Upon Upload: When a user uploads a photo to Google Photos, the system could utilize an AI model (similar to me) to analyze the image and generate a descriptive alt text.
- Confidence Score: The AI could assign a confidence score to the generated alt text based on the clarity of the image and its ability to recognize objects and scenes.
- User Review: If the confidence score is high, the alt text could be automatically added to the photo. If the confidence score is low, the user could be prompted to review and edit the generated alt text or add their own.
2. Semi-Automatic Alt Text Generation:
- User Prompt: Users could right-click on an image in Google Photos and select an option like “Generate Alt Text”.
- AI Suggestion: The AI would then analyze the image and provide a suggested alt text.
- User Editing: Users could then review and edit the suggested alt text as needed before saving it to the photo.
3. Additional Features
- Object Recognition: The AI could identify specific objects or landmarks in the image and include them in the alt text.
- Contextual Information: If the photo is associated with location data or other metadata, the AI could use this information to enhance the alt text description.
- User Customization: Users could have the option to train the AI model on their specific preferences for alt text descriptions.
Benefits:
- Improved Accessibility: Automatically generating alt text would make a significant number of photos in Google Photos accessible to users with visual impairments.
- Time-Saving: It would save users the time and effort of manually adding alt text to their photos.
- Accuracy: AI models are becoming increasingly accurate in image recognition and description, leading to high-quality alt text.
Important Considerations:
- User Control: Users should always have the option to review, edit, or override the AI-generated alt text.
- Privacy: The image analysis should be done locally on the user’s device or in a secure cloud environment to protect user privacy.
- Transparency: Users should be informed when alt text has been generated automatically.
Such a feature would be a significant step forward in making Google Photos more inclusive and accessible to everyone.
What do you think?
Well that’s it, end of my brain dump. If you have any views, insights or suggestions you’d like to share, please feel free to comment.
Discover more from Mark Braggins
Subscribe to get the latest posts sent to your email.