How Accurate are LLMs in Image Analysis? | Patrick Michael~Voice AI

Assessing the reliability of LLM Vision in practical applications

LLM’s Vision capability allows us to analyse and interpret images.

Automating the analysis of images can be useful. Images may be sent as an attachment as part of a production workflow. It may be useful to analyse these.

How reliable and accurate is this analysis? We set out to test this, and here are our findings.

This Article’s Use Case

For whatever reason, we need to analyze images. Extract the sentiment and theme of the image, as well as the details within the image. This could include text, promotional details, and product items.

This analysis is part of an automated workflow. As such, the accuracy of the image analysis is paramount.

Our Experiment Setup

We set up our experiment in the following manner:

Analysed two image types;
1. A banner image;
2. A complex image of multiple promotional items;
We analysed the images using:
1. The ChatGPT web interface;
  1. Using ChatGPT-4o model;
  2. Using ChatGPT-4 (paid version);
2. A Zapier automation triggered by a new email with attachments;
  1. Using the Vision event;
  2. Using the conversation with ChatGPT event;
As a bonus, we compared the image analysis with the analysis of a PDF that is the same as the catalogue image. For this test, we use ChatGPT-4o and ChatGPT-4.
The prompts are engineered for image analysis. They are slightly nuanced for each image type. We go into more detail on the prompts below.

Images we analysed

We analysed the following images:

Banner image of month end products, image analysis — A retail chain's web banner for month end essentials

Complex image of promotional items for image analysis — A low-resolution image of a retail chain's promotional catalogue.

The Prompts

We apply prompt engineering techniques to our prompts.

In both cases, the role we assign is that of a software agent with expert skills in analysing images. A specific skill is that of interpreting and extracting text from the image.

In addition, the tasks we assign are nuance to the image type. The task for the complex catalogue image is such that it leads the LLM to understand that the image it is analysing is a complex catalogue with multiple product promotions. Adding this level of specificity to the prompts ensures a more accurate result.

Links to the prompts

The Tests - Simple Banner Image

Test 1 - Web Interface - Simple Banner - ChatGPT-4o

### Overview
The image is an advertisement banner featuring various grocery products displayed on a kitchen counter. The theme is focused on promoting "Month-end essentials," encouraging customers to save on pantry items. The design uses a bright yellow background on the right side, with bold black text to catch attention, and a clean, modern kitchen setting on the left to showcase the products.

### Text Elements
1. **Month-end essentials**
2. **Save on all your pantry essentials.**
3. **Shop Now**
4. **FreshPak**
5. **Ariel**
6. **Oros**
7. **Jacobs**
8. **Koo**
9. **Canola Oil**

### Product and Text Elements:

Product	Text Elements	Confidence Value
FreshPak Rooibos Tea	FreshPak	5
Ariel Detergent	Ariel	5
Oros Orange Squash	Oros	5
Jacobs Coffee	Jacobs	5
Koo Baked Beans	Koo	5
Canola Oil	Canola Oil	5
Product Unrecognised	100% Recycled Paper Towel (Roller)	4

### Confidence Value
My confidence in the accuracy of this analysis is **5**.

Test 2 - Web Interface - Simple Banner - ChatGPT-4

### Overview:
The image is an advertisement for "Month-end essentials" featuring various pantry products. The background is a bright, clean kitchen setting, while the right side of the image has a vibrant yellow section with the promotional text. The theme focuses on essential household and grocery items available for purchase.

### Text Elements:
- "Month-end essentials"
- "Save on all your pantry essentials."
- "Shop Now"
- "NEW COLOUR PROTECT"
- "100% ROOIBOS TEA"
- "PURE ROOIBOS"
- "CANOLA OIL"
- "GOLD"

### Identified Products:

Product	Text Elements	Confidence Value
Oros Orange Squash	OROS	5
Freshpak Rooibos Tea	FreshPak, 100% ROOIBOS TEA, PURE ROOIBOS	5
Canola Oil	CANOLA OIL, GOLD	5
Jacobs Coffee	JACOBS	5
Ariel Laundry Detergent	ARIEL, NEW COLOUR PROTECT	5
Koo Green Beans	KOO	5
Roller Towel (Brand Unclear)	100	3

### Confidence Evaluation:
- Confidence Value: 5 for most products as they are clearly recognizable and matched with their respective text elements.
- Confidence Value: 3 for the Roller Towel as the brand is not clearly identifiable from the image.

My Observations

The analysis is 'generally accurate'. Perhaps 80% of the detail is reflected;
The overview and theme analysis is good;
Running the same test within the same model, multiple times, produced slight variations;
ChatGPT-4 produced more depth of information.

Test 3 - The Above Compared with Zapier Automation Vision Event

We sent the same banner image as an attachment to a Zapier automation.

The Zap

Step 4 of the Zap sends a report via email. Below is the report.

Zapier image analysis email report — Email report on image analysis using Zapier and Vision event

My Observations

The analysis from the Zapier automation using Vision is very good. Better than the analysis using the web interface.

Why is the output better?

Within Zapier, we can adjust and specify the tokens and temperature.

Increasing the tokens ensures that the analysis does not get truncated;
Lowering the temperature ensures a more literal analysis. A higher temperature allows more creativity, variance and 'interpretation' in the LLM response.

Adjusting tokens and temperature in Zapier image analysis automation.

The Tests - Complex Catalogue Image - Zapier Automation

We created two Zaps that automated the image analysis.

The Zap Flow

Trigger — New Email Matching Search;
1. Event: Search phrase for the trigger:
  1. Vision: subject:catalogue-analysis-vision;
  2. ChatGPT-4o: subject:vision-4-o;
Google Drive;
1. Event: Save file to Google Drive;
ChatGPT;
1. Event 1: Analyse image with Vision;
2. Event 2: Conversation with ChatGPT;
Gmail;
1. Event: New Email.

Test 4 - Zapier Automation - Complex Catalogue - OpenAI Vision

We sent the complex catalogue image as an attachment to a Zapier automation.

Zapier automation of complex catalogue image analysis - the report — The email report from the Vision event

View the text version of the report.

Test 5 - Zapier Automation - Complex Catalogue - OpenAI Conversation ChatGPT-4o

We sent the complex catalogue image as an attachment to a Zapier automation and used the ChatGPT Conversation event.

Image analysis using Zapier automation and ChatGPT-4o model — Image analysis report using ChatGPT-4o and Zapier Automation

View the text version of the report.

My Observations

The analysis of both tests was good;
Both models were not 100% accurate in terms of the detail;
Test 5 - using the Conversation Event produced a result with more depth.

The Marked Differences Between the Vision and Conversation Events

Using the Conversation event allowed for much more flexibility. Using the Conversation event, we are able to:

Choose the model;
Set the Tokens, Temperature, and Top p;
Send a User (System) message as well as Assistant instructions.

The Vision event's options are limited:

We can send instructions as a single prompt only;
We can only adjust the tokens.

Test 6 - Analyse the Catalogue as a PDF Using ChatGPT-4o and ChatGPT-4

As a final test, we analysed the same complex catalogue image, but as a PDF.

We used Gpt-4o and GPT-4

PDF Analysis Using ChatGPT-4o

View the output results.

PDF Analysis Using ChatGPT-4

View the output results.

My Observations

While there is the impression of more depth, there is definitely more 'detail', the result is the least reliable.

The Brand was given as Steelhill, not Makro;
ChatGPT-4o and ChatGPT-4 produced almost identical results;
While the product type was generally correct, the Brand Name is 'unrecognised' or incorrect;
The relationship between the product and promotional details, in some cases, is confused;
1. The details of the Typek and Alva Heater are confused.
Brand names, which are graphical elements in the PDF, are not recognised.

Conclusion

This process gives me a good contextual understanding of the capacity and reliability of ChatGPT Vision. It is important to understand these tools on this level.

Analysing general context, theme, colours, and composition, Vision is good;
Vision is 'generally good' at extracting text detail from images. This is not 100% accurate nor 100% complete;
1. Perhaps this could be improved with a more optimised prompt and iterations within the same test;
As such, it would not be useful if accuracy and completeness is a requirement.

TL;DR

This article looks into the accuracy of LLMs in image analysis.

We analyse two image types, a simple web banner and a complex promotional catalogue for a retail chain.

We compare two environments with a mix of different models:

ChatGPT Web Interface;
1. ChatGPT-4o;
2. ChatGPT-4 (paid);
Zapier Automation using Gmail and ChatGPT;
1. Event a: Analyse image with Vision;
2. Event b: Conversation using GPT4-o.

Our observations are:

Analysis of the theme, context, colours and composition is good;
Analysis of the detail and detailed text elements is generally good yet:
1. Not 100% accurate or complete;
2. Inconsistent and varied in the inconsistency;

Article Resources

Download the prompts used in this tutorial.

The Video Tutorial

This tutorial is available on YouTube.

Contact Me

I can help you with your:

Zapier Automations;
AI strategy;
Prompt engineering;
Content creation;
Custom GPTs.

I am available for remote freelance work. Please contact me.

Recent Articles

The following articles are of interest:

Unlock AI Automation: Harness AI to Automate Email Attachment Analysis

Learn to intercept, convert, and analyse email attachments using Zapier and OpenAI for maximum productivity.

Improve your email attachment workflow with no-code tools.

Learn to extract contents automatically and use them with ChatGPT.

Boost productivity now!

Natural Language Automation | Build a Knowledge Base from Email | Zapier Central

Teach a back-office AI bot to automate tasks. Zapier has launched Zapier Central - "an experimental AI workspace where you can teach bots to work across 6,000+ apps (Zapier, n.d.)".

Learn how to create your first Zapier Central automation using natural language. Easy steps to creating a knowledge base from emails.

Beginning to Navigate the "Almost Zone": Exploring Authenticity in AI-Generated Content

How to get AI to write in your own voice?

Relying solely on AI for content or designs can make solopreneur offerings indistinguishable, erasing their unique value. Treating AI as a mere quick fix risks mediocrity and loss of differentiation, squandering chances for innovation.

Can You Trust AI Image Analysis? A Zapier and ChatGPT Study

Assessing the reliability of LLM Vision in practical applications

This Article’s Use Case

In this Article

Our Experiment Setup

Images we analysed

The Prompts

Links to the prompts

The Tests - Simple Banner Image

Test 1 - Web Interface - Simple Banner - ChatGPT-4o

Test 2 - Web Interface - Simple Banner - ChatGPT-4

My Observations

Test 3 - The Above Compared with Zapier Automation Vision Event

The Zap

My Observations

The Tests - Complex Catalogue Image - Zapier Automation

The Zap Flow

Test 4 - Zapier Automation - Complex Catalogue - OpenAI Vision

Test 5 - Zapier Automation - Complex Catalogue - OpenAI Conversation ChatGPT-4o

My Observations

The Marked Differences Between the Vision and Conversation Events

Test 6 - Analyse the Catalogue as a PDF Using ChatGPT-4o and ChatGPT-4

PDF Analysis Using ChatGPT-4o

PDF Analysis Using ChatGPT-4

My Observations

Conclusion

TL;DR

Article Resources

The Video Tutorial

Contact Me

Recent Articles

Unlock AI Automation: Harness AI to Automate Email Attachment Analysis

Learn to intercept, convert, and analyse email attachments using Zapier and OpenAI for maximum productivity.

Natural Language Automation | Build a Knowledge Base from Email | Zapier Central

Beginning to Navigate the "Almost Zone": Exploring Authenticity in AI-Generated Content

How to get AI to write in your own voice?

Add new comment

Restricted HTML

Can You Trust AI Image Analysis? A Zapier and ChatGPT Study

Assessing the reliability of LLM Vision in practical applications

This Article’s Use Case

In this Article

Our Experiment Setup

Images we analysed

The Prompts

Links to the prompts

The Tests - Simple Banner Image

Test 1 - Web Interface - Simple Banner - ChatGPT-4o

Test 2 - Web Interface - Simple Banner - ChatGPT-4

My Observations

Test 3 - The Above Compared with Zapier Automation Vision Event

The Zap

My Observations

The Tests - Complex Catalogue Image - Zapier Automation

The Zap Flow

Test 4 - Zapier Automation - Complex Catalogue - OpenAI Vision

Test 5 - Zapier Automation - Complex Catalogue - OpenAI Conversation ChatGPT-4o

My Observations

The Marked Differences Between the Vision and Conversation Events

Test 6 - Analyse the Catalogue as a PDF Using ChatGPT-4o and ChatGPT-4

PDF Analysis Using ChatGPT-4o

PDF Analysis Using ChatGPT-4

My Observations

Conclusion

TL;DR

Article Resources

The Video Tutorial

Subscribe

Contact Me

Recent Articles

Unlock AI Automation: Harness AI to Automate Email Attachment Analysis

Learn to intercept, convert, and analyse email attachments using Zapier and OpenAI for maximum productivity.

Natural Language Automation | Build a Knowledge Base from Email | Zapier Central

Beginning to Navigate the "Almost Zone": Exploring Authenticity in AI-Generated Content

How to get AI to write in your own voice?

Add new comment

Restricted HTML