Tailoring LLM-generated image captions to user needs

Emiel van Miltenburg; Ivonne van der Heiden

Authors

Emiel van Miltenburg Tilburg University
Ivonne van der Heiden Tilburg University

Abstract

One of the original motivations for the development of image captioning systems is to make visual content accessible for people who are blind or visually impaired. What seemed like a huge challenge fifteen years ago, has now made it into consumer products: large language models such as ChatGPT are seemingly able to describe images in fluent natural language. But it is still unclear to what extent the generated descriptions actually match user needs. This study investigates the quality of LLM-generated image descriptions in the context of Dutch news articles. We operationalise output quality based on earlier user studies and existing image description guidelines, and present an extensive evaluation protocol that may be used in future research to assess the quality of automatically generated image descriptions.

Tailoring LLM-generated image captions to user needs

Authors

Abstract

Downloads

Published

Issue

Section

How to Cite