Evaluating LLM-Generated Topic Names via Text Reconstruction
Abstract
Automatically generating topic names for texts using large language models (LLMs) has become an innovative approach to topic detection. However, evaluating the quality of these LLM-generated topic names remains challenging, particularly in assessing their semantic relevance to the texts and the correctness of the information they convey. To address this gap, we propose a novel evaluation method that leverages LLMs to reconstruct original texts from generated topic names, then compares the reconstructed texts to the original by measuring their similarity. Topic names that produce reconstructed texts more similar to original ones better convey the original text’s information. This method favors topic names that maintain essential information, minimizing issues like incorrectness and irrelevance. Our experiments show that the reconstruction-based evaluation aligns with human topic name evaluation. This novel method demonstrates its versatility for evaluating other LLM-generated semantic compressions, such as summaries, headlines, and keywords.