The Riddle Experiment: two groups are trying to solve a Black Story behind a screen, only one group is alive

Nikki S. Rademaker; Linthe van Rooij; Yanna E. Smid; Tessa Verhoef

Authors

Nikki S. Rademaker University of Leiden
Linthe van Rooij University of Leiden
Yanna E. Smid University of Leiden
Tessa Verhoef University of Leiden

Abstract

Investigating the cognitive abilities of large language models (LLMs) can inform theories about both artificial and human intelligence and highlight areas where AI may complement human cognition. This study explores GPT-4’s logical reasoning abilities by comparing its performance in solving Black Story riddles to that of humans. Black Stories are riddles where players reconstruct a hidden narrative by asking yes-or-no questions to a player who knows the full story. These riddles test logical reasoning, creativity, and inference skills of the solvers in an interactive setting. The study utilized a set of 12 existing Black Stories, with deviations in details included. Each Black Story was tested twice in the human and GPT-4 group to minimize individual differences. The experiment was conducted via text messaging to align the testing set-up for the two groups and eliminate potential non-verbal advantages for the human test group. The primary performance indicator was the number of questions needed to solve the riddle, considering the number of given hints to come to the solution. This measure indicated no significant difference between the groups, where both groups managed to arrive at the correct answer eventually. Though GPT-4 was significantly more verbose in questioning than humans, and qualitative results showed that GPT-4 excelled in precise questioning and creativity, but often fixated too much on details. This led to missing the bigger picture and summarizing solutions prematurely. On the other hand, humans covered broader topics and adapted their focus quickly, but had more difficulty figuring out uncommon details. This research suggests that the performance of GPT-4 and humans in solving Black Stories is not significantly different, despite using alternative approaches to achieve results.

The Riddle Experiment: two groups are trying to solve a Black Story behind a screen, only one group is alive

Authors

Abstract

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)