AI is nasty when reading clocks
Nowadays, artificial intelligence can generate photorealistic images, write novels, do its homework and even Predict protein structuresS However, new research has revealed that it often fails with a very basic task: a story time.
Researchers at the University of Edinburgh have tested the ability of seven well-known multimodal large language models of AI, which can interpret and generate different types of media-to answer questions related to time, based on various images of clocks or calendars. Their study that is up to April and He is currently hosting On the front server, ARXIV demonstrates that LLMS has difficulty with these basic tasks.
“The ability to interpret and think the time of visual entrances is crucial to many applications in the real world-from planning events to autonomous systems,” the researchers wrote in the study. “Despite the progress of multimodal large language models (mllms), most works focus on detecting objects, inscription images or understanding of the stage, leaving the time conclusion not to apply.”
The team tests GPT-4O and GPT-O1 of Openai; Google DeepMind’s Gemini 2.0; Anthropic’s Claude 3.5 Sonnet; Llama of Meta 3.2-11b-Instruct; QWEN2-VL7B-instructions of Alibaba; and Modelbest Minicpm-V-2.6. They nourish the models different images of analog clocks – time with Roman numerals, different dial colors and even some missing seconds hand – as well as 10 years of calendar images.
For the images of the clock researchers asked llms, wThe hat of the hat is shown on the clock in the given image? For the calendar images, researchers asked simple questions such as wThe day of the week’s hat is New Year? and more difficult requests including wThe hat is the 153rd day of the year?
“The analogue reading of the clock and the understanding of the calendar include complex cognitive steps: they require fine -grained visual recognition (eg position of the clock handle, living cell layout) and non -trivial numerical reasoning (eg calculation of compensations of the day),”
Overall, the AI ​​systems did not perform well. They read the time of analog watches correctly less than 25% of the time. They struggled with clocks carrying Roman numerals and styling their hands as much as with clocks, missing a completely second hand, indicating that the problem can stem from the detection of the hands and interpret the corners of the face of the clock, according to the researchers.
Google’s Gemini-2.0 scored the highest in the task of the team clock, while the GPT-O1 was accurate in the calendar task 80% of the time-far better result than its competitors. But even then, the most successful mllm in the calendar task still makes mistakes about 20% of the time.
“Most people can say time and use calendars from an early age. Our discoveries emphasize a significant abyss in the ability of AI to fulfill what are the very basic skills for people, “says Rohit Saksena, co -author of the study and doctorate at the University of Edinburgh School of Informatics, says statementS “These disadvantages must be considered if AI systems need to be successfully integrated into time -sensitive applications in the real world, such as planning, automation and auxiliary technologies.”
So while AI can be able to finish your homework, do not rely on sticking to any time.