LLMs CAN'T REASON YET

Nacho Beites
11 nov 2024
2 Min. de lectura

Actualizado: 5 dic 2024

Why AI Language Models Don’t Really “Think” Like Us?

AI language models, like those created by OpenAI and Google, can seem incredibly smart. They answer questions, help write essays, create visual art, generate poetry, and even code. But recent research by Apple suggests they’re not actually thinking—at least, not in the way humans do. Let’s break down why these models might not be as “intelligent” as they appear and what it means for the future of AI.

Pattern-Matching, Not Thinking

At their core, large language models (LLMs) are highly advanced pattern-matchers. They’re trained on huge amounts of text to predict what word or phrase should come next in a sentence. This ability to follow patterns makes it seem like they understand questions and give thoughtful answers. However, Apple’s research shows that they’re just recognizing familiar patterns rather than understanding meaning or reasoning logically.

Imagine someone who has memorized a lot of trivia but doesn’t truly understand any of it. When asked a question, they might respond correctly by recalling similar patterns they’ve seen before, but they aren’t actually reasoning about it. That’s essentially how LLMs work.

The GSM-Symbolic Test: Exposing the Limits

Apple designed a test, called GSM-Symbolic, to examine how well LLMs handle questions that require logical thinking, especially with math and problem-solving. They found that when the same question is asked in slightly different ways, these AI models often give inconsistent answers. Why? Because they’re not actually “understanding” the question. Instead, they’re just matching the question to a pattern they’ve seen in their training data.

When the wording or numbers in a question change, the model can easily get confused. It often can’t keep its responses consistent, and adding unrelated information to a question can throw it off even more. This reveals a major limitation: the models don’t genuinely know the concepts they’re talking about.

Why It Matters

The inability to truly reason is a big limitation for LLMs, especially if we hope to use them in areas that require reliable, logical decisions—think of healthcare or finance. Today’s LLMs are great for generating text and following simple instructions, but they’re far from having the consistent, logical thought process that humans do.

Beites' test

So I have decided to create my own test (much simplier than GSM), using Rebus Puzzles (visual riddles). I have chosen an easy one to start with: