There are a lot of people out there that think LLM’s are somehow reasoning. Even reasoning models aren’t really doing it. It important to do demonstrations like this in the hopes that the general public will understand the limitations of this tech.
It is important to do demonstrations like this in the hopes that the general public will understand the limitations of this tech.
THIS is the thing. The general public’s perception of ChatGPT is basically whatever OpenAI’s marketing department tells them to believe, plus their single memory of that one time they tested out ChatGPT and it was pretty impressive. Right now, OpenAI is telling everyone that they are a few years away from Artificial General Intelligence. Tests like this one demonstrate how wrong OpenAI is in that assertion.
I think the problem is that, while the model isn’t actually reasoning, it’s very good at convincing people it actually is.
I see current LLMs kinda like an RPG character build with all ability points put into Charisma. It’s actually not that good at most tasks, but it’s so good at convincing people that they start to think it’s actually doing a great job.
But the general public (myself included) doesn’t really understand how our own reasoning happens.
Does anyone, really? i.e., am I merely a meat computer that takes in massive amounts of input over a lifetime, builds internal models of the world, tests said models through trial-and-error, and outputs novel combinations of data when said combinations are useful for me in a given context in said world?
Is what I do when I “reason” really all that different from what an LLM does, fundamentally? Do I do more than language prediction when I “think”? And if so, what is it?
This is definitely part of the issue, not sure why people are downvoting this. That’s also why tests like this are important, to illustrate that thinking in the way we know it isn’t happening in these models.
Anyone even believing that a generic word auto completer would beat classic algorithms wherever possible probably belongs into a psychiatry.
There are a lot of people out there that think LLM’s are somehow reasoning. Even reasoning models aren’t really doing it. It important to do demonstrations like this in the hopes that the general public will understand the limitations of this tech.
THIS is the thing. The general public’s perception of ChatGPT is basically whatever OpenAI’s marketing department tells them to believe, plus their single memory of that one time they tested out ChatGPT and it was pretty impressive. Right now, OpenAI is telling everyone that they are a few years away from Artificial General Intelligence. Tests like this one demonstrate how wrong OpenAI is in that assertion.
I think the problem is that, while the model isn’t actually reasoning, it’s very good at convincing people it actually is.
I see current LLMs kinda like an RPG character build with all ability points put into Charisma. It’s actually not that good at most tasks, but it’s so good at convincing people that they start to think it’s actually doing a great job.
But the general public (myself included) doesn’t really understand how our own reasoning happens.
Does anyone, really? i.e., am I merely a meat computer that takes in massive amounts of input over a lifetime, builds internal models of the world, tests said models through trial-and-error, and outputs novel combinations of data when said combinations are useful for me in a given context in said world?
Is what I do when I “reason” really all that different from what an LLM does, fundamentally? Do I do more than language prediction when I “think”? And if so, what is it?
This is definitely part of the issue, not sure why people are downvoting this. That’s also why tests like this are important, to illustrate that thinking in the way we know it isn’t happening in these models.
downvotes are not allowed on beehaw fyi
Downvotes aren’t federated but you still see all the downvotes sent from just your own instance
Interesting. I figured since this post is in a Beehaw community they would be invisible to everyone, but good to know.
That’s too much critical thinking for most people