Using supervised fine-tuning (SFT) to introduce even a small amount of relevant data to the training set can often lead to strong improvements in this kind of “out of domain” model performance. But the researchers say that this kind of “patch” for various logical tasks “should not be mistaken for achieving true generalization. … Relying on SFT to fix every [out of domain] failure is an unsustainable and reactive strategy that fails to address the core issue: the model’s lack of abstract reasoning capability.”

Rather than showing the capability for generalized logical inference, these chain-of-thought models are “a sophisticated form of structured pattern matching” that “degrades significantly” when pushed even slightly outside of its training distribution, the researchers write. Further, the ability of these models to generate “fluent nonsense” creates “a false aura of dependability” that does not stand up to a careful audit.

As such, the researchers warn heavily against “equating [chain-of-thought]-style output with human thinking” especially in “high-stakes domains like medicine, finance, or legal analysis.” Current tests and benchmarks should prioritize tasks that fall outside of any training set to probe for these kinds of errors, while future models will need to move beyond “surface-level pattern recognition to exhibit deeper inferential competence,” they write.

    • BlameThePeacock@lemmy.ca
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      4 days ago

      I’ve met far too many people I wouldn’t trust to give me a reasoned response.

      Some people simply lack that capacity entirely, some just don’t care enough to spend the effort on it, while others are trying to deceive me intentionally.

      • Catoblepas@piefed.blahaj.zoneOP
        link
        fedilink
        English
        arrow-up
        11
        ·
        4 days ago

        LLMs are incapable of reasoning. There is not a consciousness in there deciding and telling you things. My comment was entirely about whether LLMs can reason, not whether all people reason at the same level or might decide to trick you.

        • BlameThePeacock@lemmy.ca
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          1
          ·
          4 days ago

          I don’t disagree with you that LLMs don’t reason. I disagree that all Humans can or do reason.

          • TehPers@beehaw.org
            link
            fedilink
            English
            arrow-up
            5
            ·
            4 days ago

            I disagree that all Humans can or do reason.

            Well if we’re talking about all humans…

            But more seriously, it doesn’t take much looking to find someone who doesn’t reason. Just look on the TV during the next major election and you’ll find a bunch.