Using supervised fine-tuning (SFT) to introduce even a small amount of relevant data to the training set can often lead to strong improvements in this kind of “out of domain” model performance. But the researchers say that this kind of “patch” for various logical tasks “should not be mistaken for achieving true generalization. … Relying on SFT to fix every [out of domain] failure is an unsustainable and reactive strategy that fails to address the core issue: the model’s lack of abstract reasoning capability.”

Rather than showing the capability for generalized logical inference, these chain-of-thought models are “a sophisticated form of structured pattern matching” that “degrades significantly” when pushed even slightly outside of its training distribution, the researchers write. Further, the ability of these models to generate “fluent nonsense” creates “a false aura of dependability” that does not stand up to a careful audit.

As such, the researchers warn heavily against “equating [chain-of-thought]-style output with human thinking” especially in “high-stakes domains like medicine, finance, or legal analysis.” Current tests and benchmarks should prioritize tasks that fall outside of any training set to probe for these kinds of errors, while future models will need to move beyond “surface-level pattern recognition to exhibit deeper inferential competence,” they write.

  • teawrecks@sopuli.xyz
    link
    fedilink
    arrow-up
    14
    ·
    4 days ago

    The analogy I use is, it’s like a magician pulled a coin from behind a CEO’s ear, and their response was “that’s incredible! Free money! Let’s go into business together!”

    Literally no one ever claimed it had reasoning capabilities. It is a trick to produce a string of characters that your brain can make sense of. That’s all.

    • anachronist@midwest.social
      link
      fedilink
      English
      arrow-up
      7
      ·
      4 days ago

      Literally no one ever claimed it had reasoning capabilities

      Altman and similar grifters were and are absolutely making those claims but maybe we’re excusing them as obvious liars?

      • TehPers@beehaw.org
        link
        fedilink
        English
        arrow-up
        5
        ·
        4 days ago

        They are obvious liars. Some people are just too invested to see it.

        These models only have reasoning capabilities using the most obscure definitions of “reasoning”. At best, all they’re doing are climbing to local maxima with their so-called “reasoning” on a graph as wavy as the ocean.

        I’ve mentioned this on other posts, but it’s really sad because LLMs have been wildly incredible for certain NLP operations. They are that though, not AGI or whatever snake oil Altman wants to sell this week.