Using model-generated content in training causes irreversible defects, a team of researchers says. “The tails of the original content distribution disappears,” writes co-author Ross Anderson from the University of Cambridge in a blog post. “Within a few generations, text becomes garbage, as Gaussian distributions converge and may even become delta functions.”

Here’s is the study: http://web.archive.org/web/20230614184632/https://arxiv.org/abs/2305.17493

  • dan96kid@beehaw.org
    link
    fedilink
    arrow-up
    7
    ·
    2 years ago

    This reminds me of a saying from my programming classes: Garbage in, garbage out. Refers to how inputting bad data WILL make the program produce even more bad data