E.g. ollama/hugging face

  • DigitalDilemma@lemmy.ml
    link
    fedilink
    English
    arrow-up
    3
    ·
    2 days ago

    You can’t trust an inherantly untrustworthy industry.

    The problem is that to make a good AI, you need a lot of input and we know from leaks and reports that many/most of the major players deliberately ignored copyright to train their models. If it was reachable, they used it. Are using it. Will use it. Like Johnny 5, there’s no limit to the data they want, or that their handlers want to feed them with. They’re the Cookie Monster at a biscuit factory.

    So when the question of trust comes up, you’d have to be pretty forgiving to overlook that they’re built on foundations of theft, and pretty naive to assume these companies have suddenly grown ethics and won’t use your data and input to train with, even when you’re using commercial systems that promise they won’t.

    Even in the event that there is an ethical provider that does their utmost to ensure your data doesn’t migrate (these do exist, at least in intention), this is an incredibly fast moving, ultra-competitive market where huge amounts of data are shifted around constantly and guardrails being notoriously hard to accurately define, let alone enforce. It’s inevitable stuff will leak.