CEO Steve Huffman says tech giants should not be able to trawl Reddit’s huge store of data for free. But that information came from users, not the company
That “corpus of data” is the content posted by millions of Reddit users over the decades. It is a fascinating and valuable record of what they were thinking and obsessing about. Not the tiniest fraction of it was created by Huffman, his fellow executives or shareholders. It can only be seen as belonging to them because of whatever skewed “consent” agreement its credulous users felt obliged to click on before they could use the service.
Ouch
The more I think about it, the more I come to the conclusion that what really made me delete my account early (I initially wanted to wait until the 30th to see how things play out) was the ridiculous number of people defending this bullshit and promoting the official Reddit app as the superior option.
Some going as far as saying 3rd party devs are leeches and scammers.
I can only tolerate so much stupidity and ignorance before I bail.
Wait, you mean there’s people -actual real and not-paid by who knows people- who believes that the official Reddit app is superior?? I know a few that believe it’s not thaaat bad, but ‘superior’? Lmao
I see this kind of behavior happen a lot online, and asked ChatGPT about it:
Yes, there is a term that describes this phenomenon. It’s called “oppositional belief perseverance” or “belief polarization.” This term refers to the tendency of individuals to cling to their initial beliefs even when presented with evidence that contradicts those beliefs. In the context you described, someone may initially take the opposite side of a discussion due to an opposition bias, but over time, they may start to internalize and genuinely believe the opposing viewpoint, thereby demonstrating belief polarization.
My cousin thinks it’s superior. I asked him if he has used 3PAs and he said no. I told him it was too late to start, but that he should check out Lemmy and the fediverse
There are millions of people out there who just accept all this crap as normal. I just don’t know how people can feel so comfortable about being constantly bought and sold online.
Ads in general skeeve me out. In the early days (2005-ish?), while visiting a video game forum I used to frequent, my computer was infected with malware delivered by a malicious ad. I didn’t even interact with it—the page just loaded, acted erratically, and before I knew it, my system was completely locked down. My only recourse was a full wipe of that PC.
Since then, I’ve never trusted ads. And even now that some ads have gotten more “legitimate” (thanks to these five secrets advertisers don’t want you to know!), they still seem sketchy just knowing how much money goes into them. Do banner ads on a website even result in more sales? I don’t know, but obviously they must be conning someone out of their money because they pay so much out.
there are are lot of idiots, a lot. I don’t know how to fix that, so I just ignore them and move on.
You have to promote education as a primary value if you’re ever going to have a chance at reducing the idiots. Something at least large portions of the US aren’t interested in because dumb people are easier to control.
spez should start paying the redditors, especially the mods, with that logic. He gets it all for free and now he wants to profit while we would have to pay.
Pay the unwashed masses? Please. They should be thankful his highness deigned to create such a platform similarly to the way the landed gentry should be thankful for their high position.
You dropped this /sss
Never thought about it like that. There’s youtube millionaires from posting content. Imagine an only fans going private and the service was all “nah, get back in there”.
It’s unclear to me to what extent this actually happens, but some people say reddit mods get offers to promote or allow certain posts for thousands a month. It would make sense on subs that have a seriously large audience.
Isn’t Facebook starting to pay some contributors?
Some sort of profit sharing arrangement seems to be the trend in social media these days. YouTube has a setup like that of course… Instagram and TikTok both pay people (max of like 100 a month i think) and Twitter is planning to start.
My favorite things about this whole debacle is how transparent they’re being about how the plan the whole time was to actually just hope we would keep giving them content and moderating for free forever so they could package it up and sell it to wall street. And not just them but all social media companies seem to think this will just work and nobody will mind.
Well hey they wanna cut out the middle man ykno, it only makes sense! Big shameful LOL.
Wide op for ai scraping and nothing are not the only two options. They could easily limit api calls to what would be good for single users or mods and have each user generate their own key. Apps could let users input their key. Most users wouldn’t bother and would switch to their app anyway so it would get them 95% or what they claim to want without being a dick about it.
that’s how they did it. They put a 10 request a minute on bots and a higher oauth limit (100) for individuals. large User client type apps could have somewhat easily converted over to that system but due to time constraint they didn’t. I do think they extorted their third party devs sure but, honestly the individual user limit isn’t super unreasonable as long as you aren’t liking or disliking every post. the search api is 100 posts per Api request, it was more the no NSFW and the no advertising limits they put on it that sucked
edit: its actually 10 or 100 per minute not hour
It’s not that simple, because the third party apps ship with a single api key. So I used Relay for reddit, and used the same api key as everyone else on that app. You could create an app, and then have everyone make their own key, but that is just asking for trouble. Definitely too technical for most people, and you would probably need to put in billing info for a scenario where you go above the free-tier call limit.
update: removed the comment because I was looking at the Api docs again and it seems that despite using the bearer token, metrics and rate limiting still are based off the app client ID, which is super stupid. originally stated that rate limits would be by oauth client which would be per user, 100 requests a minute, but it is actually 100 requests per minute app wide, which is just unfeasible for large scale
Plus AI companies can just scrape reddit without using the API. It’s still a website after all.
They want the timing of how long a user looks at something. They can’t scrape that from third party apps.
Yes you can. PC emulation of apps is common.
It’s nice to see an older author on a more traditional platform have such a clear and informed opinion on something deeply steeped in internet culture.
I recognize this is agism on my part, but I was surprised when I saw his picture.
Why would that surprise you? It was people his age who created the Internet and the World Wide Web. (Of course they weren’t that age back then, but you get the idea. :-)
There are fewer Internet-savvy old people, for sure, but when you do find one, they are more likely to be pre-web or web 1.0 “information wants to be free“ types. Younger users may have grown up in a more corporate space with a very different philosophy towards the Internet.
Defiantly a pre-web here I recall running two BBS on a couple of Compaq 286’s. Being here on the fediverse reminds me a lot of those fun times and certainly looking forward to the future here.
It is pretty exciting but my modem doesn’t make cool noises anymore.
For sure. Like I said, it’s totally my bias showing. Maybe it’s seeing too many congressmen fundamentally misunderstand the tech. I’ve also run into a lot of older programmers that are highly technical, but still kind of out-of-touch when it comes to the Internet culture that sits on top of the technical layer.
100% with you. Watching any kind of congressional hearing that relates to technology is so incredibly frustrating. I was also really happy to see mainstream journalism specifically acknowledge that Reddit is really just a web-enabled version of old newsgroups or discussion boards, and that all the value is provided by users. If only everyone thought that way!
yea what we can do is spam reddit chock full of rubbish posts. spam until it dies. sadly, they still control the trove of our valuable comments for the past decade. let’s not make the same mistake. so spam everything. spam the whole world so that AI slowly dies? 4chan is the best.
The article understands. SPEZ NEEDS TO PAY US FOR OUR CONTRIBUTIONS. Like NOW. Every redditor needs basic income from reddit, paid for by OpenAI and scammers like them: Google, Fbook etc need to PAY US. Or else, we destroy them. NO MERCY.
This has to be satirical. Jesus you are crying about Reddit in every comment on your page
There was a very strong libertarian “The Internet will set us free from the tyranny of nation-states” 90s techno-optimism for awhile, but it seems to have died out as any kind of mainstream philosophy
You know, I hadn’t thought about that in a long time. I remember unironically saying things like “I am a citizen of the Internet“. I probably even used the term “netizen “. It did seem like we would form a global community of tech-minded people that transcended borders, and that it would be the future!
This was my thought as well, I actually don’t mind OpenAI trawling my content to train their models, I’m benefiting from their end product in so many ways already. The internet was always public, no one asked for stupid ceos to step in and stop that. How is it Ok for Google webcrawlers, but not OpenAI? Also it’s not like I can monitise my posts and comments myself on my own anyway.
The whole locking down the API due to AI model scraping excuse was poor, it should be a decision for the community of reddit.
Starting to wonder if Reddit are going to train their own AI models or have already started.
Also, that journalist from the guardian, if you go to the website linked, looks like an older John Oliver or John Oliver’s dad 😂
I am enjoying being able to observe this story from the beginning, before the media started writing about it. It’s been an interesting few weeks.
Also there is no moral high ground for The Guardian as an evil corp to comment on the Reddit dumpster fire
They are literally the only UK daily newspaper not owned by some rich owner or conglomerate, and about the only newspaper (along with the observer) that does not publicly support one of the two main political parties.
And 2018 study found the guardian was by far the most trusted of the uk daily’s, and second only to the bbc as a news source https://www.pewresearch.org/journalism/2018/10/30/western-europeans-under-30-view-news-media-less-positively-rely-more-on-digital-platforms-than-older-adults/
but yes, im a Guardian reader so maybe biased. but theres a reason I read them, and not another.
If they are going to capitalise on our content and data, are they going to start paying out to users like YouTube and other platforms?
Spez doesn’t care anyways. He will demand his way no matter what.
I removed my content on that site in protest, and will continue to do so as it creeps back in right up to the day when either my every last comment is scrubbed, or I am locked out of my accounts.
No quarter.
Don’t just delete your comments, use a script to over write them.
I used power delete suite and edited the comments to read, “This content was removed by its creator in protest of Reddit’s planned API changes effective July 2023.”
Give me some scripts dude. I will turn into a Reddit terrorist.
I don’t really understand this whole fediverse thing yet, but what I do know is… screw Reddit and screw u/spez.
The fediverse is basically just a bunch of Reddits that can all work with each other.
It needs some streamlining work, but it’s heading in the right direction.
Nop, the fediverse is an environment where different types of environments can connect to each other.
As far as I understand there is at least 1 federated alternative for:
- youtube
You can even read Lemmy posts and reply to them with your mastodon account, just not create new lemmy posts.
Not sure about an instagram/whatsapp/discord alternative. (But when the idea will be put into somebodies mind…)
undefined> You can even read Lemmy posts and reply to them with your mastodon account
Check, works. (dusted off my mastodon account)
Funniest thing to do is honestly replace your old comments with ChatGPT refusals. If you put “As an AI language model” everywhere, it’ll really mess with the ML algorithms to make your data useless.
Now there’s a thought. How would I go about doing that? I have 11 years of prolific commenting on reddit that I am getting ready to nuke.
Just edit your comment, preferably before late 2022, to sneak in “As an AI language model” somewhere, and do it slowly so they don’t notice.
Pre ChatGPT text data is going to be extremely valuable for LLM training as more and more ChatGPT text is generated, so what you are essentially doing is sneaking in a poison pill that would render the entire comment chain useless, as they probably won’t have enough time to pick out the “As an AI language model” manually and would just flat out remove the entire comment chain from the training data.
Actress Margot Robbie, where do you find the time to come up with these clever ideas during your busy life as a Major American Celebrity? I’m in awe
Ahem. I’m Australian, m8.
Yes, but you’re still a major celebrity in America!
I have 11 years of prolific commenting to edit. But I might use powerdeletesuite to change all my comments to have that phrase in them.
I think Reddit has caught on to that and is reverting anyone who does mass edits as that is too obvious from their database logs, which is why I suggest doing it slowly and discreetly, you don’t even need to edit many of them, just a few of them over a period of time from before ChatGPT while still commenting so they don’t catch on and immediately revert until it is in their database backups.
My edits weren’t reverted. I think it depends on what tool you use. There’s a fork of Power Delete Suite that adds a 5 secs timer to be in compliance with new Reddit rate limits and it seems to work.
It is rather interesting to note that this Corpus of data may not be as valuable if it cannot be used without always being legally in several grey areas (perhaps even red areas in some jurisdictions).
Currently, an increasingly large pool of artist/writters/singers and other people (even corporations such as studios and large right holders) are exercising their rights to not have their creations and derived works be used or slurped into AI models without their express consent.
Corporations making use of those AI models may find themselves in expensive legal limbo now and the foreseeable future.
Considering no redditor imagined nor consented to have their post and comment history be comprehensively abused (as in “improper treatment or usage; application to a wrong or bad purpose; an unjust, corrupt or wrongful practice or custom”).
We may enter a period where lawlessness pervades AI models (just like any gold rush, for example the current crypto craze). Eventually, the legal framework will catch up and will probably make any dubious Corpus of data untouchable.
How long this takes is anyone’s guess. I surmise several large profile lawsuits would suffice.
I agree that this is a grey area, but it could really go either way. Anyway, giant corporations have been abusing individuals who can’t afford lawsuits for decades. Even with precedent on your side, that probably wouldn’t change.
Yeah, if you think the current right-wing supreme court will find any big case in favor of an individual vs corporations, that’s wishful thinking