Crimson Longinus
Legend
As someone said early on, these concerns will seem quaint in 5 or 10 years.
Indeed. At that point the last surviving humans probably have more pressing concerns than the copyright.

As someone said early on, these concerns will seem quaint in 5 or 10 years.
To me, the simplest difference is that there are methods to let Google (Bing, DuckDuckGo, etc...) know that a page should not be crawled or indexed, and search engines (at least all the major ones) will respect that and not do so.Thus far it seems to me that if I had real piracy concerns with AI, I'd also have to have them for google to be ethically consistent. I'm interested if I'm missing anything.
and yet it is much more probable than the ‘truth’ your textual analysis or courts arrive at. So why do you call those truths
This is the only answer you will receive from me.Semantics. And, as a result, we're done here. I'm not continuing this conversation with you.
This is a good point. I think LLMs would benefit from much more transparency and user choice in this regard.To me, the simplest difference is that there are methods to let Google (Bing, DuckDuckGo, etc...) know that a page should not be crawled or indexed, and search engines (at least all the major ones) will respect that and not do so.
I would imagine there are some more ethical LLMs that follow the same rules, and many that do not. But if someone has a website that is being crawled by search, then they are either choosing to allow it, or are not aware of the method to label it as essentially "do not crawl this".
This is a good point. I think LLMs would benefit from much more transparency and user choice in this regard.
Except, as I have repeatedly pointed out, those are two very different things.It's a new ethical issue. I'm thinking it through.
Thus far it seems to me that if I had real piracy concerns with AI, I'd also have to have them for google to be ethically consistent. I'm interested if I'm missing anything.
And we're not talking about those other useful applications. We're talking about gaming books "produced or enhanced" with AI instead of paying actual writers and artists.I think the discussion has a broader scope than that. We can think it is unethical to use AI for creative material in commerical products while thinking it has loads of very useful applications elsewhere.
From Wikipedia:I'm sorry Faolyn but this statement seems to be outright false. The largest source of (weighted) training data for chatGPT-3, for example, was Common Crawl, which is not going to differ substantially from google. Maybe google is primarily using pirated stuff. But you rejected that, and in that case LLMs are not built almost entirely on pirated material.
The Common Crawl dataset includes copyrighted work and is distributed from the US under fair use claims. Researchers in other countries have made use of techniques such as shuffling sentences or referencing the Common Crawl dataset to work around copyright law in other legal jurisdictions.
Because it doesn't. AI takes away from actual human work and fails to pay the humans whose work was scraped, and the people who are using AI are not hiring actual humans to do the work.I guess this gets back to why I care. I see some of the statements in this thread. And critics of AI just get stuff wrong all the time in their rush to attack it. The last few posts talk about burning books and say "AI has brought nothing of interest or of value".
I don't think this discussion is progressing. I don't think your example is a good one.Except, as I have repeatedly pointed out, those are two very different things.
I've discussed ways to implement content moderation before.The managers and employees can their best to keep bad things out of the store, but, well, everything is for sale. So it's a bit hard, because the store is so big it might as well be infinite, at least by human perspective. And there are only a finite number of managers and employees.
A forum poster would just make a new account...Kind of how like the majority of wikidot pages do not contain pirated material. Google can't really de-index the 5e wikidot because all the owners would need to do is change the URL.
That seems a reasonable trade off.Google would have to de-index the entire website, most of which is completely legitimate.
There are a lot of discussion boards. They all need moderators?Also, not only would Google have to do de-index it, but every other search engine as well. I didn't even know Webcrawler still existed!
No. My point was that google and common crawl look at similar data.Did you get confused by their terminology and think that Common Crawl used only public domain material?
Even if that's true that doesn't mean it provides nothing of value. Speed and efficiency matter. And I know it provides things of value because it provides value to me.That is why AI has brought nothing of interest or value, because anything it can do, a human can do better.