Menu
News
All News
Dungeons & Dragons
Level Up: Advanced 5th Edition
Pathfinder
Starfinder
Warhammer
2d20 System
Year Zero Engine
Industry News
Reviews
Dragon Reflections
White Dwarf Reflections
Columns
Weekly Digests
Weekly News Digest
Freebies, Sales & Bundles
RPG Print News
RPG Crowdfunding News
Game Content
ENterplanetary DimENsions
Mythological Figures
Opinion
Worlds of Design
Peregrine's Nest
RPG Evolution
Other Columns
From the Freelancing Frontline
Monster ENcyclopedia
WotC/TSR Alumni Look Back
4 Hours w/RSD (Ryan Dancey)
The Road to 3E (Jonathan Tweet)
Greenwood's Realms (Ed Greenwood)
Drawmij's TSR (Jim Ward)
Community
Forums & Topics
Forum List
Latest Posts
Forum list
*Dungeons & Dragons
Level Up: Advanced 5th Edition
D&D Older Editions, OSR, & D&D Variants
*TTRPGs General
*Pathfinder & Starfinder
EN Publishing
*Geek Talk & Media
Search forums
Chat/Discord
Resources
Wiki
Pages
Latest activity
Media
New media
New comments
Search media
Downloads
Latest reviews
Search resources
EN Publishing
Store
EN5ider
Adventures in ZEITGEIST
Awfully Cheerful Engine
What's OLD is NEW
Judge Dredd & The Worlds Of 2000AD
War of the Burning Sky
Level Up: Advanced 5E
Events & Releases
Upcoming Events
Private Events
Featured Events
Socials!
EN Publishing
Twitter
BlueSky
Facebook
Instagram
EN World
BlueSky
YouTube
Facebook
Twitter
Twitch
Podcast
Features
Million Dollar TTRPG Crowdfunders
Most Anticipated Tabletop RPGs Of The Year
Tabletop RPG Podcast Hall of Fame
Eric Noah's Unofficial D&D 3rd Edition News
Top 5 RPGs Compiled Charts 2004-Present
Adventure Game Industry Market Research Summary (RPGs) V1.0
Ryan Dancey: Acquiring TSR
Q&A With Gary Gygax
D&D Rules FAQs
TSR, WotC, & Paizo: A Comparative History
D&D Pronunciation Guide
D&D in the Mainstream
D&D & RPG History
About Morrus
Log in
Register
What's new
Search
Search
Search titles only
By:
Forums & Topics
Forum List
Latest Posts
Forum list
*Dungeons & Dragons
Level Up: Advanced 5th Edition
D&D Older Editions, OSR, & D&D Variants
*TTRPGs General
*Pathfinder & Starfinder
EN Publishing
*Geek Talk & Media
Search forums
Chat/Discord
Menu
Log in
Register
Install the app
Install
Upgrade your account to a Community Supporter account and remove most of the site ads.
ShortQuests -- individual adventure modules! An all-new collection of digest-sized D&D adventures designed to plug in to your game.
Community
General Tabletop Discussion
AI Echo Cave
AI art bans are going to ruin small 3rd party creators
JavaScript is disabled. For a better experience, please enable JavaScript in your browser before proceeding.
You are using an out of date browser. It may not display this or other websites correctly.
You should upgrade or use an
alternative browser
.
Reply to thread
Message
<blockquote data-quote="Gorgon Zee" data-source="post: 9886157" data-attributes="member: 75787"><p>well ... no.</p><p></p><p>Dealign with the second point first, current main belief its that people store information as concepts -- we relate concepts to each other and that is how we build knowledge. GenAI very explicitly does not have concepts -- it deals entirely with expressions. So when you read Lord of the Rings for the first time (you lucky thing!), you mind is changed by creating a new concept, fore example, an Ent which your mind links to other concepts, like trees, Tolkien, fantasy, that story you read in 3rd grade -- a whole host of linkages. An LLM does not do that, it remembers only links between the words. So it remembers all the words that Tolkien used in passages about Ents, and those words are liked to other words.</p><p></p><p>This is a really crucial difference as it explains why LLMs are more likely to violate copyright than you are. In fact, there is a good argument to make that what LLMs store is actually a lossy compression of the materials they train on. They store relationships between words, and so when they produce text, they really want to reproduce the material they were trained on.</p><p></p><p>Here's a relevant article: <a href="https://arxiv.org/abs/2505.12546" target="_blank">Extracting memorized pieces of (copyrighted) books from open-weight language models</a>, with abstract:</p><p><em>... Through thousands of experiments, we show that the extent of memorization varies both by model and by book. With respect to our specific extraction methodology, we find that most LLMs do not memorize most books -- either in whole or in part. However, we also find that Llama 3.1 70B entirely memorizes some books, like the first Harry Potter book and 1984. In fact, the first Harry Potter is so memorized that, using a seed prompt consisting of just the first few tokens of the first chapter, we can deterministically generate the entire book near-verbatim. </em></p><p></p><p>Llama is ancient, and most modern LLMs have specific safeguards that prevent this happening. However, they are post-training add-ons that prevent the LLMs doing what they have been trained to do, rather than being fundamentally different.</p><p></p><p>TLDR: Humans train via concepts; LLMs via words. Which is why we are terrible at memorizing words and LLMs are terrible at conceptual thought.</p><p></p><p>For your first point, this has not been established in courts as a general rule. I know some courts have said that if you have the rights to a piece of work, then training on that works creates a derivative work that is not subject to copyright laws. However, that decision did not have the evidence we now have that LLMs can reproduce copyright material near verbatim. There are also plenty of other legal opinions. For example, the legal professionals I have worked with do not believe that anyone has the right to train LLMs on Private Health Information, and then use the trained model outside of care for those specific patients. Some vendors have legal experts who disagree, and believe that if you use the LLM both for those patients AND others, it's OK. I admit that we are on there conservative end of this debate (and I am happy to be there), but we do have to admit this is not a cut and dried opinion.</p><p></p><p>To be clear, if you believe that no form of training is stealing, you re saying it is OK to train LLMs on your personal data, financial, medical and other, knowing that there is a good chance the LLM can reproduce that on demand for any use by any users. I think for most people, they would prefer that was not the case.</p></blockquote><p></p>
[QUOTE="Gorgon Zee, post: 9886157, member: 75787"] well ... no. Dealign with the second point first, current main belief its that people store information as concepts -- we relate concepts to each other and that is how we build knowledge. GenAI very explicitly does not have concepts -- it deals entirely with expressions. So when you read Lord of the Rings for the first time (you lucky thing!), you mind is changed by creating a new concept, fore example, an Ent which your mind links to other concepts, like trees, Tolkien, fantasy, that story you read in 3rd grade -- a whole host of linkages. An LLM does not do that, it remembers only links between the words. So it remembers all the words that Tolkien used in passages about Ents, and those words are liked to other words. This is a really crucial difference as it explains why LLMs are more likely to violate copyright than you are. In fact, there is a good argument to make that what LLMs store is actually a lossy compression of the materials they train on. They store relationships between words, and so when they produce text, they really want to reproduce the material they were trained on. Here's a relevant article: [URL="https://arxiv.org/abs/2505.12546"]Extracting memorized pieces of (copyrighted) books from open-weight language models[/URL], with abstract: [I]... Through thousands of experiments, we show that the extent of memorization varies both by model and by book. With respect to our specific extraction methodology, we find that most LLMs do not memorize most books -- either in whole or in part. However, we also find that Llama 3.1 70B entirely memorizes some books, like the first Harry Potter book and 1984. In fact, the first Harry Potter is so memorized that, using a seed prompt consisting of just the first few tokens of the first chapter, we can deterministically generate the entire book near-verbatim. [/I] Llama is ancient, and most modern LLMs have specific safeguards that prevent this happening. However, they are post-training add-ons that prevent the LLMs doing what they have been trained to do, rather than being fundamentally different. TLDR: Humans train via concepts; LLMs via words. Which is why we are terrible at memorizing words and LLMs are terrible at conceptual thought. For your first point, this has not been established in courts as a general rule. I know some courts have said that if you have the rights to a piece of work, then training on that works creates a derivative work that is not subject to copyright laws. However, that decision did not have the evidence we now have that LLMs can reproduce copyright material near verbatim. There are also plenty of other legal opinions. For example, the legal professionals I have worked with do not believe that anyone has the right to train LLMs on Private Health Information, and then use the trained model outside of care for those specific patients. Some vendors have legal experts who disagree, and believe that if you use the LLM both for those patients AND others, it's OK. I admit that we are on there conservative end of this debate (and I am happy to be there), but we do have to admit this is not a cut and dried opinion. To be clear, if you believe that no form of training is stealing, you re saying it is OK to train LLMs on your personal data, financial, medical and other, knowing that there is a good chance the LLM can reproduce that on demand for any use by any users. I think for most people, they would prefer that was not the case. [/QUOTE]
Insert quotes…
Verification
Post reply
Community
General Tabletop Discussion
AI Echo Cave
AI art bans are going to ruin small 3rd party creators
Top