Upgrade your account to a Community Supporter account and remove most of the site ads.
Rocket your D&D 5E and Level Up: Advanced 5E games into space! Alpha Star Magazine Is Launching... Right Now!

Reply to thread

Message: <blockquote data-quote="Jfdlsjfd" data-source="post: 9237068" data-attributes="member: 42856">You can't really prevent artists from managing their own IP by forbidding deals where they sell all rights to their creations for a lump sum. If you could it wouldn't change the situation a lot except of the very short term: the existing datasets wouldn't be affected (so the Adobe situation wouldn't change) and progress in AI is showing that the size of the dataset matters less than the quality of the captioning. Dall-E 2 used a smaller dataset than Dall-E 1 and while Microsoft didn't pubish a lot of information on Dall-E 3, a few papers from them let think that its ability to follow prompt comes from better (AI-generated) captioning.It is a big problem with dataset scraped from the Internet: they lack a good caption and using the alt tags isn't sufficent. Let's take an example: look at the image used in this board to illustrate the topic. What do you see? I'd say "a steampunk workshop, with lamp bulbs, a gauge from an unknown machine and a box on the table in front of the image, a window opening toward a drab street, a bookshelf in the background. At the center of the image are five colourful cards from Magic, leaning on cylindral box. The lighting is warm and comes both from the lamps and the viewer of the image". This is a basic description that could be used in training. If the site was scraped to train an AI, it would have the site's image and the alt, which is currently "screenshot-2024-01-07-at-18-38-32-png". That's not very useful as an alt (so if you're blind, you've no way of having information of what is displayed) but that reflects the poor state of alts on the Internet, few people know about the use of alt -- at best they use it to have a pop-up text when the mouse is on the image and provide additional information, like "buy one, get one free until January 31st". Most of the images within scraped dataset are tagged like this one, and they won't really help the AI training. D3 worked on an undisclosed database of image, but they used a process (probably using chatgtp-vision? That's my guess but they could have paid for distributed work from a low-wage country) to provide accurate caption, far reducing the size of the database the training process needed and improving quality, because even in detailed alts, it's not common to provide positional details or things that are readily assumed by human readers).And public artworks database are being released (Europeanea, a project from the EU commission, released a 25 million high-quality image database from artworks in European museum with a licence that does allow training) and they aim for 58 millions. That's a fourth of what Adobe has, but you can get a decent experimental model with as low as 14 millions images. If a worldwide regulation somehow disallowed flat selling of IP, it would just prompt training from datasets like this after a captioning pass. There are community efforts to train on this right now,  but the computing costs are high for enthusiasts, so it's not happening overnight, unfortunately. On the other hand, I doubt it would stop Adobe, Microsoft, Amazon or Alibaba from literally doing it overnight.</blockquote>

Verification