so I was looking at civitai and saw this as part of one model's description "This new model was fine-tuned using a vast collection of public domain images"
I don't think civitai would be the best place to look for, as their models are finetunes or merge of existing models, which are accused of being unethically trained. Therefore any model being built on top of that would generate the same reaction, I guess, depending on one view of what is ethical. Microsoft does claim its AI to be ethical, and who knows how Dall-E's dataset was obtained, but OpenAI is sued for its LLM model, so maybe they are not as ethical as they claim. Firefly is undoubtedy clean (because Adobe acquired the rights on the images they used), but, since training a base model from scratch is extremely expensive, there will be few effort to make a state of the art model out of only public-domain data unless there is a clear determination that it is illegal to do "business as utsual".
Edit: I forgot about pixart-alpha model being trained on CC-0 dataset: https://mpost.io/akash-networks-mainnet-8-upgrade-boosts-visibility-for-cloud-gpu-operations/
But it's more for demonstration of their traing method (which is, they say, very cost-effective, reducing the need in computation power to around 30k$)than to provide state of the art generation.
Last edited: