In another thread, I posted the article in nature about the energy question.
As AI systems proliferate, their greenhouse gas emissions are an increasingly important concern for human societies. In this article, we present a comparative analysis of the carbon emissions associated with AI systems (ChatGPT, BLOOM, DALL-E2, Midjourney) and human individuals performing...
www.nature.com
Here's the relevant comparison for image generation:
While AI's energy used for creating an image is indeed massive, the energy used to power the computer while the human generates the image with computer programs is tremendously higher.
While the human CO2 emission total isn't something that can really be saved because we must assume that the human would still be living while not generating an image (the human wasn't created for the purpose of generating images), probably producing CO2, the relevant part is the comparison between midjourney generation and computer graphic human generation of a single image.
If we were really concerned about our energy consumption for having image, we should ban human image generators to use laptop or worse desktop and focus on really picking up a brush and do a painting on a canvas. And even then, it's quite possible that the environmental cost of moving the brush, paint and canvas from China to the location of the human and moving the result to the customer (possibly with several back and forth for complex work) would exceed IA image generation's total cost.
Thank god we accept to worry not about energy spending for our hobbies!
How much energy does it take to generate one image?
Depends on the image, the model and the computer your use. Image size matter, and generating a 4k image will require more energy than a 1k image, obviously. It tends to be proportional. Model complexity will increase the amount of compute needed to output an image, by a lot. For example, a 4090 RTX Nvidia graphic card can output 30 images per seconds using a 0.6-billions parameters model, but it takes 20 seconds to generate one image with a 12-billions parameters model and the same card. Then the energy depends on the efficiency of the processor. A 4090 nvidia cards has a top consumption of 450 W, but more optimized, professional GPU designed for servers can be much more efficient. How long you accept to wait for the result also matter (accepting a drop in performance for energy saving). For example, to run a local state-of-the-art model, you could use a 4090 for 20 seconds and get an image for 2.5 W, or you could get the same image in twice the energy by using an earlier generation card, or for 1.9 W on a H100-equipped server (newer chips are more efficient, newer models are more compute-hungry).
You won't need to ride your bike for long!