On energy use: Training models is super-expensive, no disagreement. But per-token use is not as large as most people think.
Studies on GPT 4o ( the model we use most) shows that expected energy use of a 500 token query is about 0.3 watt-hours. Now to help understand what that means in real terms, the average American home uses about 30 kWh a day. So, doing the math, the average house uses the same energy per day as about 100,000 500-token queries
The studies people are using also assumes the use of the old H100 chips. As you probably know, the newer chips are more efficient. I’m not sure exactly how much. But based on pricing, probably 2x.
So a good estimate of how much energy is used is: 200k queries is about the same as a house.
Is that a lot? Well, it really doesn’t seem like it to me, but YMMV. If each of these queries saves a nurse 5 minutes of boring data entry, a house day worth of energy saves nearly 250 hours - or 30 nurse days.
So it’s really a question to you - do you think saving 30 days of tedious data entry is worth a day’s worth of house energy?