I now know that (1+2+3+4+5+6+1+2+3+4+5+6+3+4+5+6)/16 does 3.75, which translates to "all the possible outcomes summed up and divided by their number" which is precisely how you average something.
He actually gave you a weighted arithmetic mean formula, not the "standard" average.
(1+2+3+4+5+6+1+2+3+4+5+6+3+4+5+6)/16
is just
(6*3.5 + 6*35 + 1*3 + 1*4 + 1*5 + 1*6) / (6 + 6 + 1 + 1 + 1 + 1)
In other words, the exact weighted average formula.
If you want a "standard" average, there are actually 36 events to consider, not 16. You can abstractly think of the damage as a 1d6 roll followed by another 1d6 roll, where 3s, 4s, 5s, and 6s in the first roll will do a second roll with 6-sided dice where the faces all the same number. Essentially, you have 6 * 6 rolls, or 36 rolls.
If you roll a 1, you get 1, 2, 3, 4, 5, or 6.
If you roll a 2, you get 1, 2, 3, 4, 5, or 6.
If you roll a 3, you get 3, 3, 3, 3, 3, or 3.
If you roll a 4,... well, etc.
So, if we want the probability of a 3, it's 8/32 or 22.2%, precisely what is expected. The formula you provided says something to the effect of "the probability of rolling a 3 is 3/16ths or 18.75%" which we know is false.
So the average is
(1+2+3+4+5+6 + 1+2+3+4+5+6 + 3+3+3+3+3+3 + 4+4+4+4+4+4 + 5+5+5+5+5+5 + 6+6+6+6+6+6) / (6 + 6 + 6 + 6 + 6 + 6)
or simplified
((1+2+3+4+5+6)/6 + (1+2+3+4+5+6)/6 + 3 + 4 + 5 + 6)/6
which is just algebra for Average(3.5,3.5,4,5,6) like was posted earlier.
This is all just simple math to show the logic behind Average(3.5,3.5,4,5,6). The difference is one formula assumes 16 events of equal probability and the other assumes 36.
Incidentally, I made a model in C++ that does 1 million random dice rolls (uniform distribution, reroll 1s and 2s once), and the results matched, ~5.5% for 1s and 2s, ~22.2% for 3s-6s, and a 4.167 average.
I stopped lurking and made an account for a math post. Dunno what that says about me.