Sorry to use too much mathematical jargon. Let me see if I can translate some of this so that a person without a background in statistics can follow it.
This voting system actually results in an extremely complex multiple stakeholder decisionmaking problem. To be fair and accurate, it really requires the data to be analyzed in such a way as to determine a value curve for each voter, then reassess the scores based upon that value curve, then (assuming equal shareholder weights) decide on a method to weight vote volumes.
This is just trying to define the basic theory that applies. A branch of decision making analysis called utility theory examines the consequences of people's choices when they are forced to make decisions based on a deterministic system -- like when assigning scores to a ranking of products. Although ranking systems are intended to be linear (each "step" in the score is worth the same value), people's individual value systems are not: the difference in quality between a "5" and "6" product that you vote for may not be the same as the difference between a "9" and a "10" product. This is what messes up strictly numerical rating systems, particularly at low numbers -- a "2" product is exactly twice as good as a "1", but a "3" is only 1.5x as good as a "2". Value theory (and it's extension, utility theory, which deals with uncertainty -- but really doesn't apply here, since there aren't any "maybe" answers that include uncertainty) is designed to correct desicion weighting -- the scores that people given -- based on their perceived value of the score. Each individual has a different value curve -- for one person, a "5" might be twice as good as a "4", while for another it might be 4x as good. Value theory enables all those scores to be compared equally -- without the correction, you're comparing apples to oranges, in essence, an it is possible for some people's votes to carry more weight than others.
But value theory is fairly complicated to apply, because it requires evaluating a set of tradeoffs for each person, so we can't apply it directly, here. What we can try to do is come close, with the goal that each person's vote essentially carries equal weight, and that products are ranked by their quality, not just popularity (or why else have the 0-10 rating?).
Central to this concept is the fact that individual rankings of a product don't directly assess the overall quality of the product -- you're actually estimating the quality of the product from a sampling of people that have used the product. Done correctly, you'll estimate the real quality of the product within a certain margin of error (essentially what polls do when they sample X number of people and report an answer +/- a certain amount).
- Delete obvious outliers (someone who votes "1" on everything, then "10" on one product).
First, get rid of anyone trying to screw up the voting system, by having a non-regular voting pattern (which Morrus is already doing). The "average" voter will have a certain voting distribution that can be described mathematically, within a certain margin of variability. Anyone who falls well outside that can be assumed to be trying to fix votes and should be deleted.
- Norm each individuals scores to an N(0,1) distribution before they're calculated. Individual values may not actually be normally distributed, so this is a simplification, but when we start adding votes the Central Limit Theorem will kick in and we'll end up wth a normal distribution anyway. This norms out individual biases.
This concept is a little difficult to follow if you haven't had stats, but essentially when two people rate a product, their rating's aren't equal. Even if you have a 10-point rating scale, no two people are going to use the entire scale in the same way (because of the value information I presented above). Some are biased toward high scores, some are biased toward low scores, some might have a tight grouping (only score 4-6, for example), others might use the whole range. Differing variance and mean (voting bias) can skew results, and cause certain poeple's votes to effectively carry more weight. With a large enough sample of votes, this tends to be reduced somwhat -- but why not correct it right off the bat?
We can "correct" everyone's votes so that everyone uses the same distribution -- a normal (Gaussin, bell) curve with a mean (average) score of 0 and a variance of 1 (ie, N(0,1)). If you take all the ratings a person gives, calculate the mean and standard deviation of those scores , you can arrive at a corrected score for each product by taking the individual's score, subtracting the mean score, and dividing by the standard deviation. This generates a set of scores that range from -4 to +4, distributed along a bell curve -- and if done for every individual, their scores will be distributed along the same curve. A -4 correleates to the intended lowest score, +4 to the highest, and the middle value -- 0 -- will now correlate to the intended "average" score: 5.
That way, when you add their scores, you've eliminated individual bias, to ensure that everyone's score means the same thing.
- Sum normed votes. Calculate the required minimum votes to be a statistically representative sample of the population, and use that as a cut-off. A product with fewer votes than than is ineligible to win regardless of score. This ensures that enough votes are gathered to be a representative sample, while still allowing less familiar products a chance to complete.
A sample -- set of votes -- has to be big enough to generate a truly representative sampling. If a product has sold 1 million copies, for example, and you only get 10 ratings, are those ten truly indicative of the quality of the product? Or did only the biggest whiners/fanboys vote?
Morrus has established a cutoff, which is good. You can calculate an exact number needed, based on how accurate you want to be -- but for our purposes a swag estimate will probably work.
- Compare the remaining products in each category via a one-factor analysis of variance, to ensure that the winner in each category is actually the winner (outside of the margin of variance). If so, you have a winner. If not, you'll have to either narrow the voter pool and reanalyze, or else declare a tie.
We want to make sure the winner is really the winner, without question. Say, for example, you have Product #2 that gets 10 votes, all 5's (we'll ingore norming for the moment, and assume these are normed scores). The total score is 50, mean 5. Product #2 gets 4 10's, 4 2's, and 2 1's: total 50, mean 5. Product #3 gets 5 7's, a 6, 2 2's, and 2 1's: total 47, mean 4.7. Who wins? Strictly by total or mean score, 1 and 2 tie, both slightly better than 3 -- but is that how we should judge it? Product 3 has more "above average" scores than either of the other two products, for example. Because of variance, the apparent winner may not be the actual winner.
As sample sizes get very large, it's possible to construct scenarios where widely varying scores are actually the same due to variance. That's the purpose of the ANOVA, to test that the winning score is actually statistically different than the others. There's more to it than that, of course -- I'm trying to avoid any deeper discussion. The point is -- make sure everyone's vote counts equally, and that the winner is really far enough ahead to be the winner.
There's quite an involved science behind ratings and evaluations. Multiple voter (also know as stakeholder) systems which involve individual rating schemes are one of the most complicated systems to get to work in a truly fair manner -- be glad elections are usually held on a "one-man, one-vote" plurality/majority system.
If you're interested in more reading about decision making and value theory, there's a great little book written purely in layman's terms, called
Smart Choices, by Hammond, Keeney, and Raiffa.
Hope I haven't bored everyone to tears. Thanks for bearing with my pedantry.