INCONSISTENCY: Rating The Rating Systems

How you ask a question can determine the answer you get.

The concept lies at the heart of journalists, scientists, those who interview job candidates, police detectives, opinion pollsters … and wine ratings.

In many settings — especially those where the vino-cognoscenti gather — the question, “What did you think of that wine?,” will frequently elicit a number between 70 and 100.


Numerous scholarly articles as well as those in the general media have pointed out that, even if the 100-point scale were a good numerical system, the actual ratings are so inconsistent among the best experts that the numbers expounded are mostly worthless. (Wine-quality scores mostly random, fail to be repeatable.)

Scores on the same wine by the same taster frequently differ from time to time: How close are repeated wine-quality scores?

One of the most thorough round-ups of rating flaws was recently published in The Guardian: “Wine-tasting: it’s junk science” which notes that, “Every year Robert Hodgson selects the finest wines from his small California winery and puts them into competitions around the state.

“And in most years, the results are surprisingly inconsistent: some whites rated as gold medalists in one contest do badly in another. Reds adored by some panels are dismissed by others. Over the decades Hodgson, a softly spoken retired oceanographer, became curious. Judging wines is by its nature subjective, but the awards appeared to be handed out at random.”

Hodgson’s most recent study (as are his oprevious ones) was published in the Cambridge University Press’s scholarly publication, Journal of Wine Economics An Examination of Judge Reliability at a major U.S. Wine Competition

Also see:


The 100-point wine-rating scale also carries psychological baggage from school where below 70 is an F and 100 is an A+.

Those rating/grade associations solidify the false perception that a rating is an objective, unassailable judgement on quality.

Beyond that, what is the “meaning” of an 83 versus 85 or 87? Is the “meaning” biased by personal expectations? Grades in school or performance at work?? What a wine “deserves?”


Alternative scales — the substitution of 10 or 5 point scales as well as stars and other icons — have found widespread acceptance in wine as well as in other consumer products. But they also have psychological biases, possibilities for misinterpretation (“what does it mean?”) .

Some eesearch shows that when confronted with an odd-numbered scale, respondents tend to cluster around the neutral point with a bias to the positive. This reflects potential anxiety over extreme positions and a tendency to avoid those.

This means that a middle point offers no guidance in  in a “buy versus not buy” situation, the

On the other hand, bias may occur at the top or bottom of the scale depending upon context and whether or not the rater psychologically  wants to reward or punish the product or company. See: The Problems with 5-Star Rating Systems, and How to Fix Them

This link explores the pros and cons of even- versus odd-numbered scales.The text, below, is excerpted from that page.

Disadvantages of odd-numbered (Thumbs up/down) scales

  • People may be less discriminating in response (respondents don’t take time to carefully consider all of the various response categories)
  • May not be collecting accurate responses (the mid-point can mean different things to different people)

Advantages of even-numbered scales

  • People may be more discriminating, be more thoughtful
  • Eliminates possible misinterpretation of mid-point

The biggest problem with rating systems…

…is that they have too many biases for too many unknown reasons to be accurately used for recommendations especially for products “of taste” such as wine, books, and movies. That is why Netflix and other savvy companies have moved to “big data” which tries to make recommendations on the basis of some measurable action such as watching a movie, buying a book or bottle of wine.

This moves the process into the realm of the current reigning paradigm: collaborative filtering.

The problem with big data collaborative filtering