How Predictive Big Data Fails

Recommendations from big data predictive methods are a lot like the use of epidemiology to determine the cause of a disease:

  • Both rely on the statistical analysis and correlation of massive amounts of data.
  • Both are indicative that something is happening.
  • Both can offer insights into a phenomenon.
  • Both are easily derailed by unknown confounding factors.
  • Both are no stronger than the weakest data set.
  • Both make predictions that — even if totally accurate — may be valid for categories, but not for individuals.
  • Both are valuable in helping select investigative paths that need further work to validate the group decision.

In short, current big data predictive methods can be valuable tools for designing marketing and sales campaigns for targeted sub-groups of consumers, but fail at recommending specific items to individual consumers.

Collateral Damage Recommendations

Much effort has gone into targeting demographic groups: Boomers, GenX, Millennials etc. Market research has been good at finding average, aggregate characteristics and differentiating major demographic groups from one another.

But averages and aggregations are abroad side of a barn. No one ever sells a product to the barn. Sales are made to the people inside the barn.

This could be termed “Collaborative Target Over-Aggregation” and is nothing more than collaborative filtering with more data: “Some people inside this barn also bought the same products as some other people inside this barn.”

And no matter how finely data can narrow the focus to smaller and smaller barns and sheds, individual recommendations are just a shot at the siding with hopes of accidentally hitting the individuals inside.

Demographic segmentation is valuable when creating mass market promotions and advertising, but fails when companies attempt one-to-one sales and recommendations

Dirty Data

It is no secret that there is no such thing as clean data. There is dirty data and dats that is less dirty.

The dirtiest data for recommendation comes from sources where a consumer expresses a quality rating. No matter whether the system is stars, a 100-point rating or any other variation, psychological and life experiences alter how people interpret interpret a 92 or two stars.

To dirty the data even furthers, many sites such as Yelp, Open Table and every available wine wine app allow consumers to view average and previous ratings before offering their own. Social and peer pressure work to bias data that could have been useful.

In addition, many sites like those previously mentioned are “a mile wide and an inch deep.” They collect many thousands — or even millions — of total users.

Hands-on experience with one prominent and highly visible wine  site indicated that more than 93 percent of users were never engaged enough to post more than 10 times. Lack of engagement came primarily from the users’ overall perception that they received no tangible benefits to justify the time and effort (friction) they expended on the ratings process.

… to be continued.