Franco Arda, Frankfurt am Main (Germany)                                                                                                       

  • Franco Arda

Anscombe's Quartet: The danger of relying on numbers alone ...

For Data Science, the famous statistician John Tukey summarized it best:

“The greatest value of a picture (visualization) is when it forces us to notice what we never expected to see …”.

Anscombe’s quartet “proofs” this nicely.

The danger of numbers or summary metrics alone is, that they can be misleading.

To demonstrate this effect, statistician Francis Anscombe put together four example data sets in the 1970s. Known as the Anscombe’s quartet, each data set has the same mean, variance and correlation. However, when graphed, it’s clear that each of the data sets is totally different. The point Anscombe most likely wanted to make is that the shape is as important as the summary metrics and cannot be ignored in data analytics.