How to add jitter in R and Tableau?
Overplotting can happen with discreteness in smaller datasets. In other words, because points overlap, they might not be visible. The solution is to jitter which adds a small amount of randomness so that as many points as possible are visible.
In R, we can add jitter with geom_gitter:
The dataset is taken from ModernDive, which is a toy dataset displaying teachers' beauty score vs. teaching score. In other words, is there a relationship between the looks of a teacher and what score they get. As we want to compare two measures, the scatterplot is perfect for displaying the scores. On the X-axis is the beauty score and on the Y-axis the teaching score.
In the following ggplot in R, I don't use jitter. Check the upper right corner and how it changes when I add jitter:
Now I've added jitter. More points become visible:
If we need the visualization in production to be consumed by many users, one great possibility is to use Tableau. Tableau does not have jitter out of the box, so we have to add jitter manually. There are different ways of adding jitter in Tableau. Here's my take. First, we need to create a Calculated Field adding randomness:
RANDOM() is an undocumented function in Tableau. To make the function re-run, we use a trick with RANDOM() * 1 (thanks to Toan!). RANDOM() returns a random number between 0.01 - 0.99. To make it smaller, I've divided it. It's hardcoded, but we could also parameterize it, even though this might be an overkill for some users.
Next, I've created a new Calculated Field where I add the Random Noise to the measure.
The scatterplot in Tableau without jitter:
The scatterplot in Tableau with jitter: