• Franco Arda

Spark + R + Tableau for Big Data

Combining Apache Spark, R and Tableau can interesting, if we have a complex ad hoc data challenge.

For example, we connect R with Spark using sparklyr (amazing package), add statistical tools in R (e.g. probability of flight delay), and then use Tableau to deliver the analytics to users.

For R users, the huge advantage of this approach is that it works nicely with the Tidyverse family. All family members share an underlying design philosophy, grammar, and data structure. This can be incredibly valuable for a Data Scientist.

Combining Spark, R and Tableau empower us to do large-scale Data Science. In our ad hoc example, Tableau users can dive deeper into analyzing why certain flights delay.

The same team that created the Spark package for R has also recently published a book called "Mastering Spark with R." In my view, the added value from the R-Studio team to the Data Science community cannot be overstated. I personally believe that R is better suited than other programming languages for Spark Data Science problems. The reason is Spark problems tend to be data wrangling, statistical problems, or statistical Machine Learning challenges. To me, this is where R shines. Granted, if you need to tackle Deep Learning problems, Python is way superior.

R to Tableau Prep Builder for Big Data

My example is for an ad hoc solution. If we need a complex statistical calculation in production, we need to change our approach. Since we're dealing with really big data, one approach might be to feed the R code into Tableau Prep Builder (not Tableau Desktop!). The ability to feed R scripts into Tableau Prep Builder is fairly new. The advantage of finding R scripts directly into Tableau Prep, for non-live connections, is DASHBOARD PERFORMANCE. Connecting Tableau Desktop directly to programming scripts (R or Python) can slow dashboard performance down considerably. Add big data to the picture, and the dashboard gets really slow.

By feeding R scripts directly to Tableau Prep Builder, we can have fast Tableau Dashboards. Franco Arda


Franco Arda, Frankfurt am Main (Germany)                                                                                                                 franco.arda@hotmail.com