• Franco Arda

ETL with Tableau Prep and Python. Very exciting?

For tabular data, Pandas is probably the very best data wrangling library for those coding in Python. We can mix data types (floats and strings), pivot, merge, concatenate and add columns based on some logic for algorithms. In short, Pandas rock! But attention, there's a steep learning curve with Pandas. And knowing Python (intermediate level) is a must. Let me rephrase that; in order to learn Pandas, you must know Python. Knowing Python doesn't mean you know Pandas.

Allow me a short example, why Pandas is so cool. In (1) we have four lists which we (2) bring together and our output (3) is a data frame (tabular data). Pretty cool?

For a demo, I wanted to load an CSV file into Tableau Desktop. Unfortunately, the CSV file is huge (tax fairs on Kaggle) with several gigabytes. The file misses an index, so I can't filter by an Index. Any other form of filtering would have skewed my data. So what I did is loading the CSV file in Tableau Prep, adding a column "Index" and writing a short script creating an Index. For a detailed explanation, this post by Joshua Milligan is excellent.

The ability to use Python in combination with Tableau Prep is fairly new. It was released with 2019.3 in 2019. I strongly believe that we have a many use cases. I'll soon tackle more challenging problems. Franco Arda


Franco Arda, Frankfurt am Main (Germany)