Screen Shot 2020-12-27 at 9.37.47 PM.png
Screen Shot 2020-12-27 at 9.40.11 PM.png

Customers:

Screen Shot 2020-12-27 at 9.41.11 PM.png
Screen Shot 2020-12-27 at 9.41.29 PM.png
Screen Shot 2020-12-27 at 9.42.24 PM.png

___________________________________________________________________________________________________________________________________

bmww.png
sie-logo-petrol-rgb.png

In which case do we need custom geocoding in Tableau?
Let's say we have two cities, London and New York. To mark them on a map in Tableau is super simple. However, if we need a spatial function in Tableau, we need to feed Tableau the exact coordinates of latitude and longitude. Why? In one word, precision. Let's say we want to measure the distance between two points in New York, we need the exact latitude and longitude. 

Spatial functions can be used for data analytics in Tableau. As of 2022, we have the following spatial functions:
MAKEPOINT() Converts data from latitude to longitude columns into spacial objects.
MAKELINE() Generates a line mark between two points
DISTANCE() Returns the distance measurement between two points in a specified unit such as miles or kilometers
BUFFER() Returns a circular shape with a radius determined by defined distance and units

In my case, I created the spatial data (city latitude and longitude) in Excel and joined it with the non-spatial data on "City name." For example, in the image below, we can see the use of a BUFFER() function for London to New York color-coded in red.

Can we detect patterns in homicides?

The dashboard above, Baltimore Homicide Analytics, tries to detect patterns in weekdays and timestamps (essentially dividing the day in different bins). Find out more in Text Analytics with Tableau.

GIF.gif

Set Theory in Tableau

Set theory is a branch of mathematical logic that studies sets, which informally are a collection of objects - often used in discrete mathematics books. In Tableau, we have:

- Fixed sets

- Dynamic sets

- Combined sets

"Fixed sets" are kind of useless as they are hardcoded members. "Dynamic sets" are most commonly used with top-N or in combination with parameters.

In the image on the right, we have a set with sales >5,000 and a set with profit >1,500. The "combined set" is highlighted in the red square, where observations fulfill the conditions of both sets.

In other words, the combined sets highlight observations which have sales >5,000 AND profit >1,500.

combined_set2.png

Electric Scooter Accident Analytics

This dashboard shows electric scooter accidents in my city, Frankfurt am Main.

The
data is simulated via randomized latitude and longitude (not real data!).

 

In the dashboard, we get a map with an overview of the accidents, an attempt to forecast the number of accidents over the next 3 months, and most importantly, a heatmap showing patterns in month and day of the week.

The
heatmap shows that warm months such as June and July display a high number of accidents. Additionally, during warm months, Monday, Thursday, and Sunday are high accident days.

Link to Tableau Public.

electric_scooter_accidents.png

Sales Target Analytics

In the Bundesland "Thüringen," we hit the sales target by 33%. So all is great, right?

 

Well, when we aggregate sales and sales targets, we might miss nuances. For example, some Bundesländer might simply be above the sales target because of a single best-seller (high risk).

In the case of Thüringen, we can see easily with a deeper drill down, that
the sales target was achieved mainly due to one best-selling product (Tuc).


Visit Sales Target Analytics for more details.

sales.png

Nested LOD's: inheritance

With a nested LOD function, the inner expression in the LOD inherits its dimensionality from the outer expression.

Due to this inherent logic, the two nested LOD's on the right return the same result.

In other words, with this nested LOD, we get the average sales for each customer name by state, regardless of whether we specify "state" again in the nested part.

inheritance.png

Cardinality of Sets

In discrete mathematics, the cardinality of sets refers to the number of distinct elements within a set. In Tableau, the function ATTR() checks the cardinality or converts a dimension into a measure. 

The function ATTR() converts the result into a measure. 

Without ATTR(), the function would return a dimensions such as "republican."

aaa.png

Referential Integrity

Referential integrity refers to the accuracy and consistency of data whint a replationship.

In the image on the right, the related table contains a foreign key value that doesn't exist in the primary key field of the primary table (i.e., in the "CompanyId" field).

This has resulted in a "orphaned record." In other words, let's do not ASSUME REFERENTIAL INTEGRITY in Tableau, if you're not really sure, otherwise the results may not be reliable.

ref.png

How to write better functions?

(1) Give the function a descriptive name so you or somebody else can find it better ...
(2) Use spacing to amek the code more readable and detect mistakes quicker.
(3) Comment to make it clearer for you and others.
(4) Avoid uncesssary loops.

Screen Shot 2021-12-11 at 9.09.46 PM.png
join.png

SQL join calculation

In a SQL join, if we have have no common column, we can create one with a Join Calculation.

With this dataset, we can join on "First Name" + "Last Name" with "Name."

"INCLUDE and EXCLUDE calculations are always MEASURES and as a consequence are aggregated. However, FIXED can be either a dimension or measure."

case.png

CASE vs. IF-THEN-ELSE

For a porameter control, the CASE statement might feel simpler, especially if you know SQL. However, we could also work with an if-then-else statement.

The if-then-else logic is widely known in many programming languages and can also be used in calculated fields. Whily many (simple) problems can be solved with the CASE statement, using IF-THEN-ELSE lets us include Boolean operators and nested logic.

22.png

IFNULL()

Of course, nothing and zero are not the same. IFNULL() is used to change a null value to something else. It's a shorter way of writing an IF ISNULL() ...END statement.

For example, in the function on the right, we return "Email is not known" if there's no email address.

Forecasting in Tableau (without R/Python)

Tableau's built-in "simple" exponential smoothing for forecasting is a powerful alternative to more advanced models. In my personal opinion, we probably can cover 80% or more of use cases without falling back to more complicated models such as ARIMA or Facebook's Prophet.

forecast.png

The MASE (Mean Absolute Squared Error) is one of the most important indicators for our forecasting model. We can think of MASE as the equivalent of R square for regression models. A MASE value of 1.00 means, that our model is not better than simply taking the last value to forecast (a.k.a., the naive approach to forecasting). 

2020-11-27_14-55-16.png

A MASE value of 0.50 means, that we doubled the prediction accuracy compared to a naive approach. Tableau terms such a prediction power as "OK."

Resources:
- "Forecasting: Principles and Practice" (website) is probably the most well-known book on the topic. Requires R.
- "Forecasting and Time Series Analysis in Tableau" (Udemy) is an amazing video course that uses only a bit of R.

Info icons

Info icons in Tableau are particularly useful in providing detailed complimentary information within a dashboard. Info icons are typically created using a combination of custom shapes and tooltips. How it works; whenever a user hovers over the info icon, a pop-up appears (tooltip). 

hhh.png

Resources:
- "Advanced Tableau for Business Analytics" (Udemy) is an amazing course and covers info icons and many more advanced techniques.

Smart design: think like a designer

Well-designed data visualization - like well-designed objects - is easy to interpret and understand. One theory is the zigzag "z" of taking information on a dashboard, at least in the Western world.

2020-11-30_11-20-44.png

I personally have never been into design. "If I see the key number" it's all I need, was my thinking. Time learnt me another lesson though. Users pay a lot of attention to sometimes pixel-perfect design. In other words, content alone is often not enough. I learned it the hard way. Nowadays, I pay a considerable part of my energy on the layout, low cognitive load, colors, location (x and y-axis), and pixels.

After a while, you don't just have your favorite color, but your favorite color is #5e7986 as a background with a white font.

The 4 most important visualization types?

With the following four visualization types, we probably cover 80% of the visualizations. Those are:

1 BAR CHART (categorical vs. numerical)
2. SCATTER PLOT (numerical vs. numerical)

3. HISTOGRAM (numerical)

4. LINE CHART (numerical vs. time)

aaa.png

Data-driven email alerts

I think email alerts are one of the most underrated tools on Tableau's server. As soon as we have a measure on the y-axis, we can create an email alert. For example, below I set an email alert if the volatility is > 32%. In other words, as soon as the volatility goes above 32%, I get an email alert. 

12.png

Email alerts have so many applications such as in sales or profits, when a certain level is reached (high or low), and in statistical analysis.

What type of ranking?

Tableau offers in the Table Calculation RANK() several options:

- Competition (1,2,2,4)

- Modified Competition (1,3,3,4)
- Unique (1,2,3,4)

And RANK_DENSE() which is illustrated in the image on the right.

I personally don't recommend memorizing the different ranking versions. However, I highly recommend keeping the RANK() function in mind. The RANKE_DENSE() function assigns the identical rank to individual values, but it will not skip the next rank. For example, I had several values "62" in the bin with a rank of #1. The bin "63" was rank #2. Without RANKE_DENSE(), it's impossible to visualize the ranking correctly.

2020-12-15_22-16-35.png

How to read a parameterized LOD?

The LOD (Level of Detail) function below labels rows with an X that fulfill a certain condition:

IF MAD or COUNT based on on a certain aggregate fulfill the conditions, then a row is labeled with X. As we know from formal logic, it's an OR operation, not to confuse with AND or NOR. Lastly, two parameters give the user control over the LOD.

2020-12-16_09-38-32.png

Addressing & Partitioning

With Tableau's table calculations, we can define the addressing fields to change the direction of the (table) calculation.

 

 

Partitioning (a.k.a., scope) defines the level or group at which table calculations are evaluated (i.e., cells, tables, and panes). In other words, partitioning defines what data we use.

Addressing fields (a.k.a., direction) defines the direction in which a calculation (i.e., down, across, across then down) is defined.

2020-12-28_16-11-16.png

Percent Change from a Reference Date

With a reference date parameter, we can compare the performance of several categorical values in a pre-defined range. For example, we can compare the performance of Ford, Toyota Motor, and Testa within a certain time range.

This advanced analytics function is often taught in courses for the preparation of the Tableau Certified Desktop Professional.

2020-12-28_21-11-58 (1).gif

Percent change from a reference date can be used in different business settings. For example, the user wants to compare different sales regions or products based on a reference date. To make this work, we need to follow the following three steps:

img.png

I personally believe it's very hard to memorize those steps. Unlike a traditional programming language such as R or Python, it's challenging to make it reproducible. My strategy is to keep a workbook with each step being annotated. The example is taken from Kirill Eremenko's Certified Professional video training (source).

Common Baseline

To me, a common baseline is a visualization technique that can give powerful insights, unlike any other statistical measure. In other words, a visualization of a common baseline can often show the power of visualizations (which is not always the case). 

Let's take a dataset from the Kickstarter project (Kaggle). Since not all projects start on the same day, we don't care about the date a project is launched, but rather how long it lasted. This creates our x-axis with the number of days. 

On the y-axis, we have the amounts raised for each individual project. Now we can gain potential insights from the behavior of Kickstart projects. 

Creating a common baseline is not so easy in Tableau, R, or Python. 

Source: Kirill Eremenko "Tableau Certified Desktop Professional."

2020-12-29_13-47-50.png

Table Calculation Functions: WINDOW_AVG()

In this example, we're looking at salaries in different industries (source: Glassdoor). Analytically, we want to know whether an industry is above or below average. Because a user might exclude certain industries, we cannot hard code average. In other words, the average calculation must be based on the view (or scope). 

2020-12-29_15-49-07.png

Thus, we need a function that dynamically updates based on the view. In Tableau lingo, this is called a table calculation. In particular, the function in the image is based on "_WINDOW_AVG" which is simply:

WINDOW_AVG(SUM([Meidan Base Pay])

In summary, the table calculation WINDOW_AVG() allows us to calculate the average dynamically, based on the user's preferences. We could not achieve this with a hard-coded average formula

International football results from 1872 - 2020

In team sports, the term home advantage refers to the benefit that the home team has over the visiting team. Thanks to Kaggle, we now have a dataset with over 40,000 international football results.

For the analysis, we ignore all games on neutral ground. Then we grouped the results by:

- Home win 
- Home loss

- Home tie

In a rare instance, we are using a pie-chart. From this simple analysis, we can see immediately, that the home advantage seems to be real. In other words, the probability of losing is only around 25%, if the corresponding team plays at home.

That's a pretty powerful insight gained from data.

Screen Shot 2020-12-30 at 10.57.39 PM.pn

Relationships vs. Joins

Relationships are part of Tableau's new dynamic data model interface, which automatically assigns join conditions between related tables without materializing those tables into a single data source.

 

The advantages? The relationships should (1) reduce the upfront data preparation work by utilizing a flexible relationship join. It should (2) reduce the number of data sources pulled in to answer analytic questions, and potentially most important, (3) relationships ensure measures will not duplicate due to join clauses (a.k.a., join risk).

                                     Relationships                                                                                               Joins

rel_vs_join.png

In other words, relationships (logical layer) keeps tables separate but defines a relationship between them while the joins (physical layer) physically joins tables together.

Optimizing Relationship Queries: CARDINALITY

In mathematics, cardinality means the number of different variables in a set. For example, if we have a column "countries" with the USA, Germany, and Switzerland in it, then the set or column contains 3 elements. A high cardinality can become a nasty problem in Machine Learning and visualizations.

In data modeling, the cardinality of the join between two tables is the numerical relationship between rows of one table and rows in the other. Tableau offers one-to-one, one-to-many, many-to-one, and many-to-many.

card.png

In general, I would not change the default settings unless you have a deep knowledge of your database. The default setting is many-to-many and if "one" is chooses and the records are not unique, duplicates will occur.

Optimizing Relationship Queries: REFERENTIAL INTEGRITY

Referential integrity is a property of data stating that all its references are valid. It requires that if a value of one attribute of relation references a value of another attribute, then the referred value must exist (Wiki). 

Unless you really know what you're doing, I recommend keeping the default setting at "some records match." In other words, we don't assume referential integrity.

ref.png

With the "all records match" option, if all values in selected fields have a match in both tables. Joins are optimized with Inner Joins, but records can be removed(missing due to unmatched values. In other words, "all records match" treats records as if there are no null records in matching fields.

Nested LOD's (Level of Detail)

As in traditional programming languages, we can nest several layers of LOD's to answer more complicated questions. There are a few things to look out for. For example, the inner expression inherits the dimensionality of the outer expression. The relationship between the nested expression and the filters is determined by the parent expression (e.g., INCLUDE and EXCLUDE are evaluated after the dimension filter).

The best way to read a nested LOD is from the inside out.

nested.png

In my experience, I more common but simpler nested LOD is something like AVG({ FIXED [State], [Customer Name] : SUM([Sales])}) where we can wrap almost anything "around" the LOD expression. An amazing course on Udemy is called "Advanced Tableau - Level of Detail Expressions / LOD" which goes meticulously in the details of LOD's and nested LOD's.

"LOD planning"

The general idea of "LOD planning" is that we write a LOD (or any other complex function) and test the results first in a tabular dataset. The idea has been around for a while, but to my knowledge, Kirill Eremenko was the first person to give this concept a name.

plan.png

As we can see, the function above gives us the average of the Week of Date.

Once we have tested the function and now it works properly, we can use it in visualizations. The reason for this cautious approach is that errors in visualization can be hard to spot.  In other words, code in Tableau is hard to "read" unlike in traditional programming languages such as Python.

Visualizing changes in ranking

When we want to visualize changes in ranking, slope and bump charts can be extremely useful. Slope charts are useful when comparing changes in ranking between two points in space and time. Bump charts on the other hand, extend this concept across more than two points.

On the right, we can see a bump charts that shows the sales ranking from some US states.

 

Note the simplicity of the view that allows the user to easily see the degree of change as indicated by the angle of the slope.

For example, we can see that "Idaho" improved its ranking from #7 to #4.

The example if from the amazing book "Learning Tableau 2020."

bump.png

Do we have seasonality?

Seasonality can be hard to visualize (see trend line in the red box). But what is seasonality anyway? First, seasonality is generally caused by weather or humans. Second, seasonality is a characteristic of a time series in which the data experiences regular and predictable changes

Ithat recur yearly. Any predictable fluctuation or pattern that recurs or repeats over a one-year period is said to be seasonal.

A potential solution to visualize seasonality is the cycle plot. For example, in the visualization on the right, we compare each individual month with bins as a year range.

In other words, for "September", we compare the sales numbers for September in 2014, 2015, 2016, and 2017. The average is displayed with an orange dotted line.

We can immediately see that September has the highest average sales. Other strong seasonal months are December and June.

season.png

Algorithmic complexity (Big O notation) in Tableau terms

For Data Scientists and data analysts, the first three kinds of algorithmic complexity are probably enough knowledge. Computer scientists call this the Big O notation which quantifies the performance of various algorithms as the input size grows.

The following are approximations. Let's say you have 10 million rows and it takes your dashboard 10 seconds to load.

O(1)

Stands for a constant time and is the fastest version (though unlikely to see in the real world). In our case, if you have 100 million rows, your dashboard still needs only 10 seconds to render (i.e., no change, thus constant).

O(log(n))
Complexity grows logarithmically. Let's say you change your dataset from 10 million rows to 100 million rows. Now your dashboard takes like 30 seconds to render. It's slower than constant but faster than linear (next).

O(n)

Complexity grows linearly. Loading 10 million rows takes 10 seconds and loading 100 million rows takes 100 seconds.
 

O(n^2)

Anything more complex than linear is probably not going to work in Tableau. For example, quadratic is most likely going to crash your dashboard.  

Can we combine live + extract data in Tableau?

Challenge: you need your data live. However, you also need historical data in your Tableau dashboard. Problem: your historical data is really big (e.g. 20 GB which is big, if needed in memory). One potential solution is to combine live and extract. It's simply a matter of connecting to both, a live connection and a hyper file. 

Table-Scoped LOD (Level of Detail)

Sometimes we need to define a level of detail expression at the table level without using any scoping keywords. For example, in the following view we want to see each country's GDP divided by the total GDP. In the case of China (as of 2018), China contributes around 16% of the total GDP. However, as I'm drilling down on the Asia region (not visible in the image below), I can see that China's contribution is over 50%.

table_scoped_lod.png

Fun fact: LOD's depend on dimension fitlers, but table-scoped LOD's do not. In the case above, my table-scoped LOD completely ignored my date filter, even though it's a dimension filter. 

Extracting email addresses with RegEx

Regular expressions (RegEx) use special characters to define search patterns and find/replace text. One example of such a use case is extracting the email address from the text. RegEx is available in many programming languages and whole books have been written on the subject. 

regex.png

Unless you're using RegEx regularly, you probably don't want to learn RegEx in detail. One nice trick is to go to www.regexr.com and test it our RegEx expression (tip from Dustin's awesome Advanced Tableau Training). The image below shows regexr.com. (1) I past in a sample text. (2) I test the RegEx function and see (3) the highlighted text (the correct one).

regexr.png

Regular expressions (RegEx) use special characters to define search patterns and find/replace text. One example of such a use case is extracting the email address from the text. RegEx is available in many programming languages and whole books have been written on the subject. 

Boolean Algebra

I think that the study of Boolean algebra can help tremendously in thinking smarter. In essence: "The idea of Boolean algebra is to formalize a mathematical approach to logic." In Boolean algebra, values are either true or false (1 or 0).

There is a very important relationship between Boolean algebra and set theory. If we interpret A and B as sets, then we can think of "A and B" as the set of elements that are in A and are in B, i.e, the intersection A ∩ B.

In Tableau, we have three Boolean operators:
- AND
- OR
- NOT

Because there are only two possible values (true and false), we can record all Boolean operations in a table. We can use Booleans for simple visualizations (e.g., highlighting) to algorithms (e.g., decision tree).

Cannot mix aggregate ...

 

In Tableau, one of the most annoying errors is "Cannot mix aggregate and non-aggregate arguments with this function."

Sometimes, a potential solution is to add a Fixed LOD to get around the error message.

On the right is a potential fix where we add a Fixed LOD in order to make the calculation valid.

Picture1.png

Scatter plot:  country vs. country

Visually, this scatter plot with X vs. Y can be pretty useful in a drill-down.

Technically, it's "simply" a combined set.

Country Set 1 and Country Set 2 are filtered with a Combined Set (full join). The part I spent most time is that those two sets are "Show set" and NOT "Show Filter." The X vs. Y approach does not work if they are set as filters.

show_set.png
comb_3.png

Special thanks to Tableau Zen Master Toan Hoang from Tableau Magic.

Filter Menus

Filter menus enable users to show/hide filters on a dashboard. Using filter menus helps conserve dashboard space, enhance usability, and create a more interactive user experience. We can use the Tableau default icons (pretty nice) or create or own designs. 

Important: in the dashboard, we can only drag the filter to a FLOATING CONTAINER.

filter_menus.png

Dynamic Sorting (not to be confused with parameter sorting)

Dynamic sorting allows us to sort by any variable.

Attention: in case we're using ATTR(), we're losing the dynamic sort feature.

dyn_sorting.png

The TOTAL() function:

The TOTAL() function can help us getting totals. For example, TOTAL(AVG(Sales)) gives us the average based on the selected partitioning. 

In my sample case, the average was calculated based on the category. Attention, the average for each sub-category would be different.

An awesome video tutorial offers Tableau Tim.

RANK(INDEX())

 

 

 

A great idea from Tableau Tim: let's say we always want to show the last 6 months. We can do this with RANK(INDEX()).

 

total.png
rank-index.png

Nothing and zero are NOT the same

If we have 100 rows at 5 each, 10 NANs (no value), and want to take the average, then Tableau calculates the averaged based on the available data. In other words, 90 x 5 / 90 =  5. However, if we replace the NAN's with 0, then we get an average of 90 x 5 + 10 x 0 / 100 = 4.5. 

We can achieve this with Tableau's function ZN(Variable). It all depends on what we want to achieve.

 

END

geo.png