Franco Arda, Frankfurt am Main (Germany)                                                                                                                 franco.arda@hotmail.com

Search
  • Franco Arda

Coding in Tableau: 50 Questions & Answers

What’s the difference between TODAY() and NOW()? TODAY() returns a date such as 01.01.2020 NOW() returns date and time such as 01.01.2020 8:52:09 PM You have 100 days of IoT machine [Temperature] data. You want to get the maximum temperature for the last 10 days. How can you write it?

The following function returns the maximum of the temperature over the last 10 days including today: WINDOW_MAX(MAX[Temperature], -10,0)

You create an Inner Join between two tables. What’s a potential risk? The risk of joins is called “join risk” and refers mostly to creating duplicates. What’s the difference between a Dimension (categorical data) and a Measure (continuous data)?

Measures are numerical values that mathematical functions work on. For example, a sales revenue column is a measure because you can find out a total or average of the data. Dimensions are qualitative and do not total a sum (e.g. countries, gender, customer ID...).

Which LOD functions are always a measure? Only EXCLUDE and INCLUDE are always a measure. You have two measures: age and weight. What chart type makes the most sense? Scatter plot is an essential visualization type for understanding the relationship between two measures. In the Scatterplot, we plot an X value against a Y value. It shows the dependency of the Y variable on the X variable and correlation. Regression lines can be applied to show trends in the data. You want to add a numeric goal to a visualization – what can you do?

The Constant Line is an excellent way of representing a goal.

What are your favorite color combinations for visualizations?

Mine's e gray, orange and blue.

In your opinion, why are bar charts so popular? It’s quick to compare information-revealing highs and lows at a glance. Bar chars are especially effective when you have numerical data that splits nicely into different categories, so you can quickly see trends within your data.

I have a [Date] column with 01August2018 and I want I new column only with the year. How can I do that? Right([Date], 4) which gets us 2018 in this case.

Can you remove join duplicates with an LOD? Yes. The function could be like {FIXED [Dimension]: MIN[Measure]} Can you explain data densification to me? It’s essentially turning sparse data into dense data, but I’ve never seen a use case.

What does the following function return REPLACE ([Customer Name], “PhD”, “Ph.D.”)? Ph.D. What’s your first goal with a Dashboard? A good dashboard should be interactive and/or support fast decision making. Let’s say you have a column [Month] such as 5,6,7 and a column [Year] such as 2018, 2019, 2020 …both columns are Strings. How can yon concatenate them with a Calculated Field? First I use the DATE() function and then concatenate both STR(). This could look like: DATE(STR([Month]) + “ -1-” + STR([Year]).

If today was 2/3/2000 in the column [Date], then DATEPART (‘month’, [Date]) would return what? DATEPART() returns an integer. In this case, it returns 2 as in the 2nd month of the year. If today was 2/3/2000 in the column [Date], then DATENAME (‘month’, [Date]) would return what?

DATENAME() returns a string. In this case, it returns February, as the month of the year.

Does the Dimension Filter influence the Exclude or Include LOD? Of course, because Exclude and Include are both based on measures. Thus, the Dimension filter affects both LODs.

Do you use Containers in Dashboards? If yes, why? I hope you do. I believe it’s a sign of Tableau proficiency. It’s the first thing I do when I start building a Dashboard. The purpose of containers is to group elements inside a shared space and allow me to have better control over those elements.

How does the color gradient change for a dimension vs. a measure? Finally a good question again. For a measure, Tableau gives a divergent palette. For a dimension, it’s different colors. When are hierarchies in dimensions important? It’s pretty important for geographical roles or dates.

In a SQL join, if we have no common column, what can we do? We might be able to create a Join Calculation. For example, if we have in one table “First name” and “Last Name” and in the other table “Name (first and last)”, we could create a Join Calculation such as [First Name] + “ “ + [Last Name]. Are FIXED LODs impacted by the Dimension filter? I believe you’ve already asked a similar question, but the answer is no. Why? Because a FIXED LOD can be a dimension or measure.

How to read a Nested LOD Expression? { FIXED [State] : AVG ({FIXED [Customer Name] : SUM([Sales])})} Calculate the FIXED SUM of Sales by Customer 2. Take the average of #1 at the level of State.

I want to calculate the number of days between [Shipe Date] and [Order Date]?

DATEDIFF('day', [Order Date], [Ship Date]). This gives me the number of days such as 12 (days). You have a [Postal Code] of 98103. What does the function "LEFT([Postal Code],2)" return? It returns the first two digits from the left. i.e. 98. What does the function CONTAINS([Product Name], “Phone”) return, if the [Product Name] is “Apple iPhone” or “Apple iPad.” TRUE and FALSE. What does the following function return? DATEADD(‘month’, 5, #08.08.2018#)

It adds 5 months to the date which gives 08.01.2019 (or in the USA 01.08.2019).


The user wants a filter to see the last 12 months or shorter – how can you achieve that? I can add a function called LAST(), set it to filter and set ‘at most’ at 12.


You have a string 9.6.98 [Original_Date] and create a function DATEPARSE(‘d.M.yy’, [Original_Date]) with the name New_Date. What does New_Date return in this case? New_Date creates a date 03/06/1998 (in Europe 06/03/1998).


When would you use a CASE statement instead of an IF-THEN-ELSE Logic? The CASE statement is particularly handy for parameterized functions. In this case, a CASE statement is much simpler than an IF-THEN-ELSE logic. Of course, the IF-THEN-ELSE logic is much more powerful and allows for Boolean operators and nested logic.

What’s the risk of selecting “Assume Referential Integrity”? If your data doesn’t meet the referential integrity condition, your results might not be accurate.

I suspect that in creating a join that I created duplicates (a.k.a. join risk). I use the following LOD to remove duplicates: { FIXED [Category] : MIN([Target])}. Could that work? Potentially as a fixed LOD allows you to remove duplicates. How to add an index in Tableau?

INDEX() and added to rows.

You want to rank SUM[Profit] in descending order. How do you do this? RANK(SUM([Profit]), 'desc') How do you write a Fixed LOD which counts the distinct orders per customer ID?

{FIXED [Customer ID]:COUNTD([Order ID])} Write a fixed LOD that returns the first order date. Use [Customer Name] and [Oder Date]

{FIXED [Customer Name]:MIN([Order Date])} What does the following function return? { COUNTD ([Customer ID])} This is called a tabled scoped LOD and returns the count of distinct customers. Scoped LODs use always FIXED. What does the following function return? [Profit] / SUM([Sales])

An error. We can't mix non aggregated with aggregated data. In other words, we are trying to compare each row of profits with the sum of profits. That doesn't work. What does the following function calculate? AVG[Profit] / [Sales] First calculates the ratio of sales to profit for each row in your data, and then take the average for all rows. What does the following function calculate? Sum([Profit]) / Sum([Sales]) The function first calculates the total profits and total sales and then divides the first value by the second value.

Let's say you have [Meat Consumption] in kg/year/per capita. You want to simulate different scenarios based on demand. So add a Parameter [Reduction Rate]. How do you write the Calculated Field? [Meat Consumption] * 1 * (1- [Reduction Rate])

I can add this function (Calculated Field) as a new column. In the superstore dataset, I have [City] as a Dimension (or categorical data). There are hundreds of cities. If I drag [City] to rows and add COUNTD([Cities]), what do I get? It returns the distinct number of cities which is 1 for each city name.

In the superstore dataset, I have [City] as a Dimension (or categorical data). There are hundreds of cities. If I drag [City] to rows and add COUNT([Cities]), what do I get? It returns how often the city has been in a row. For example, Los Angeles 747. Which of the following aggregation types cannot be applied to a dimension? COUNTD(), COUNT(), AVG(), MAX(), MIN().

As categorical data is not countable, we can’t take the average (AVG) of categorical data. Write an efficient Calculated Field which returns: AVG[Shipping Cost] if below 25, then it's "Low Costs, If between 25-150 then "Mid Cost", and if it's above 150 then "High Cost."

IF AVG[Shipping Cost] <25 THEN "Low Costs" ELSEIF AVG[Shipping Cost] >150 THEN "High Cost" ELSE "Mid Cost" END I want to write a KPI. What's a potential problem with the following SUM([Profit] / [Sales]) You calculate first the row-based KPI and then the sum. That's wrong. What does the following function return, if Email is "NAN"? IFNULL([Email], "Email not known") That's a cheeky question. It's "NAN". Can you write a fixed LOD that takes the average of [City], [Customer Name] divided by the sum of [Sales]?

AVG({ FIXED [City], [Customer Name] : SUM(Sales)}) You have [Age] between 20 - 80. Convert it with a function to a dimension?

IF [Age] > 60 THEN '60 - 80' ELSEIF [Age] <60 AND [Age] >35 THEN '35-60' ELSEIF [Age] <35 THEN '20-35' END How are LOD expressions different from Calculated Fields? With regular Calculated Fields (a.k.a. functions), measures are always aggregated to match the view. With level of detail expressions (LODs), a scope for the calculation can be specified. For example, we can include measures that are not in the view or exclude measures from the view. You want to join two tables. [Table_1] values are “AA1234” and [Table_2] values are “1234”. How can you join them?

We could create a “join calculation” such as RIGHT([Table_1],4) in [Table_2].

Why would you add a Parameter in a Custom SQL?

By passing parameters to a Custom SQL query, you can solve issues such as joining two different SQL data sources which could not be simply joined.

What's ordinal data? Categorical data that is ordered: S/M/L/XL, mild/hot/very hot ... You bin age into groups 20-40, 40-60 and >60. What do you get, a measure or dimension?

When we bin a measure, we create a dimension. This is also known as discretization.

You need a function that returns "OFFICE HOURS" or "NOT OFFICE HOURS." Office hours are between 07:00 - 19:00. Use [DATE] as a dimension. IF DATEPART(‘hour’, [DATE]) > 7 AND DATEPART(‘hour’, [DATE]) < 19 THEN “OFFICE HOURS” ELSE “ NOT OFFICE HOURS” END You need a FIXED LOD that shows true, if [Customer Name]'s sum of [PROFIT] is bigger than 0.

{ FIXED [Customer Name] : SUM( [Profit] ) } > 0 A fixed LOD can be a measure or dimension. True?

Yes. A fixed LOD is impacted by (1) the context filter, (2) measure filter and/or (3) dimension filter?

Only by the context filter. You put the following function on color. What do you get? SUM([Sales]) - WINDOW_MAX(SUM([Sales])

Max sales in the window in color.

You have customer age 20 years - 80 years. Age is obviously a measure. How could you make it discrete? Create age bins such as 20-40, 40-60 and 60-80. Can you explain to me the difference between numerical and categorical data? Numerical data can be measured while categorical data can only be counted.

In the column [Name], you have "Abraham Jacobs." How do you extract the first name? SPLIT([Name], " ", 1) How can you fix "18-09-04 8:42:15" to "09/04/2018 08:42.15"? DATEPARSE(yy-MM-dd h:m:s',[Original Date]

How can you fix "6.2.98" to "03/06/1998"? DATEPARSE('d.M.yy', [Original Date]

Solving the problem of missing data is a core skill for data science. What is the technical word?

The filling is known as imputation.

Do you know a mathematical formula which can describe the growth of a product, that starts slowly, then has a rapid phase, followed by a smooth tapering off as the final size is reached? Sigmoid function.

Can you give me an example where you would apply a cross-database join? Let’s say your budget numbers are in Excel and your sales data are in an SQL database. In such a case, a cross-database join can help to combine the data from two completely different data sources. Can you create a set from measures, dimensions or both?

We can only create sets from measures such as companies with a ROI >10% or below.

When would you use a line chart? This is useful for seeing trends and evolution. When you have multiple lines, you can easily compare the values at each time point. Ideally, we have one or more measures and a date (or alternatively an index). Which is probably the most common chart type? The bar chart. It’s perfect for comparing values between multiple columns.

When would you use an area chart? An area chart is useful for seeing a global trend and the proportion at each time point.

What’s a heat map? A heat map is a table, but potentially better due to its color-coding. Think of an Excel file with color filling.

What’s a treemap?

A treemap is a hierarchical representation, with nested rectangles, that gives us a quick idea of the number of values and the proportion of each of them.

When would you use a pie chart? If we must use a pie chart, at least we have to sort the values and only use a limited number of measures (max 5?). Often, a donut chart is a better choice though.

Uber tells you to visualize the best places to wait for passengers based on day of the week, seasonality and special events. How would you visualize this? A density map (heat map) might help us identifying locations with greater or fewer numbers of data points.

How can you extract the year from a column [Date]? Let’s say the date format is “14Feburary2020”? RIGHT([Date],4) extracts the last four digits.

You want to write a Boolean function that classifies ‘Phones’ and ‘Art’ as interesting and ‘Storage’ as boring. The dimension is [Sub-Category]. How would you write this function?

IF ([Sub-Category] = ‘Phones’) OR ([Sub-Category] = ‘Art’)

THEN “interesting”

ELSEIF ([Sub-Category] = ‘Storage’)

THEN “boring”

END

What’s the advantage of LOD expressions?

Level of detail expressions allow us to calculate aggregations that are not at the level of details in the visualization.

You want to create an email alert if the Standard Deviation is >28.50%. On which axis must Standard Deviation be placed, in order to be able to create an email alert? In order to create an email alert, we need a measure on the Y-axis.

What’s a density chart? A density chart shows us the density of our marks. The superposition of multiple marks determines the color intensity. In business settings, I’ve rarely seen it though. What’s a special about a “group”? One is either in or out. A group is static. A group can be nationality or gender.

What’s special about a ‘set’? Sets are dynamic. I can be in a set of BMI >25. After losing some weight, I might be in a different set such as BMI <25. Can INCLUDE and EXCLUDE also be dimensions?

No. They are always measures and as a consequence are aggregated. If your data has a positive skew, the mean is usually greater that the median. True or false? With a positive skew, the extreme values are to the right while most of the data is to the left. So the answer is true. How would you describe 1, 2 and 3 standard deviations in terms of likelihood? - LIKELY to be within 1 standard deviation (68 out of 100) - VERY LIKELY to be within 2 standard deviations (95 out of 100) - ALMOST CERTAINLY to be within 3 standard deviations (99 out of 100) Discrete data is counted and continuous data is measured. True or false? True. Can you explain the difference between univariate and multivariate? Univariate is one variable measured over time. Multivariate measures multiple variables over time. Can you explain geocoding to me? Geocoding is the process of converting addresses (like a street address) into geographic coordinates (like latitude and longitude), which you can use to place markers on a map, or position the map. Reverse geocoding is the process of converting geographic coordinates into a human-readable address.


What defines a good graph? - Self-explanatory: a good graph doesn’t require much explanation from the analyst nor much thinking from the user. - Contains only the necessary information - Focuses only on 1-2 problems - Supports fast understanding and fast decision making What is so cool about Parameters? Parameters are dynamic values, which can inject into calculations or filters. Parameters allow us to avoid hard coding values so that the end-user can interact with the data. The median is the middle of a sorted list. True or false? True.


How do you visualize important and less important data? As a rule of thumb is to put the most important data on the X- or Y- axis and less important data on color, size, or shape. For Tableau, what’s better; a column named year and the individual years in the rows or years as columns? For Tableau, tall and narrow data, which often results in more rows and fewer columns, works better. For example, we want a column “Year” with rows as 2000, 2001, 2002 instead of columns as 2000, 2001, 2002… Let’s say you have a Dimension with countries. Why would you use a set instead of filters? Filters are solely based on trimming data from the view whereas sets allow us to find insights in our data by allowing us to see members that are inside the set.


Do you know what a helper function is in Tableau? Helper functions to provide information about the relative position or size of an item within the current scope and direction. Helper functions are INDEX(), FIRST(), LAST() and SIZE(). In terms of Tableau performance, what’s better; TODAY() or NOW()? We should use the NOW() function only if we really the timestamp, otherwise it’s better to use TODAY(). But that’s only really relevant for seriously big data. Let’s say I add ATTR([Country]) and there is USA and Germany in Country. What’s the result of the function? This function is hardly ever used. In this case we get a ‘ * ’. ATTR() can be used as a precaution if the data changes. Is the following function a recursion or iteration? IF LAST() = 0

THEN (SUM([Sales]) - LOOKUP(SUM([Sales]), -1)) / LOOKUP(SUM([Sales]), -1)

END Are you running out of simple questions? We can’t write an iteration, such as a for loop, in Tableau. A recursion in computer science is a method of solving a problem where the solution depends on solutions to smaller instances of the same problem (as opposed to iteration). Thus, this is recursion. How many measures can we have in a combined axis and a dual-axis? In a combined axis, we can have several measures while in a dual-axis, well, the dual says it all, we can only have two measures. In other words, in a combined axis, all measures are on the left Y-axis. On the dual-axis, one measure is on the left Y-axis while the other is on the right Y-axis. Kind of weird to talk about a left and right Y-axis. Now if we synchronize the axis in the dual-axis depends on the case. With a combined chart, we simply drag and drop a new measure over the existing axis. Dual-axis charts allow us to build visualizations such as the donut chart, dual-axis map (bar and line graph combined) and lollipop charts. We add the second measure by dragging it to the right Y-axis. What’s the default filter hierarchy? None. The logical operator is ‘and’ such as A and B. In a context filter, this would be B given A or B | A. In other words, the context filter simply gives a hierarchy. What’s the global filter hierarchy in Tableau? Extract filters -> Data source filters -> Context filters -> Dimension filters -> Measure filters -> Table Calculation filters.


What’s the difference between a Primary Key and a Foreign Key? A FOREIGN KEY is a key used to link two tables together. A FOREIGN KEY is a field (or collection of fields) in one table that refers to the PRIMARY KEY in another table. The table containing the foreign key is called the child table, and the table containing the candidate key is called the referenced or parent table.


What’s cool about tabular data? That you can add new columns with functions. In Tableau, it’s called Calculated Fields. With Python, it’s mostly pandas. What Tableau Server Approach do you recommend? In most environments, a TDSX-centric architecture should be the approach. Even the one disadvantage, not real-time, can be mitigated with frequent updates. How could you bin “hour”? Sweet question! This is called “time discretization”. We could segregate in the morning, afternoon, evening and night. Fun fact: did you know that there are discretization algorithms? What’s a .tds file? The .tds file stores the metadata and data connection information. More formally, a .tds data source file is a published data source that stores data connection information and any metadata changes you’ve made to it (e.g. calculated fields, aliases). The data connection information is all the information required to make a live connection to an external.


You have a string 9.6.98 [Original_Date] and create a function DATEPARSE(‘d.M.yy’, [Original_Date]) with the name New_Date. What does New_Date return in this case? New_Date creates a date 03/06/1998 (in Europe 06/03/1998).


How does the average line work?

The average line works the same way as a constant line, however, we don’t have to enter a value. Tableau automatically calculates the average of the measure. The average can be calculated based on the table, pane or cell. In general, I like the average line to be in color (such as orange) and use a dotted line. What does the Average with 95% CI (Confidence Interval) present?

The CI quantifies the uncertainty of the average in the future. The formula for 95% confidence is 1.96*Standard Deviation / √N. Thus, the larger our sample (N) the higher our confidence and therefore the small our range. For example, for our sales numbers we might get an upper confidence of 80,000 and a lower confidence of 40,000. With 95% confidence, we expect our average sales number to be between those two numbers. You have a sales graph. If the [Sales] are >12,000 you want to show a Mark “Bravo!!”. If below 0, then “Horrible!!” How can you achieve that? That’s called dynamic labeling. I create a Calculated Field and add it to Marks: IF SUM([Sales]) > 12,000 THEN “Bravo!!” ELSEIF SUM([Sales]) < 0 THEN “Horrible!” ELSE NULL END How would you write “is not” in Tableau? Similarly to Python, we can write “ != ”. Use cases are tough though. One might be a IF-THEN-ELSE and a LOOKUP as on page 244 in “Mastering Tableau” by David Baldwin. Alternatively, we can write “<>”, which I find weird though.


How have a huge dataset that is extremely slow. What can you do? We can edit the extract and “hide all unused fields” or “extract a limited number of rows.” Another trick is to use incremental refreshes. If historical data changes, you might consider incremental refreshes during the week and a full refresh on the weekend.


If [Sales] are above [Quota], you want to display thumbs up, otherwise thumbs down. How can you achieve that? We can simply copy and paste Unicode. For example: IF SUM([Sales]) > [Quota] THEN “thumbs up”

ELSE “thumbs down”

END Can you save a password as part of a data source connection? Let’s say you share a workbook using a live connection with someone else, they will need to re-enter the password to access the data.


You create a Left Join. What happens to data which does not match the table on the left? As all records from the table on the left will be kept, unmatched records will contain NULL values for all fields from the table on the right.


Boolean calculations are faster than numeric or string calculations. True or False?

False. Boolean and numeric calculations are faster than string calculations. Can you tell me what Boolean logic is?

Boolean logic can be used to compare and manipulate sets using just three operators: AND, OR and NOT. For example: - You can have Strawberry OR Vanilla OR Chocolate. - You can have Strawberry AND Vanilla AND Chocolate. - You can have Strawberry OR Vanilla NOT Chocolate.


With Table calculations, what’s meant by scope and direction?

The scope option refers to table, pane or cell. The direction option refers to down, across, down then across and across then down. Let’s say you have an [Postal_Code] of “98103”. What does the following Calculated Field LEFT([Postal_Code],2) return? The Calculated Field returns the first 2 numbers from the left, i.e. “98”.






5 views1 comment