Updated: Jun 25
This blog post focuses on how to deploy an AI model in Tableau. In particular, whether you should use Tableau's “Table Extensions” or the “Analytics Extensions API.” We prefer the later approach, as it doesn't require TabPy (a pain on the server). From a customer perspective, the Analytics Extensions API is super simple: (1) add a Calculated Field and (2) give the Calculated Field access to the web API (hostname, port, username, password) as illustrated here:
- Why the "Analytics Extensions API" is our preferred solution
- The difference between "Table Extensions” vs. “Analytics Extensions API."
- The exception: for time series data, the “Table Extension” is your only solution.
Attention: This post covers deploying your AI model – i.e., an AI model you developed.
This blog assumes that you or your team can develop an AI model and create an API. Our focus is the Tableau developer “perspective.” A blueprint on how to deploy an AI model in Tableau. As we'll see, it's already quite technical.
What qualifies us to write about it? We've done all the steps and deployed an AI model in production (i.e., on the Tableau server). We've built a dataset consisting of seven billion transactions, developed an AI model in Google's TensorFlow, created an API in PythonAnywhere, and deployed the model as an "Analytics Extensions API" on the Tableau server.
We tried the "Table Extensions" and the "Analytics Extensions API" approaches. Both approaches have advantages and disadvantages. The "Table Extension" is a pain for an AI model deployment as it requires a local host for the server to access. The "Analytics Extensions API" is much simpler to deploy on the Tableau server. However, if your data is granular, such as with time series data, the "Analytics Extensions API" sadly doesn’t work.
We hope this blog post helps you avoid major headaches in deploying an AI model in Tableau. Trust us; you want to remove every obstacle possible. Developing and deploying an AI model is challenging.
Let’s get started …
Programming languages in Tableau
Let's make this short: you can use three different programming languages (Python, R, and MATLAB) in Tableau, but you can only do it all using Python. Thus, we only use Python.
Using Python in Tableau
Technically, there are three ways of using Python in Tableau. There is an old way of using Python within TabPy, calling a script from your local environment. However, it's clunky and complicated to call it in a Calculated Field. As "Table Extensions" technically replaces this old approach, we ignore the old approach.
As of today (2023), we have two ways of calling a Python script within Tableau Desktop:
1: Table Extensions
2: Analytics Extensions API
Let's cover both briefly before we dive into deploying an AI model.
A Short Introduction to “Table Extensions”
"Table Extensions" allow us to create new data tables directly in an analytics extension script. Unlike the old TabPy solution, we enter the Python code directly in Tableau Desktop. In the image below, I've connected to my local TabPy. Once I'm connected, Tableau's Python "IDE" turns the Python code into full color. It's gorgeous!
In the example above, we connect Tableau Desktop to an external API and return a table. Our simple demo example sends a person's name, e.g., Elon, and the API tries to predict the age of the given name. Of course, it's a silly example, but it's recommended that you test "Table Extensions" the first time. Why? The API calls are free, and you probably need to get used to the "way things work." For example, we must drag "Table Extensions" in before we drag the data. It's different from how Tableau typically works.
Once you apply the model, Tableau creates a new table called "Output Table." In our case, the name is "Age."
There's nothing more to do in Tableau, such as creating a new Calculated Field. In our case, Tableau made a new measure called "Age," and we can use it as any other measure.
Use cases for Table Extensions:
- Call a function not available in native Tableau
- Call an API such as weather, geocoordinates, or custom script
- Call an AI model via an API
“Table Extensions” requirements:
To run "Table Extensions" successfully, we need to have Python and TabPy installed and connected, whether we run it locally or on a server.
Depending on the Tableau developer's Python skills, running "Table Extensions" locally can already be somewhat of a challenge. However, running it on a Tableau server is a more significant challenge as the Tableau server needs a local host, and it can be our local environment. In other words, to bring "Table Extensions" into production, we need a local host (Python and TabPy) that the Tableau server can access.
I think "Table Extensions" in production are cumbersome, and I would avoid them if possible. There's a more straightforward approach for many cases, namely the "Analytics Extensions API."
A Short Introduction to the “Analytics Extensions API”
As we will see shortly, deploying the "Analytics Extensions API" is much simpler than the "Table Extensions." If you don't shy away from creating your API, deploying the "Analytics Extensions API" is always easier. However, regarding deploying an AI model, we'll cover a pesky restriction shortly.
The most significant advantage of the "Analytics Extensions API" over the "Table Extension" is that the "Analytics Extensions API" does not require a local host. In other words, with the "Analytics Extensions API," you are not required to have Python or TabPy installed locally or on a server.
To run, the "Analytics Extensions API" requires only two things:
1 – Similar to the "Table Extensions," we need a connection. However, we don't connect to a local host in TabPy, but an external API in "Analytics Extensions API."
2 – A new calculated field.
Let's start with step (1). Similarly to "Table Extensions," we connect to an API. However, with the "Analytics Extensions API," we don't connect to a local API but a REST API (essentially an API call via the Internet). However, with the 'Analytics Extensions API,' we don't connect to a local API on our machine or within our network. Instead, we connect to a REST API that is hosted on a server accessible via the Internet.
Similarly, to connect "Table Extensions," we must go to Manage Analytics Extensions Connections (under Help | Settings and Performance). However, we need to connect to the Analytics Extensions API and not TabPy this time.
In the Analytics Extension API, we can connect to our API. In our case, we've developed our API using PythonAnywhere. This API is built using Flask, a popular Python framework for web development, and hosted on Pythonanywhere, a cloud platform for Python applications.
Now that we are connected to our API, the next step is creating a custom Calculated Field related to our “Analytics Extensions API.”
Let's assume we've trained a Machine Learning or AI model to predict the probability of getting type II diabetes based on age, weight, and height. It's a simplified example to show the workings of the Calculated Field.
The Calculated Field in Tableau could look like the following:
SCRIPT_REAL("Diabetes", ATTR([Age]), ATTR([Weight]), ATTR([Height]))
The Calculated Field would send the information about age, weight, and height to the API. The API would return the probability of getting diabetes type II as a new variable (or measure in Tableau lingo). For example, the probability might be 10%.
That’s it with the “Analytics Extensions API.” No need to install Python or TabPy; locally or on your server.
However, for all the awesomeness with "Analytics Extensions API," there's a weakness related to AI models. Why? Looking at the Calculated Field above, you see that the script only takes aggregated or constant data.
To understand the challenge with aggregated vs. granular data, let's explore it from the first principle basis (i.e., down to the point where we cannot deduce any further).
Machine Learning vs. AI
The definition of Machine Learning and AI is a bit vague and confusing. In my view, a Machine Learning algorithm is based on statistical techniques such as Logistic Regression, Linear Regression, Random Forest, or XGBoost. However, I believe everything is an AI that uses a neural network as its basis.
I'm emphasizing "my view," as some experts might disagree.
Why is this distinction important? It's crucial due to the nature of our data. In Tableau, we mainly deal with tabular data, which tends to be smaller. On smaller datasets, Machine Learning models tend to outperform AI models. Let's see that in a practical example.
Most classification datasets are in tabular format, such as the example below. Each row represents a customer. Data is either discrete (a dimension in Tableau lingo) or aggregated. The last column has a label for customers having churned or not churned (1/0) to train an algorithm.
Source: Kaggle (Bank Customer Churn Dataset)
The dataset is perfect for the “Analytics Extensions API” as the data is either constant or aggregated.
However, for such a dataset (tabular data), a traditional classification Machine Learning algorithm (e.g., XGBoost) is sufficient. Why? A Deep Learning model requires a large amount of data (think millions of rows). In general, those tabular datasets are small. For example, the customer churn dataset above, published on Kaggle, has only 10,000 rows. Few companies have millions of customers, and thus, those datasets tend to be small.
Everything changes with time series data.
Time series data is a whole different beast. The critical difference is that we have to display multiple time series in a long format. In a long format, values from distinct time series or dimensions of the same time series can be stored in the same column. Data in the long format also has an identifier column that provides context for the value in each row. More commonly, values of the same variable from discrete time series are stored in the same column.
Let's take the following fake example. Each row is a transaction. In many industries, this can result in millions of rows (i.e., the data is on a granular level):
Many industries deal with time series data to forecast, detect anomalies, or predict fraud.
The "Analytics Extensions API" cannot deal with time series data, as the data needs to be aggregated or constant. One workaround might be to bin your time series data in some cases. By binning, you can present your time series data as bins. With bins, you can pivot your data to a tabular format and deliver the bins as columns. In this case, you might end up with a high number of columns (100s or even 1000s). If you can maintain a high number of rows (e.g., >1,000,000) combined with a high number of columns (e.g., >100), you end up with an ideal dataset for an AI. Why? In most cases, AI models outperform traditional Machine Learning algorithms on high-dimensionality datasets.
However, if binning is not your option, the "Analytics Extensions API" is not your solution for time series data. In that (rare) case, the "Table Extension" is your solution. With a "Table Extension," you can process time series data, deliver it to the API, and return the result.
Configuring Tableau Cloud with the Analytics Extensions API
Last, but not least, we need to give permissions to run the API in Tableau Cloud. That's pretty straightforward for the Analytics Extensions API. In Tableau Cloud, settings | extensions, we can configure the connection to the Analytics Extensions API (i.e., our PythonAnywhere API).
As mentioned before, the steps would be slightly different for a Table Extension. For a Table Extension, we would need a local host that is always reachable (not your local host) for the Tableau server (see Diego Martinez’s excellent description) and connect to TabPy (not Analytics Extensions API).
Deploying an AI model is not simple. However, in Tableau, we now have two great options, "Table Extensions" and the "Analytics Extensions API," to deploy an AI model. Both approaches have advantages and disadvantages. However, for deploying complex models, we strongly favor the "Analytics Extensions API."