Python, Rodeo, and Tableau Data Extract API
Many moons ago, I wrote some code to build a Tableau Data Extract from the work that I had munged together in python
. I figured it was time to update the code since I recently discovered that the Tableau API has changed.
For a link to that old code, refer to the Jupyter Notebook in this repo.
Assumptions and Requirements
First off, I am using a Macbook, and while I believe things are getting easier on Windows machines with respect to coding, I prefer to write Terminal commands over point-and-click installs.
For tools, we will be using:
- A light-weight python instance from Anaconda called Miniconda. This lets us create environments that house the versions of the packages that we need, and it just works. Installation should be easy and is well documented on their site. The beauty of
conda
is that the tool handles all of the version dependencies for us! Prior, getting packages to work could be a nightmare at times. - The Tableau Python SDK. This was surprisingly harder to find than I would have guessed. Simply download the SDK, unpack it, and follow the instructions at the link. We will do this below, however.
- While not necessary, I am going to use Yhat’s Rodeo IDE and list how to configure the editor to use a specific conda environment. This is also easy to install, and assumes that you have already done so.
- Tableau, of course.
1. Create our Python environment
Assuming you have miniconda installed properly, we are going to create a Python 2.7 environment for our work with the Tableau API.
First, let’s explore what I have for environments on my system.
$ conda info --envs
yields
# conda environments:
#
chatbot /Users/btibert/miniconda3/envs/chatbot
ipy /Users/btibert/miniconda3/envs/ipy
tableau /Users/btibert/miniconda3/envs/tableau
root * /Users/btibert/miniconda3
Above are the environments that I have set up on my machine. If you are not familiar with environments, think of them as isolated installs of python with various packages for your use-case. Environments are a great way to isolate the versions and dependencies relative to a project, and allows you to avoid collisions if you just manage a single python environment.
Now, lets create our environment. In this case, I am going to call it tableau-post
.
conda create -n tableau-post python=2.7 pandas jupyter matplotlib
This will install python 2.7
, pandas
, matplotlib
and jupyter
(formerly ipython).
These packages and version are what you need to:
- Run Rodeo
- Use the Tableau Data Extract API
To activate the environment, simply use:
source activate tableau-post
The code above will activate our environment.
To confirm that you are good to go:
source activate tableau-post
which python
In my case, this yields:
/Users/btibert/miniconda3/envs/tableau-post/bin/python
We will use this when we configure Rodeo later on.
From this point forward, I am going to assume that you have tableau-post
activated.
2. Get the Tableau SDK
The code below jumps into Downloads
and will get the Tableau SDK from the web.
cd ~/Downloads/
wget https://downloads.tableau.com/tssoftware/Tableau-SDK-Python-OSX-64Bit-10-0-2.tar.gz
tar -xvzf Tableau-SDK-Python-OSX-64Bit-10-0-2.tar.gz
This will download and unpack the file, which will yield a directory that we need for installation.
cd TableauSDK-10000.16.1004.1720/
The command above will navigate into the folder.
sudo python setyp.py install
The command above will install the Tableau SDK for you. For more on this, go to this link
3. Fire up Rodeo and Configure it
With Rodeo open, navigate to Preferences
. You will see a dialog box similar to the one below.
You can see above that for the Python Command I am using the same location that was printed from the which python
command within the tableau-post
environment.
It’s that simple; now you can use isolated environments when developing with Rodeo!
The majority of the code that I write is in R
using the fantastic Rstudio IDE, and while the Rodeo tool is not as feature-rich (yet), it’s great for the way I develop. The fact that there is “type-ahead” functionality (read: you write less code), and the ability to commit the lines of my script 1x1, is extremely helpful for the way I work through my tasks.
I hardly debug in an efficient way, but I also tend to solve smaller problems and do not require test suites.
4. Build a Basic Extract
Below are a few helpful resources on how to use python to build out a Tableau Data Extract:
- Tableau Community Post
- Building Tableau Data Extract files with Python in Tableau 8
- My old notebook
- Build Extracts out of CSVs
The code below builds out a basic extract from a sample dataset available on the web.
## import the libraries
import tableausdk.Extract as tde
import pandas as pd
import os
## bring in a sample Graduate School Admissions datasets
= "http://www.ats.ucla.edu/stat/data/binary.csv"
file_name = pd.read_csv(file_name)
df
df.head()
df.shape
## create the extract name, but remove the extract if it already exists
= "example.tde"
fname try:
= tde.Extract(fname)
tdefile except:
'del ' + fname)
os.system('del DataExtract.log')
os.system(= tde.Extract(fname)
tdefile
## define the table definition
= tde.TableDefinition()
tableDef
## create a list of column names and types
= df.columns
colnames = df.dtypes
coltypes
## create a dict for the field maps
## Caveat: I am not including all of the possibilities below
= {
fieldMap 'float64' : tde.Types.Type.DOUBLE,
'float32' : tde.Types.Type.DOUBLE,
'int64' : tde.Types.Type.DOUBLE,
'int32' : tde.Types.Type.DOUBLE,
'object': tde.Types.Type.DOUBLE,
'bool' : tde.Types.Type.DOUBLE
}
## for each column, add the appropriate info the Table Definition
for i in range(0, len(colnames)):
= colnames[i]
cname = fieldMap.get(str(coltypes[i]))
ctype
tableDef.addColumn(cname, ctype)
## create the extract from the Table Definition
## Super Hacky, but legible
## for each row, add the data to the table
## Again, not accounting for every type or errors
with tdefile as extract:
= extract.addTable("Extract", tableDef)
table for r in range(0, df.shape[0]):
= tde.Row(tableDef)
row for c in range(0, len(coltypes)):
if str(coltypes[c]) == 'float64':
row.setDouble(c, df.iloc[r,c])elif str(coltypes[c]) == 'float32':
row.setDouble(c, df.iloc[r,c])elif str(coltypes[c]) == 'int64':
row.setDouble(c, df.iloc[r,c]) elif str(coltypes[c]) == 'int32':
row.setDouble(c, df.iloc[r,c])elif str(coltypes[c]) == 'object':
row.setString(c, df.iloc[r,c])elif str(coltypes[c]) == 'bool':
row.setBoolean(c, df.iloc[r,c])else:
row.setNull(c)# insert the row
table.insert(row)
## close the file
tdefile.close()
Summary
Some notes on the process:
- I am attempting to build-in some “smarts” as to how you define the column types by using the pandas data type and mapping that to a Tableau data type.
- The process defines the table, and for each row, and each column in that row, loops through the data 1x1 to define each value, and then commits the data to the row. NOT ALL TYPES ARE BEING ACCOUNTED FOR
- Each row is being processed in a serial fashion, so this might take some time depending on the size of your dataset.
I hope that this helps you as you develop with Python and Tableau. I recently have been trying to define a workflow locally to prototype my data needs, and since my dataset has millions of rows, using CSVs and local database tables just isn’t cutting it.