RapidMiner and Tableau

Data Science
Author

Brock Tibert

Published

August 20, 2021

I am a huge fan of incorporating Tableau into my data analytics projects. While some may use R/python or Tableau, I use both; Tableau allows us to rapidly explore our data in order to find errors that need to be addressed before moving onto downstream modeling and reporting tasks.

As you can imagine, I was thrilled when I recently noticed that Rapidminer has an Tableau Writer Extension. In short, it intends to do exactly what it says; export our ExampleSet to a Tableau extract file for use directly in Tableau.

However, when you dive into the documentation, the setup process is somewhat complex. Unfortunately, I was unsucessful and could not get things to behave.

Not a problem. Conda environments and python to the rescue!

Rapidminer allows us to call R and python from within the tool, and even provides us with the ability to manage virtual environments.

You can access above by navigating to Rapidminer > preferences.

The big thing to note above is that you see we can specify our package manager and environment. We are going to use that in this post to create a Conda environment for our Rapidminer work.

To get started, I assume that you already have Conda installed. I prefer Miniconda, which you can install here.

Let’s create the environment. I am a Mac user, so the below commands will be entered into Terminal. Windows users absolutely can perform the same actions, though the sytnax may be slightly different.

conda create -n rapidminer python=3.7 pandas scikit-learn

Above we created a new conda environment called rapidminer which uses python version 3.7, and includes pandas and scikit-learn out of the box. When prompted, say ‘y’ to install the necessary toolling.

One more step. We need to activate the environment to install pantab via pip.

conda activate rapidminer
pip install pantab

Last but not least, go back to Rapidminer > Preferences and select our newly created rapidminer environment. Note, you may need to refresh the interface for it to be made available.

That is all we need for setup!

From here, let’s do a basic test. We will use the included Golf dataset, and write the file to a Tableau hyper file.

Note that the dataset is going into our input port.

The only other thing that we need to do is create a simple script to run. In this case, there are some details specific to me and my machine, but you can easily change these as needed.

import pandas
import os
import pantab
import shutil

# rm_main is a mandatory function, 
# the number of arguments has to be the number of input ports (can be none),
#     or the number of input ports plus one if "use macros" parameter is set
# if you want to use macros, use this instead and check "use macros" parameter:
#def rm_main(data,macros):
def rm_main(data):
    # SETUP VARS
    FNAME = "brock.hyper"
    TNAME = "brock"
    FPATH = "/Users/btibert/Downloads/" + FNAME

    # checking
    print('Hello, world!')
    # output can be found in Log View
    print(type(data))
    # where are we
    print(os.curdir)
    print(data.shape)
    ## ^^ TO SEE ABOVE, add log as a VIEW from the top ribbon

    # the dir for the data to start
    os.chdir("/tmp")

    # write the file to a hyper
    pantab.frame_to_hyper(data, FNAME, table=TNAME)

    # move the file
    shutil.move(FNAME, FPATH)

    return data

This script is included within the script portion of the operator.

Based on my setup, when I run the process, I now have a file called brock.hyper in my Downloads folder, which is our golf dataset written to a Tableau extract file via the excellent pantab library for python.