Posts

I am a huge fan of incorporating Tableau into my data analytics projects. While some may use R/python or Tableau, I use both; Tableau allows us to rapidly explore our data in order to find errors that need to be addressed before moving onto downstream modeling and reporting tasks. As you can imagine, I was thrilled when I recently noticed that Rapidminer has an Tableau Writer Extension. In short, it intends to do exactly what it says; export our ExampleSet to a Tableau extract file for use directly in Tableau.

CONTINUE READING

This notebook aims to show the basics of: Tensorflow 2.0 Shooter Embedding estimation for NHL Player evaluation Evaluate feasibility generating a post that switches between R and python via reticulate Demonstrate code similarity/approach in both languages side-by-side TL;DR Combine Tensorflow/Keras with R NHL Data to estimate Shooter Player Embeddings Export to Tableau for exploration (yes we could use ggplot et. al, but highlights we have other options, especially for those new to the language) R Setup # packages library(keras) suppressPackageStartupMessages(library(tidyverse)) library(reticulate) suppressPackageStartupMessages(library(caret)) # options options(stringsAsFactors = FALSE) use_condaenv("tensorflow") Python setup # imports import pandas as pd import numpy as np from sklearn.

CONTINUE READING

I have been diving back into python a bit lately, and admittedly, I have yet to find a tool that fits my workflow similar to that of R and Rstudio. There are all sorts of tools out there, but in the end, it feels like I am fighting the tool, not my code. To be honest, I really like using VSCode for other projects, but I feel like this product is aimed more at developers working on large applications, not data scientists.

CONTINUE READING

I recently learned today of carbon and it is absolutely fantastic. In my own words, carbon provides a terminal-like formatting for your code snippets, which can be included in blog posts and the like. It just makes things easier to read, in my opinion. Where my head goes is taking a snippet that looks like this: options(stringsAsFactors = FALSE) ## load the packages library(wakefield) ## generate a dataset of random users users = r_data_frame( n = 500, id, state, date_stamp(name="registration_date"), dob, language ) users$ID = as.

CONTINUE READING

Below is a post aimed at my future self. Be forewarned. The idea is to take an R data frame and convert it to a JSON object where each entry in the JSON is a row from my dataset, and the entry has key/value (k/v) pairs where each column is a key. Finally, if the value is missing for an arbitrary key, remove that k/v pair from the JSON entry.

CONTINUE READING

In this post, I am going to walk through some issues that I recently encountered when attempting to get up and running with the Rasa stack.. I am a big fan of the work they are doing, and by and large, it makes a complex problem, chatbot development, accessible and leverages machine learning under the hood. This is in contrast to tools that levergae simple rule-based approaches. Below we will be using conda to manage our python environments and ensure that the package dependencies align.

CONTINUE READING

Many moons ago, I wrote some code to build a Tableau Data Extract from the work that I had munged together in python. I figured it was time to update the code since I recently discovered that the Tableau API has changed. For a link to that old code, refer to the Jupyter Notebook in this repo. Assumptions and Requirements First off, I am using a Macbook, and while I believe things are getting easier on Windows machines with respect to coding, I prefer to write Terminal commands over point-and-click installs.

CONTINUE READING

If you have skimmed through some of my other posts on this blog, it’s probably not surprising that I love using Neo4j in my projects. While you certainly can develop and work through your ideas locally, if you are like me, you probably have a few pet projects going at once, some of which you might want to share publicly. This post aims to highlight how quickly you can get up and running using Cloud9, a cloud-based development environment.

CONTINUE READING

Below is a quick writeup on how I use R and RNeo4j to munge my data and throw “larger” datasets into Neo4j. In short, I am fairly capable in R, so I prefer to use it to do the heavy lifting. All I am doing is calling the neo4j-shell tool via ?system command. This post runs through how I have used this approach in some of my recent projects. I used this process for a project that I am currently working on at work, where 3+ million nodes and nearly 9 million relationships.

CONTINUE READING

I have been watching the DiagrammeR package for a while now, and at this stage, it’s pretty impressive. I encourage you to take a look at what is possible, but be assured the framework is there to do some really awesome things. One use-case that applies to me is that of data modeling an app within Neo4j. There are already some tools out there, namely: Arrows Graphgen by GraphAware And you can always use graphgists The last link above is a sample graph gist that is a decent overview.

CONTINUE READING