Python Development with Rstudio using Reticulate
I have been diving back into python a bit lately, and admittedly, I have yet to find a tool that fits my workflow similar to that of R and Rstudio. There are all sorts of tools out there, but in the end, it feels like I am fighting the tool, not my code.
To be honest, I really like using VSCode for other projects, but I feel like this product is aimed more at developers working on large applications, not data scientists.
My teaching tool of choice is Google Colaboratory, but the lack of a dedicated R
runtime is really brutal. There are workarounds, but this renders many of the features that I love about Colab useless. For example, you can’t connect your session to Google Drive. To me, and when I teach in class, this is a deal breaker.
Rstudio and Reticulate
There has been plenty written about reticulate, so I will let you dive into the tutorials and background. It is not without it’s quirks, but by and large, the combo work really well, especially for a younger solution. Historically there have been other attempts to bridge the gap, but from a feature and usability perspective, this is by far the most robust offering if you ask me.
Over the last year, I have been (slowly) working on a python package to help facilitate the collection and analysis of datasets that are openly available within higher education. I mention the tools above because my development has been really slow outside of RStudio. Last week, I got fed up and came back to RStudio. If I must say, the experience has been really pleasant, but more importantly, I am writing code at a much faster rate. Is it because I am more comfortable with the Rstudio interface? Perhaps, but I really do believe RStudio could be THE data science IDE of the future.
With that said, here are a few things that tripped me up along the way. This is not meant to call out the quirks of developing python packages using RStudio and reticulate, but is a note to my future self as to the tricks necessary to work around some issues that are pretty annoying.
1. Restarting R sessions
Reticulate is fantastic and can hook into environments on our machine. For my package above, I manage my environment using conda. Here is the thing. If I make a change to my package, and need to retest the code locally, things start to get hairy. You may properly uninstall/install your work locally, even in the conda environment, but you won’t see the changes in Rstudio.
The flow below solves the issue above, and represents the process by which I have been editing and seamlessly test my code all without leaving RStudio.
- Open Rstudio project for my python package
- Load the reticulate package, set conda with
use_condaenv()
and thenrepl_python
- Make edits to my functions, methods, whatever. The trick here is that now we have an interactice python repl which is very helpful as I step through the development of methods and classes.
- With the changes made, on the Terminal tab within Rstudio, uninstall the package with
pip uninstall <packagename>
and thenpip install .
- Once the package is updated, you must restart your R session via Session > Restart R. Failure to do so will not bring in the changes to your package, which is now updated within your conda environment
- After restarting R, repeat step 2.
- Voila, it works!
It’s not the worst workflow, but without it, you inevitably will be banging your head against the wall wondering why your python package updates didn’t hold.
If there is a flaw above, or an easier way to address my issue, please reach out. As I noted above, Rstudio for python package development is my preferred solution at the moment.
2. History doesn’t work
The section heading says it all. While we log the commands to the History tab, if you try to send a selected entry to the console, the python repl will break.
3. Incomplete function runs
This isn’t the worst, but it tripped me up once or twice. When I select the full block of code for a function or method definition, the code will bulk run in the repl, but I have to select the repl and hit enter for the code to fully execute. It’s as if we the repl doesn’t know that we are done with our code run.
This might be happening if I select only the function, but that said, it does appear to occur here and there.
4. Environment
I feel like this works from time to time, but the objects in my python session are not shown within the Environment tab.
To me, this would be a game changer, and have feature parity with spyder