Recently I took a job with the Department of Fisheries and Oceans translating some statistical software from python into R. The package uses kernel smoothing to estimate the probability distribution of a variable of interest relative to one or more covariates. I’m employed for the next month or so, and hopefully by the end we’ll be submitting the package to CRAN. I’ll write a little more about that as it gets closer to happening.

I’ve gotten to the point in the development process where I need to think hard about optimization and benchmarking. I’m in the habit of keeping my bosses/colleagues up to date on my work through html files I make using R markdown (the same way I write this blog). R markdown has drastically changed the way I work, allowing me to weave narrative and code together to make a more compelling exploration of my progress. I was hoping to benchmark my code against the original python code to see how they stack up speedwise and thought it would be great to have the python benchmarks right in there with my R code. I knew knitr provided facillities for including non-R code in documents so I set to work figuring our how to do it.


Back to basics

For those of you who don’t know the fantastic one-two punch that is R markdown and Knitr let me tell you a little bit about it. R, since at least the early 2000’s, has provided facillities for (s)weaving code and documents together. The aim was to mesh analysis and reporting into one cohesive action in the name of reproducibility. Though there is still much progress to be made moving toward this ideal the tools have gotten better.

In 2012, Yihui Xie introduced the world to knitr, a neater knitter of code and documentation. It has (almost?) completely supplanted sweave as the primary tool to acheive this goal. The benefits of knitr are many, it allows you to create html and pdfs, include code, figures, and LaTeX all in the same document, and rerun the analysis any time the included code changes (but only if you want it to). Needless to say I’m a big fan.

The next major innovation in the reproducible research game was R Markdown. Markdown is family of markup languages introduced by John Gruber in 2004 designed to facillitate writing html documents in plain text. The goal was to have readable plain text documents where the html tags could be generated on demand. R markdown is variant designed to work primarily with R, created by the folks at R Studio. If you use R Studio, the integration of R markdown is seamless, you write an R markdown formatted file, the documents is converted to a format knitr understands, and then knitr makes you a beautiful document with all of your work ready to be shared with the world. I highly recommend using these tools for just about any authoring job short of manuscript preparation.

Now that I’ve given you a brief introduction, lets dive into my oddessy making a knitr engine.


The problem

As I mentioned in the preamble, I wanted to run R and python code side-by-side in my document so that both sets of benchmarks could be readily compared. Knitr comes pre-loaded with a quite a few engines for other languages:

library(knitr)
names(knit_engines$get())
##  [1] "awk"       "bash"      "coffee"    "gawk"      "groovy"   
##  [6] "haskell"   "lein"      "mysql"     "node"      "octave"   
## [11] "perl"      "psql"      "Rscript"   "ruby"      "sas"      
## [16] "scala"     "sed"       "sh"        "stata"     "zsh"      
## [21] "highlight" "Rcpp"      "tikz"      "dot"       "c"        
## [26] "fortran"   "fortran95" "asy"       "cat"       "asis"     
## [31] "stan"      "block"     "block2"    "js"        "css"      
## [36] "sql"       "go"        "python"    "julia"

A quick scan of the available engines shows that knitr should be perfectly capable of executing python code. So I tried:

```{r, engine = “python”}
#My python code
```

Which sadly did not work as I had hoped. The python interpreter used by knitr in that situation was the stock python interpreter. It failed to properly import the modules I needed and was untenable for my objectives. “Alas, if only there was an ipython engine” I thought to myself, and then promptly decided I should probably build one.


Taking apart the engine

To figure out how to build my own engine, I decided to have a look at the engine provided for me. A convenience of R that I think more people should take advantage of is the ability to get code for any function you like. Provided it’s written in R in many cases you can figure out how it works. You can skim over this code, I’ll get back to the important bits.

knit_engines$get("python")
## function (options) 
## {
##     if (isFALSE(options$python.reticulate)) {
##         eng_interpreted(options)
##     }
##     else {
##         if (!loadable("reticulate")) 
##             warning2("The 'python' engine in knitr requires the reticulate package. ", 
##                 "If you do not want to use the reticulate package, set the chunk option ", 
##                 "python.reticulate = FALSE.")
##         reticulate::eng_python(options)
##     }
## }
## <environment: namespace:knitr>

First good sign, all the code is R with no references to byte-code and it’s short enough to not take all day to figure out. First thing to note about the code is it’s a function, and not only that it’s a function of one argument, options. The first line of code takes a named element of options called engine (so options is a named list containing things like the name of the engine). My intuition was that options was a list of every component of chunk, and the rest of the function largely confirmed that intuition. Other named elements of options used in the function include:

The shape of the code the begins to come into focus. The function primarily takes your code out of the chunk (element code), converts it into shell command that runs engine with some flags and a quoted argument containing your code. This gets executed by the function system2 which is used to execute shell commands and retrieve output. This output is then passed to a function engine_output along with the code and options, the results of which are used to build your document. So from this point it seamed obvious what needed to happen.

  1. Concatenate engine.path and engine to create the command that needs to be run
  2. Convert code into a string appropriate for use in shell commands
  3. Paste flags onto code so that the engine knows to expect a string of code
  4. Run the command and the modified code through the engine with system2, saving the output
  5. Pass everything to engine_output

Pretty straight forward really, and except for some simple housekeeping that’s all I needed to write.


Writing the new engine

Now I began writing my own engine (and it didn’t work the first time, some of the housekeeping I didn’t figure out right away, but lets pretend it did). I called my engine ipythonKnitEngine but you can call your engine whatever you want.

ipythonKnitEngine <- function(options){
  
  enginePath <- options$engine.path
  if(grepl("/$", enginePath)) enginePath <- paste0(enginePath, "/")
  
  engine <- paste0(enginePath, options$engine)
  
  code <- paste("-c", shQuote(paste(options$code,
                                    collapse = "\n")))
  
  out <- system2(engine, code, stdout = TRUE, stderr = TRUE)
  
  engine_output(options, options$code, out)
  } 

As of writing this, the engine requires you specify the path to your ipython executable, its a kink I’ll probably iron out in the near future, but this was a quick fix to finish a document, so it still needs some polish.

For those who aren’t familiar shQuote converts a character vector to a string useable in your shell. The other potentially obtuse bit is the collapse = "\n" to paste. When paste takes a character vector of length n, by default it makes a vector of length n, when collapse is not NULL, all of those elements are concatenated together separated by the collapse argument. Otherwise pretty standard stuff.

The last necessary little bit of funny business is engine_output isn’t actually available from the knitr package to users, but you can force your engine to use knitr’s namespace by setting your engine’s environment:

environment(ipythonKnitEngine) <- environment(knit_engines$set)

Now the environment of the function is set to the knitr namespace and it will be able to find engine_output.

And finally we can set the new engine and begin using it!

knit_engines$set(ipython = ipythonKnitEngine)


Using the engine

To use the engine in your R Markdown / Knitr documents just run library(knitr) in your first code chunk, and include the function declaration for your engine along with the environment and knit_engines$set statements above and then you can use your engine in your document.

Now we can run

```{r, engine = “ipython”, engine.path = “path/to/ipython/”}
#Python code to execute
```
with impunity and it works like a charm!.


Wrap-Up

I hope you liked this little tutorial. If you like to say “nuts to writing my own, I’ll just use yours” you can find all this code in a convenient format on github.

Chris