Here at rexy.ai we use R programming language a lot - starting from exploratory data analysis and ending deploying fully fledged backend applications. While one can find tons of articles about the former, the later topic is less known.
First we will introduce RestRserve - a framework we’ve developed to build HTTP API. Then will discuss the current state of the tools within R ecosystem aiming similar goals.
We’ve started RestRserve in 2018 as a framework for building REST API in R which can handle requests in parallel (targeting Rserve backend). Now it is emerged into a solid, flexible piece of software with plugable backends which can be useful for a broad R community. Please find out more at https://restrserve.org/.
- high test coverage
- versioned automated docker builds available from docker hub. Alpine-based images are under 30mb.
- robust around bugs in user code, gracefully generate appropriate HTTP error codes and log stacktraces for debugging
- maturing API. As we run RestRserve in production we care about backward compatibility and try minimizing breaking changes
- it is fast
Exposing R programs to the world
There are several common outcomes of data scientist/statistician work:
- - static document - scientific paper / report
- - interactive application/visualization/dashboard
- - backend application
And the R ecosystem provides best in class tooling for the first two problems:
- - R Markdown is an excellent tool to generate any kind of report in any format
- - Shiny or Dash allows to build beautiful web applications while not leaving comfortable R environment. With the help of shinyproxy you can add enterprise-grade authentication and horizontally scale your applications.
However many (if not the most) people from R community struggle with integration of the results of their work into other systems. The popular opinion that “R is not suitable for production. You need to rewrite your program using normal programming language like Java/Python/Go/C++/…” is only partially true. Let’s deep dive into the problem.
R is a weird language
Yes! Even loving R we need to admit that it is a weird language. It is weird in a sense that it comes from Lisp family, while most of the mainstream programming languages which everybody studies at university are from C family.
Few large companies use R “in production”. By “in production” we mean that R applications are communicate with other systems and run 24⁄7 in a non-interactive mode. Reason is very simple - it is much easier and cheaper to hire an engineer with the knowledge of a general purpose programming language. Small community around “R in production” brings us to the next challenge - tooling.
What does typical system integration need?
In a modern world it is common design pattern to organize the system as a set of loosely coupled microservices which communicate over the network. So your R application should be able to communicate with other systems via network.
Since R is not extensively used to build backend applications the number of packages addressing common devops/engineering is limited. However it is incorrect to assume that tools don’t exist.
There are different protocols to accomplish communication over network. Of course one can communicate with other systems via plain sockets. However since it is a very low-level protocol, it is error prone to develop such applications. So most of the people use higher-level solutions.
Some of the protocols are fast, binary and complicated. Others, like HTTP, are text-based and much more simpler. Let’s briefly review what R ecosystem can offer.
We are aware of the following options:
- The most feature-rich solution is Rserve. It is developed by Simon Urbanek, R-core member, and was around since 2003(!) - see DSC-2003 Proceedings. Rserve provides fast binary transport protocol on top of TCP, multiple client libraries for popular languages like C++, Java, Python. On top of that it creates a separate workspace for every connection and handled them concurrently! As per our understanding (could be wrong!) Rserve is extensive used at At&T - one of the largest US telecom providers.
Let us know if we missed something here.
At the moment the most common approach is to use HTTP protocol and REST architectural style. Despite HTTP is a plain text based protocol (so it is far from being optimal). It is practically proven to be good enough for the most of the cases while being very simple.
Generally you can’t expect to find as many HTTP servers as for example in Python or Java. However R ecosystem can offer several choices.
- There is an excellent httpuv, HTTP and WebSocket Server Library, developed by Rstudio. This library is quite low-level and generally is not used directly to develop applications.
- plumber which is built on top of httpuv is probably the most popular options to create a HTTP API. It utilize special code annotations to do so. It very easy to start building basic API using plumber since it magically does a lot of things for you behind the scenes (encoding and decoding the HTTP body, query parameter parsing, etc.). However from our experience it quickly falls short as you start building something less trivial - annotations are cool, but it is much more flexible to control the application programmatically.
- fiery is another package build on top of httpuv.
- Rook was the first packages which tried to define a web server interface for R (in fact httpuv follows Rook specification). Additionally you can build a HTTP API using build-on R’s Rhttpd web server as backend.
- FastRWeb allows to use Rserve to build high-performance HTTP API. Somewhat low-level for an average developer, you will need to handle many thing yourself.
- opencpu by Jeroen Ooms, which target it to provide simple interface to R. But you can use it to develop REST API as well.
What are the limits of the solutions above?
A typical web application usually has is a high chance to receive multiple simultaneous requests.
- httpuv based solutions mitigate this by queueing requests, which effectively means handling one request at a time. Of course one can put load balancer such as HAproxy or nginx behind the application (and usually this is done in production environments anyway!). But this effectively makes things unnecessary complicated for simple setups (especially considering a skill set for an average R user).
- opencpu requires additional installations if you want to handle requests in parallel (correct me if I’m wrong). Also it introduces quite significant latency (around 100ms) which might be critical for fast API calls.
- FastRWeb being based in Rserve can handle requests truly in parallel in POSIX systems - each connection is served in a separate fork process. But as mentioned before FastRWeb is quite low level library and requires significant amount of additional code to robust web application.
Need support? Just let us know - email@example.com.
Found a bug? Please report on github rexyai/RestRserve.
Modern large scale applications are not limited to synchronous REST style communication. More and more companies move towards asynchronous architectures where the central part is a distributed queue or PubSub. Keep eyes on our blog to learn more about ditributed queues.
Take care in our challenging times and stay tuned!