Scheduling R Tasks with Crontabs to Conserve Memory

One of R’s biggest pitfalls is that eats up memory without letting it go.  This can be a huge problem if you are running really big jobs, have a lot of tasks  to run, or there are multiple users on your local computer or r server.  When I run huge jobs on my mac, I can pretty much forget doing anything else like watching a movie or ram intensive gaming.  For my work, Kwelia, I run a few servers with a couple dedicated solely to R jobs with multiple users, and I really don’t want to up the size of the server just for the few times that memory is exhausted by multiple large jobs or all users on at the same time.  To solve this problem, I borrowed a tool, crontab, from the linux (we use an ubuntu server but works on my mac as well) folks to schedule my Rscripts to run at off hours (between 2am-8am), and the result is that I can almost cut the size of the server in half.

Installing Crontabs is easy (I used this tutorial and this video) in a linux environment but should be similar for mac and windows. From the command line enter the following to install:

sudo apt-get install gnome-schedule

Then to create a new task for any user on the system enter if you are the root user or admin:

sudo crontab -e

or as a specific user:

crontab -u yourusername -e

You must then choose your preferred text editor. I chose nano, but the vim works just as well. This will create a file that looks like this:
Screen Shot 2013-09-03 at 5.01.19 PM

The cron job is laid out in this format:minute (0-59), hour (0-23, 0 = midnight), day (1-31), month (1-12), weekday (0-6, 0 = Sunday), command. To run an rscript in the command just put the “Rscript” and then the file path name. An example:

0 0 * * * Rscript Dropbox/rstudio/dbcode/loop/loop.R

This runs the loop.R file at midnight (zero minute of the zero hour) every day of every week of every month because the stars mean all.  I have run endless repeat loops before in previous posts, but R consumes the memory and never free it.  However, running  cron jobs is like opening and closing R every time so the memory is freed (probably not totally) after the job is done.

As an example, I ran the same job in a repeat every twelve hours on the left side of the black vertical line, and on the right is the same job being called at 8pm and 8am.  Here’s the memory usage as seen through munin:

Screen Shot 2013-09-03 at 5.10.41 PM Screen Shot 2013-09-03 at 5.11.09 PM

I don’t have to worry nearly as much about my server overloading now, and I could actually downsize the server.

QED

About these ads

One thought on “Scheduling R Tasks with Crontabs to Conserve Memory

  1. Pingback: Scheduling R scripts to run on a regular basis | Thiago G. Martins

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s