#set SLURM binaries PATH so that RSW Launcher jobs work
<-"/opt/slurm/bin"
slurm_bin_path
<-strsplit(Sys.getenv("PATH"),":")[[1]]
curr_path
if (!(slurm_bin_path %in% curr_path)) {
if (length(curr_path) == 0) {
Sys.setenv(PATH = slurm_bin_path)
else {
} Sys.setenv(PATH = paste0(Sys.getenv("PATH"),":",slurm_bin_path))
} }
FAQ
Important things to be aware of
- global data can be exported via the
export
parameter as a list - packages needed can equally exported to the worker nodes as a character vector via
pkgs
- additional values in the respective
clustermq
template can be set via thetemplate
option as a list, e.g.template=list(memory = 1024, cores = 1)
- worker log files can be switched on by setting
log_worker = TRUE
How does this all fit into RStudio IDE ?
The usage of clustermq
and its Q
function opens possibilities for both RStudoi IDE Open Source and the professional version.
Irrespective of the version and whether professional product or Open Source, you always can run the Q
function on the R Console which itself will then use the HPC environment your R session is running on and spawn the required jobs.
Additionally you can use the “Background jobs” feature in Open Source or the “Workbench jobs” in the professional version to farm out the possibly longer running Q
function into a non-interactive job. Please note that the resources required for such a non-interactive job are very minimal (1 core) as this will only be the Master process - all the Workers will be spawned by this Master Process into separate jobs.
RStudio Workbench jobs & SLURM
The usage of the Workbench jobs feature can be a bit cumbersome if the main clustermq
process is run in such a workbench job. This is due to the PATH
environment variable not set to contain the information where to find the SLURM binaries. As a workaround we recommend to add the following lines to your .Rprofile
in your home-directory (alternatively get your IT admin to add it to Rprofile.site
within your R installation).
What happens if I don’t have any HPC cluster available to run my clustermq
based code ?
This is no problem at all, you then can simply remove the clustermq.template
option and clustermq
will fall back to local execution without any further code changes. But you still can also parallelize locally by using multicore
or multiprocess
as clustermq.scheduler
What happens if the HPC cluster I would like to work got no RStudio installation ?
Not all hope is lost in this case either. clustermq
also supports the ssh
connector. This allows you to remote run your R code on any remote host you are allowed to log in via ssh
. If you set up passwordless ssh connection to the login node of your HPC cluster, for example, you can set
options(
clustermq.scheduler = "ssh",
clustermq.ssh.host = "user@hpclogin", # use your user and login node
clustermq.ssh.log = "~/cmq_ssh.log" # log for easier debugging
)
Depending on the overall setup of the RStudio Server and the HPC cluster (e.g. R versions and installation directories, home-directory location) you may need to tweak the provided default ssh template Note: clustermq
will use the ssh
connection and once on the HPC cluster will detect and use the appropriate scheduler for submitting jobs.
Measuring code execution time in R
For the purposes in this document, we are using the microbenchmark
package. This allows the execution of selected code chunks a number of times to average over typical OS jitter. An example would be
library(microbenchmark)
<- function (x) x*x
func microbenchmark(func(10))
Unit: nanoseconds
expr min lq mean median uq max neval
func(10) 885 893 16892.86 901 1006 1576467 100
Note: By default the function call is evaluated 100 times. This can be changed by adding the optional parameter times=X
where X
is the desired amount of function calls to run.