Further testing of quickRareCurve

After my post yesterday, documenting a faster parallelised version of the rarecurve function (quickRareCurve), I realised it’d be good to show a real world example using it on a reasonably large OTU table, to prove that it is indeed quicker than the original function. So, here we go.

# Starting with an OTU table in which rows are samples,
# Cols are OTUs/species
dim(otuTable)
## [1]   48 5382
# the first column of my data is the sample names
# so remember [, -1] to not include it
# Lets inspect sample sizes with a simple histogram
hist(rowSums(otuTable[, -1]), xlab = "Sample sizes")

In this example, our OTU table contains 48 samples and 5381 OTUs, plus a column with the sample names in.

Now we can use microbenchmark again to compare the performance of the original rarecurve function to our faster parallel version, quickRareCurve. We will then plot the results using ggplot2. Warning: the code below will take some time to run!

library(microbenchmark)

# benchmark the two functions with 3 replicates
testResult <- microbenchmark(
  rarecurve(otuTable[, -1]),
  quickRareCurve(otuTable[, -1]),
  times = 3)
library(ggplot2)

# convert from nanoseconds to minutes by dividing
# time by 6e+10
ggplot(testResult, aes(x = expr, y = time/6e+10)) +
  geom_boxplot() +
  labs(x = "Function", y = "Time (minutes)") +
  theme(axis.text = element_text(size = 16),
    axis.title = element_text(size = 18))

There you go, on a typical OTU table, the quickRareCurve function is far quicker, reducing the processing time from ~25 minutes to < 5 minutes. The more samples in your OTU table, and the more CPU cores available, the greater the increase in performance.

Avatar
Dave Clark
Post-doctoral researcher in microbial ecology