Using ggplot2 Inside data.table

John · 2019/10/24 · 1 minute read

I’m learning data.table. Recently I needed to plot a histogram of some particular columns. I already learned to use .SDcols and .SD to get at particular columns and subsets, but I couldn’t figure out how to refer to the column’s name in the plot itself.

Eventually I settled on the approach of mapping over the indices of the data set and getting the correct column data and name during each lapply iteration.

Let me know your ideas for improvement. Like I said, I’m still learning data.table, so I’m sure I have much to improve. Here’s my code:

histogram_my_columns <- function(data, columns = NULL) {
  require(data.table)
  require(ggplot2)
  if (!is.data.table(data)) {
    data <- as.data.table(data)
  }
  if (is.null(columns)) {
    columns <- names(data)
  }
  stopifnot(all(columns %in% names(data)))
  plotter <- function(index) {
    column <- data[[index]]
    column_name <- columns[index]
    p <- 
      ggplot() +
      geom_histogram(aes(column)) +
      ggtitle(paste0('Histogram of ', column_name))
  suppressMessages(print(p))
  TRUE
  }
  res <- data[, lapply(seq_along(.SD), plotter), .SDcols = columns]
}
suppressPackageStartupMessages(library(data.table))
suppressPackageStartupMessages(library(ggplot2))
histogram_my_columns(mtcars, c('mpg', 'cyl', 'disp'))