What methods can we use to reshape VERY large data sets?

If your real data is as regular as your sample data we can be quite efficient by noticing that reshaping a matrix is really just changing its dim attribute. 1st on very small data library(data.table) library(microbenchmark) library(tidyr) matrix_spread <- function(df1, key, value){ unique_ids <- unique(df1[[key]]) mat <- matrix( df1[[value]], ncol= length(unique_ids),byrow = TRUE) df2 <- … Read more

Subsetting R data frame results in mysterious NA rows

Wrap the condition in which: df[which(df$number1 < df$number2), ] How it works: It returns the row numbers where the condition matches (where the condition is TRUE) and subsets the data frame on those rows accordingly. Say that: which(df$number1 < df$number2) returns row numbers 1, 2, 3, 4 and 5. As such, writing: df[which(df$number1 < df$number2), … Read more

tech