How can i do FAST/ADVANCE data manipulation in nested data.table – data.table within data.table

I am going to try answering it myself. I am seeing Warning message in a hope to understand it better for any limitations. However, for me it works fine (ignoring the warning message).

On a side note, data.table breaks all the limitation of R that stops it to do Big Data processing, and lest i forget my own research would like it to be documented.

Meanwhile let us create a function that breaks up the route in legs:

construct.legs <- function(ro) {
      node_vector <- unlist(strsplit(ro, ">"))
      d_nodes <- node_vector[!node_vector %in% node_vector[1]]
      o_nodes <- node_vector[!node_vector %in% node_vector[length(node_vector)]]
      legs <- paste(o_nodes,d_nodes, sep = ">")

Now create nested leg_table for each route containing legs of the route. Of course using the function construct.legs that was defined above:

route_data[, leg_data := .(list(data.table(leg = construct.legs(route)))), by = list(row.names(route_data))]

How does our route_data look like now?

                                       route      travel_type leg1_time_hr leg2_time_hr leg3_time_hr leg4_time_hr     leg_data
1:           Seattle>NewDelhi>Patna>Motihari business_meeting           18         2.00         4.00           NA <data.table>
2:                 Seattle>NewDelhi>Motihari      casual_trip           18        18.00           NA           NA <data.table>
3: Seattle>Hyderabad>NewDelhi>Patna>Motihari   office_meeting           18         2.25         1.75            4 <data.table>

Let’s take a look what is inside if the nested data.table in 3rd row of route_data

route_data$leg_data[3]  #Access the leg_table like we do in data.frame. But this returns leg_data as a list
route_data$leg_data[[3]]  #This returns leg_data as a data.table
route_data[3, leg_data] #Access the leg_table like we do in data.table. This returns leg_data as a list
route_data[3, leg_data[[1]]] #This returns leg_data as a data.table

data.table stored in the 3rd row of route_data

1:  Seattle>Hyderabad
2: Hyderabad>NewDelhi
3:     NewDelhi>Patna
4:     Patna>Motihari

Let me add row number in route_data tha i will use later in populating transit time within nested table leg_data

route_data[, route_num := seq_len(.N)]

Similarly add row number in nested table leg_Table

route_data[, leg_data := .(list(leg_data[[1]][, leg_num := seq_len(.N)])), by = list(row.names(route_data))]

You see a Warning message that says there was invalid internal self reference that has been fixed by shallow copying. So, i am going to ignore this as of now. I would need help here from someone who can help me understand if it breaks anything. Anyway, lets proceed.

Why do we have [[1]]? This is to ensure that sub_table values are returned as data.table, not as list. Try running route_data[3, leg_data[[1]]] and route_data[3, leg_data] to see the difference.

Now finally add the transit time in nested leg_data from route_data

route_data[, leg_data := .(list(leg_data[[1]][, leg_transit_time_hr := sapply(leg_num, function(x) {route_data[[route_num, 2+x, with = FALSE]]})])), by = list(row.names(route_data))]

What did we do here?

We just looped in row number leg_num of leg_data via sapply by passing it as vector and utilized the row number route_num of route_data to identify right column of transit time to extract from the route_data.

Why did we place double [[]] on the [[route_num, 2+x, with = FALSE]]?

Double braces ensure it returns value as vector not as data.table

And, finally, let’s take a look into the nested data.table leg_data of 3rd row of route_data

route_data[3, leg_data[[1]]]
        leg             leg_num       leg_transit_time_hr
1:  Seattle>Hyderabad       1               18.00
2: Hyderabad>NewDelhi       2                2.25
3:     NewDelhi>Patna       3                1.75
4:     Patna>Motihari       4                4.00

Let’s see how 2nd row nested table looks like:

         leg            leg_num       leg_transit_time_hr
1:  Seattle>NewDelhi       1                  18
2: NewDelhi>Motihari       2                  18

Leave a Comment