By using the
merge function and its optional parameters:
merge(df1, df2) will work for these examples because R automatically joins the frames by common variable names, but you would most likely want to specify
merge(df1, df2, by = "CustomerId") to make sure that you were matching on only the fields you desired. You can also use the
by.y parameters if the matching variables have different names in the different data frames.
merge(x = df1, y = df2, by = "CustomerId", all = TRUE)
merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE)
merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE)
merge(x = df1, y = df2, by = NULL)
Just as with the inner join, you would probably want to explicitly pass “CustomerId” to R as the matching variable. I think it’s almost always best to explicitly state the identifiers on which you want to merge; it’s safer if the input data.frames change unexpectedly and easier to read later on.
You can merge on multiple columns by giving
by a vector, e.g.,
by = c("CustomerId", "OrderId").
If the column names to merge on are not the same, you can specify, e.g.,
by.x = "CustomerId_in_df1", by.y = "CustomerId_in_df2" where
CustomerId_in_df1 is the name of the column in the first data frame and
CustomerId_in_df2 is the name of the column in the second data frame. (These can also be vectors if you need to merge on multiple columns.)