Subset data frame based on number of rows per group

Using the dplyr package:

df %>%
  group_by(name) %>%
  filter(n() < 4)

# A tibble: 5 x 2
# Groups:   name [2]
  name      x
  <fct> <int>
1 a         1
2 a         2
3 a         3
4 b         4
5 b         5

n() returns the number of observations in the current group, so we can group_by name, and then keep only those rows which are part of a group where the number of rows in that group is less than 4.

Leave a Comment