Filter Spark DataFrame based on another DataFrame that specifies denylist criteria

You’ll need to use a left_anti join in this case.

The left anti join is the opposite of a left semi join.

It filters out data from the right table in the left table according to a given key :

largeDataFrame
   .join(smallDataFrame, Seq("some_identifier"),"left_anti")
   .show
// +---------------+----------+
// |some_identifier|first_name|
// +---------------+----------+
// |            222|      mary|
// |            111|       bob|
// +---------------+----------+

Leave a Comment