Difference between df.repartition and DataFrameWriter partitionBy?

Watch out: I believe the accepted answer is not quite right! I’m glad you ask this question, because the behavior of these similarly-named functions differs in important and unexpected ways that are not well documented in the official spark documentation. The first part of the accepted answer is correct: calling df.repartition(COL, numPartitions=k) will create a … Read more