databricks – Make Me Engineer

Spark dataframe save in single file on hdfs location [duplicate]

September 3, 2022 by Tarik

It’s not possible using standard spark library, but you can use Hadoop API for managing filesystem – save output in temporary directory and then move file to the requested path. For example (in pyspark): df.coalesce(1) \ .write.format(“com.databricks.spark.csv”) \ .option(“header”, “true”) \ .save(“mydata.csv-temp”) from py4j.java_gateway import java_import java_import(spark._jvm, ‘org.apache.hadoop.fs.Path’) fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(spark._jsc.hadoopConfiguration()) file = fs.globStatus(sc._jvm.Path(‘mydata.csv-temp/part*’))[0].getPath().getName() fs.rename(sc._jvm.Path(‘mydata.csv-temp/’ … Read more

How to install a library on a databricks cluster using some command in the notebook?

databricks configure using cmd and R

Spark dataframe save in single file on hdfs location [duplicate]