For everyone experiencing this in pyspark: this even happened to me after renaming the columns. One way I could get this to work after some iterations is this:
file = "/opt/myfile.parquet"
df = spark.read.parquet(file)
for c in df.columns:
df = df.withColumnRenamed(c, c.replace(" ", ""))
df = spark.read.schema(df.schema).parquet(file)
Related Contents:
- How to save/insert each DStream into a permanent table
- How to melt Spark DataFrame?
- Using a column value as a parameter to a spark DataFrame function
- Find maximum row per group in Spark DataFrame
- Unpivot in spark-sql/pyspark
- Split Spark Dataframe string column into multiple columns
- How to access element of a VectorUDT column in a Spark DataFrame?
- How to check if spark dataframe is empty?
- Avoid performance impact of a single partition mode in Spark window functions
- How to find count of Null and Nan values for each column in a PySpark dataframe efficiently?
- pyspark dataframe filter or include based on list
- Pyspark: Pass multiple columns in UDF
- Pyspark : forward fill with last observation for a DataFrame
- How to loop through each row of dataFrame in pyspark
- PySpark: how to resample frequencies
- spark.ml StringIndexer throws ‘Unseen label’ on fit()
- Rename more than one column using withColumnRenamed
- Why does Spark think this is a cross / Cartesian join
- Keep only duplicates from a DataFrame regarding some field
- More than one hour to execute pyspark.sql.DataFrame.take(4)
- pyspark: count distinct over a window
- Convert pyspark string to date format
- Spark Equivalent of IF Then ELSE
- Multiple Aggregate operations on the same column of a spark dataframe
- Partitioning in spark while reading from RDBMS via JDBC
- How to change a dataframe column from String type to Double type in PySpark?
- How to control partition size in Spark SQL
- How do I add a new column to a Spark DataFrame (using PySpark)?
- ‘PipelinedRDD’ object has no attribute ‘toDF’ in PySpark
- Does spark predicate pushdown work with JDBC?
- Explode in PySpark
- Spark DataFrame: count distinct values of every column
- Create Spark DataFrame. Can not infer schema for type
- GroupBy column and filter rows with maximum value in Pyspark
- Passing a data frame column and external list to udf under withColumn
- Count number of non-NaN entries in each column of Spark dataframe in PySpark
- Dividing complex rows of dataframe to simple rows in Pyspark
- How to fix ‘TypeError: an integer is required (got type bytes)’ error when trying to run pyspark after installing spark 2.4.4
- How do I add an persistent column of row ids to Spark DataFrame?
- Apache spark dealing with case statements
- Spark DataFrame: does groupBy after orderBy maintain that order?
- Cast column containing multiple string date formats to DateTime in Spark
- How to explode multiple columns of a dataframe in pyspark
- How to pivot on multiple columns in Spark SQL?
- How to group by common element in array?
- Using UDF ignores condition in when
- Create single row dataframe from list of list PySpark
- Cannot find col function in pyspark
- Spark specify multiple column conditions for dataframe join
- What should be the optimal value for spark.sql.shuffle.partitions or how do we increase partitions when using Spark SQL?