apache-spark-encoders – Make Me Engineer

Why is “Unable to find encoder for type stored in a Dataset” when creating a dataset of custom case class?

July 20, 2022 by Tarik

Spark Datasets require Encoders for data type which is about to be stored. For common types (atomics, product types) there is a number of predefined encoders available but you have to import these first from SparkSession.implicits to make it work: val sparkSession: SparkSession = ??? import sparkSession.implicits._ val dataset = sparkSession.createDataset(dataList) Alternatively you can provide … Read more

Encoder error while trying to map dataframe row to updated row

June 11, 2022 by Tarik

There is nothing unexpected here. You’re trying to use code which has been written with Spark 1.x and is no longer supported in Spark 2.0: in 1.x DataFrame.map is ((Row) ⇒ T)(ClassTag[T]) ⇒ RDD[T] in 2.x Dataset[Row].map is ((Row) ⇒ T)(Encoder[T]) ⇒ Dataset[T] To be honest it didn’t make much sense in 1.x either. Independent … Read more

How to store custom objects in Dataset?

May 2, 2022 by Tarik

Update This answer is still valid and informative, although things are now better since 2.2/2.3, which adds built-in encoder support for Set, Seq, Map, Date, Timestamp, and BigDecimal. If you stick to making types with only case classes and the usual Scala types, you should be fine with just the implicit in SQLImplicits. Unfortunately, virtually … Read more