Why is “Unable to find encoder for type stored in a Dataset” when creating a dataset of custom case class?

Spark Datasets require Encoders for data type which is about to be stored. For common types (atomics, product types) there is a number of predefined encoders available but you have to import these first from SparkSession.implicits to make it work:

val sparkSession: SparkSession = ???
import sparkSession.implicits._
val dataset = sparkSession.createDataset(dataList)

Alternatively you can provide directly an explicit

import org.apache.spark.sql.{Encoder, Encoders}

val dataset = sparkSession.createDataset(dataList)(Encoders.product[SimpleTuple])

or implicit

implicit val enc: Encoder[SimpleTuple] = Encoders.product[SimpleTuple]
val dataset = sparkSession.createDataset(dataList)

Encoder for the stored type.

Note that Encoders also provide a number of predefined Encoders for atomic types, and Encoders for complex ones, can derived with ExpressionEncoder.

Further reading:

  • For custom objects which are not covered by built-in encoders see How to store custom objects in Dataset?
  • For Row objects you have to provide Encoder explicitly as shown in Encoder error while trying to map dataframe row to updated row
  • For debug cases, case class must be defined outside of the Main https://stackoverflow.com/a/34715827/3535853

Leave a Comment