Easy idiomatic way to define Ordering for a simple case class

My personal favorite method is to make use of the provided implicit ordering for Tuples, as it is clear, concise, and correct: case class A(tag: String, load: Int) extends Ordered[A] { // Required as of Scala 2.11 for reasons unknown – the companion to Ordered // should already be in implicit scope import scala.math.Ordered.orderingToOrdered def … Read more

Scala case class inheritance

My preferred way of avoiding case class inheritance without code duplication is somewhat obvious: create a common (abstract) base class: abstract class Person { def name: String def age: Int // address and other properties // methods (ideally only accessors since it is a case class) } case class Employer(val name: String, val age: Int, … Read more

How to define schema for custom type in Spark SQL?

Spark 2.0.0+: UserDefinedType has been made private in Spark 2.0.0 and as for now it has no Dataset friendly replacement. See: SPARK-14155 (Hide UserDefinedType in Spark 2.0) Most of the time statically typed Dataset can serve as replacement There is a pending Jira SPARK-7768 to make UDT API public again with target version 2.4. See … Read more

What is the difference between Scala’s case class and class?

Case classes can be seen as plain and immutable data-holding objects that should exclusively depend on their constructor arguments. This functional concept allows us to use a compact initialization syntax (Node(1, Leaf(2), None))) decompose them using pattern matching have equality comparisons implicitly defined In combination with inheritance, case classes are used to mimic algebraic datatypes. … Read more

Case class equality in Apache Spark

This is a known issue with Spark REPL. You can find more details in SPARK-2620. It affects multiple operations in Spark REPL including most of transformations on the PairwiseRDDs. For example: case class Foo(x: Int) val foos = Seq(Foo(1), Foo(1), Foo(2), Foo(2)) foos.distinct.size // Int = 2 val foosRdd = sc.parallelize(foos, 4) foosRdd.distinct.count // Long … Read more