How to suppress info and success messages in sbt?

sbt 1.x, sbt 0.13.13+ Use -warn or -error. See Fixes with compatibility implications for sbt 0.13.13 release: it is strongly encouraged to migrate to the single hyphen options: -error, -warn, -info, and -debug sbt 0.13.1 To disable info messages run SBT with –warn or –error command line options. To disable [success] messages set showSuccess to … Read more

Spark UDF with varargs

UDFs don’t support varargs* but you can pass an arbitrary number of columns wrapped using an array function: import org.apache.spark.sql.functions.{udf, array, lit} val myConcatFunc = (xs: Seq[Any], sep: String) => xs.filter(_ != null).mkString(sep) val myConcat = udf(myConcatFunc) An example usage: val df = sc.parallelize(Seq( (null, “a”, “b”, “c”), (“d”, null, null, “e”) )).toDF(“x1”, “x2”, “x3”, … Read more

Apache Spark, add an “CASE WHEN … ELSE …” calculated column to an existing DataFrame

In the upcoming SPARK 1.4.0 release (should be released in the next couple of days). You can use the when/otherwise syntax: // Create the dataframe val df = Seq(“Red”, “Green”, “Blue”).map(Tuple1.apply).toDF(“color”) // Use when/otherwise syntax val df1 = df.withColumn(“Green_Ind”, when($”color” === “Green”, 1).otherwise(0)) If you are using SPARK 1.3.0 you can chose to use a … Read more

How to compare two dataframe and print columns that are different in scala

From the scenario that is described in the above question, it looks like that difference has to be found between columns and not rows. So, to do that we need to apply selective difference here, which will provide us the columns that have different values, along with the values. Now, to apply selective difference we … Read more

Implementing yield (yield return) using Scala continuations

Before we introduce continuations we need to build some infrastructure. Below is a trampoline that operates on Iteration objects. An iteration is a computation that can either Yield a new value or it can be Done. sealed trait Iteration[+R] case class Yield[+R](result: R, next: () => Iteration[R]) extends Iteration[R] case object Done extends Iteration[Nothing] def … Read more

Left Anti join in Spark?

You can use the “left anti” join type – either with DataFrame API or with SQL (DataFrame API supports everything that SQL supports, including any join condition you need): DataFrame API: df.as(“table1”).join( df2.as(“table2″), $”table1.name” === $”table2.name” && $”table1.age” === $”table2.howold”, “leftanti” ) SQL: sqlContext.sql( “””SELECT table1.* FROM table1 | LEFT ANTI JOIN table2 | ON … Read more

Accessing the underlying ActorRef of an akka stream Source created by Source.actorRef

You need a Flow: import akka.stream.OverflowStrategy.fail import akka.stream.scaladsl.Source import akka.stream.scaladsl.{Sink, Flow} case class Weather(zip : String, temp : Double, raining : Boolean) val weatherSource = Source.actorRef[Weather](Int.MaxValue, fail) val sunnySource = weatherSource.filter(!_.raining) val ref = Flow[Weather] .to(Sink.ignore) .runWith(sunnySource) ref ! Weather(“02139”, 32.0, true) Remember this is all experimental and may change!