How to calculate the size of dataframe in bytes in Spark?

Usingspark.sessionState.executePlan(df.queryExecution.logical).optimizedPlan.stats(spark.sessionState.conf).sizeInBytes we can get the size of actual Dataframe once its loaded into memory. Check the below code. scala> val df = spark.read.format(“orc”).load(“/tmp/srinivas/”) df: org.apache.spark.sql.DataFrame = [channelGrouping: string, clientId: string … 75 more fields] scala> import org.apache.commons.io.FileUtils import org.apache.commons.io.FileUtils scala> val bytes = spark.sessionState.executePlan(df.queryExecution.logical).optimizedPlan.stats(spark.sessionState.conf).sizeInBytes bytes: BigInt = 763275709 scala> FileUtils.byteCountToDisplaySize(bytes.toLong) res5: String = 727 MB … Read more

size of array in c

C arrays don’t store their own sizes anywhere, so sizeof only works the way you expect if the size is known at compile time. malloc() is treated by the compiler as any other function, so sizeof can’t tell that arr points to the first element of an array, let alone how big it is. If … Read more

Determine the size of an InputStream

This is a REALLY old thread, but it was still the first thing to pop up when I googled the issue. So I just wanted to add this: InputStream inputStream = conn.getInputStream(); int length = inputStream.available(); Worked for me. And MUCH simpler than the other answers here. Warning This solution does not provide reliable results … Read more