Hive Explode / Lateral View multiple arrays

I found a very good solution to this problem without using any UDF, posexplode is a very good solution : SELECT COOKIE , ePRODUCT_ID, eCAT_ID, eQTY FROM TABLE LATERAL VIEW posexplode(PRODUCT_ID) ePRODUCT_IDAS seqp, ePRODUCT_ID LATERAL VIEW posexplode(CAT_ID) eCAT_ID AS seqc, eCAT_ID LATERAL VIEW posexplode(QTY) eQTY AS seqq, eDateReported WHERE seqp = seqc AND seqc = … Read more

Create Table in Hive with one file

There are many possible solutions: 1) Add distribute by partition key at the end of your query. Maybe there are many partitions per reducer and each reducer creates files for each partition. This may reduce the number of files and memory consumption as well. hive.exec.reducers.bytes.per.reducer setting will define how much data each reducer will process. … Read more

How to set variables in HIVE scripts

You need to use the special hiveconf for variable substitution. e.g. hive> set CURRENT_DATE=’2012-09-16′; hive> select * from foo where day >= ${hiveconf:CURRENT_DATE} similarly, you could pass on command line: % hive -hiveconf CURRENT_DATE=’2012-09-16′ -f test.hql Note that there are env and system variables as well, so you can reference ${env:USER} for example. To see … Read more