Create Table in Hive with one file

There are many possible solutions: 1) Add distribute by partition key at the end of your query. Maybe there are many partitions per reducer and each reducer creates files for each partition. This may reduce the number of files and memory consumption as well. hive.exec.reducers.bytes.per.reducer setting will define how much data each reducer will process. … Read more

HIVE select count(*) non null returns higher value than select count(*)

Most probably your query without where is using statistics because of this parameter is set: set hive.compute.query.using.stats=true; Try to set it false and execute again. Alternatively you can compute statistics on the table. See ANALYZE TABLE SYNTAX Also it’s possible to gather statistics during INSERT OVERWRITE automatically: set hive.stats.autogather=true;