Set hadoop system user for client embedded in Java webapp

Finally I stumbled on the constant static final String HADOOP_USER_NAME = “HADOOP_USER_NAME”;` in the UserGroupInformation class. Setting this either as an environment variable, as a Java system property on startup (using -D) or programmatically with System.setProperty(“HADOOP_USER_NAME”, “hduser”); makes Hadoop use whatever username you want for connecting to the remote Hadoop cluster.

Spark spark-submit –jars arguments wants comma list, how to declare a directory of jars?

In this way it worked easily.. instead of specifying each jar with version separately.. #!/bin/sh # build all other dependent jars in OTHER_JARS JARS=`find ../lib -name ‘*.jar’` OTHER_JARS=”” for eachjarinlib in $JARS ; do if [ “$eachjarinlib” != “APPLICATIONJARTOBEADDEDSEPERATELY.JAR” ]; then OTHER_JARS=$eachjarinlib,$OTHER_JARS fi done echo —final list of jars are : $OTHER_JARS echo $CLASSPATH spark-submit … Read more

MPI: blocking vs non-blocking

Blocking communication is done using MPI_Send() and MPI_Recv(). These functions do not return (i.e., they block) until the communication is finished. Simplifying somewhat, this means that the buffer passed to MPI_Send() can be reused, either because MPI saved it somewhere, or because it has been received by the destination. Similarly, MPI_Recv() returns when the receive … Read more

Scaling solutions for MySQL (Replication, Clustering)

I’ve been doing A LOT of reading on the available options. I also got my hands on High Performance MySQL 2nd edition, which I highly recommend. This is what I’ve managed to piece together: Clustering Clustering in the general sense is distributing load across many servers that appear to an outside application as one server. … Read more