Download large data for Hadoop [closed]

I would suggest you downloading million songs Dataset from the following website:

http://labrosa.ee.columbia.edu/millionsong/

The best thing with Millions Songs Dataset is that you can download 1GB (about 10000 songs), 10GB, 50GB or about 300GB dataset to your Hadoop cluster and do whatever test you would want. I love using it and learn a lot using this data set.

To start with you can download dataset start with any one letter from A-Z, which will be range from 1GB to 20GB.. you can also use Infochimp site:

http://www.infochimps.com/collections/million-songs

In one of my following blog I showed how to download 1GB dataset and run Pig scripts:

http://blogs.msdn.com/b/avkashchauhan/archive/2012/04/12/processing-million-songs-dataset-with-pig-scripts-on-apache-hadoop-on-windows-azure.aspx

Leave a Comment