I want to do an experiment on join algorithms in Hadoop MapReduce, so one factor is the number of records used.
Could you please refer me to a free usage dataset that can be used with Joining Algorithms and having large and variable no of records (from .5 to 500) million of records