To generate files:
time hadoop jar /opt/hadoop/hadoop-1.2.1/hadoop-examples-1.2.1.jar randomtextwriter -Ddfs.block.size=1073741824 -Dtest.randomtextwrite.bytes_per_map=19478485 -Dtest.randomtextwrite.maps_per_host=16 /Workloads/grep/datagrep_1
To grep “toto” inside:
time hadoop jar /opt/hadoop/hadoop-1.2.1/hadoop-examples-1.2.1.jar grep /Workloads/grep/datagrep11 /Workloads/grep/datagrep-out "toto"
To generate files:
time hadoop jar /opt/hadoop/hadoop-1.2.1/hadoop-examples-1.2.1.jar teragen -Ddfs.block.size=536870912 -Dmapred.map.tasks=32 -Dmapred.reduce.tasks=16 -Dmapred.map.tasks.speculative.execution=true -Dmapred.compress.map.output=true 10000000 /Workloads/teragen/ter_in.4967
To sort:
time hadoop jar /opt/hadoop/hadoop-1.2.1/hadoop-examples-1.2.1.jar terasort -Ddfs.block.size=536870912 -Dio.file.buffer.size=524288 -Dmapred.map.tasks=32 -Dmapred.reduce.tasks=16 -Dio.sort.factor=48 -Dio.sort.mb=650 -Dio.sort.record.percent=0.138 -Dio.sort.spill.percent=1.0 /Workloads/teragen/ter_in.4967 /Workloads/terasort/ter_out.4967