User Tools

Site Tools


Sidebar

Navigation

Join us by

user mailing list
devel mailing list


More news...

RSS

tutorial:terabenchmark

Hadoop Tera benchmark suite

Teragen

Command line

./hadoop jar ../hadoop-examples-1.2.1.jar teragen \
    -Ddfs.block.size=536870912 \
    -Dmapred.map.tasks=32 \
    -Dmapred.reduce.tasks=16 \
    -Dmapred.map.tasks.speculative.execution=true \
    -Dmapred.compress.map.output=true \
    1000000 \
    /Workloads/teragen.data

parameters

dfs.block.size=536870912The size of blocks in Hadoop (here 512MB)
mapred.map.tasks=32The number of map tasks per job (size of mapper, each one will generate 512MB)
mapred.reduce.tasks=16The number of reduce tasks per job
mapred.map.tasks.speculative.execution=trueMultiple instances of some map tasks may be executed in parallel
mapred.compress.map.output=trueShould the outputs of the maps be compressed before being sent across the network (uses SequenceFile compression)
1000000Size of generated files in 100-byte chuncks
/Workloads/teragen.dataPath of file to write

Terasort

Command line

./hadoop jar ../hadoop-examples-1.2.1.jar terasort \
    -Ddfs.block.size=536870912 \
    -Dio.file.buffer.size=32768 \
    -Dmapred.map.tasks=32 \
    -Dmapred.reduce.tasks=16 \
    -Dio.sort.factor=48 \
    -Dio.sort.record.percent=0.138 \
    /Workloads/teragen.data \
    /Workloads/terasort.data

parameters

dfs.block.size=536870912The size of blocks in Hadoop (here 512MB)
mapred.map.tasks=32The number of map tasks per job (size of mapper, each one will generate 512MB)
mapred.reduce.tasks=16The number of reduce tasks per job
io.file.buffer.size=32768The size of buffer for use in sequence files; it determines how much data is buffered during read and write operations
io.sort.factor=48The number of streams to merge at once while sorting files; this determines the number of open file handles
io.sort.record.percent=0.138The percentage of io.sort.mb dedicated to tracking record boundaries
1000000Size of generated file multiplied in 100-byte chuncks
/Workloads/teragen.dataPath of file to read
/Workloads/terasort.dataPath of file to write

Teravalidate

Command line

./hadoop jar ../hadoop-examples-1.2.1.jar teravalidate \
    /Workloads/terasort.data \
    /Workloads/teravalidate.data

parameters

/Workloads/terasort.dataPath of file to read
/Workloads/teravalidate.dataPath of file to write
tutorial/terabenchmark.txt · Last modified: 2015/01/09 15:14 by bouge