User Tools

Site Tools


Sidebar

Navigation

Join us by

user mailing list
devel mailing list


More news...

RSS

tutorial:visualize_results_with_chained_tests

Visualize results with chained tests

Chained tests

The script chain-test.sh automatize the execution of a set of tests with incremental concurrency. Again, the -h option is used to show the help message:

Usage of input parameters:
-t, --type:
    Type of benchmark should be run. Currently supported test types: 
    - app-wordcount
    - app-sort
    - app-FASTA
    - bench-wdf (Write different files)
    - bench-rsf (Read same file)
    - bench-rdf (Read different files)
    - bench-asf (Append to same file)
-m, --mnumber:
    Number of meta-data storage providers.
-p, --pnumber:
    Number of storage providers.
-s, --snumber:
    Number of Hadoop slaves.
-c, --concurrency:
    Maximum number of nodes concurrently run the test for benchmark tests.
    Maximum number of map tasks in power of 2 for application tests. Valid value 0:7.
-d, --data-size:
    Total data size for Read (different files), Write and Append test in power of 2. Default value is 30 (1GB).
-b, --block-size:
    Data Block size for Read same file test in power of 2. Default value is 26 (64MB).
-g, --granu
    Augmentation granularity in term of concurrency.
-r, --run:
    Number of repeat times for each test.
-h, -?, --help:
    Display this help message.

It is worth to be reminded that in chained tests, the input parameter concurrency is not the concurrency of a single test, but the maximum concurrency value of a set of tests. For application tests, the number of map tasks is ranged from 1 to 128. For benchmark tests, the number of nodes is ranged from 2 to the number of reserved resources. Then, the -g option is used to set the augmentation value of concurrency between two consecutive rounds. To eliminate the random effect, the user may set the number of runs by -r option in each round.

To launch a set of wordcount tests with maximum 64 map tasks, the number of concurrency increasing 2 power x between two rounds, and 3 runs in each round, the user can type:

./chain-test.sh -t app-wordcount -m 2 -p 6 -s 6 -c 6 -g 2 -r 3

Required test is app-wordcount.
Meta-data storage provider number is set to 2
Data storage provider number is set to 6
Hadoop slave number is set to 6
Concurrency is set to 6.
The augmentation granularity is 2
Number of runs for each test is 3

***** Check input parameters *****
Required test is wordcount application.


***** Chain test log management *****
Logging directory: /home/zhli/BlobSeer-Demo/logs
There are 1 existing log file(s).
There are 0 existing output file(s).
There are 0 existing result file(s).
There are 1 existing chain test result directory(ies).

***** Start evaluation *****
Test will run first over BSFS, then HDFS.

Chained test running over BSFS.

***** Execute round 1 *****

...
...

***** RESULT: Execution duration is 26 second. *****

Test finished.

1 result file(s) generated.
Moved to the chain test result directory.

***** One set of test is finished. *****

All tests finished. Program terminates.

The script will first run the application over BSFS for all different settings, then over HDFS. At the same time of the execution, we are able to visualize the results.

Visualize the results

Since Grid5000 do not provide a graphical interface to display a PDF file, the visualization functionality has to be realized on the user's local machine. As we mentioned in the installation phase, if the user wants to use the visualization functionality, the Demo should be already installed on the user's local machine, and gnuplot is also correctly installed.

Before activate the function, the user should enter the ~/BlobSeer-Demo/figure-scripts directory. In this folder, the use will find three sub-directories:

  1. data directory stores the parsed results as the input of plot
  2. gnu-scripts directory keeps the gnuplot script for each plot
  3. figures directory saves all the output figures

The user can also find the script inc-figure.sh which is the main program of the function. It has two input arguments, the name of the Grid5000 user account and the site where the current chained test is running. Simply enter the command

./inc-figure.sh <username> <site>

The program start to retrieve the results from the corresponding site, and draw the figure. In the following example, a chained Read different files tests with 7 as max concurrency and 3 as incremental granularity is running on the nancy site.

./inc-figure.sh xxxxx nancy
 
***** Check input arguments. *****

Grid5000 site running the chained test is nancy.
Grid5000 user account is zhli.
***** Current active chain test is chain-7-3-2-bench-rdf-13-10-16-21-42 *****

***** Get chain test parameters *****
The maximum concurrency is 7.
The incremental granularity is 3.
The number of runs for each test.
The chained test is Read different files benchmark test.

***** Write Gnuplot script *****
Write file header.
Figure title: Read different files
xlabel: Number of nodes
ylabel: Throughput MB/s
Ouput figure file name: chain-7-3-2-bench-rdf-13-10-16-21-42.eps
y axis range: [0:500]
x axis range: [2:7]

***** Retrieve results. *****
Create data file
16 result files will be generated when the chained test is finished.
x Axis points are 2 3 6 7

Drawing figure...

...
...
1 tests have been finished.
Writing data...
...
...
All results are generated, finilize drawing figure.

At the same time, the user can open the corresponding .eps figure under the figure directory. The curves in the figure are updated in real time according to the results given by the chained test.

Another example to get figure with throughput:

  • ~/BlobSeer-Demo/demo-scripts/benchmark-test.sh -f BSFS -m 1 -p 3 -s 3 -a 1 -t rsf -c 3 -b 27
  • ~/BlobSeer-Demo/demo-scripts/chain-test.sh -f BSFS -m 1 -p 3 -s 3 -a 1 -t bench-rsf -c 3 -b 27

Here a diagram to explain how curves are generated:

Experiments

here you can find result got using different benchmarks and different parameters.

These tests has been done on cluster Rennes:

—– Grid'5000 - Rennes - frennes.rennes.grid5000.fr —–

This site has 4 clusters (see: https://api.grid5000.fr/3.0/ui/visualizations/nodes.html)

- paradent : 64 nodes (2 CPUs Intel@2.5GHz, 4 cores/CPU, 31GB RAM, 298GB DISK)

- paranoia : 8 nodes (2 CPUs Intel@2.2GHz, 10 cores/CPU, 126GB RAM, 558GB DISK)

- parapide : 25 nodes (2 CPUs Intel@2.93GHz, 4 cores/CPU, 23GB RAM, 465GB DISK)

- parapluie: 40 nodes (2 CPUs AMD@1.7GHz, 12 cores/CPU, 47GB RAM, 232GB DISK)

And cluster of sophia:

—– Grid'5000 - Sophia - fsophia.sophia.grid5000.fr —–

This site has 3 clusters (see: https://api.grid5000.fr/3.0/ui/visualizations/nodes.html)

- helios: 56 nodes (2 CPUs AMD@2.2GHz, 2 cores/CPU, 3GB RAM, 135GB DISK)

- sol : 50 nodes (2 CPUs AMD@2.6GHz, 2 cores/CPU, 3GB RAM, 232GB DISK)

- suno : 45 nodes (2 CPUs Intel@2.26GHz, 4 cores/CPU, 31GB RAM, 557GB DISK)

tutorial/visualize_results_with_chained_tests.txt · Last modified: 2014/12/17 09:29 (external edit)