User Tools

Site Tools


Sidebar

Navigation

Join us by

user mailing list
devel mailing list


More news...

RSS

tutorial:play_with_single_test

Play with single test

We provide two kinds of test to compare the performance of BSFS and HDFS including application tests and benchmark tests.

Benchmark Tests

The goal of the microbenchmarks is to evaluate the throughput achieved by BSFS and HDFS when multiple, concurrent clients access the file systems, under the following four scenarios:

  1. Write different files
  2. Read same file
  3. Read different files
  4. Append data to same file

The benchmark test is launched by the script benchmark-test.sh. Again, the -h option shows the parameters needed to run the test.

~/BlobSeer-Demo/demo-scripts/benchmark-test.sh -h
-f, --file-system		: Set it to 'HDFS' to deploy Hadoop File System. Default setting is BSFS.
-m, --m-number			: Number of meta-data storage providers.
-p, --p-number			: Number of storage providers.
-s, --s-number			: Number of Hadoop slaves.
-a, --nmanager			: Specify the Namespace Manager.
				  (-a 5 means 5th node in available nodes list, default is the first node)
-t, --test-type			: Type of benchmark should be run: Read/Write/Append (*obligatory*).
   				  Currently supported test type: 
					- wdf (Write different files)
					- rsf (Read same file, only for BSFS)
					- rdf (Read different files)
					- asf (Append to same file, only for HDFS)
-c, --concurrency		: Number of nodes concurrently run the test.
-d, --data-size		        : Total data size for Read (different files), Write and Append test in power of 2. Default value is 30 (1GB).
-b, --block-size		: Data Block size for Read same file test in power of 2. Default value is 26 (64MB).
-h, -?, --help			: Display this help message.

The concurrency represents the number of nodes that concurrently execute one of the four types of test. For example, if the user wants to run the Read same file test over BSFS with 3 nodes and chunk size as 64MB, the command is:

~/BlobSeer-Demo/demo-scripts/benchmark-test.sh -f BSFS -m 1 -p 3 -s 3 -a 1 -t rsf -c 3 -b 26

In this example, the number of meta-data provider is set to 1, data provider is set to 3, hadoop slave is set to 3, the namespace manager of BSFS is the first available node on Grid5000.

Here the equivalent scheme:  benchmark-deployment-over-5-nodes

And here is the sequency diagram of benchmark-test to explain each step in some words:

+++++ Dispatch jobs to nodes 
Version manager is:
parapide-4.rennes.grid5000.fr

Provider manager is:
parapide-5.rennes.grid5000.fr

Meta data storage providers are:
parapide-6.rennes.grid5000.fr

Data storage providers are:
parapide-6.rennes.grid5000.fr
parapide-7.rennes.grid5000.fr
parapide-9.rennes.grid5000.fr

After enter the command, the program exeuctes the following steps:

The required file system is BSFS.
Meta-data storage provider number is set to 1.
Data storage provider number is set to 3.
Hadoop slave number is set to 3.
Namespace Manager is the 1th node.
Required test is rsf.
Concurrency is set to 3.
Block size size is set to 26.

***** Check input parameters *****
File system set to BSFS.
Required test is Read same file.

----------------------------
----- Start deployment -----
----------------------------

Data storage provider number is set to 2
Meta-data storage provider number is set to 1
Hadoop slave number is set to 2
Namespace Manager is the 1th node.
---------------------------------
----- Start deploy BlobSeer -----
---------------------------------
...
...
-------------------------------
----- Start deploy Hadoop -----
-------------------------------
...
...
--------------------------
----- Log management -----
--------------------------
...
...
--------------------------------
----- Start benchmark test -----
--------------------------------
...
...
-----------------------------------
----- Clean up the deployment -----
-----------------------------------
...
...  
------------------------
----- Parse result -----
------------------------

Job start time is 4949129080771.
Job complete time is 4949327476029.

***** RESULT: Through put is 528 MB/s. *****

Test finished.

Then, the logs, output and the result of the test can be found in the directory ~/BlobSeer-Demo/demo-scripts/logs. If the user run several tests, all the logging information are kept in ~/BlobSeer-Demo/demo-scripts/logs/histories.

Application Tests

Application tests investigate the performance of Hadoop-BSFS/HDFS by running Hadoop applications as wordcount and sort. Different from benchmark tests, in application test the demo count the execution time of the application job instead of throughput. Moreover, the concurrency means the number of Mappers that concurrently work for one job.

~/BlobSeer-Demo/demo-scripts/application-test.sh -h
 
-f, --file-system:
    Set it to 'HDFS' to deploy Hadoop File System. Default setting is BSFS.
-m, --m-number:
    Number of meta-data storage providers.
-p, --p-number:
    Number of storage providers.
-s, --s-number:
    Number of Hadoop slaves.
-a, --nmanager:
    Specify the Namespace Manager. (-a 5 means 5th node in available nodes list, default is the first node)
-t, --test-type:
    Type of application should be run (*obligatory*). Currently supported test type: 
	  - wordcount
	  - sort
	  - FASTA
-c, --concurrency:
     Number of concurrent Map tasks (between 2 and 128). The default value is 2. Max 16 for FASTA.
-h, -?, --help:
    Display this help message.

To run the sort application over HDFS with 99 mappers, simply type:

~/BlobSeer-Demo/demo-scripts/application-test.sh -f HDFS -s 3 -t sort -c 99

Here is the sequency diagram of application-test to explain each step in some words:

The process is almost the same as the benchmark test except the parsed result is the execution time:

The required file system is HDFS.
Hadoop slave number is set to 2.
Required application is sort.
Number of mapper is set to 99.

***** Check input parameters *****
File system set to HDFS.

***** Set FS block size according to the concurrency. *****
Block size is 4096
 
 
----------------------------
----- Start deployment -----
----------------------------

Slave number is set to 2
File system block size is set to 4096
...
...
---------------------------------
----- Start Map Reduce Task -----
---------------------------------

sort test running on HDFS.

...
...
------------------------
----- Parse result -----
------------------------

Job start time is 1381088712
Job complete time is 1381088872

***** RESULT: Execution duration is 160 second. *****

Test finished.

If everything is OK…please go to next step Visualize results with chained tests

tutorial/play_with_single_test.txt · Last modified: 2014/12/17 09:29 (external edit)