User Tools

Site Tools


Sidebar

Navigation

Join us by

user mailing list
devel mailing list


More news...

RSS

tutorial:deployment

Deployment

BlobSeer is designed to be used in a large scale environment. A common deployment is performed in a local cluster, however, for a test environment the local machine is more than enough.

How to deploy on localhost

Deployment on the local machine implies running all components of BlobSeer on localhost. You need to launch at a very minimum a version manager, a provider manager, a provider and a dht service provider. Each of them reads a configuration file supplied as a first parameter. Here's a sample configuration file:

# Version manager configuration
vmanager: {
    # The host name of the version manager
    host = "localhost";
    # The name of the service (tcp port number) to listen to
    service = "2222";
};
 
# Provider manager configuration
pmanager: {
    host = "localhost";
    service = "1111";
};
 
# Provider configuration
provider: {
    service = "1235";
    # Maximal number of pages to be cached (make sure slots * expected_page_size fits in RAM)
    cacheslots = 1024;
    # Update rate: when reaching this number of updates report to provider manager
    urate = 100;
    # Use the specified Berkeley DB database to store the pages
    dbname = "/tmp/blobseer/provider/db/provider.db";
    # No persistency: just store in RAM
    #dbname = "";
    # Total space available to store pages, in MB (64GB here)    
    space = 65536;
    # How often (in secs) to sync stored pages
    sync = 10;
    # Activate de-duplication engine?
    deduplication = false;
};
 
# Built in DHT service configuration
sdht: {
    # Maximal number of metadata entries to be cached
    cacheslots = 1000000;
    # Use the specified Berkeley DB database to store the metadata entries
    dbname = "/tmp/blobseer/sdht/db/sdht.db";
    # No persistency: just store in RAM
    #dbname = "";
    # Total space available to store metadata entries values, in MB (1GB here)
    space = 1024;
    # How often (in secs) to sync metadata entries
    sync = 10;
};
 
# Client side DHT access interface configuration
dht: {
    # The service name of the DHT service (currently tcp port number the provider listens to)
    service = "1234";
    # List of machines running the builtin dht (sdht)
    gateways = (
        "localhost"
    );
    # How many times to replicate metadata on different providers for fault-tolerance
    replication = 1;
    # How many seconds to wait for response
    timeout = 10;
    # How big the client's cache for dht entries is
    cachesize = 1048576;
};

A template configuration file is provided in the scripts directory: blobseer-template.cfg. In the same directory, there are two predefined scripts that automate the process of deploying and terminating a BlobSeer instance on localhost: (on g5k, to run it without password one have to copy in /home/USER/.ssh private/public/authorised_key which are used to connect to the node without password)

local-deploy.sh
local-kill.sh

For a manual deployment, build a configuration file from the template configuration file or the sample configuration file provided above, then run each of the required processes:

$INSTALL_DIR/bin/vmanager localhost-config-file.cfg
$INSTALL_DIR/bin/pmanager localhost-config-file.cfg
$INSTALL_DIR/bin/provider localhost-config-file.cfg
$INSTALL_DIR/bin/sdht localhost-config-file.cfg

How to deploy in a cluster

To deploy BlobSeer in a cluster, use scripts/blobseer-deploy.py. This script assumes the machine where the deployment is performed from has ssh access to all machines involved in the deployment.

Step 1: Adjust the configuration template

The script uses a template, blobseer-template.cfg to generate the full configuration file described in the previous section by replacing three variables ${vmanager}, ${pmanager} and ${gateways} with the address of the version manager, provider manager and the list of addresses of the metadata providers. The user needs to adjust all BlobSeer configuration options in this file before deployment.

Step 2: Specify the machines involved in the deployment

Create two text files, dht.txt that holds the IP addresses of all machines that will run a metadata provider and provider.txt that holds the addresses of the data providers (one address per line). Note that an address can appear in both files, meaning you can co-deploy a metadata provider with a data provider. Reserve one machine for the version manager and one machine for the provider manager.

Step 3: Invoke the script

In order to deploy BlobSeer, invoke the script as follows:

 blobseer-deploy.py -v <vmgr_address> -m <pmgr_address> -d dht.txt -p providers.txt --launch

To check the status of the deployment, use:

 blobseer-deploy.py -v <vmgr_address> -m <pmgr_address> -d dht.txt -p providers.txt --status

No output means everything is ok, otherwise unresponsive BlobSeer processes are listed. To un-deploy BlobSeer, use:

 blobseer-deploy.py -v <vmgr_address> -m <pmgr_address> -d dht.txt -p providers.txt --kill

Again, no output means ok, otherwise processes that could not be killed are listed.

How to deploy on Grid5000

To deploy BlobSeer on Grid5000, we provide two scripts bs-single-clustest.sh and bs-multi-clustest.sh. Another script bs-cleanup.sh is used to clean up the deployment. They are temporarily available here, and will be integrated into the next release. The choice of the scripts depends on the type of job the user has reserved on Grid5000. If the job is reserved by oarsub on a single cluster, please use bs-single-clustest.sh. Otherwise, the job should reserved by oargridsub across multiple clusters, please use bs-multi-clustest.sh. Both of them have their merits and flaws.

Please note: this tutorial is given based on the fact that BlobSeer has been successfully build on the user's account of Grid5000. Moreover, it is assume that the user has already reserved a Grid5000 job on at least three nodes, one for version manager, one for provider manager, and the other for meta-data and data provider.

Single Cluster Job

When a user reserve a job on a single cluster he should define the type -t allow_classic_ssh to ease the communication among components of BlobSeer. The name of the job should be defined by the option -n, for example -n “BlobSeer”. This option allows the user to forget the Grid5000 $JOB_ID since it is somehow annoying to remember a different ID for each job.

Before launch bs-single-clustest.sh, please create a small script to export environment variables that are obligatory for the deployment of BlobSeer. The user can decide the name of this script. In this tutorial, we call it env.sh, and it contains:

#!/bin/bash
export BLOBSEER_HOME=your_BlobSeer_home_directory
export LD_LIBRARY_PATH=your_libraries_home_directory

Now we are ready to launch the deployment. The program bs-single-clustest.sh offers various options of the deployment. The user can retrieve the detail of each option by enter the -h option:

$./bs-single-clustest.sh -h

Check input parameters

Usage of input parameters:
e, --env-file 			        : File set environment variables (*obligatory*).
n, --job-name			        : Name of the Grid5000 job (*obligatory*).
m, --mnumber	      		        : Number of meta-data storage providers.Usage of input parameters:
p, --pnumber			        : Number of storage providers.
t, --test  			        : If automatic test is required, set this option to true.
c, --cleanup			        : If automatic clean up is required, set this option to true.
h, -?, --help                 	        : Display this help message.

Two of the parameters are mandatory, they are -e path to the script exports the environment variables and -n name of the Grid5000 job. Then the user can set the number of meta-data storage providers and storage providers using -m and -p. If they are not defined, or defined with a number less than one, the program will set each of them to one by default. The automatic test and clean up options are set to false by default. If they are change to “true”, the program will automatically execute basic test of BlobSeer, and clean it up at the end of the test. To deploy, just enter:

$./bs-single-clustest.sh -e env.sh -n BlobSeer-14 -m 2 -p 3 -t true -c true

To clean up BlobSeer manually if -c option is not set to true, the user can use bs-cleanup.sh as:

$./bs-cleanup.sh -e env.sh

Multi Cluster Job

The type -t allow_classic_ssh is also necessary for the reservation of multi-cluster job. The advantage of a multi-cluster job is that the number of nodes reserved will not be limited by the number of nodes in a cluster. However, the job name option is not available this time, so the user must remember the assigned JOB_ID, and export it as an environment variable:

export BLOBSEER_HOME=your_BlobSeer_home_directory
export LD_LIBRARY_PATH=your_libraries_home_directory
export JOB_ID=your_job_id

The options for bs-multi-clustest.sh are the same of those for bs-single-clustest.sh, except the missing of option -n. So the command for deployment is similar:

$./bs-single-clustest.sh -e env.sh -n BlobSeer-14 -m 2 -p 3 -c -t

To clean up, use the same bs-cleanup.sh

Some tips

For Mac users

On most Macs localhost does not get resolved to 127.0.0.1. Using the latter might be necessary.

Should you have installed dependencies in a directory that is not part of the libary path, the corresponding Mac OS X environment variable must be set. for example:

  export DYLD_LIBRARY_PATH=~/deploy/lib

Done

BlobSeer is now ready to process client requests. Let's test the deployment.

tutorial/deployment.txt · Last modified: 2014/12/17 09:29 (external edit)