BlobSeer can work as backend storage for various applications such as Hadoop MapReduce or virtual machine image storage and deployment.
MapReduce is a programming model that allow user to run a task on a lot of machine, such as a “grep” command on huge data (>To). When a problem arrives on a node (i.e. physical machine), the principle is the use two functions: map and reduce.
A good blog to complete tuto mapRed
The map cut input data in fixed size chunk and delegate its to dataNode that will run job and retrieve result for result phase.
here is the architecture of Grid5000: you can see those machines are distributed on many places in France. Note that you have to ask an account to use it.
You can do four tutorials in order to learn about grid5000, brief description of them:
To install BlobSeer, besides CMAKE, the three libraries are necessary:
A quick setup guide is given in the tutorial.
If you compilation problem about BOOST, please make sure that the version of BOOST is larger than 1.51.0.
BlobSeer can be deployed both manually or automatically.
For a manual deployment, build a configuration file from the template configuration file or the sample configuration file provided above, then run each of the required processes:
$INSTALL_DIR/bin/vmanager localhost-config-file.cfg $INSTALL_DIR/bin/pmanager localhost-config-file.cfg $INSTALL_DIR/bin/provider localhost-config-file.cfg $INSTALL_DIR/bin/sdht localhost-config-file.cfg
For automatic deployment, we provider several scripts to deploy BlobSeer on local machine, Grid5000 platform or a general cluster. More information about how to use the scripts can be found in the following tutorial.
On certain operating system, such as Fedora and Red Hat, you may meet “locale name” problems when you are launching data providers. The Boost program may throw the exception as:
terminate called after throwing an instance of 'std::runtime_error' what(): locale::facet::_S_create_c_locale name not valid
This can be fixed to set the
LC_CTYPE to empty by