User Tools

Site Tools



Join us by

user mailing list
devel mailing list

More news...



Virtual machine image storage and deployment

BlobSeer includes a FUSE module that is specifically written to extend its functionality with dedicated support for virtual machine image storage and deployment in IaaS clouds. In short, this FUSE module exposes a BLOB (in which the VM image was previously stored) as a locally modifiable copy-on-write snapshot using the standard POSIX access interface. From the viewpoint of the hypervisor, the BLOB acts as if it were an independent, fully locally available raw virtual disk image. To maintain this illusion, the FUSE module gradually fetches and caches locally the BLOB contents on-demand during reads, while writing all modifications locally. This works in the same way as if sharing a backing file through a parallel file system and then creating a qcow2 image locally, however in a fully transparent fashion: there is no need to create or manage any qcow2 files explicitly. Besides the ease of management, using this approach can provide much better performance (especially when deploying the same image on a large number of nodes, thanks to optimizations such as adaptive prefetching) as well as several features not available in standard hypervisors, such as live snapshotting (i.e. taking a virtual disk snapshot without interrupting the VM) and high performance incremental block storage migration.

How to use

This tutorial focuses on the features presented above. It is assumed that a working BlobSeer deployment is already available. How to achieve this is described here.

Uploading the backing file to BlobSeer

Upload the backing file as a BLOB. The backing file must be in raw format. You can do this using the test program file_uploader that comes with BlobSeer (located in the test directory). Chunk sizes are given in powers of two (e.g. 20 corresponds to 1MB):

./file_uploader <raw_image> <blobseer_cfg_file> <max_chunk_size> <replication factor>

Launching the FUSE module

Launch the FUSE module. You need to be root to do this. Also, by default, only root can access the mount point. How to enable access to other users as well is beyond the scope of this tutorial and is available in the FUSE documentation. The FUSE module has only one specific option: -C <blobseer_cfg_file>, which must hold the configuration of the BlobSeer deployment. All other valid FUSE options can be used as well. You can always use the –help option for more details. For example, to mount the BLOBs in /mnt/blobs, use:

./blob-fuse -C <blobseer_cfg_file> -o big_writes

All BLOBs are exposed as regular files using the following naming convention: blob-id/blob-version. To test this, you can ls the contents of the mount point. It should have a single BLOB: blob-1/version-1.

Live snapshotting

At any moment during the VMs execution, you can take a live snapshot using two special ioctl calls: CLONE and COMMIT. The opcodes for CLONE and COMMIT are defined in blob-fuse/blob_ioctl.hpp and are also listed by blob-fuse –help.

CLONE is used only when multiple VMs share the same BLOB in order to switch to a private BLOB where the local modifications can be persisted, for the purpose of avoiding conflicts. It is necessary to call it only once and it returns the id of the new BLOB.

You don't have to CLONE if the backing file serves a single VM instance or if you don't plan to COMMIT. However, CLONE is a very cheap operation and it makes BLOB management much easier.

In order to trigger the live snapshotting itself, COMMIT is used. It returns the version of the BLOB where the snapshot will be found. The blob-id, blob-version pair can be later used to access the snapshot from any node that has access to BlobSeer.

An example of how to do this in Python:

fd = open("blob-id/blob-version")
blob_id = fcntl.ioctl(fd, CLONE)
blob_version = fcntl.ioctl(fd, COMMIT)

The snapshot will be persisted in the background while the VM is running. There is currently no notification of when this process has finished, however a message is displayed by the FUSE module and can be parsed if necessary. If the previous snapshot did not finish when you call COMMIT, it will block and wait for it to finish.

Incremental block migration

In order to perform a live migration, you need to set up a destination node first. This basically involves two steps: (1) mount the BLOBs using the FUSE module; (2) initialize the hypervisor for migration. Step two depends on the hypervisor. For example, with QEmu/KVM this can be done like this:

kvm .... (network config, etc) .... -incoming tcp:0:<kvm_migr_port> /blob-id/blob-version

Both the source and the destination must share the same BLOB. You can't CLONE or COMMIT while the live migration is in progress.

Incremental block migration is initiated using the MIGRATE ioctl (see previous section on how to obtain the opcode). This ioctl must be called on the source before initiating the hypervisor-driven live migration to the destination. It takes as a parameter a fixed-sized string of 64 bytes in the form <destination_ip:fuse_migr_port>, where port refers to the migration port used by the FUSE module and is configured in the BlobSeer configuration file. This port must be different from the one used by the hypervisor for migration.

Here's an example of how to do this in Python:

fd = open("blob-id/blob-version")
fcntl.ioctl(fd, MIGRATE, (socket.gethostbyname("localhost") + ":" + fuse_migr_port).ljust(64))

Due to limitations of the ioctl syscall, you must supply a string of exactly 64 bytes (padded with zeros or spaces), otherwise MIGRATE will fail.

Once the block migration has been initiated, you can use the live migration feature of the hypervisor to complete the migration. In KVM/QEmu this is achieved by using the monitor:

migrate tcp:<destination_ip:kvm_migr_port>

The live migration performed by the hypervisor should include only memory and device state. No not use any block migration feature (incremental or not) that comes with the hypervisor.

There is currently no notification mechanism to know when the migration has completed. However this information is logged by the FUSE module and can be parsed if necessary.

tutorial/vm.txt · Last modified: 2014/12/17 09:29 (external edit)