From GridPP Wiki
Jump to: navigation, search

This model was originally presented in Andrew's LHCb clouds/VMs talk at GridPP30 in Glasgow. This page has more details of the vac implementation and testbed work at Manchester with LHCb.

The basic idea

The "vacuum" model is a way of operating computing nodes at a site using virtual machines, in which the virtual machines are created and contextualised for virtual organisations (VOs) by each physical machine of the site. For the VO, these virtual machines appear to be produced spontaneously "in the vacuum" rather than in response to requests by the VO. This model takes advantage of the pilot job frameworks adopted by many VOs, in which pilot jobs submitted via the grid infrastructure in turn start job agents which fetch the real jobs from the VO's central task queue. In the vacuum model, the contextualisation process starts a job agent within the virtual machine and real jobs are fetched from the central task queue as normal. This is similar to ongoing cloud work where the job agents are also run inside virtual machines.

The vac implementation

Andrew McNab is working on an implementation of this scheme, vac, in which a VM factory runs on each physical worker node to create and contextualise virtual machines. With this system, each node's VM factory can decide which VO's virtual machines to run, based on site-wide target shares and on a peer-to-peer protocol in which the site's VM factories query each other to discover which virtual machine types they are running, and therefore identify which virtual organisations' virtual machines should be started as nodes become available again. A property of this system is that there is no gate keeper service, head node, or batch system accepting and then directing jobs to particular worker nodes, avoiding several central points of failure.

The peer-to-peer protocol uses UDP packets containing JSON representations of Python dictionaries, and listing what types of VM are currently running on that factory machine. Each factory machine has quasi static configuration files, which list the other factory hosts in the same vac space at the site, information about the VM types to run (including contextualisation hooks and VM images to use), target shares for each VM type (roughly, each experiment), and details of the VM hosts assigned to that factory machine by the site. It is assumed the site will create static DNS/DHCP entries for the VMs, and will push out updates to the factory machines' configuration files using its normal procedure (puppet, cfengine etc). Included in this is the overall target shares list, on the assumption that it will change on the timescale of days rather than minutes. The transient instances of the VM machines themselves are created by vac from scratch each time, including making a copy-on-write copy of the disk image, the CD-ROM image currently used for contextualisation of standard CERNVM images, and the NFS exported directories used to provide the machine features and jobfeatures directories and values described in the HEPiX VM working group's protocol.

Testing with LHCb

Andrew has been testing this using jobs from the central LHCb development task queue, using the same contextualisation procedure that has already been developed for running LHCb virtual machines in Clouds. (These are visible to LHCb users in the Development grid, running at So far he has used dummy lhcb1 and lhcb2 architectures to test the target shares mechanism, and is now extending this work to a small testbed of a dozen machines at Manchester.

The LHCb JobAgent has been patched to implement the extended HEPiX shutdown command protocol, which allows the JobAgent to shut the VM down from inside, and to communicate to the factory machine why this has been done.