Vac Site Admin Guide 0.14

Andrew McNab <Andrew.McNab AT cern.ch>

Quick start

By following this quick start recipe you can verify that your installation will work with Vac and see it creating and destroying virtual machines. You will almost certainly want to start again from scratch by following the step-by-step part of the Admin Guide so don't invest a lot of time here. If you're already familiar with VMs, you could just skip straight there but it's safest to go through the quick start to make sure the requirements are all there.

To follow the quick start, you need an x86_64 Intel or AMD machine with hardware virtualization (Intel VT-x or AMD-V) enabled in its BIOS; and the machine needs to be installed with a version of Scientific Linux 6, with libvirt installed and enabled. (In particular, the packages libvirt, libvirt-client, libvirt-python, qemu-kvm, qemu-kvm-tools, and then run "service libvirtd restart" to make sure libvirtd daemon is running.)

Install the vac RPM, and download a recent compressed kvm-style CernVM batch image from the CernVM downloads page to /tmp. Use zcat to extract the image itself to a file in /tmp. It will be a large file: more than 9GB.

Copy the file /var/lib/vac/doc/testkvm.xml to /tmp as well, and edit the "source file=" path and filename to point to your CernVM image.

At the command line, excecute

virsh create --console testkvm.xml
You should see the familiar Linux boot messages and eventually a login prompt as the virtual machine boots using the CernVM image file as its root filesystem. If all this doesn't happen, then something is wrong with your installation or hardware virtualization isn't enabled. Please check the libvirt documentation to try to identify where the problem is.

To get out of the login prompt, just press Ctrl and ] and then use the command

virsh destroy testkvm
to kill the VM. We're now ready to set up Vac itself.

To create another pristine disk image, use zcat again to make another copy in /tmp and then use cp to make a sparse copy of this image in Vac's area (you may need to change the exact version numbers in these file names):

cd /tmp
zcat cernvm-batch-node-2.6.0-4-1-x86_64.hdd.gz >cernvm-batch-node-2.6.0-4-1-x86_64.hdd
cp --sparse=always cernvm-batch-node-2.6.0-4-1-x86_64.hdd /var/lib/vac/images
We do this because zcat (and gunzip) won't create sparse files, but cp can and they are much quicker to read and copy around.

Vac uses NFS to share some directories from the factory machine to its virtual machines, and needs to have the standard NFS server installed and running. It's not necessary to configure the NFS server, as Vac uses exportfs commands to create and destroy exports dynamically. If you have any iptables rules blocking NFS you must disable them before starting vacd to avoid needing to reset iptables after libvirtd has made its additions.

Now you need to create the /etc/vac.conf configuration file. Copy /var/lib/vac/doc/example.vac.conf to /etc/vac.conf and read through its comments. There are 6 lines you need to check and probably change.

vac_space = in [settings]
Set this to vac01 in your site's domain. So if your site is .example.com then set it to vac01.example.com . A Vac space is a group of factory machines that communicate with each other, and is equivalent to a subcluster or subsite. A space's name is a fully qualified domain name (FQDN), and can be used as a virtual CE name where necessary in other systems.
names = in [factories]
Since we're creating a space that contains a single factory machine, set this to be the FQDN of the factory machine you're workng on.
total_machines
Set this to the number of VMs to create and manage on this factory. Vac will create hostnames for the VMs from the factory name. For example, factory1.example.com will lead to factory1-00.example.com, factory1-01.example.com, ...
root_image = in [vmtype example]
The path and filename given in this setting must point to the CernVM image you created for Vac in /var/lib/vac/images. Double check the version numbers are correct.
rootpublickey = in [vmtype example]
This setting is not strictly necessary but is very useful. By copying an RSA key pair to /root/.ssh on the factory machine, or creating one with ssh-keygen you will be able to ssh into the VM as root and see how it is laid out and how it is running. If you don't place a public key at the location given in this setting, you need to comment out this setting.

The files needed for the example vmtype are installed by the RPM in /var/lib/vac/vmtypes/example and with /etc/vac.conf done and the CernVM image in place you're ready to go. Just do service vacd restart to make sure vacd is running and look in the log files.

When vacd starts it forks a factory process that watches the VMs and creates or destroys them as necessary; and a responder process that replies to queries from factories about what is running on this host. These two processes have separate log files as /var/log/vacd-factory and /var/log/vacd-responder .

In its log file, you should be able to see the factory daemon trying to decide what to do and then creating the example VM which runs for 10 minutes then shuts itself down. When deciding what to do, the factory queries its own responder via UDP and this should be visible in the responder's log file.

You should also be able to see the state of the VM using the command vac scan, where vac is a command line tool that the RPM installs in /usr/sbin.

Configuration step-by-step

This part of the guide covers the same groud as the quick start guide but in a lot more detail. It's intended to help you choose how best to configure your site.

The configuration file /etc/vac.conf uses the Python ConfigParser syntax, which is similar to MS Windows INI files. The file is divided into sections, with each section name in square brackets. For example: [settings]. Each section contains a series of option=value pairs.

For convenience, two more optional files are read if they exist:

These files are named after important sections, but sections can be placed in any of the three files, or all placed in /etc/vac.conf

Xen vs kvm

We recommend that hardware virtualization (eg Intel VT-x features) with kvm is used for production. Vac also supports Xen paravirtualization which can run on old machines without hardware virtualization but this is not supported on RHEL6/SL6. (6.x RPMs are available from Steven Haigh.)

CernVM images

Vac currently requires the use of CernVM images with HEPiX contexualization based on EC2/ISO ("CD-ROM") images.

You can download a recent compressed kvm-style CernVM batch image from the CernVM downloads page. You can use gunzip or zcat to extract the image itself. It will be a large file: more than 9GB. gunzip creates non-sparse files but you can convert the image to sparse with cp:

cd /tmp
zcat cernvm-batch-node-2.6.0-4-1-x86_64.hdd.gz >cernvm-batch-node-2.6.0-4-1-x86_64.hdd
cp --sparse=always cernvm-batch-node-2.6.0-4-1-x86_64.hdd /var/lib/vac/images
Sparse files not only use less disk space and are quicker to copy, but they are also quicker to read.

The cernvm-batch-node-2.6.0-4-1-x86_64 image was distributed with a filesystem check interval of 6 months, and after 3rd April 2013 a check automatically happened when it was used as a Linux root partition. You can disable this problem using tune2fs:

losetup /dev/loop7 cernvm-batch-node-2.6.0-4-1-x86_64.hdd
kpartx -a /dev/loop7
tune2fs -i 0 -c -1 /dev/mapper/loop7p1
kpartx -d /dev/loop7 
losetup -d /dev/loop7
If loop7 is in use on your system, replace it with a different number. For different partitions layouts, you will need to adapt the use of kpartx and the partition numbers.

DNS, IP, MAC

Vac uses a private NAT network for each set of virtual machines on a given factory. Vac then creates the VM FQDNs from the factory name by adding -00, -01, ... So factory1.example.com has factory1-00.example.com, factory1-01.example.com, ... as its VMs. The total number of virtual machines on the factory is specified by total_machines. Vac assigns IP addresses starting 169.254.169.0 for VM 0, 169.254.169.1 for VM 1 etc. Unique MAC addresses are also assigned to each VM. Using libvirt NAT machinery means this network is hidden from the rest of the LAN and only visible from the factory and its VMs. libvirt configures the dnsmasq server to run dedicated DNS and DHCP servers on this private network. The factory's address in this private network is 169.254.169.254, which is the so-called Magic IP used by some Cloud systems for a local configuration service which VMs look for.

To use IP addresses in the 169.254.0.0 network, you must ensure you are using a recent version of dnsmasq. For SL6, dnsmasq-2.48-13.el6.x86_64.rpm avaiable as part of SL6 updates, is suitable.

The 169.254.0.0 network should not be configured on the factory machine before you start Vac. In particular, Zeroconf support should be disabled by adding NOZEROCONF=yes to /etc/sysconfig/network and restarting networking.

You can check the vac private network exists with the command virsh net-list which should list the vac_169.254.0.0 network (and probably the "default" network defined by libvirtd.)

Vac will print the error "Failed to create NAT network vac_169.254.0.0 (Need dnsmasq RPM >= 2.48-13? Did you disable Zeroconf? Does virbr1 already exist?)" if it cannot use libvirtd to create that network. Check dnsmasq is installed, that Zeroconf is disabled, and that virbr1 does not already exist with the commands ifconfig and brctl show. If virbr1 is already there, use ifconfig virbr1 down and brctl delbr virbr1 to remove it completely. brctl is from the RPM bridge-utils.

Logical volumes

Vac virtual machines can use logical volumes that exist on the factory machine to provide additional, faster disk space. Normally these will be mounted at /scratch by the VM, as they are on conventional grid worker nodes.

By default, the block device associated with the logical volume is available to the VM is hdb, but this can changed with the scratch_device option in a [vmtype ...] section.

For each virtual machine Vac will look for a logical volume to use with it. The global volume_group option in [settings] (default vac_volume_group) and the virtual machine's name are used to construct the logical volume paths to try. For example, /dev/vac_volume_group/factory1-01.example.com

You must create the volume group and logical volumes to be used by Vac, with something similar to this, where /dev/sda9 is a physical volume (partition):

vgcreate vac_volume_group /dev/sda9
lvcreate --name factory1-00.example.com -L 25G vac_volume_group
lvcreate --name factory1-01.example.com -L 25G vac_volume_group
lvcreate --name factory1-02.example.com -L 25G vac_volume_group
...

During the creation of each virtual machine instance, Vac will identify the logical volume it has been told to use, (re)format it, and provide it to the VM as hdb by default.

Installation: tar vs RPM

RPM is the recommended installation procedure, and RPMs are available from the Downloads directory on the Vac website.

It is possible to install Vac from a tar file, using the install Makefile target.

Configuration of the Vac space

The [settings] section must include a vac_space name, which is also used as the virtual CE name.

A separate [factories] section contains the single required option name which has a space separated list of the fully qualified domain names of all the factories in this Vac space, including this factory. The factories are queried using UDP when a factory needs to decide which vmtype to start. The Vac responder process on the factories replies to these queries with a summary of the VM and the outcome of recent attempts to run a VM of each vmtype.

For ease of management, the [factories] section can be placed in /etc/vac-factories.conf which could be automatically generated and maintained from another source, such as the site's assets database.

Setting up vmtypes

One [vmtype ...] section must exist for each vmtype in the system, with the name of the vmtype given in the section name, such as [vmtype example]. A vmtype name must only consist of lowercase letters, numbers, periods, underscores, and hyphens. The vac.conf(5) man page lists the options that can be give for each vmtype.

The [targetshares] section contains a list of vmtype=share pairs giving the desired share of the total VMs available in this space for each vmtype. The shares do not need to add up to 1.0, and if a share is not given for a vmtype, then it is set to 0. The creation of new VMs can be completely disabled by setting all shares to 0. Vac factories consult these shares when deciding which vmtype to start as VMs become available.

For ease of management, the [targetshares] section can be placed in a separate file, /etc/vac-targetshares.conf, which is convenient if it is generated automatically or frequently edited by hand and pushed out to the factory machines.

The experiment or VO responsible for each vmtype should supply step by step intructions on how to set up the rest of the [vmtype ...] section and how to configure the files to be placed in its subdirectory of /var/lib/vac/vmtypes .

Starting and stopping vacd

The Vac daemon, vacd, is started and stopped by /etc/rc.d/init.d/vacd on conjunction with the usual service and chkconfig commands. As the configuration files are reread at the start of each cycle (by default, one per minute) it is not necessary to restart vacd after changing the configuration.

Furthermore, as vacd rereads the current state of the VMs from status files and the hypervisor at the start of each cycle, vacd can be restarted without disrupting running VMs or losing information about their state. In most cases it will even be possible to upgrade vacd from one patch level to another within the same minor release without having to drain the factory of running VMs.

Using vac command

The vac(1) man page explains how the vac command can be used to scan the current Vac space and display the VMs running, along with statistics about their CPU load and wall clock time.

Setting up Nagios

The check-vacd script installed in /var/lib/vac/bin can be used with Nagios to monitor the state of the vacd on a factory node.

It can be run from the local Nagios nrpe daemon with a line like this in its configuration file:

command[check-vacd]=/var/lib/vac/bin/check-vacd 600
which raises an alarm if the vacd heartbeat wasn't updated in the last 600 seconds.

APEL accounting

When Vac detects that a VM has finished, it writes one line with accounting information to two daily log files in /var/log/vacd-accounting . One log file is in PBS format, and one in BLAHP format. Vac uses the UUID of the VM as the local and grid job ID, and the vmtype name as the local user and group. A unique user DN is constructed from the components of the Vac space name. For example, vac01.example.com becomes /DC=com/DC=example/DC=vac01 . If the accounting_fqan option is present in a [vmtype ...] section, then for VMs of that type the value of that option is included in the log files as the user FQAN, which indicates the VO associated with the VM.

The format of these accounting log files is designed to work with the standard APEL PBS log file parser, which can be run on each factory machine from cron to publish accounting records into the site's APEL publisher database.

For APEL 2, a configuration file modelled on this example can be placed in /etc/glite-apel-pbs/parser-config-vac.xml:

<?xml version="1.0" encoding="UTF-8"?>
<ApelConfiguration enableDebugLogging="yes">
<SiteName>EXAMPLE-COM</SiteName>
<DBURL>jdbc:mysql://apel.example.com:3306/accounting</DBURL>
  <DBUsername>accounting</DBUsername>
  <DBPassword>REPLACE-WITH-PASSWORD</DBPassword>
  <DBProcessor inspectTables="no"/>

  <EventLogProcessor>
    <Logs searchSubDirs="yes"
     reprocess="no"><Dir>/var/log/vacd-accounting</Dir></Logs>
    <Timezone>UTC</Timezone>
    <SubmitHost>vac01.example.com</SubmitHost>
  </EventLogProcessor>

  <BlahdLogProcessor>
    <BlahdLogPrefix>blahp.log-</BlahdLogPrefix>
    <Logs searchSubDirs="yes" reprocess="no">
      <Dir>/var/log/vacd-accounting</Dir>
    </Logs>
  </BlahdLogProcessor>
</ApelConfiguration>

The SiteName is the EGI/WLCG site name used by the GOCDB, and the other example.com values should be replaced with your chosen names as before.

The parser can be run once a day by placing the file glite-apel-pbs-parser-cron in /etc/cron.d:

25 01 * * * root env APEL_HOME=/ /usr/bin/apel-pbs-log-parser  -f /etc/glite-apel-pbs/parser-config-vac.xml >> /var/log/apel.log 2>&1

A similar approach can be taken for APEL 3 based sites, using the new APEL configuration file format and scripts.

In either case, it is necessary to ensure that HEPSPEC06 benchmark values are present in the publisher database, either by placing them in the BDII and using the APEL SpecUpdater, or by inserting them by hand in the database. New rows only need to be inserted if the benchmark changes for some reason.