RALPP Work List Areca Disk Servers

From GridPP Wiki
Jump to: navigation, search

SL4 Versions

We're running SL4 in 64 bit mode on these servers. We've tried SL42 and SL43, we can successfully install and configure SL42 however SL43 hangs during the install (after formating the disks), this is the case under the 32 bit version as well.

Building Driver Disks

(Note although this talks about driver disks we never actually use a floppy, only .img files accessed over NFS

You need to have the same version of the driver kernel modules as the install kernel of the version of SL you are using. Unfortunately the 64bit bit versons of SL42 and SL43 use a different kernel version than the coresponding RHEL version so that will not work. It can however be used as a base.

To find out the install kernel version: Start an install up to the point where it complains that it cannot find any disks, pressing [f2] at that point will drop you to a shell where you can run uname -a

So I downloaded the RHEL4u2 driver disk from areca

unpacked the zip file:

unzip 1.20.0X.13Beta.zip

and the install.zip in that:

cd 1.20.0X.13
unzip install.zip

This got me to driver.img

I then mounted that:

mkdir /mnt/loop
mount -o loop ./driver.img
ls /mnt/loop
modinfo  modules.cgz  modules.dep  patchinstall  pcitable  rhdd

It's the modules.cgz we want to alter but first we need to unpack it:

mkdir modules
cd modules
gzip -dc /mnt/loop/modules.cgz > modules.cpio
cpio -idv < modules.cpio
ls
2.6.9-34.EL  2.6.9-34.ELhugemem  2.6.9-34.ELlargesmp  2.6.9-34.ELsmp

Now we want to add our kernel versions. Note you'll need both the install kernel and the installed kernel (the one it boots to after the install) drivers. For SL42 that is 2.6.9-22.0.1.EL (install) 2.6.9-22.0.1.ELsmp (installed).

You can extract the arcmsr.ko files from rpms downloaded from bodgit-n-scraper, rpms you've made yourself (next section) or build them from the areca sources.

Extracing the arcmsr.ko file from an rpm

Use rpm2cpio.

mkdir /tmp/arcmsr
cd /tmp/arcmsr
rpm2cpio /path/to/rpm/kernel-module-arcmsr-2.6.9-22.0.1.EL-1.20.00.07-1.x86_64.rpm | cpio -d -i
rpm2cpio /path/to/rpm/kernel-smp-module-arcmsr-2.6.9-22.0.1.EL-1.20.00.07-1.x86_64.rpm | cpio -d -i

Back in the modules directory I created subdirectories for the kernel_versions/architectures I'm intereseted in and copied the arsmsr.ko files in:

mkdir -p 2.6.9-22.0.1.EL/x86_64
cp /tmp/arcmsrlib/modules/2.6.9-22.0.1.EL/kernel/drivers/scsi/arcmsr.ko 2.6.9-22.0.1.EL/x86_64
mkdir -p 2.6.9-22.0.1.ELsmp/x86_64
cp /tmp/arcmsrlib/modules/2.6.9-22.0.1.ELsmp/kernel/drivers/scsi/arcmsr.ko 2.6.9-22.0.1.ELsmp/x86_64

Finally bundle it back up and put it back in the image:

find . -print -depth | cpio -ov -H crc | gzip -c9 /mnt/loop/modules.cgz
umount /mnt/loop

I cannot see any reason why you could not keep doing this to add more an more versions as the install and installed kernels get updated.

Chris brew 17:19, 11 Jul 2006 (BST)

Building areca driver rpms

Got the Areca driver source rpm from bodgit-n-scraper

This required the kernel-devel rpms to be installed to build:

yum install kernel-devel.x86_64 kernel-smp-devel.x86_64

All that needed to be done then was to edit kernel-module-arcmsr.spec file to change the kernel version:

...
%define kernel 2.6.9-34.0.1.EL
...

and run rpmbuild

rpmbuild -ba kernel-module-arcmsr.spec

At this point I was able to copy:

kernel-module-arcmsr-2.6.9-34.0.1.EL-1.20.00.07-1.x86_64.rpm
kernel-smp-module-arcmsr-2.6.9-34.0.1.EL-1.20.00.07-1.x86_64.rpm

to our SL4 yum repository, install them along with the 2.6.9-34.0.1EL kernel and reboot to the new kernel.

Chris brew 17:19, 11 Jul 2006 (BST)

Testing these during installs and upgrades of the server it turns out that there is now a yum helper that tries to work out what extra kernel modules you have installed and upgrade those automatically when you upgrade the kernel. Unfortunately, it requires the rpm names to be of the form kernel-module-modulename-`uname -r`, i.e. kernel-module-arcmsr-2.6.9-34.0.1.ELsmp-1.20.00.07-1.x86_64.rpm not kernel-smp-module-arcmsr-2.6.9-34.0.1.EL-1.20.00.07-1.x86_64.rpm and to Provide kernel-module.

I've rewritten the arcmsr spec file in (with chunks ripped out of the xfs spec file) to change the naming convention and add the provides.

Now when I yum update the kernel it automatically pulls in the new versions of the areca and xfs kernel module rpms.

Chris brew 09:02, 13 Jul 2006 (BST)

Building xfs kernel modules

This one was faily simple:

Install the source rpm from Scientific Linux

rpm -ivh ftp://ftp.scientificlinux.org/linux/scientific/42/x86_64/contrib/SRPMS/xfs/kernel-module-xfs-2.6.9-22.EL-0.1-1.src.rpm

and build it for each kernel version you require:

rpmbuild -ba kernel-module-xfs.spec --define "kernel_topdir /lib/modules/2.6.9-34.0.1.ELsmp/build"
rpmbuild -ba kernel-module-xfs.spec --define "kernel_topdir /lib/modules/2.6.9-34.0.1.EL/build"

Chris brew 17:19, 11 Jul 2006 (BST)

Installing and Testing dCache

Installed a basic versions of dCache using YAIM with a cut down copy of my normal site-info.def script that only supports dteam. I installed all the main services on heplnx173 along with 4 pools and 4 pools on each of heplnx170-172.

Here's the commands used for the install (after configuring the raid partition):

scp heplnx173:/root/yaim-conf/* /root/yaim-conf/
mkdir /etc/grid-security/
mkdir /raid/data1 /raid/data2 /raid/data3 /raid/data4
openssl pkcs12 -in heplnx170-2007.p12 -clcerts -nokeys -out /etc/grid-security/hostcert.pem
openssl pkcs12 -in heplnx170-2007.p12 -nocerts -nodes -out /etc/grid-security/hostkey.pem
scp heplnx173:/etc/yum.repos.d/* /etc/yum.repos.d/
yum remove perl-Net-LDAP  perl-XML-SAX postgresql-libs.i386
yum install glite-SE_dcache lcg-CA
#/opt/glite/yaim/scripts/configure_node /root/yaim-conf/site-info.def SE_dcache # (I think this step failed and I cancelled it)
/opt/glite/yaim/scripts/run_function /root/yaim-conf/site-info.def config_sedcache
vi /raid/data?/pool/setup # (This was to set the "max diskspace" of the pool)
service dcache-pool restart
service iptables stop
chkconfig iptables off

I then added the head node into the bdii-update.conf file on my site bdii.

Once the top level bdii's had updated I was able to use the lcg-* commands against the test setup. I then had the system added to the RAL FTS server so I could use the that to test the setup.

Using the filetransfer.py script I was able to transfer 100s of gigabytes of files into and out of the test setup without more unexplained errors that usual, achieving rates of up to 800Mb/s on some tests.

On 29/09/2006 I had to decommision two of the pool nodes (heplnx170-1) to put them into service elsewhere. To do this without losing the data on those pools I used the copy manager to copy the files from the pools to be removed to those remaining.

First I had to configure and start to copy manager. Following the instructions in the dCache book I created /opt/d-cache/config/copy.batch containing:

#
set printout default 3
set printout CellGlue none
onerror shutdown
#
check -strong setupFile
#
copy file:${setupFile} context:setupContext
#
#  import the variables into our $context.
#  don't overwrite already existing variables.
#
import context -c setupContext
#
#   Make sure we got what we need.
#
check -strong serviceLocatorHost serviceLocatorPort
#
create dmg.cells.services.RoutingManager  RoutingMgr
#
#   The LocationManager Part
#
create dmg.cells.services.LocationManager lm \
       "${serviceLocatorHost} ${serviceLocatorPort} "
#
#
#
create diskCacheV111.replicaManager.CopyManager copy0 \
       "default -export"
#

The initialised and started the copy mamnager:

cd /opt/d-cache/jobs
./initPackage.sh
/opt/d-cache/jobs/copy start

Then for each destination pool I had to set:

pp set pnfs timeout 300
save

and in each source pool

p2p set max active 10

Then from the poolManager I disabled each of the source pools with:

psu set disabled <poolname>

Then from the copyManager cell it was just a case of running:

copy <sourcePool> <destinationPool> -max=5

For each pair of pools info lets you see the progress of the transfers and gives you an average speed which for these transfers was about 110MB/s - very close to line speed.

Finally I ran rep ls in some source/destination pairs just to check that the copies seemed to have worked before shutting down the dCache services on the nodes to be removed.