RALPP Work List Areca Disk Servers
Contents
SL4 Versions
We're running SL4 in 64 bit mode on these servers. We've tried SL42 and SL43, we can successfully install and configure SL42 however SL43 hangs during the install (after formating the disks), this is the case under the 32 bit version as well.
Building Driver Disks
(Note although this talks about driver disks we never actually use a floppy, only .img
files accessed over NFS
You need to have the same version of the driver kernel modules as the install kernel of the version of SL you are using. Unfortunately the 64bit bit versons of SL42 and SL43 use a different kernel version than the coresponding RHEL version so that will not work. It can however be used as a base.
To find out the install kernel version: Start an install up to the point where it complains that it cannot find any disks, pressing [f2]
at that point will drop you to a shell where you can run uname -a
So I downloaded the RHEL4u2 driver disk from areca
unpacked the zip file:
unzip 1.20.0X.13Beta.zip
and the install.zip
in that:
cd 1.20.0X.13 unzip install.zip
This got me to driver.img
I then mounted that:
mkdir /mnt/loop mount -o loop ./driver.img ls /mnt/loop modinfo modules.cgz modules.dep patchinstall pcitable rhdd
It's the modules.cgz
we want to alter but first we need to unpack it:
mkdir modules cd modules gzip -dc /mnt/loop/modules.cgz > modules.cpio cpio -idv < modules.cpio ls 2.6.9-34.EL 2.6.9-34.ELhugemem 2.6.9-34.ELlargesmp 2.6.9-34.ELsmp
Now we want to add our kernel versions. Note you'll need both the install kernel and the installed kernel (the one it boots to after the install) drivers. For SL42 that is 2.6.9-22.0.1.EL (install) 2.6.9-22.0.1.ELsmp (installed).
You can extract the arcmsr.ko
files from rpms downloaded from bodgit-n-scraper, rpms you've made yourself (next section) or build them from the areca sources.
Extracing the arcmsr.ko
file from an rpm
Use rpm2cpio
.
mkdir /tmp/arcmsr cd /tmp/arcmsr rpm2cpio /path/to/rpm/kernel-module-arcmsr-2.6.9-22.0.1.EL-1.20.00.07-1.x86_64.rpm | cpio -d -i rpm2cpio /path/to/rpm/kernel-smp-module-arcmsr-2.6.9-22.0.1.EL-1.20.00.07-1.x86_64.rpm | cpio -d -i
Back in the modules
directory I created subdirectories for the kernel_versions/architectures I'm intereseted in and copied the arsmsr.ko
files in:
mkdir -p 2.6.9-22.0.1.EL/x86_64 cp /tmp/arcmsrlib/modules/2.6.9-22.0.1.EL/kernel/drivers/scsi/arcmsr.ko 2.6.9-22.0.1.EL/x86_64 mkdir -p 2.6.9-22.0.1.ELsmp/x86_64 cp /tmp/arcmsrlib/modules/2.6.9-22.0.1.ELsmp/kernel/drivers/scsi/arcmsr.ko 2.6.9-22.0.1.ELsmp/x86_64
Finally bundle it back up and put it back in the image:
find . -print -depth | cpio -ov -H crc | gzip -c9 /mnt/loop/modules.cgz umount /mnt/loop
I cannot see any reason why you could not keep doing this to add more an more versions as the install and installed kernels get updated.
Chris brew 17:19, 11 Jul 2006 (BST)
Building areca driver rpms
Got the Areca driver source rpm from bodgit-n-scraper
This required the kernel-devel rpms to be installed to build:
yum install kernel-devel.x86_64 kernel-smp-devel.x86_64
All that needed to be done then was to edit kernel-module-arcmsr.spec
file to change the kernel version:
... %define kernel 2.6.9-34.0.1.EL ...
and run rpmbuild
rpmbuild -ba kernel-module-arcmsr.spec
At this point I was able to copy:
kernel-module-arcmsr-2.6.9-34.0.1.EL-1.20.00.07-1.x86_64.rpm kernel-smp-module-arcmsr-2.6.9-34.0.1.EL-1.20.00.07-1.x86_64.rpm
to our SL4 yum repository, install them along with the 2.6.9-34.0.1EL
kernel and reboot to the new kernel.
Chris brew 17:19, 11 Jul 2006 (BST)
Testing these during installs and upgrades of the server it turns out that there is now a yum helper that tries to work out what extra kernel modules you have installed and upgrade those automatically when you upgrade the kernel. Unfortunately, it
requires the rpm names to be of the form kernel-module-modulename-`uname -r`
, i.e. kernel-module-arcmsr-2.6.9-34.0.1.ELsmp-1.20.00.07-1.x86_64.rpm
not kernel-smp-module-arcmsr-2.6.9-34.0.1.EL-1.20.00.07-1.x86_64.rpm
and to Provide kernel-module
.
I've rewritten the arcmsr
spec file in (with chunks ripped out of the xfs
spec file) to change the naming convention and add the provides.
Now when I yum update
the kernel it automatically pulls in the new versions of the areca
and xfs
kernel module rpms.
Chris brew 09:02, 13 Jul 2006 (BST)
Building xfs kernel modules
This one was faily simple:
Install the source rpm from Scientific Linux
rpm -ivh ftp://ftp.scientificlinux.org/linux/scientific/42/x86_64/contrib/SRPMS/xfs/kernel-module-xfs-2.6.9-22.EL-0.1-1.src.rpm
and build it for each kernel version you require:
rpmbuild -ba kernel-module-xfs.spec --define "kernel_topdir /lib/modules/2.6.9-34.0.1.ELsmp/build" rpmbuild -ba kernel-module-xfs.spec --define "kernel_topdir /lib/modules/2.6.9-34.0.1.EL/build"
Chris brew 17:19, 11 Jul 2006 (BST)
Installing and Testing dCache
Installed a basic versions of dCache using YAIM with a cut down copy of my normal site-info.def
script that
only supports dteam. I installed all the main services on heplnx173 along with 4 pools and 4 pools on each of heplnx170-172.
Here's the commands used for the install (after configuring the raid partition):
scp heplnx173:/root/yaim-conf/* /root/yaim-conf/ mkdir /etc/grid-security/ mkdir /raid/data1 /raid/data2 /raid/data3 /raid/data4 openssl pkcs12 -in heplnx170-2007.p12 -clcerts -nokeys -out /etc/grid-security/hostcert.pem openssl pkcs12 -in heplnx170-2007.p12 -nocerts -nodes -out /etc/grid-security/hostkey.pem scp heplnx173:/etc/yum.repos.d/* /etc/yum.repos.d/ yum remove perl-Net-LDAP perl-XML-SAX postgresql-libs.i386 yum install glite-SE_dcache lcg-CA #/opt/glite/yaim/scripts/configure_node /root/yaim-conf/site-info.def SE_dcache # (I think this step failed and I cancelled it) /opt/glite/yaim/scripts/run_function /root/yaim-conf/site-info.def config_sedcache vi /raid/data?/pool/setup # (This was to set the "max diskspace" of the pool) service dcache-pool restart service iptables stop chkconfig iptables off
I then added the head node into the bdii-update.conf file on my site bdii.
Once the top level bdii's had updated I was able to use the lcg-* commands against the test setup. I then had the system added to the RAL FTS server so I could use the that to test the setup.
Using the filetransfer.py script I was able to transfer 100s of gigabytes of files into and out of the test setup without more unexplained errors that usual, achieving rates of up to 800Mb/s on some tests.
On 29/09/2006 I had to decommision two of the pool nodes (heplnx170-1) to put them into service elsewhere. To do this without losing the data on those pools I used the copy manager to copy the files from the pools to be removed to those remaining.
First I had to configure and start to copy manager. Following the instructions in the dCache book I created /opt/d-cache/config/copy.batch containing:
# set printout default 3 set printout CellGlue none onerror shutdown # check -strong setupFile # copy file:${setupFile} context:setupContext # # import the variables into our $context. # don't overwrite already existing variables. # import context -c setupContext # # Make sure we got what we need. # check -strong serviceLocatorHost serviceLocatorPort # create dmg.cells.services.RoutingManager RoutingMgr # # The LocationManager Part # create dmg.cells.services.LocationManager lm \ "${serviceLocatorHost} ${serviceLocatorPort} " # # # create diskCacheV111.replicaManager.CopyManager copy0 \ "default -export" #
The initialised and started the copy mamnager:
cd /opt/d-cache/jobs ./initPackage.sh /opt/d-cache/jobs/copy start
Then for each destination pool I had to set:
pp set pnfs timeout 300 save
and in each source pool
p2p set max active 10
Then from the poolManager I disabled each of the source pools with:
psu set disabled <poolname>
Then from the copyManager cell it was just a case of running:
copy <sourcePool> <destinationPool> -max=5
For each pair of pools info
lets you see the progress of the transfers and gives you an average speed which for these transfers was about 110MB/s - very close to line speed.
Finally I ran rep ls
in some source/destination pairs just to check that the copies seemed to have worked before shutting down the dCache services on the nodes to be removed.