https://www.gridpp.ac.uk/w/api.php?action=feedcontributions&user=Andrew+elwell&feedformat=atomGridPP Wiki - User contributions [en]2024-03-29T08:30:34ZUser contributionsMediaWiki 1.22.0https://www.gridpp.ac.uk/wiki/Glasgow_full_shut_down_procedureGlasgow full shut down procedure2007-12-21T12:32:47Z<p>Andrew elwell: </p>
<hr />
<div>== controlled shutdown of some nodes (caused by environmental limitations / scheduled upgrades etc) ==<br />
log in to svr016 and mark the worker nodes concerned as '''offline'''<br />
<br />
This can be simple (for one or two hosts)<br />
svr016 #> pbsnodes -o node008 node140<br />
Or slightly more complex<br />
svr016 #> for i in 01 03 05 07 09 11 13 15 17 19 21 23 25 27 29 31 33 35 ; do echo -n " node0$i" ; done | xargs pbsnodes -o<br />
<br />
As we run long (up to 7 days) the worker nodes may take a long time to drain. Follow the [http://svr031.gla.scotgrid.ac.uk/ganglia/ Ganglia] plots to see when the nodes go idle.<br />
Once thats done you can shutdown the nodes cleanly with the '''poweroff''' command on the nodes themselves.<br />
psdh -w node008,node140 poweroff<br />
Please note that to power a node back on from this state you'll either need to press the power button on the front of the box itself, or cycle the power (the nodes should be set to come on automatically after power loss - which means we can control them via the APC masterswitches)<br />
<br />
== Urgent clean shutdown (minor panic) ==<br />
log into svr031. Poweroff all nodes (clean shutdown)<br />
pdsh -w node[001-140] poweroff<br />
once they're down you can shut off the power to them <br />
powernode --host=node001-140 --off<br />
Then take down the servers / DPMdisks / NFSdisk / headnode<br />
'''FIXME - Any preferred order?'''<br />
pdsh -w svr0[16-30] poweroff<br />
pdsh -w disk0[32-41] -x disk037 poweroff <br />
pdsh -w disk037 poweroff<br />
powernode --host=... --off <br />
Finally kill server 31 itself once you've checked that all the lights are off on the machines. <br />
FIXME - What about the nortel switch?<br />
poweroff<br />
It should all now be nice n quiet<br />
<br />
== Very Urgent shutdown ===<br />
Big red button is located by the aircon unit at the back of the room</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/XenXen2007-11-29T11:04:19Z<p>Andrew elwell: </p>
<hr />
<div>[[Category:Virtualisation]]<br />
<br />
=About=<br />
<br />
Xen (http://www.cl.cam.ac.uk/Research/SRG/netos/xen/) is a virtual <br />
machine monitor (VMM) for x86-compatible computers. Xen can <br />
securely execute multiple virtual machines, each running its own <br />
OS, on a single physical system with close-to-native performance. <br />
The VMM monitor itself runs in what is often referred to as Domain 0, <br />
virtual machines (VM) run in other Domains.<br />
See also [[Xen-strap]]<br />
<br />
=Prerequisites=<br />
==Documentation==<br />
<tt>README</tt> file in<br />
<br />
<tt>xen-3.0.2-src.tgz</tt><br />
or<br />
<tt>xen-3.0.2-install-x86_32.tgz</tt><br />
<br />
Xen 3.0 User Manual<br />
http://www.cl.cam.ac.uk/Research/SRG/netos/xen/readmes/user.pdf<br />
<br />
==Hardware==<br />
A reasonably powerful box. I'm running Xen on:<br />
<br />
* 2 processor Pentium III (Katmai) 600MHz, 512MB RAM.<br />
* 1 processor Intel(R) Pentium(R) 4 CPU 1.80GHz, 512MB RAM.<br />
* 1 processor Intel(R) Pentium(R) 4 CPU 3.00GHz, 4GB RAM.<br />
* 2 processor Intel(R) Xeon(TM) CPU 2.80GHz, 4GB RAM.<br />
* 1 processor Intel(R) Pentium(R) M 1.4GHz, 2GB RAM (IBM Thinkpad X31 laptop).<br />
<br />
==Software==<br />
===OS===<br />
A Linux distribution. I chose Debian stable 3.1 as I was having<br />
difficulties with Xen domain 0 on Scientific Linux.<br />
<br />
[[User:Andrew_elwell]] Also has Xen 3.1 running on Ubuntu 7.10<br />
<br />
===Xen===<br />
I chose to install Xen from a binary distribution<br />
(<tt>xen-3.0.2-install-x86_32.tgz</tt>), but compiled my Xen <br />
kernels from source (<tt>xen-3.0.2-src.tgz</tt>). Another possibility would be<br />
to use <tt>-xen</tt> modules from the binary distribution<br />
for both dom0 and domU. If you do that don't forget to generate initrd and add it to <br />
your boot loader.<br />
<br />
Get both of these packages at http://www.xensource.com/xen/downloads/<br />
<br />
=Installation=<br />
<br />
==Xen 3.0.2 Installation==<br />
# xr=<temporary_xen_installation_root><br />
# mkdir -p $xr<br />
# tar zxvf xen-3.0.2-install-x86_32.tgz -C $xr<br />
# tar zxvf xen-3.0.2-src.tgz -C $xr<br />
<br />
Read <tt>$xr/xen-3.0.2-2-install/README</tt>.<br />
In particular, you'll need to install <tt>bridge-utils</tt> and <tt>iproute</tt> packages.<br />
<br />
# $xr/xen-3.0.2-2-install/install.sh<br />
<br />
The script performs a few checks and installs Xen 3.0.2<br />
(unpackaged) on your Linux box. The install script can be<br />
run several times (if you find it fails for some reason). It must exit 0 (all done).<br />
<br />
==Xen 3.0.2 kernel compilation==<br />
Kernel configuration files are stored in <tt>$xr/xen-3.0.2-2/buildconfigs</tt><br />
<br />
I've customised these: <br />
<tt>$xr/xen-3.0.2-2/buildconfigs/{linux-defconfig_xen0_x86_32,linux-defconfig_xenU_x86_32}</tt><br />
<br />
In particular, I've added support for IP tables and my network card.<br />
<br />
# cd $xr/xen-3.0.2-2/ && KERNELS="linux-2.6-xen0 linux-2.6-xenU" make world<br />
<br />
After compilation copy Xen kernels<br />
<br />
# mkdir -p /boot/kernel/xen-3.0.2<br />
# cp -a $xr/xen-3.0.2-2/dist/install/boot/* /boot/kernel/xen-3.0.2<br />
<br />
and kernel modules<br />
# cp -ra $xr/xen-3.0.2-2/dist/install/lib/modules /lib<br />
<br />
on your Domain 0 box. You'll also need to copy <br />
2.6.16-xenU modules on your Domain U (VM) boxes.<br />
<br />
==Image managers==<br />
After some time playing with Xen, you'll realise you need an OS image manager of some sort to help<br />
you installing and archiving VMs.<br />
I'm you'll find many of them around, but you'll probably be better off writing your own to suit<br />
your specific needs. I've written a simple dialog-based GUI Xen Image Manager (XIM).<br />
see also https://www.gridpp.ac.uk/wiki/Xen-strap http://www.gridpp.rl.ac.uk/pps/xen-strap/<br />
<br />
=Configuration=<br />
<br />
==Domain 0==<br />
<br />
===Grub===<br />
title Xen 3.0 (DEB-31)<br />
kernel /boot/kernel/xen-3.0.2/xen-3.0.gz dom0_mem=131072 console=vga<br />
module /boot/kernel/xen-3.0.2/vmlinuz-2.6.16-xen0 root=/dev/sda1 ro console=tty0<br />
<br />
you may need to change <tt>/dev/sda1</tt>, it is my Xen Domain 0 root partition.<br />
<br />
===System configuration===<br />
Add <tt>/etc/init.d/xend</tt> to your init scripts.<br />
<br />
You may also want to add a simple script to start all your virtual domains after your Domain 0 reboots.<br />
<br />
for b in $(ls /etc/xen/*.rl.ac.uk) <br />
do <br />
xm create $b <br />
done<br />
<br />
===Sample configuration file===<br />
<br />
# cat /etc/xen/grumpy.esc.rl.ac.uk<br />
name="grumpy.esc.rl.ac.uk"<br />
memory=384<br />
kernel="/boot/kernel/xen-3.0.2/vmlinuz-2.6-xenU"<br />
disk=['phy:sda5,hda2,w','file:/mnt/sda13/swap0,hda13,w']<br />
root="/dev/hda2 ro"<br />
vif=['mac=aa:00:00:77:ca:8f']<br />
ip="130.246.76.119"<br />
netmask="255.255.255.0"<br />
gateway="130.246.76.254"<br />
hostname="grumpy.esc.rl.ac.uk"<br />
restart='onreboot'<br />
extra="4"<br />
vnc=0<br />
<br />
==Domain U==<br />
<br />
===Creating custom Xen images===<br />
Make sure that:<br />
* <tt>/dev/*</tt> contains device files, especially <tt>/dev/console</tt><br />
* you edit <tt>/etc/fstab</tt><br />
* you do <tt>mv /lib/tls /lib/tls.disabled</tt><br />
* you have <tt>/lib/modules/2.6.16-xenU</tt><br />
<br />
See also<br />
https://www.gridpp.ac.uk/wiki/Xen-strap<br />
http://www.gridpp.rl.ac.uk/pps/xen-strap/<br />
<br />
=Testing=<br />
<br />
==ttylinux==<br />
<br />
Start xend<br />
<br />
# xend start<br />
<br />
Make sure it worked:<br />
<br />
# xm list<br />
Name ID Mem(MiB) VCPUs State Time(s)<br />
Domain-0 0 128 1 r----- 734.5<br />
<br />
Get, unpack and modify ttylinux (or use https://www.gridpp.ac.uk/wiki/Xen-strap alternatively for other images)<br />
<br />
# wget http://www.minimalinux.org/ttylinux/packages/ttylinux-5.0.tar.gz<br />
# tar zxvf ./ttylinux-5.0.tar.gz -C /boot ttylinux-5.0/rootfs.gz<br />
# cd /boot/ttylinux-5.0<br />
# gzip -d rootfs.gz<br />
# mkdir -p mnt<br />
# mount -o loop rootfs mnt<br />
# mv mnt/etc/fstab mnt/etc/fstab.orig<br />
# sed 's|^/dev/ram0|/dev/sda1|' < mnt/etc/fstab.orig > mnt/etc/fstab<br />
# umount mnt<br />
# rmdir mnt<br />
<br />
Create a Xen Domain U configuration file <br />
<br />
# cat >/etc/xen/tty<<EOF<br />
kernel="/boot/kernel/xen-3.0.2/vmlinuz-2.6-xenU"<br />
memory=64<br />
name="ttylinux"<br />
disk=['file:/boot/ttylinux-5.0/rootfs,sda1,w']<br />
dhcp="dhcp"<br />
root="/dev/sda1 ro"<br />
extra="4"<br />
EOF<br />
<br />
Start tty Domain U (VM) with attached console<br />
<br />
# xm create tty -c<br />
<br />
Login as <tt>root</tt>, password <tt>root</tt>.<br />
<br />
==OS tested and working on Xen 3.0.2==<br />
<br />
* CentOS 4.2<br />
* Debian Linux 3.1 (stable), testing<br />
* Fedora Core 4, 5<br />
* NetBSD 3.0, 3.0.1<br />
* OpenSolaris 5.11<br />
* Scientific Linux 3.0.5, 3.0.7<br />
* Scientific Linux 4.3<br />
* Scientific Linux CERN 3.0.6<br />
* Scientific Linux CERN 4.3<br />
* SuSE 9.3 Enteprise Server, Professional; 10.1 (open)<br />
* Ubuntu Linux 5.10, 6.06<br />
* WhiteBox EL 4<br />
<br />
=Gotchas=<br />
As far as I know Xen 3.0.2 (in contrast to 2.0.7) works only<br />
with 2.6.x Linux kernels. If you want to use SL 3.0.x,<br />
you'll need modutils that support 2.6.x kernels, for example <br />
from SL 4.3.</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Glasgow_New_Cluster_InstallerGlasgow New Cluster Installer2007-11-09T10:41:39Z<p>Andrew elwell: </p>
<hr />
<div>==Overview==<br />
<br />
The installer is based on ''kickstart'', which installs the base RPM set. After this install is done, a postbootinstaller forces ssh keys (and any other secrets) onto the host.<br />
<br />
At this point [[cfengine]] is started (from rc.local) and configures the node properly.<br />
<br />
After the node is installed [[cfengine]] continues to run (once an hour), so that any updates to the system are patched in quickly.<br />
<br />
===Summary===<br />
<br />
====Prolog====<br />
<br />
Make sure the node is known to [[:Category:YPF | YAIM People's Front]], by [[Glasgow_Cluster_New_Host | adding it's MAC/IP addresses]], etc.<br />
<br />
These steps only need done once.<br />
<br />
====Main Sequence====<br />
# Use the <tt>setboot</tt> utility to set the correct PXE boot for the host, e.g., <tt>sl-4x-x86_64-eth0-ks</tt><br />
## Note the coded boot image here: Scientific Linux (sl), version 4x, architecture x86_64, install over eth0, with kickstart (ks).<br />
## For a list of boot images see the <tt>/usr/local/ypf/tftp/pxelinux.cfg</tt> directory.<br />
# Check the contents of <tt>/usr/local/ypf/www/classes.conf</tt> which determine the kickstart file.<br />
## Currently for workernodes this should be <tt>nodeXXX: sl-4x-x86_64 eth0 yumsl4x compat32 cfengine wn</tt><br />
## Note the correspondence with the boot image naming scheme above.<br />
## Unless a node is completely new this file should in fact be ok.<br />
# Allow the node to recover secrets<br />
## Use <tt>allowsecret --host=nodeXXX-nodeYYY</tt><br />
# Reboot the node(s): Reboot or <tt>nodepower --reboot --host=...</tt>.<br />
# Nodes will PXE boot, kickstart and themselves<br />
## As part of the kickstart process the nodes do a <tt>yum update</tt>, so they will reboot fully patched.<br />
# Upon reboot, a node will signal its first boot by writing a file into <tt>/usr/local/ypf/var/firstboot/NODENAME</tt><br />
# The <tt>firstbootwatcher</tt> script looks for hostnames in this file, and if they are authorised, pushes the ssh keys, cfengine keys and grid certificates to the host.<br />
# [[cfengine]] starts and configures the node for use.<br />
## cfengine will not start if the node doesn't have its correct ssh host keys (this is a proxy for having been granted secrets) .<br />
## If everything looks ok after YAIM has been run, then pbs_mom is started to join the batch system.<br />
<br />
Note that if WNs are taken out the batch system then they are probably marked as <tt>offline</tt> in torque, so use <tt>pbsnodes -c NODE</tt> to clear the offline status.<br />
<br />
===Example===<br />
<br />
Two worker nodes, <tt>node013</tt> and <tt>node088</tt> have been repaired and need to be rebuilt and brought back into service:<br />
<br />
<pre><br />
svr031# setboot --verbose --image=sl-30x-i386-eth1-ks --host=node013,node088<br />
svr031# allowsecret --verbose --host=node013,node088<br />
svr031# ppoweroff -n node013,node088<br />
svr031# ppoweron -n node013,node088<br />
[Drink coffee ~10 min]<br />
</pre><br />
<br />
If this works the nodes should now appear in ganglia and in torque. If pbs_mom was started on the nodes then YAIM ran correctly, so all is well. If the nodes were marked offline then on <tt>svr016</tt> do <tt>pbsnodes -c node013 node088</tt>.<br />
<br />
[[Category: ScotGrid]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Glasgow_Add_New_Local_UserGlasgow Add New Local User2007-11-03T17:45:25Z<p>Andrew elwell: </p>
<hr />
<div>Once a local user (N.B. local in this sense can mean any user from a ScotGrid partner site) has been granted access on the cluster (''important'': they must have agreed to the AUP), they need to send their DN and be identified with a local group (for their torque queue). <br />
<br />
1) log into svr031 and cd to<tt>/usr/local/ypf/private/users/</tt><br />
<br />
Then:<br />
<br />
# Append their '''DN''', Next available '''username''' (format glaNNN), [optional physics UID - not yet implemented],'''VO name'''[,optional secondary VOs] to <tt>master-mapfile</tt><br />
# <tt>make</tt><br />
## This will rebuild the passwd, group, shadow and grid-mapfile-local files - but it does not yet install them. '''Check for sanity and investigate all errors!''':<br />
## Run <tt>make diff</tt> to print the differences between current live and new files<br />
# <tt>make install</tt><br />
## This will first take a backup of the current files in to <tt>/usr/local/ypf/private/users/backup</tt>, then it copies in the new files.<br />
## In the most pathological of cases the rollout across the cluster will take an hour<br />
<br />
<br />
If the ./genaccts.pl script throws a warning <br />
WARNING - cannot find GID for group GROUPNAME - Check group.base<br />
then you'll have to add the new group into the group.base file. For most users this will be the group of their corresponding VO, however, there are a few purely local groups which can be used for globus job submission. These start "gl" plus an abbreviation corresponding to their research area, e.g. "glee" = "Glasgow Electrical Engineering".<br />
<br />
It will take cfengine 60 minutes (worst case scenario) to roll these changes out across the cluster.<br />
<br />
Note that this will give the user access via gsissh to the local account. If they submit a job using edg-job-submit and a vanilla proxy they will also map to this account and group in the batch system. If they use a VOMS proxy then they will get mapped to a different VO pool account. VOMS is preferred in the long run, but if a user needs write access to their file area then they should submit the job using a vanilla proxy.<br />
<br />
N.B. If you are starting a new group then you will need to setup a torque queue and maui fairshare: [[Glasgow Cluster Adding Queues]].<br />
<br />
Durham users get durXXX accounts, but otherwise everything is as above. The naming is completely arbitrary, of course, but an institute code seems sensible.<br />
<br />
===Mailing List===<br />
<br />
Subscribe the user to the [http://www.physics.gla.ac.uk/mailman/listinfo/guscotgrid-users GUScotGrid Users] mailing list.<br />
<br />
[[Category: ScotGrid]] [[Category: Glasgow]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Transfer_Test_Python_Script_DevelopmentTransfer Test Python Script Development2007-10-28T22:31:12Z<p>Andrew elwell: </p>
<hr />
<div>==Source Code==<br />
<br />
===Host===<br />
<br />
The source code for the python script is held in subversion on <tt>grid01.ph.gla.ac.uk</tt>. You will need an account on this machine in order to work with the source code - there's no anonymous access presently. Mail [[User:Graeme stewart]] for an account.<br />
<br />
===Path===<br />
<br />
The code is in [[User:Graeme stewart|Graeme's]] subversion tree:<br />
<br />
svn+ssh://grid01.ph.gla.ac.uk/home/graeme/SVN/lcg/scripts/filetransfer<br />
<br />
You need to check out the code with <tt>svn co</tt>:<br />
<br />
$ svn co svn+ssh://grid01.ph.gla.ac.uk/home/graeme/SVN/lcg/scripts/filetransfer<br />
<br />
As of revision 296 this has been branched off for FTS 2 compatibility (branches/FTS2) and patched thanks to Gavin<br />
<br />
==Building RPMS==<br />
<br />
The <tt>Makefile</tt> contains a target for building the package's RPM. However it assumes that you have an RPM build tree in <tt>$HOME/rpm</tt>. If this is not the case then make a link to your real RPM build area.<br />
<br />
If you do have an RPM build tree here, then you probably need a <tt>.rpmmacros</tt> file with a line in it like<br />
<br />
%_topdir /home/graeme/rpm<br />
<br />
==Checking In==<br />
<br />
If you check code back in, please write a suitable log message. Thanks!<br />
<br />
[[Category: Transfer Test Script]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Category:DurhamCategory:Durham2007-10-22T08:35:32Z<p>Andrew elwell: </p>
<hr />
<div>[[category:ScotGrid]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/DurhamDurham2007-10-22T08:35:17Z<p>Andrew elwell: </p>
<hr />
<div>The University of Durham's [http://www.ippp.dur.ac.uk Institute of Particle Physics Phenomenology] is a member of ScotGrid.<br />
<br />
* [[Durham SC4]]<br />
<br />
[[Category:ScotGrid]]<br />
[[category:Durham]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Category:EdinburghCategory:Edinburgh2007-10-22T08:34:37Z<p>Andrew elwell: </p>
<hr />
<div>[[category:ScotGrid]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/EdinburghEdinburgh2007-10-22T08:34:20Z<p>Andrew elwell: </p>
<hr />
<div>The Physics Department of the [http://www.ph.ed.ac.uk/particle/Exp/GRID/ University of Edinburgh] is a member of [[ScotGrid]].<br />
<br />
== SRM setup ==<br />
<br />
* [[Edinburgh dCache troubleshooting]]<br />
<br />
* [[Edinburgh dCache Setup]]<br />
<br />
* [[Edinburgh DPM Setup]]<br />
<br />
* [[Site SRM Setup Template]]<br />
<br />
== SC4 FTS test ==<br />
<br />
* [[Edinburgh_SC4|Edinburgh's logbook for SC4]]<br />
* [[FTS_vs_srmcp|Comparison with srmcp]]<br />
<br />
== Filesystem Testing ==<br />
<br />
* [[Edinburgh_NFS_Tests|NFS]]<br />
* [[Ed_RAID_Tests|RAID]]<br />
<br />
[[Category:ScotGrid]]<br />
[[category:Edinburgh]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Template:UnderrevisionTemplate:Underrevision2007-10-09T09:08:06Z<p>Andrew elwell: </p>
<hr />
<div><div class="messagebox under revision" style="border:1px solid orange;background-color:#FFFACD;padding:7px;">This page is being revised by {{{1}}}.</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Nagios_jabber_notificationNagios jabber notification2007-05-24T22:27:45Z<p>Andrew elwell: </p>
<hr />
<div>[[Nagios]] can be configured to send notifications by various means, including '''Jabber''', which these days includes Google talk's XMPP service.<br />
<br />
See the [http://scotgrid.blogspot.com/2007/05/jabber-dabba-do.html ScotGrid Blog entry] for an overview. Configuration for nagios was as follows:<br />
<br />
appended into ''commands.cfg'' (where I keep the notification methods)<br />
<pre><br />
# 'host-notify-by-jabber' command definition<br />
define command{<br />
command_name host-notify-by-jabber<br />
command_line /usr/local/bin/notify_via_jabber $CONTACTPAGER$ "Host '$HOSTALIAS$' is $HOSTSTATE$ - Info<br />
: $HOSTOUTPUT$"<br />
}<br />
<br />
# 'notify-by-jabber' command definition<br />
define command{<br />
command_name notify-by-jabber<br />
command_line /usr/local/bin/notify_via_jabber $CONTACTPAGER$ "$NOTIFICATIONTYPE$ $HOSTNAME$ $SERVICED<br />
ESC$ $SERVICESTATE$ $SERVICEOUTPUT$ $LONGDATETIME$"<br />
}<br />
</pre><br />
<br />
The '''notify_via_jabber''' script is as follows:<br />
<pre><br />
#!/usr/bin/perl -w<br />
#<br />
# script for nagios notify via Jabber / Google Talk Instant Messaging<br />
# using XMPP protocol and SASL PLAIN authentication.<br />
#<br />
# author: Andrew Elwell <A.Elwell@physics.gla.ac.uk><br />
# based on work by Thus0 <Thus0@free.fr> and David Cox<br />
#<br />
# released under the terms of the GNU General Public License v2<br />
# Copyright 2007 Andrew Elwell.<br />
<br />
use strict;<br />
use Net::XMPP;<br />
<br />
## Configuration<br />
my $username = "your.google.username";<br />
my $password = "your.google.password";<br />
my $resource = "nagios";<br />
## End of configuration<br />
<br />
<br />
my $len = scalar @ARGV;<br />
if ($len ne 2) {<br />
die "Usage...\n $0 [jabberid] [message]\n";<br />
}<br />
my @field=split(/,/,$ARGV[0]);<br />
#------------------------------------<br />
<br />
# Google Talk & Jabber parameters :<br />
<br />
my $hostname = 'talk.google.com';<br />
my $port = 5222;<br />
my $componentname = 'gmail.com';<br />
my $connectiontype = 'tcpip';<br />
my $tls = 1;<br />
<br />
#------------------------------------<br />
<br />
my $Connection = new Net::XMPP::Client();<br />
<br />
# Connect to talk.google.com<br />
my $status = $Connection->Connect(<br />
hostname => $hostname, port => $port,<br />
componentname => $componentname,<br />
connectiontype => $connectiontype, tls => $tls);<br />
<br />
if (!(defined($status))) {<br />
print "ERROR: XMPP connection failed.\n";<br />
print " ($!)\n";<br />
exit(0);<br />
}<br />
<br />
# Change hostname<br />
my $sid = $Connection->{SESSION}->{id};<br />
$Connection->{STREAM}->{SIDS}->{$sid}->{hostname} = $componentname;<br />
<br />
# Authenticate<br />
my @result = $Connection->AuthSend(<br />
username => $username, password => $password,<br />
resource => $resource);<br />
<br />
if ($result[0] ne "ok") {<br />
print "ERROR: Authorization failed: $result[0] - $result[1]\n";<br />
exit(0);<br />
}<br />
<br />
# Send messages<br />
foreach ( @field ) {<br />
$Connection->MessageSend(<br />
to => "$_\@$componentname", <br />
resource => $resource,<br />
subject => "Notification",<br />
type => "chat",<br />
body => $ARGV[1]);<br />
}<br />
</pre><br />
<br />
[[category:Nagios]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Glasgow_Cluster_YPF_Adding_A_New_HostGlasgow Cluster YPF Adding A New Host2007-04-05T10:24:02Z<p>Andrew elwell: </p>
<hr />
<div>To add a new machine to the cluster:<br />
<br />
* Ensure that the machine will netboot from the interface connected to the ''internal'' network. This the only interface where DHCP and BOOTP will work.<br />
* Add the new machine to, at minimum, the hosts database. At the moment this is done with the <tt>clusterdb SQL_QUERY</tt> command, e.g.,<br />
<br />
clusterdb 'insert into hosts values("svr024.beowulf.cluster","00:30:48:42:95:1A","10.141.255.24");'<br />
clusterdb 'insert into hosts values("svr024.gla.scotgrid.ac.uk","00:30:48:42:95:1A","130.209.239.24");'<br />
<br />
If the host is associated with APC ports or network ports, add entries in these db tables as well.<br />
<br />
Note the SQL query has to be a single argument and hostnames are FQDNs. ''This is terribly crude right now - should have much better (and less dangerous) utilities.''<br />
<br />
An alternative is to populate the file '''/home/alt/etc/extra-hosts''' with IP, Hostnames, MAC address of new hosts, ie:<br />
10.141.255.26 svr026.beowulf.cluster 00:30:48:42:9C:3C<br />
130.209.239.26 svr026.gla.scotgrid.ac.uk 00:30:48:42:9C:3D<br />
10.141.255.27 svr027.beowulf.cluster 00:30:48:42:C8:24<br />
130.209.239.27 svr027.gla.scotgrid.ac.uk 00:30:48:42:C8:25<br />
and use the script '''extrahosts2clusterdb'''. When '''mkhosts''' runs it will munge the <tt>.beowulf.cluster</tt> and add a plain hostname alias.<br />
<br />
* Now regenerate the hosts tables for the cluster:<br />
<br />
# mkhosts > /var/cfengine/inputs/skel/common/etc/hosts<br />
# cp /var/cfengine/inputs/skel/common/etc/hosts /etc/hosts # N.B. Temporary until svr031 uses cfengine<br />
<br />
Send a HUP to dnsmasq on svr031 so the local DNS gets reloaded.<br />
<br />
''When svr031 uses cfengine, then do a <tt>cfagent -qv</tt>. This will HUP dnsmasq automatically.''<br />
<br />
* And also regenerate the dhcp configuration:<br />
<br />
# mkdhcpdconf > /etc/dhcpd.conf<br />
<br />
Restart dhcpd. <br />
<br />
''Better script could do this for you.''<br />
<br />
* Generate appropriate ssh key for the new machine using the <tt>gensshkey</tt> command (N.B.. here use the ''short'' hostname.)<br />
<br />
# gensshkey svr024<br />
<br />
''At the moment this is stored in a directory tree, but the in clusterdb would be much better.''<br />
<br />
* Regenerate the <tt>ssh_known_hosts</tt> and <tt>shosts.equiv</tt>.<br />
<br />
# cd /home/alt/private/key<br />
# genknownhosts *<br />
<br />
''This is rather crap - much better if ssh keys were properly in clusterdb and knownhosts generated directly from here. Also note the flakiness in working out if a host has a routed address.''<br />
<br />
NB: ''Until svr031 is managed by cfengine you need to copy /var/cfengine/inputs/skel/common/etc/ssh/ssh_known_hosts to /etc/ssh''<br />
<br />
* Generate a cfengine keypair for the new host.<br>''No easy way to do this right now - cfengine craply always tries to write to /var/cfengine/ppkeys/localhost.{pub,priv}. Doing it as a non-root user writes, more helpfully, to $HOME/.cfagent/ppkeys. These can then be copied to /home/alt/private/cfengine/HOST. Then copy the localhost.pub key to /var/cfengine/ppkeys/root-10.141.XXX.YYY.pub... As with ssh keys, the cfengine key pairs would be better stored in the clusterdb.''<br />
<br />
* If the machine is a server, unpack the host certificate into /home/alt/private/cert/HOST/host{cert,key}.pem. The little script <tt>unpackcerts</tt> might be useful...<br />
<br />
* Now the host is ready for instalation: move to [[Glasgow Cluster YPF Install]].<br />
<br />
[[Category: ScotGrid]] [[Category: YPF]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Template:TrantestTemplate:Trantest2007-03-18T19:04:21Z<p>Andrew elwell: </p>
<hr />
<div><p><small>{{{date}}} - {{{num}}} Files</small><br>[http://ppewww.physics.gla.ac.uk/~aelwell/TestLogs/{{{log}}}.log {{{status}}}] {{{speed}}}Mb/s [http://ppewww.physics.gla.ac.uk/~aelwell/TestLogs/{{{log}}}.plt P][http://ppewww.physics.gla.ac.uk/~aelwell/TestLogs/{{{log}}}.png G]</p></div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Template:SiteinfoTemplate:Siteinfo2007-03-14T14:09:14Z<p>Andrew elwell: </p>
<hr />
<div>{| class="infobox bordered" style="width: 25em; text-align: left; font-size: 95%;"<br />
|+ style="font-size: larger;" | '''{{{name}}}'''<br />
|-<br />
| colspan="2" style="text-align:center;" | [[Image:{{{image}}}|300px| ]]<br>{{{caption}}}<br />
|- <br />
! Data 1:<br />
| {{{data1}}}<br />
|- <br />
! Data 2:<br />
| {{{data2|''this text displayed if data2 not defined''}}}<br />
|- <br />
! Data 3 (data hidden if data3 empty or not defined):<br />
| {{{data3|}}}<br />
|- <br />
| colspan="2" style="font-size: smaller;" | {{{footnotes|}}}<br />
|}</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Category:YAIMCategory:YAIM2007-03-13T14:22:25Z<p>Andrew elwell: </p>
<hr />
<div>{{:YAIM}}</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Glite-3.0_dCache_upgrade_configurationGlite-3.0 dCache upgrade configuration2007-03-13T14:20:53Z<p>Andrew elwell: </p>
<hr />
<div> [root@wn4 examples]# /opt/glite/yaim/scripts/configure_node /opt/glite/yaim/examples/site-info.def SE_dcache<br />
Configuring config_upgrade<br />
Configuring config_ldconf<br />
/sbin/ldconfig: /opt/glite/externals/lib/libswigpy.so.0 is not a symbolic link <br />
/sbin/ldconfig: /opt/glite/externals/lib/libswigpl.so.0 is not a symbolic link <br />
/sbin/ldconfig: /opt/glite/externals/lib/libswigtcl8.so.0 is not a symbolic link<br />
Configuring config_sysconfig_edg<br />
Configuring config_sysconfig_globus<br />
Configuring config_sysconfig_lcg<br />
Configuring config_crl<br />
Configuring config_rfio<br />
Configuring config_host_certs<br />
Configuring config_users<br />
Configuring config_edgusers<br />
Configuring config_mkgridmap<br />
Now creating the grid-mapfile - this may take a few minutes...<br />
Configuring config_java<br />
Configuring config_rgma_client<br />
<br />
Welcome to the R-GMA setup utility<br />
----------------------------------<br />
<br />
<br />
Configuration written to:<br />
<br />
/opt/glite/etc/rgma/rgma.conf<br />
<br />
To configure security, edit proxy/certificate settings in<br />
<br />
/opt/glite/etc/rgma/ClientAuthentication.props<br />
<br />
Configuring config_gip<br />
<br />
Setting up an R-GMA Gin...<br />
<br />
- Configuring a gip information provider<br />
- Not configuring an fmon information provider<br />
- Not configuring a glite-ce information provider<br />
<br />
Wrote configuration to: /opt/glite/etc/rgma-gin/gin.conf<br />
<br />
All done<br />
<br />
Stopping rgma-gin: [ OK ]<br />
Starting rgma-gin: [ OK ]<br />
Configuring config_globus<br />
creating globus-sh-tools-vars.sh<br />
creating globus-script-initializer<br />
creating Globus::Core::Paths<br />
checking globus-hostname<br />
Done<br />
<br />
Creating...<br />
/opt/globus/etc/grid-info.conf<br />
Done<br />
<br />
Creating...<br />
/opt/globus/sbin/SXXgris<br />
/opt/globus/libexec/grid-info-script-initializer<br />
/opt/globus/libexec/grid-info-mds-core<br />
/opt/globus/libexec/grid-info-common<br />
/opt/globus/libexec/grid-info-cpu*<br />
/opt/globus/libexec/grid-info-fs*<br />
/opt/globus/libexec/grid-info-mem*<br />
/opt/globus/libexec/grid-info-net*<br />
/opt/globus/libexec/grid-info-platform*<br />
/opt/globus/libexec/grid-info-os*<br />
/opt/globus/etc/grid-info-resource-ldif.conf<br />
/opt/globus/etc/grid-info-resource-register.conf<br />
/opt/globus/etc/grid-info-resource.schema<br />
/opt/globus/etc/grid.gridftpperf.schema<br />
/opt/globus/etc/gridftp-resource.conf<br />
/opt/globus/etc/gridftp-perf-info<br />
/opt/globus/etc/grid-info-slapd.conf<br />
/opt/globus/etc/grid-info-site-giis.conf<br />
/opt/globus/etc/grid-info-site-policy.conf<br />
/opt/globus/etc/grid-info-server-env.conf<br />
/opt/globus/etc/grid-info-deployment-comments.conf<br />
Done<br />
Creating gatekeeper configuration file...<br />
Done<br />
Creating state file directory.<br />
Done.<br />
Reading gatekeeper configuration file...<br />
Determining system information...<br />
Creating job manager configuration file...<br />
Done<br />
Setting up fork gram reporter in MDS<br />
-----------------------------------------<br />
Done<br />
<br />
Setting up pbs gram reporter in MDS<br />
----------------------------------------<br />
loading cache /dev/null<br />
checking for qstat... no<br />
Setting up condor gram reporter in MDS<br />
----------------------------------------<br />
loading cache /dev/null<br />
checking for condor_q... no<br />
Setting up lsf gram reporter in MDS<br />
----------------------------------------<br />
loading cache /dev/null<br />
checking for lsload... no<br />
loading cache ./config.cache<br />
checking for mpirun... (cached) no<br />
creating ./config.status<br />
creating fork.pm<br />
loading cache /dev/null<br />
checking for mpirun... no<br />
checking for qdel... no<br />
loading cache /dev/null<br />
checking for condor_submit... no<br />
loading cache /dev/null<br />
loading cache ./config.cache<br />
creating ./config.status<br />
creating grid-cert-request-config<br />
creating grid-security-config<br />
Stopping Globus MDS [FAILED]<br />
Starting Globus MDS (gcc32dbgpthr) [ OK ]<br />
Stopping globus-gridftp: [FAILED]<br />
Starting globus-gridftp:execvp: No such file or directory<br />
[FAILED]<br />
Shutting down lcg-mon-gridftp: [FAILED]<br />
Starting lcg-mon-gridftp [ OK ]<br />
Configuring config_pgsql<br />
Configuring config_sepnfs<br />
start yaim_query_conf_node_dcache_pnfs_root<br />
stop yaim_query_conf_node_dcache_pnfs_root<br />
Configuring config_sedcache<br />
start yaim_state_reset_postgresql<br />
psql: could not connect to server: No such file or directory<br />
Is the server running locally and accepting<br />
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?<br />
stop yaim_state_reset_postgresql<br />
start config_pnfs_stop<br />
Shutting down pnfs services (PostgreSQL version):<br />
umount: /pnfs/fs: not mounted<br />
Stopping Heartbeat .... Ready<br />
Sorry, can't find shm 1122<br />
<br />
stop config_pnfs_stop<br />
start config_pgsql_uninstall<br />
Stopping postgresql service: [FAILED]<br />
end config_pgsql_uninstall<br />
Done Resetting D-Caches Database<br />
start config_pgsql_install<br />
start config_pgsql_base_init<br />
Initializing database: Initializing database: [ OK ]<br />
Starting postgresql service: [ OK ]<br />
/opt/glite/yaim/scripts/configure_node: line 395: echo_success: command not found<br />
stop config_pgsql_base_init<br />
start config_pgsql_base_access<br />
Stopping postgresql service: [ OK ]<br />
Starting postgresql service: [ OK ]<br />
stop config_pgsql_base_access<br />
Starting postgresql service: [ OK ]<br />
end config_pgsql_install<br />
start config_pgsql_base_users<br />
Starting postgresql service: [ OK ]<br />
config_pgsql_base_user pnfsserver<br />
start config_pgsql_base_user pnfsserver<br />
CREATE ROLE<br />
stop config_pgsql_base_user<br />
config_pgsql_base_user srmdcache<br />
start config_pgsql_base_user srmdcache<br />
CREATE ROLE<br />
stop config_pgsql_base_user<br />
stop config_pgsql_base_users<br />
Checking for the RDBMS 'dcache'<br />
Could not find 'dcache' so installing<br />
createdb -U srmdcache dcache<br />
Checking for the RDBMS 'companion'<br />
Could not find 'companion' so installing<br />
createdb -U srmdcache companion<br />
psql:/opt/d-cache/etc/psql_install_companion.sql:6: NOTICE: CREATE TABLE / UNIQUE will create implicit index "cacheinfo_pnfsid_key" for table "cacheinfo"<br />
Checking for the RDBMS 'replicas'<br />
Could not find 'replicas' so installing<br />
createdb -U srmdcache replicas<br />
psql:/opt/d-cache/etc/psql_install_replicas.sql:99: NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "poolname" for table "pools"<br />
psql:/opt/d-cache/etc/psql_install_replicas.sql:108: NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "replica" for table "replicas"<br />
psql:/opt/d-cache/etc/psql_install_replicas.sql:117: NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "hbprocess" for table "heartbeat"<br />
Checking for the RDBMS 'billing'<br />
Could not find 'billing' so installing<br />
createdb -U srmdcache billing<br />
start yaim_query_conf_node_dcache_pnfs_root<br />
stop yaim_query_conf_node_dcache_pnfs_root<br />
start yaim_state_reset_pnfs<br />
stop yaim_state_reset_pnfs<br />
start config_pnfs_stop<br />
Shutting down pnfs services (PostgreSQL version):<br />
umount: /pnfs/fs: not mounted<br />
Stopping Heartbeat .... Ready<br />
Sorry, can't find shm 1122<br />
<br />
stop config_pnfs_stop<br />
start config_pnfs_uninstall<br />
start yaim_query_conf_node_dcache_pnfs_root<br />
stop yaim_query_conf_node_dcache_pnfs_root<br />
stop config_pnfs_uninstall<br />
start config_pnfs_install<br />
start yaim_query_conf_node_dcache_pnfs_root<br />
stop yaim_query_conf_node_dcache_pnfs_root<br />
start config_pnfs_install_script<br />
start yaim_query_conf_node_dcache_pnfs_root<br />
stop yaim_query_conf_node_dcache_pnfs_root<br />
/opt/pnfs /opt/glite/yaim/examples<br />
PNFS_PSQL_USER = pnfsserver<br />
PNFS is already installed and is not supposed to be overwritten - Exit<br />
stop config_pnfs_install_script<br />
stop config_pnfs_install<br />
Done Resetting D-Caches PNFS<br />
start config_pnfs_start<br />
Shutting down pnfs services (PostgreSQL version):<br />
umount: /pnfs/fs: not mounted<br />
Stopping Heartbeat .... Ready<br />
Sorry, can't find shm 1122<br />
<br />
Starting pnfs services (PostgreSQL version):<br />
Shmcom : Installed 8 Clients and 8 Servers<br />
Starting database server for admin (/opt/pnfsdb/pnfs/databases/admin) ... Failed<br />
Starting database server for data1 (/opt/pnfsdb/pnfs/databases/data1) ... Failed<br />
Starting database server for alice (/opt/pnfsdb/pnfs/databases/alice) ... Failed<br />
Starting database server for atlas (/opt/pnfsdb/pnfs/databases/atlas) ... Failed<br />
Starting database server for dteam (/opt/pnfsdb/pnfs/databases/dteam) ... Failed<br />
Starting database server for cms (/opt/pnfsdb/pnfs/databases/cms) ... Failed<br />
Starting database server for lhcb (/opt/pnfsdb/pnfs/databases/lhcb) ... Failed<br />
Starting database server for sixt (/opt/pnfsdb/pnfs/databases/sixt) ... Failed<br />
Waiting for dbservers to register ... Ready<br />
Starting Mountd : pmountd<br />
Starting nfsd : pnfsd<br />
mount: RPC: Unable to receive; errno = Connection refused<br />
<br />
stop config_pnfs_start<br />
Skipping D-Cache config Reset<br />
<br />
Shutting down dcache pool: Stopping wn4Domain (pid=4589) Done<br />
<br />
Shutting down dcache services:<br />
Stopping gridftp-wn4Domain (pid=6085) Done<br />
Stopping gsidcap-wn4Domain (pid=6216) Done<br />
Stopping srm-wn4Domain (pid=6341) Done<br />
Pid File (/opt/d-cache/config/lastPid.replica) doesn't contain valid PID<br />
Stopping utilityDomain (pid=5789) Done<br />
Stopping httpdDomain (pid=17608) Done<br />
Stopping infoProviderDomain (pid=15988) Done<br />
Stopping pnfsDomain (pid=5886) Done<br />
Stopping adminDoorDomain (pid=5600) Done<br />
Stopping doorDomain (pid=5515) Done<br />
Stopping dirDomain (pid=5434) Done<br />
Stopping dCacheDomain (pid=5342) Done<br />
Stopping lmDomain (pid=5270) Done<br />
<br />
Stopping postgresql service: [ OK ]<br />
Starting postgresql service: [ OK ]<br />
start config_pnfs_stop<br />
Shutting down pnfs services (PostgreSQL version):<br />
umount: /pnfs/fs: not mounted<br />
Stopping Heartbeat .... Ready<br />
Removing 8 Clients 0+ 1+ 2+ 3+ 4+ 5+ 6+ 7+<br />
Removing 8 Servers 0+ 1+ 2+ 3+ 4+ 5+ 6+ 7+<br />
Removing main switchboard ... O.K.<br />
<br />
stop config_pnfs_stop<br />
start config_pnfs_start<br />
Shutting down pnfs services (PostgreSQL version):<br />
umount: /pnfs/fs: not mounted<br />
Stopping Heartbeat .... Ready<br />
Sorry, can't find shm 1122<br />
<br />
Starting pnfs services (PostgreSQL version):<br />
Shmcom : Installed 8 Clients and 8 Servers<br />
Starting database server for admin (/opt/pnfsdb/pnfs/databases/admin) ... Failed<br />
Starting database server for data1 (/opt/pnfsdb/pnfs/databases/data1) ... Failed<br />
Starting database server for alice (/opt/pnfsdb/pnfs/databases/alice) ... Failed<br />
Starting database server for atlas (/opt/pnfsdb/pnfs/databases/atlas) ... Failed<br />
Starting database server for dteam (/opt/pnfsdb/pnfs/databases/dteam) ... Failed<br />
Starting database server for cms (/opt/pnfsdb/pnfs/databases/cms) ... Failed<br />
Starting database server for lhcb (/opt/pnfsdb/pnfs/databases/lhcb) ... Failed<br />
Starting database server for sixt (/opt/pnfsdb/pnfs/databases/sixt) ... Failed<br />
Waiting for dbservers to register ... Ready<br />
Starting Mountd : pmountd<br />
Starting nfsd : pnfsd<br />
mount: RPC: Unable to receive; errno = Connection refused<br />
<br />
stop config_pnfs_start<br />
[ERROR] /pnfs/fs mount point exists, but is not mounted.<br />
Make sure pnfs is running on this admin node. Exiting.<br />
<br />
Starting dcache pool: Starting wn4Domain 6 5 4 3 2 1 0 Done (pid=31767)<br />
<br />
start config_dcache_pnfs_databases<br />
start config_dcache_pnfs_database_get_id<br />
start yaim_query_conf_node_dcache_pnfs_root<br />
stop yaim_query_conf_node_dcache_pnfs_root<br />
end config_dcache_pnfs_database_get_id<br />
WARN: No action needed 'dteam' has its own pnfs DB already<br />
stop config_dcache_pnfs_databases<br />
start config_pnfs_stop<br />
Shutting down pnfs services (PostgreSQL version):<br />
umount: /pnfs/fs: not mounted<br />
Stopping Heartbeat .... Ready<br />
Removing 8 Clients 0+ 1+ 2+ 3+ 4+ 5+ 6+ 7+<br />
Removing 8 Servers 0+ 1+ 2+ 3+ 4+ 5+ 6+ 7+<br />
Removing main switchboard ... O.K.<br />
<br />
stop config_pnfs_stop<br />
start config_pnfs_start<br />
Shutting down pnfs services (PostgreSQL version):<br />
umount: /pnfs/fs: not mounted<br />
Stopping Heartbeat .... Ready<br />
Sorry, can't find shm 1122<br />
<br />
Starting pnfs services (PostgreSQL version):<br />
Shmcom : Installed 8 Clients and 8 Servers<br />
Starting database server for admin (/opt/pnfsdb/pnfs/databases/admin) ... Failed<br />
Starting database server for data1 (/opt/pnfsdb/pnfs/databases/data1) ... Failed<br />
Starting database server for alice (/opt/pnfsdb/pnfs/databases/alice) ... Failed<br />
Starting database server for atlas (/opt/pnfsdb/pnfs/databases/atlas) ... Failed<br />
Starting database server for dteam (/opt/pnfsdb/pnfs/databases/dteam) ... Failed<br />
Starting database server for cms (/opt/pnfsdb/pnfs/databases/cms) ... Failed<br />
Starting database server for lhcb (/opt/pnfsdb/pnfs/databases/lhcb) ... Failed<br />
Starting database server for sixt (/opt/pnfsdb/pnfs/databases/sixt) ... Failed<br />
Waiting for dbservers to register ... Ready<br />
Starting Mountd : pmountd<br />
Starting nfsd : pnfsd<br />
mount: RPC: Unable to receive; errno = Connection refused<br />
<br />
stop config_pnfs_start<br />
start yaim_query_conf_info_system_version_door<br />
stop yaim_query_conf_info_system_version_door<br />
Warning: The new information system is still experimental.<br />
start yaim_config_info_system<br />
start yaim_config_pool_manager_add_lines<br />
start yaim_config_file_update_prepare<br />
stop yaim_config_file_update_prepare<br />
start yaim_config_file_update_done<br />
stop yaim_config_file_update_done<br />
stop yaim_config_pool_manager_add_lines<br />
start yaim_state_info_setup<br />
start yaim_config_file_update_prepare<br />
stop yaim_config_file_update_prepare<br />
start yaim_config_file_set_value<br />
stop yaim_config_file_set_value<br />
start yaim_config_file_update_done<br />
stop yaim_config_file_update_done<br />
Shutting down dcache services:<br />
Stopping gridftp-wn4Domain (pid=6085) Done<br />
Stopping gsidcap-wn4Domain (pid=6216) Done<br />
Stopping srm-wn4Domain (pid=6341) Done<br />
Pid File (/opt/d-cache/config/lastPid.replica) doesn't contain valid PID<br />
Stopping utilityDomain (pid=5789) Done<br />
Stopping httpdDomain (pid=17608) Done<br />
Stopping infoProviderDomain (pid=15988) Done<br />
Stopping pnfsDomain (pid=5886) Done<br />
Stopping adminDoorDomain (pid=5600) Done<br />
Stopping doorDomain (pid=5515) Done<br />
Stopping dirDomain (pid=5434) Done<br />
Stopping dCacheDomain (pid=5342) Done<br />
Stopping lmDomain (pid=5270) Done<br />
<br />
[ERROR] /pnfs/fs mount point exists, but is not mounted.<br />
Make sure pnfs is running on this admin node. Exiting.<br />
Stopping Globus MDS [FAILED]<br />
Starting Globus MDS (gcc32dbgpthr) [ OK ]<br />
stop yaim_state_info_setup<br />
start yaim_config_info_link<br />
stop yaim_config_info_link<br />
start yaim_config_info_template_static_set<br />
stop yaim_config_info_template_static_set<br />
start yaim_config_info_restart<br />
Stopping Globus MDS [FAILED]<br />
Starting Globus MDS (gcc32dbgpthr) [ OK ]<br />
stop yaim_config_info_restart<br />
stop yaim_config_info_system<br />
D-cache takes some time to initialise<br />
Please wait 10-15 Min's before you conclude<br />
the system is wrongly configured<br />
Configuration Complete<br />
[root@wn4 examples]#<br />
<br />
<br />
[[category:dCache]]<br />
[[category:gLite]]<br />
[[category:YAIM]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/YAIMYAIM2007-03-13T14:19:00Z<p>Andrew elwell: </p>
<hr />
<div>'''YAIM''' (Yet Another Installation Manager) is used to configure and post-install the various components.<br />
<br />
For more information see [https://twiki.cern.ch/twiki/bin/view/LCG/YaimGuide301 The LCG TWiki Page]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/IC-HEP/GinIC-HEP/Gin2007-03-01T12:19:59Z<p>Andrew elwell: </p>
<hr />
<div>The GIN activity is ment to do interoperability test between differents Grids. <br />
The official Gin web page is at [http://wiki.nesc.ac.uk/read/gin-jobs?GinResources NeSC]<br />
<br />
;General Info<br />
:CE end point: ''gw39.hep.ph.ic.ac.uk:2119/jobmanager-lcgpbs -q dteam''<br />
:SRM end point: ''gfe02.hep.ph.ic.ac.uk:8443/pnfs/hep.ph.ic.ac.uk/data/gin<br />
:BDII end point: ''gw39.hep.ph.ic.ac.uk:2170'' <br />
<br />
==Gin VO setup==<br />
<br />
*Have enabled the GIN VO on gw39.hep.ph.ic.ac.uk cluster. The Vo setup uses the following yaim configuration:<br />
<pre><nowiki><br />
VO_GIN_SW_DIR=$VO_SW_DIR/gin<br />
VO_GIN_DEFAULT_SE=$CLASSIC_HOST<br />
VO_GIN_STORAGE_DIR=$CLASSIC_STORAGE_DIR/gin<br />
VO_GIN_QUEUES="gin"<br />
VO_GIN_VOMS_SERVERS="vomss://kuiken.nikhef.nl:8443/voms/gin.ggf.org?/gin.ggf.org/"<br />
VO_GIN_VOMSES="'gin.ggf.org kuiken.nikhef.nl 15050 /O=dutchgrid/O=hosts/OU=nikhef.nl/CN=kuiken.nikhef.nl gin.ggf.org'"<br />
</nowiki></pre><br />
<br />
== TFDT Ninf-g installation ==<br />
*I have installed the ninf-g package in the shared experiment area. <br />
**VO_GIN_SW_DIR=/opt/exp_soft/gin/ is set for each gin user.<br />
**The software sits in VO_GIN_SW_DIR/SW<br />
<br />
*Followed [http://pragma-goc.rocksclusters.org/applications/multigrid-tddft/requirement.html this] page<br />
*I did not have the headers since globus was installed as binary. To obtain de headers I did the following on gw05 as root:<br />
<pre><br />
gpt-build gcc32dbg -nosrc<br />
gpt-build ====> Changing to /root/BUILD/globus_core-2.15/<br />
gpt-build ====> BUILDING FLAVOR gcc32dbg<br />
gpt-build ====> Changing to /root/BUILD<br />
gpt-build ====> REMOVING empty package globus_core-gcc32dbg-pgm_static<br />
gpt-build ====> REMOVING empty package globus_core-noflavor-doc<br />
</pre><br />
*On the same machine as ginsgm <br />
**followed the instructions at http://ninf.apgrid.org/packages/download-g2.shtml installed 2.4.0 with 2.4.1 patch<br />
** ./configure --prefix=/opt/exp_soft/gin/SW/; make; make install<br />
* The tfdt software requires fortran compile: Installed fortran compiler 9 [http://www.intel.com/cd/software/products/asmo-na/eng/compilers/219717.htm url]<br />
** Had to remove the intel fortran compiler due to licence policy.<br />
<br />
== Others ==<br />
*06/06/02: Patched the Helper.pm to support the -s option of globus-job-submit.<br />
** Patch found [https://savannah.cern.ch/bugs/?func=detailitem&item_id=4400 here]<br />
** Checked: ''globus-job-submit gw39.hep.ph.ic.ac.uk:2119/jobmanager-lcgpbs -q dteam -s test.sh test.sh'' ok</pre><br />
<br />
<br />
[[Category:London Tier2]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Lancaster_(10_Questions)Lancaster (10 Questions)2007-03-01T12:19:19Z<p>Andrew elwell: </p>
<hr />
<div>== Question 1 == <br />
<br />
''Provide the name and contact details of your local (Departmental) and Institutional network support staff.''<br />
<br />
* My Departmental network support contact is: <br />
**Brian Davies<br />
**ALex Finch/Robert Henderson<br />
**John Windsor/Richard Ion<br />
* My Institutional network support contact is:<br />
**Lancaster University ISS Network Support Team<br />
<br />
== Question 2 ==<br />
<br />
''Provide details of the responsibilities, together with the demarcation of those responsibilities, of your local and Institutional network support staff.''<br />
<br />
* The departmental contact is responsible for:<br />
**Making sure network requests in the department are resolved by themselves or passed on to the Instutional/NRO support<br />
* The institutional contact is responsible for:<br />
** This gets quite complicated since our instutional support staff are not only instutional support but are also our NRO. They are however responsible for many fabric operations (http proxy, DNS,DHCP servers). They also keep our connections to both the JANET production network and some of the services for the UKLIGHT developemnt network link we use in parrallel to the production network.<br />
<br />
== Question 3 ==<br />
<br />
''What is a Regional Network Operator (RNO), and why does this matter to you?''<br />
<br />
* An RNO is: <br />
** An RNO is responsible for maintaining a Metropolitan Area Network (MAN). OUR MAN is CANLMAN. <br />
* I care because:<br />
** The RNO through the MAN lead to access to other MAN's and the rest of the world (ROW).<br />
<br />
== Question 4 ==<br />
<br />
''What is SuperJANET4? And more importantly what is SuperJANET5?''<br />
<br />
* SuperJANET4 is:<br />
**The current 2.5GB core of the JointAcademicNETwork that supports FE HE schools and research councils.<br />
* SuperJANET5 is:<br />
**The sequel to SJ4. I has an inproved core of 10Gb and peripherals. This should come online in 2006/7 <br />
** Better info can be found at http://www.ja.net/about/topology/index.html<br />
<br />
== Questions 5, 6, 7 and 9 (part) ==<br />
<br />
5: ''Draw a simple diagram showing your local (Departmental) network and sufficient of your Institutional network such that you can trace a line from your end-system to the connection from your Institutes network into the RNO infrastructure.''<br />
<br />
6: ''On the diagram produced in answer to Question 5, show the capacity of each link in the network and provide a note against each link of its contention ratio.''<br />
<br />
7: ''On the diagram produced in answer to Question 5, colour and distinguish the switches and routers and for each device provide a note of its backplane capability.''<br />
<br />
9.x: ''On the diagram produced in answer to Question 5 colour in the firewall(s) (or other security devices).''<br />
<br />
(upload an image via http://wiki.gridpp.ac.uk/wiki/Special:Upload)<br />
== Question 8 ==<br />
* In addition to this infomation, LANCS also has 1Gb UKLight Link which connects HEPLancs to Rutherford Appleton Lab (RAL). The line has been rate tested so far up to ~900Mb and has has had a peak of 640Mbps disk to disk transfers and an average usage of 45mbps averaged over a month in addition to the university production links. <br />
''What is the average and peak traffic flow between your local (Departmental) network and the Institutional network?''<br />
<br />
* Average traffic:<br />
* Peak traffic:<br />
<br />
''What is the average and peak traffic flow between your Institutional network and the RNO?''<br />
<br />
* Average traffic:100Mbps inbound, 40Mbps outbound<br />
* Peak traffic: 850Mbps<br />
<br />
''What is the total capacity of your Institutional connection to the RNO?''<br />
<br />
* Our total capacity is: There is a 1Gb connection between our LANCS and CANLMAN. <br />
<br />
''What are the upgrade plans for your local (Departmental) network; your Institutional network and the network run by the RNO?''<br />
<br />
* Departmental plans: Connection of 500Mbps to Manchester University HEP group via UKLight. Else to follow the Institutes recommendations.<br />
* Institutional plans: Core Network upgrade in progress to Fully Gigabit Backbone<br />
* RNO plans: SJ5?<br />
<br />
== Question 9 ==<br />
<br />
''Do you believe in IS Security? Does your Institute believe in IS Security?''<br />
<br />
* I'm a believer: YES<br />
* We're collective believers: YES<br />
<br />
''Do you believe in firewalls? Does your Institute believe in firewalls?''<br />
<br />
* I'm a believer: YES<br />
* We're collective believers: YES<br />
<br />
''Provide information of how changes are made to the rule set of the firewall.''<br />
<br />
* Firewall rules are changed by: Request and negotiation with ISS for core services and internally approved for HEP Machine Firewalls.<br />
<br />
''Provide a note of the capacity of this device and what happens when that capacity is exceeded.''<br />
<br />
* The capacity is: Within the HEP machines the firewall is software and has not been calculated.<br />
<br />
== Question 10 ==<br />
<br />
''What is the best performance you can achieve from your end-system to an equivalent system located in some geographically remote (and friendly!) Institute?''<br />
<br />
* Best performance is:<br />
<br />
For your end-system: 640Mbps disk to disk file transfer from LANCS to RAL over uklight link.<br />
<br />
: ''Do you understand the kernel, the bus structure; the NIC; and the disk system?''<br />
<br />
* I understand: YES<br />
<br />
: Do you understand TCP tuning and what it can do for you? <br />
<br />
* I understand: YES-And recommend it!!<br />
<br />
: Do you understand your application and what it can do to your performance? <br />
<br />
* I understand: Partially<br />
<br />
<br />
[[Category:NorthGrid]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/LT2_CMSLT2 CMS2007-03-01T12:18:21Z<p>Andrew elwell: </p>
<hr />
<div>= Activity Log =<br />
* 2006/12/15: Made the rfio [https://uimon.cern.ch/twiki/bin/view/CMS/CMSSWandDPMrfio#New_recipe_for_CMSSW_0_8_3_and_C hack] fro RHUL<br />
* 2006/12/15: Storage problems at QMUL prevents cms from running properly. <br />
* 2006/12/15: Made the rfio [https://uimon.cern.ch/twiki/bin/view/CMS/CMSSWandDPMrfio#New_recipe_for_CMSSW_0_8_3_and_C hack] at Brunel on dgc-grid-44. Both dgc-grid-40 and dgc-grid-44 have been done.<br />
<br />
<br />
[[Category:London Tier2]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Giuliano_castelliGiuliano castelli2007-03-01T12:14:39Z<p>Andrew elwell: </p>
<hr />
<div>#REDIRECT [[User:Giuliano_castelli]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Giuliano_CastelliGiuliano Castelli2007-03-01T12:14:06Z<p>Andrew elwell: </p>
<hr />
<div>#REDIRECT [[User:Giuliano_castelli]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Filesystem_Tests_With_CERN_KernelFilesystem Tests With CERN Kernel2007-03-01T12:09:07Z<p>Andrew elwell: </p>
<hr />
<div>{| border=1 style="text-align:center"<br />
|+CERN Kernel 2.4 ext2<br />
|-<br />
|<br />
|<br />
| colspan=3 align="center"| Parallel Streams<br />
|-<br />
|<br />
! !! 3 !! 5 !! 10<br />
|-<br />
| rowspan=3| Files<br />
! 3<br />
| 254, 0 || 226, 0 || bgcolor=red|123, 13<br />
|-<br />
! 5<br />
| 254, 0 || 238, 0 || bgcolor=red|182, 17<br />
|-<br />
! 10<br />
| 243, 0 || 236, 0 || 201, 0<br />
|}<br />
<br />
{| border=1 style="text-align:center"<br />
|+CERN Kernel 2.4 ext3<br />
|-<br />
|<br />
|<br />
| colspan=3 align="center"| Parallel Streams<br />
|-<br />
|<br />
! !! 3 !! 5 !! 10<br />
|-<br />
| rowspan=3| Files<br />
! 3<br />
| 190, 0 || 213, 0 || 238, 0<br />
|-<br />
! 5<br />
| 199, 0 || 217, 0 || 231, 0<br />
|-<br />
! 10<br />
| 193, 0 || 213, 0 || 244, 0<br />
|}<br />
<br />
{| border=1 style="text-align:center"<br />
|+CERN Kernel 2.4 xfs<br />
|-<br />
|<br />
|<br />
| colspan=3 align="center"| Parallel Streams<br />
|-<br />
|<br />
! !! 3 !! 5 !! 10<br />
|-<br />
| rowspan=3| Files<br />
! 3<br />
| 287, 0 || 321, 0 || 335, 0<br />
|-<br />
! 5<br />
| 282, 0 || 330, 0 || 333, 0<br />
|-<br />
! 10<br />
| 294, 0 || 329, 0 || 331, 0<br />
|}<br />
<br />
<pre><br />
sdf1<br />
====<br />
f=3<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1075.80374408 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 223.089017231Mb/s<br />
<br />
f=5<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1207.00010681 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 198.840081824Mb/s<br />
<br />
f=10<br />
T=3<br />
<br />
<br />
f=3<br />
T=5<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1000.90465903 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 239.783078073Mb/s<br />
<br />
f=5<br />
T=5<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1096.37612605 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 218.902978911Mb/s<br />
<br />
f=10<br />
T=5<br />
<br />
<br />
f=3<br />
T=10<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 987.092818022 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 243.138229372Mb/s<br />
<br />
<br />
f=5<br />
T=10<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1016.71633601 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 236.054041328Mb/s<br />
<br />
f=10<br />
T=10<br />
<br />
<br />
<br />
sdg1<br />
====<br />
f=3<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 835.333688021 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 287.310332915Mb/s<br />
<br />
f=5<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 748.493108988 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 320.644234554Mb/s<br />
<br />
f=10<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 715.599011183 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 335.383358906Mb/s<br />
<br />
f=3<br />
T=5<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 851.735743999 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 281.77753686Mb/s<br />
<br />
f=5<br />
T=5<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 728.07062602 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 329.638350213Mb/s<br />
<br />
f=10<br />
T=5<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 720.479511976 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 333.111484797Mb/s<br />
<br />
f=3<br />
T=10<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 817.251503944 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 293.667247893Mb/s<br />
<br />
f=5<br />
T=10<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 728.760859013 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 329.326139065Mb/s<br />
<br />
f=10<br />
T=10<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 724.410189152 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 331.30400924Mb/s<br />
<br />
<br />
<br />
<br />
sde1<br />
====<br />
f=3<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1263.73733807 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 189.912882029Mb/s<br />
<br />
f=5<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1127.37044215 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 212.884772411Mb/s<br />
<br />
f=10<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1009.9858582 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 237.627089579Mb/s<br />
<br />
f=3<br />
T=5<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1208.95005703 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 198.519366954Mb/s<br />
<br />
f=5<br />
T=5<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1108.64772201 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 216.479946908Mb/s<br />
<br />
f=10<br />
T=5<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1041.03944397 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 230.53881521Mb/s<br />
<br />
f=3<br />
T=10<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1243.77486515 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 192.960966429Mb/s<br />
<br />
f=5<br />
T=10<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1127.3604331 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 212.886662467Mb/s<br />
<br />
f=10<br />
T=10<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 985.732695103 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 243.473713708Mb/s<br />
</pre><br />
<br />
<br />
<br />
[[Category:Optimisation]]<br />
[[Category:Storage]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Filesystem_Tests_Using_Vanilla_KernelFilesystem Tests Using Vanilla Kernel2007-03-01T12:08:46Z<p>Andrew elwell: </p>
<hr />
<div>{| border=1 style="text-align:center"<br />
|+Vanilla Kernel 2.4 ext2<br />
|-<br />
|<br />
|<br />
| colspan=3 align="center"| Parallel Streams<br />
|-<br />
|<br />
! !! 3 !! 5 !! 10<br />
|-<br />
| rowspan=3| Files<br />
! 3<br />
| 233, 0 || 174, 0 || bgcolor=red|153, 4<br />
|-<br />
! 5<br />
| bgcolor=red|144, 2 || 183, 0 || bgcolor=red|158, 2<br />
|-<br />
! 10<br />
| bgcolor=red|160, 7 || bgcolor=red|160, 8 || bgcolor=red|161, 8<br />
|}<br />
<br />
{| border=1 style="text-align:center"<br />
|+Vanilla Kernel 2.4 ext3<br />
|-<br />
|<br />
|<br />
| colspan=3 align="center"| Parallel Streams<br />
|-<br />
|<br />
! !! 3 !! 5 !! 10<br />
|-<br />
| rowspan=3| Files<br />
! 3<br />
| 166, 0 || 201, 0 || 115, 0<br />
|-<br />
! 5<br />
| 198, 0 || 228, 0 || 200, 0<br />
|-<br />
! 10<br />
| 254, 0 || 219, 0 || 213, 0<br />
|}<br />
<br />
{| border=1 style="text-align:center"<br />
|+Vanilla Kernel 2.4 jfs<br />
|-<br />
|<br />
|<br />
| colspan=3 align="center"| Parallel Streams<br />
|-<br />
|<br />
! !! 3 !! 5 !! 10<br />
|-<br />
| rowspan=3| Files<br />
! 3<br />
| 289, 0 || 292, 0 || 298, 0<br />
|-<br />
! 5<br />
| bgcolor=red|301, 3 || bgcolor=red|301, 3 || bgcolor=red|305, 2<br />
|-<br />
! 10<br />
| bgcolor=red|263, 10 || bgcolor=red|289, 4 || bgcolor=red|276, 9<br />
|}<br />
<br />
<pre><br />
sdf1<br />
====<br />
f=3<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1031.46830702 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 232.678016733Mb/s<br />
<br />
f=5<br />
T=3<br />
Transfer Bandwidth Report:<br />
28/30 transferred in 1551.80322504 seconds<br />
28000000000.0 bytes transferred.<br />
Bandwidth: 144.348198525Mb/s<br />
<br />
f=10<br />
T=3<br />
Transfer Bandwidth Report:<br />
23/30 transferred in 1153.44354892 seconds<br />
23000000000.0 bytes transferred.<br />
Bandwidth: 159.522327879Mb/s<br />
<br />
f=3<br />
T=5<br />
<br />
<br />
f=5<br />
T=5<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1312.29309702 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 182.885973069Mb/s<br />
<br />
f=10<br />
T=5<br />
Transfer Bandwidth Report:<br />
22/30 transferred in 1102.857342 seconds<br />
22000000000.0 bytes transferred.<br />
Bandwidth: 159.585463411Mb/s<br />
<br />
f=3<br />
T=10<br />
Transfer Bandwidth Report:<br />
26/30 transferred in 1363.98007178 seconds<br />
26000000000.0 bytes transferred.<br />
Bandwidth: 152.494896592Mb/s<br />
<br />
f=5<br />
T=10<br />
Transfer Bandwidth Report:<br />
28/30 transferred in 1418.51655602 seconds<br />
28000000000.0 bytes transferred.<br />
Bandwidth: 157.911445622Mb/s<br />
<br />
f=10<br />
T=10<br />
Transfer Bandwidth Report:<br />
22/30 transferred in 1093.01589894 seconds<br />
22000000000.0 bytes transferred.<br />
Bandwidth: 161.022360398Mb/s<br />
<br />
<br />
<br />
sde1<br />
====<br />
f=3<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1443.75944805 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 166.23267839Mb/s<br />
<br />
f=5<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1211.4208231 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 198.114474693Mb/s<br />
<br />
f=10<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 944.127444983 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 254.202969393Mb/s<br />
<br />
f=3<br />
T=5<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1197.24895501 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 200.459561059Mb/s<br />
<br />
f=5<br />
T=5<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1052.02088594 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 228.13235289Mb/s<br />
<br />
f=10<br />
T=5<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1097.88622117 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 218.60188731Mb/s<br />
<br />
f=3<br />
T=10<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 2087.89984584 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 114.948042397Mb/s<br />
<br />
f=5<br />
T=10<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1203.12005496 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 199.481339381Mb/s<br />
<br />
f=10<br />
T=10<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1128.3005209 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 212.709287601Mb/s<br />
<br />
<br />
<br />
sdg1<br />
====<br />
f=3<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 810.411117792 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 296.145986563Mb/s<br />
<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 805.420502901 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 297.980991464Mb/s<br />
<br />
f=5<br />
T=3<br />
Transfer Bandwidth Report:<br />
27/30 transferred in 717.959347963 seconds<br />
27000000000.0 bytes transferred.<br />
Bandwidth: 300.852688405Mb/s<br />
<br />
f=10<br />
T=3<br />
Transfer Bandwidth Report:<br />
20/30 transferred in 609.095984936 seconds<br />
20000000000.0 bytes transferred.<br />
Bandwidth: 262.68437809Mb/s<br />
<br />
f=3<br />
T=5<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 823.212517023 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 291.540756533Mb/s<br />
<br />
f=5<br />
T=5<br />
27/30 transferred in 717.309243917 seconds<br />
27000000000.0 bytes transferred.<br />
Bandwidth: 301.125353997Mb/s<br />
<br />
f=10<br />
T=5<br />
Transfer Bandwidth Report:<br />
26/30 transferred in 719.97972703 seconds<br />
26000000000.0 bytes transferred.<br />
Bandwidth: 288.897023334Mb/s<br />
<br />
f=3<br />
T=10<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 806.550590992 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 297.563479192Mb/s<br />
<br />
f=5<br />
T=10<br />
Transfer Bandwidth Report:<br />
28/30 transferred in 735.701839924 seconds<br />
28000000000.0 bytes transferred.<br />
Bandwidth: 304.471170037Mb/s<br />
<br />
f=10<br />
T=10<br />
Transfer Bandwidth Report:<br />
21/30 transferred in 608.105660915 seconds<br />
21000000000.0 bytes transferred.<br />
Bandwidth: 276.267778443Mb/s<br />
</pre><br />
<br />
<br />
[[Category:Optimisation]]<br />
[[Category:Storage]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Durham_(10_Questions)Durham (10 Questions)2007-03-01T12:08:18Z<p>Andrew elwell: </p>
<hr />
<div>1. <br />
<br />
Dr. David Stockdale (Physics Network Manager) D.P.Stockdale@durham.ac.uk, +44 191 334 43644<br />
Mr. Tom Turnbull (Acting Head of Infrastructure) Tom.Turnbull@durham.ac.uk +44 191 334 42783<br />
<br />
2.<br />
<br />
Dr. Stockdale is responcible for all of the Physics network, Mr. Turnbull is responsible for the connection to Physics, the connection to JANET via NORMAN and other networks on campus<br />
<br />
3.<br />
<br />
NORMAN, they provide the connection to JANET.<br />
<br />
4.<br />
<br />
SuperJANET 4 is the current version of Janet (1G connection) SuperJANET 5 is the next version, it will run at 2.5 G<br />
<br />
5. <br />
[[Image:/scratch/Mark.jpg]]<br />
<br />
6. <br />
<br />
We have <br />
<br />
7. <br />
<br />
Departmental Router (Black Diamond) 384G 96 million bps<br />
<br />
HP 2648 in rack 13.66 10 million pps<br />
<br />
8. <br />
<br />
Traffic flow between department and campus - <br />
<br />
110 mbps Down <br />
70 mbps Up<br />
<br />
Traffic between campus and Norman (from ITS)<br />
<br />
Average (in + out) traffic on the NorMAN link for week Nov 14 to Nov 21<br />
<br />
4.181% or just under 42Mbit<br />
<br />
Peek or Maximum in the same time was<br />
<br />
10.934% just under 110 Mbit<br />
<br />
Connection from Durham to NORMAN 1G<br />
<br />
Upgrade plans<br />
<br />
Not sure.<br />
<br />
9. <br />
<br />
IS Security not sure what is ment by this term.<br />
<br />
Yes we have a firewall <br />
<br />
Changes are done by the firewall team in ITS, the change requests are submitted via a password protected web form.<br />
<br />
The device capacity is <br />
<br />
The firewall has gigabit ports on the internal and external interfaces<br />
<br />
However I would not expect to get more than 650Mbit depending on the type of <br />
traffic and the load put on the firewall.<br />
<br />
<br />
<br />
10.<br />
<br />
I have a basic understanding, however a course may be of some use.<br />
<br />
I have never done TCP tuning again a course would be of use.<br />
<br />
I don't have an application.<br />
<br />
<br />
[[Category:ScotGrid]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Filesystem_Tests_Using_CERN2.6_KernelFilesystem Tests Using CERN2.6 Kernel2007-03-01T12:07:34Z<p>Andrew elwell: </p>
<hr />
<div>{| border=1 style="text-align:center"<br />
|+CERN Kernel 2.6 ext2<br />
|-<br />
|<br />
|<br />
| colspan=3 align="center"| Parallel Streams<br />
|-<br />
|<br />
! !! 3 !! 5 !! 10<br />
|-<br />
| rowspan=3| Files<br />
! 3<br />
| 229, 0 || bgcolor=red|153, 4 || 122, 0<br />
|-<br />
! 5<br />
| 206, 0 || bgcolor=red|152, 1 || bgcolor=red|126, 1<br />
|-<br />
! 10<br />
| 214, 0 || bgcolor=red|154, 1 || 122, 0<br />
|}<br />
<br />
{| border=1 style="text-align:center"<br />
|+CERN Kernel 2.6 ext3<br />
|-<br />
|<br />
|<br />
| colspan=3 align="center"| Parallel Streams<br />
|-<br />
|<br />
! !! 3 !! 5 !! 10<br />
|-<br />
| rowspan=3| Files<br />
! 3<br />
| 177, 0 || 98, 0 || bgcolor=red|90, 1<br />
|-<br />
! 5<br />
| bgcolor=red|154, 1 || bgcolor=red|109, 3 || 111, 0<br />
|-<br />
! 10<br />
| bgcolor=red|152, 1 || bgcolor=red|125, 1 || 95, 0<br />
|}<br />
<br />
{| border=1 style="text-align:center"<br />
|+CERN Kernel 2.6 xfs<br />
|-<br />
|<br />
|<br />
| colspan=3 align="center"| Parallel Streams<br />
|-<br />
|<br />
! !! 3 !! 5 !! 10<br />
|-<br />
| rowspan=3| Files<br />
! 3<br />
| 333, 0 || 362, 0 || 398, 0<br />
|-<br />
! 5<br />
| 380, 0 || 356, 0 || 364, 0<br />
|-<br />
! 10<br />
| 325, 0 || 324, 0 || 319, 0<br />
|}<br />
<br />
{| border=1 style="text-align:center"<br />
|+CERN Kernel 2.6 jfs<br />
|-<br />
|<br />
|<br />
| colspan=3 align="center"| Parallel Streams<br />
|-<br />
|<br />
! !! 3 !! 5 !! 10<br />
|-<br />
| rowspan=3| Files<br />
! 3<br />
| 352, 0 || 355, 0 || 339, 0<br />
|-<br />
! 5<br />
| 294, 0 || 267, 0 || 282, 0<br />
|-<br />
! 10<br />
| 186, 0 || 247, 0 || 244, 0<br />
|}<br />
<br />
<pre><br />
sdg1 (jfs)<br />
====<br />
f=3<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 681.71413517 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 352.053724015Mb/s<br />
<br />
f=5<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 675.533220053 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 355.274904144Mb/s<br />
<br />
f=10<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 707.84757185 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 339.056047579Mb/s<br />
<br />
f=3<br />
T=5<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 816.620950937 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 293.894002749Mb/s<br />
<br />
f=5<br />
T=5<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 900.071187973 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 266.645575602Mb/s<br />
<br />
f=10<br />
T=5<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 850.99517107 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 282.022751901Mb/s<br />
<br />
f=3<br />
T=10<br />
Transfer Bandwidth Report:<br />
26/30 transferred in 1116.2180872 seconds<br />
26000000000.0 bytes transferred.<br />
Bandwidth: 186.34351332Mb/s<br />
<br />
f=5<br />
T=10<br />
Transfer Bandwidth Report:<br />
28/30 transferred in 906.35223794 seconds<br />
28000000000.0 bytes transferred.<br />
Bandwidth: 247.144532361Mb/s<br />
<br />
f=10<br />
T=10<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 985.121921062 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 243.624667027Mb/s<br />
<br />
<br />
<br />
sdg1 (xfs)<br />
====<br />
f=3<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 721.279257774 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 332.742134774Mb/s<br />
<br />
f=5<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 662.581736088 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 362.219462035Mb/s<br />
<br />
f=10<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 603.564337969 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 397.637807442Mb/s<br />
<br />
f=3<br />
T=5<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 632.167878866 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 379.645989655Mb/s<br />
<br />
f=5<br />
T=5<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 674.963133812 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 355.574975843Mb/s<br />
<br />
f=10<br />
T=5<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 660.231583834 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 363.508813993Mb/s<br />
<br />
f=3<br />
T=10<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 738.461091995 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 325.000196492Mb/s<br />
<br />
f=5<br />
T=10<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 741.23122716 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 323.785603205Mb/s<br />
<br />
f=10<br />
T=10<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 751.672570944 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 319.287957652Mb/s<br />
<br />
<br />
<br />
sde1 (ext3)<br />
====<br />
f=3<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1358.98828983 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 176.601963237Mb/s<br />
<br />
f=5<br />
T=3<br />
Transfer Bandwidth Report:<br />
29/30 transferred in 1507.10651708 seconds<br />
29000000000.0 bytes transferred.<br />
Bandwidth: 153.93736101Mb/s<br />
<br />
f=10<br />
T=3<br />
Transfer Bandwidth Report:<br />
29/30 transferred in 1531.7594192 seconds<br />
29000000000.0 bytes transferred.<br />
Bandwidth: 151.459816138Mb/s<br />
<br />
f=3<br />
T=5<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 2454.30401087 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 97.7873967272Mb/s<br />
<br />
f=5<br />
T=5<br />
Transfer Bandwidth Report:<br />
27/30 transferred in 1977.34470105 seconds<br />
27000000000.0 bytes transferred.<br />
Bandwidth: 109.237403011Mb/s<br />
<br />
f=10<br />
T=5<br />
Transfer Bandwidth Report:<br />
29/30 transferred in 1863.70127201 seconds<br />
29000000000.0 bytes transferred.<br />
Bandwidth: 124.483469258Mb/s<br />
<br />
f=3<br />
T=10<br />
Transfer Bandwidth Report:<br />
29/30 transferred in 2580.96023893 seconds<br />
29000000000.0 bytes transferred.<br />
Bandwidth: 89.8890252164Mb/s<br />
<br />
f=5<br />
T=10<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 2170.59897304 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 110.568558716Mb/s<br />
<br />
f=10<br />
T=10<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 2521.02329206 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 95.1994377662Mb/s<br />
<br />
sdf1 (ext2)<br />
====<br />
f=3<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1046.92429304 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 229.242937236Mb/s<br />
<br />
f=5<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1163.42442679 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 206.28757182Mb/s <br />
f=10<br />
T=3<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1124.09160089 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 213.505731925Mb/s<br />
<br />
f=3<br />
T=5<br />
Transfer Bandwidth Report:<br />
26/30 transferred in 1362.75401402 seconds<br />
26000000000.0 bytes transferred.<br />
Bandwidth: 152.632094905Mb/s<br />
<br />
f=5<br />
T=5<br />
Transfer Bandwidth Report:<br />
29/30 transferred in 1523.42930198 seconds<br />
29000000000.0 bytes transferred.<br />
Bandwidth: 152.287998989Mb/s<br />
<br />
f=10<br />
T=5<br />
Transfer Bandwidth Report:<br />
29/30 transferred in 1504.65775394 seconds<br />
29000000000.0 bytes transferred.<br />
Bandwidth: 154.187887174Mb/s<br />
<br />
f=3<br />
T=10<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1961.45055199 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 122.358424869Mb/s<br />
<br />
f=5<br />
T=10<br />
Transfer Bandwidth Report:<br />
29/30 transferred in 1837.48913097 seconds<br />
29000000000.0 bytes transferred.<br />
Bandwidth: 126.25925024Mb/s<br />
<br />
f=10<br />
T=10<br />
Transfer Bandwidth Report:<br />
30/30 transferred in 1971.70147014 seconds<br />
30000000000.0 bytes transferred.<br />
Bandwidth: 121.722280799Mb/s<br />
</pre><br />
<br />
<br />
<br />
[[Category:Optimisation]]<br />
[[Category:Storage]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Filesystem_Tests_Using_BonnieFilesystem Tests Using Bonnie2007-03-01T12:06:42Z<p>Andrew elwell: </p>
<hr />
<div><pre><br />
# bonnie++ -d /disk1/tmp -s 16000 -r 4096 -u nobody -m pool3<br />
# bonnie++ -d /gridstorage/sde1/tmp -s 16000 -r 4096 -u nobody -m pool3<br />
# bonnie++ -d /gridstorage/sdf1/tmp -s 16000 -r 4096 -u nobody -m pool3<br />
# bonnie++ -d /gridstorage/sdg1/tmp -s 16000 -r 4096 -u nobody -m pool3<br />
<br />
<br />
=============<br />
reiser<br />
=============<br />
Using uid:99, gid:99.<br />
Writing with putc()...done<br />
Writing intelligently...done<br />
Rewriting...done<br />
Reading with getc()...done<br />
Reading intelligently...done<br />
start 'em...done...done...done...<br />
Create files in sequential order...done.<br />
Stat files in sequential order...done.<br />
Delete files in sequential order...done.<br />
Create files in random order...done.<br />
Stat files in random order...done.<br />
Delete files in random order...done.<br />
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-<br />
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--<br />
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP<br />
pool3 16000M 27835 95 36808 30 12930 7 13753 39 36567 13 275.2 1<br />
------Sequential Create------ --------Random Create--------<br />
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--<br />
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP<br />
16 17640 100 +++++ +++ 14922 92 17651 96 +++++ +++ 15059 100<br />
pool3,16000M,27835,95,36808,30,12930,7,13753,39,36567,13,275.2,1,16,17640,100,++ +++,+++,14922,92,17651,96,+++++,+++,15059,100<br />
<br />
<br />
=============<br />
x3<br />
=============<br />
Using uid:99, gid:99.<br />
Writing with putc()...done<br />
Writing intelligently...done<br />
Rewriting...done<br />
Reading with getc()...done<br />
Reading intelligently...done<br />
start 'em...done...done...done...<br />
Create files in sequential order...done.<br />
Stat files in sequential order...done.<br />
Delete files in sequential order...done.<br />
Create files in random order...done.<br />
Stat files in random order...done.<br />
Delete files in random order...done.<br />
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-<br />
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--<br />
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP<br />
pool3 16000M 27687 96 40910 33 20110 11 11250 31 73331 22 587.1 2<br />
------Sequential Create------ --------Random Create--------<br />
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--<br />
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP<br />
16 2108 89 +++++ +++ +++++ +++ 2226 99 +++++ +++ 5005 100<br />
pool3,16000M,27687,96,40910,33,20110,11,11250,31,73331,22,587.1,2,16,2108,89,+++++,+++,+++++,+++,2226,99,+++++,+++,5005,100<br />
<br />
=============<br />
x2<br />
=============<br />
Using uid:99, gid:99.<br />
Writing with putc()...done<br />
Writing intelligently...done<br />
Rewriting...done<br />
Reading with getc()...done<br />
Reading intelligently...done<br />
start 'em...done...done...done...<br />
Create files in sequential order...done.<br />
Stat files in sequential order...done.<br />
Delete files in sequential order...done.<br />
Create files in random order...done.<br />
Stat files in random order...done.<br />
Delete files in random order...done.<br />
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-<br />
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--<br />
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP<br />
pool3 16000M 33201 94 44385 15 19243 9 21596 62 65350 19 555.2 2<br />
------Sequential Create------ --------Random Create--------<br />
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--<br />
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP<br />
16 3977 99 +++++ +++ +++++ +++ 4097 100 +++++ +++ 10107 99<br />
pool3,16000M,33201,94,44385,15,19243,9,21596,62,65350,19,555.2,2,16,3977,99,+++++,+++,+++++,+++,4097,100,+++++,+++,10107,99<br />
<br />
=============<br />
xfs<br />
=============<br />
Using uid:99, gid:99.<br />
Writing with putc()...done<br />
Writing intelligently...done<br />
Rewriting...done<br />
Reading with getc()...done<br />
Reading intelligently...done<br />
start 'em...done...done...done...<br />
Create files in sequential order...done.<br />
Stat files in sequential order...done.<br />
Delete files in sequential order...done.<br />
Create files in random order...done.<br />
Stat files in random order...done.<br />
Delete files in random order...done.<br />
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-<br />
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--<br />
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP<br />
pool3 16000M 31129 90 43911 16 19118 9 21975 61 63994 20 400.7 1<br />
------Sequential Create------ --------Random Create--------<br />
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--<br />
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP<br />
16 3843 27 +++++ +++ 3083 23 3858 32 +++++ +++ 2557 18<br />
pool3,16000M,31129,90,43911,16,19118,9,21975,61,63994,20,400.7,1,16,3843,27,+++++,+++,3083,23,3858,32,+++++,+++,2557,18<br />
</pre><br />
<br />
<br />
[[Category:Optimisation]]<br />
[[Category:Storage]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Filesystem_Tests_SummaryFilesystem Tests Summary2007-03-01T12:06:03Z<p>Andrew elwell: </p>
<hr />
<div>{| border="1"<br />
|+'''CERN'''<br />
|-<br />
! file system !! f !! T !! time (sec) !! Bandwidth (Mb/sec) !! failures<br />
|-<br />
! sdf1<br />
| 3 || 3 || 947 || 254 ||<br />
|-<br />
! ext2<br />
| 5 || 3 || 1064 || 226 ||<br />
|-<br />
!<br />
| 10 || 3 || 1109 || 123 || 13<br />
|-<br />
!<br />
| 3 || 5 || 946 || 254 ||<br />
|-<br />
!<br />
| 5 || 5 || 1008 || 238 ||<br />
|-<br />
!<br />
| 10 || 5 || 571 || 182 || 17<br />
|-<br />
!<br />
| 3 || 10 || 987 || 243 ||<br />
|-<br />
!<br />
| 5 || 10 || 1017 || 236 ||<br />
|-<br />
!<br />
| 10 || 10 || 1190 || 201 ||<br />
|-<br />
! file system !! f !! T !! time (sec) !! Bandwidth (Mb/sec) !! failures<br />
|-<br />
! sde1<br />
| 3 || 3 || 1264 || 190 ||<br />
|-<br />
! ext3<br />
| 5 || 3 || 1127 || 213 ||<br />
|-<br />
!<br />
| 10 || 3 || 1010 || 238 ||<br />
|-<br />
!<br />
| 3 || 5 || 1209 || 199 ||<br />
|-<br />
!<br />
| 5 || 5 || 1109 || 217 ||<br />
|-<br />
!<br />
| 10 || 5 || 1041 || 231 ||<br />
|-<br />
!<br />
| 3 || 10 || 1244 || 193 ||<br />
|-<br />
!<br />
| 5 || 10 || 1127 || 213 ||<br />
|-<br />
!<br />
| 10 || 10 || 986 || 244 ||<br />
|-<br />
! file system !! f !! T !! time (sec) !! Bandwidth (Mb/sec) !! failures<br />
|-<br />
! sdg1<br />
| 3 || 3 || 835 || 287 ||<br />
|-<br />
! xfs<br />
| 5 || 3 || 748 || 321 ||<br />
|-<br />
!<br />
| 10 || 3 || 716 || 335 ||<br />
|-<br />
!<br />
| 3 || 5 || 852 || 282 ||<br />
|-<br />
!<br />
| 5 || 5 || 728 || 330 ||<br />
|-<br />
!<br />
| 10 || 5 || 721 || 333 ||<br />
|-<br />
!<br />
| 3 || 10 || 817 || 294 ||<br />
|-<br />
!<br />
| 5 || 10 || 729 || 329 ||<br />
|-<br />
!<br />
| 10 || 10 || 724 || 331 ||<br />
|}<br />
<br />
{| border="1"<br />
|+'''Vanilla'''<br />
|-<br />
! file system !! f !! T !! time (sec) !! Bandwidth (Mb/sec) !! failures<br />
|-<br />
! sdf1<br />
| 3 || 3 || 1032 || 233 ||<br />
|-<br />
! ext2<br />
| 5 || 3 || 1552 || 144 || 2<br />
|-<br />
!<br />
| 10 || 3 || 1153 || 160 || 7<br />
|-<br />
!<br />
| 3 || 5 || 1382 || 174 ||<br />
|-<br />
!<br />
| 5 || 5 || 1312 || 183 ||<br />
|-<br />
!<br />
| 10 || 5 || 1103 || 160 || 8<br />
|-<br />
!<br />
| 3 || 10 || 1364 || 153 || 4<br />
|-<br />
!<br />
| 5 || 10 || 1419 || 158 || 2<br />
|-<br />
!<br />
| 10 || 10 || 1093 || 161 || 8<br />
|-<br />
! file system !! f !! T !! time (sec) !! Bandwidth (Mb/sec) !! failures<br />
|-<br />
! sde1<br />
| 3 || 3 || 1444 || 166 ||<br />
|-<br />
! ext3<br />
| 5 || 3 || 1221 || 198 ||<br />
|-<br />
!<br />
| 10 || 3 || 944 || 254 ||<br />
|-<br />
!<br />
| 3 || 5 || 1197 || 201 ||<br />
|-<br />
!<br />
| 5 || 5 || 1052 || 288 ||<br />
|-<br />
!<br />
| 10 || 5 || 1098 || 219 ||<br />
|-<br />
!<br />
| 3 || 10 || 2088 || 115 ||<br />
|-<br />
!<br />
| 5 || 10 || 1203 || 200 ||<br />
|-<br />
!<br />
| 10 || 10 || 1128 || 213 ||<br />
|-<br />
! file system !! f !! T !! time (sec) !! Bandwidth (Mb/sec) !! failures<br />
|-<br />
! sdg1<br />
| 3 || 3 || 805 || 298 ||<br />
|-<br />
! jfs<br />
| 5 || 3 || 718 || 301 || 3<br />
|-<br />
!<br />
| 10 || 3 || 609 || 263 || 10<br />
|-<br />
!<br />
| 3 || 5 || 823 || 292 ||<br />
|-<br />
!<br />
| 5 || 5 || 717 || 301 || 3<br />
|-<br />
!<br />
| 10 || 5 || 720 || 289 || 4<br />
|-<br />
!<br />
| 3 || 10 || 807 || 298 ||<br />
|-<br />
!<br />
| 5 || 10 || 736 || 305 || 2<br />
|-<br />
!<br />
| 10 || 10 || 608 || 276 || 9<br />
|}<br />
<br />
<br />
<br />
[[Category:Storage]]<br />
[[Category:Optimisation]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Edinburgh_SC4Edinburgh SC42007-03-01T12:03:39Z<p>Andrew elwell: </p>
<hr />
<div>This page contains a log of the FTS tests that were carried out as part of Edinburgh's participation in SC4. In conjunction, these tests were used to undertand the local dCache setup.<br />
<br />
[[Service_Challenge_Transfer_Tests]]<br />
<br />
== Outline of tests '''(draft)''' ==<br />
<br />
SC4 requirement is for a sustained transfer of 1TB of data from the RAL Tier-1. As a warm up for this test, I will transfer smaller amount of data from the Tier-1 and at the same time modify our dCache setup to observe the effect that the following have on the data transfer rate:<br />
<br />
* only 1 NFS mounted pool<br />
* only NFS mounted pools<br />
* only 1 RAID volume pool<br />
* only RAID pools<br />
* all pools available for use<br />
<br />
It is expected that there will be a decrease in the transfer rate when only the NFS mounted pools are available to the dCache, but I would like to get a quantitative results. This data will be used to modify our dCache setup in the future towards a more optimal configuration. In addition to modifying the dCache setup, it is also possible to use FTS to modify the configuration of the RAL-ED channel (in terms of number of concurrent file transfers and parallel streams). It is hoped that there will be sufficient time to study these effects on the transfer rate.<br />
<br />
=== 12/12/05 ===<br />
<br />
Started trying to initiate FTS tests. FTS was accepting the jobs, but querying the transfer status produced a strange error message. Problem was eventually resolved when a new <code>myproxy -d</code> was issued.<br />
<br />
=== 13/12/05 ===<br />
<br />
FTS tests were started properly. Initially just using Matt Hodges test script to start some transfers in order to observe the performance before any tuning took place. Also gave chance to study the FTS logs and ganglia monitoring pages. Submitted a batch transfer of files<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams<br />
<br />
|-<br />
|50GB<br />
|5<br />
|5<br />
|}<br />
<br />
Initially transfers were successful, then started seeing error messages in the dCache pool node gridftp door logs:<br />
<br />
12/13 16:14:48 Cell(GFTP-dcache-Unknown-998@gridftpdoor-dcacheDomain) : CellAdapter: cought SocketTimeoutException: Do nothing, just allow looping to continue<br />
12/13 16:14:48 Cell(GFTP-dcache-Unknown-994@gridftpdoor-dcacheDomain) : CellAdapter: cought SocketTimeoutException: Do nothing, just allow looping to continue<br />
12/13 16:14:48 Cell(GFTP-dcache-Unknown-999@gridftpdoor-dcacheDomain) : CellAdapter: cought SocketTimeoutException: Do nothing, just allow looping to continue<br />
<br />
Not clear what is causing this. Had to cancel the transfer due to this. Even if I now just submit a single file for transfer, I get an error these error messages. Setup another transfer:<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams<br />
<br />
|-<br />
|25*1GB<br />
|5<br />
|1<br />
|}<br />
<br />
Only saw 11Mb/s. FTS log files reported problem with pool dcache_24. Confirmed by dCache monitoring. Not sure why, possibly excessive load. Also having problem with gridftp door on the admin node not starting up. For the moment, I have disabled it and all traffic now going through the pool node.<br />
<br />
=== 14/12/05 ===<br />
<br />
Setup another transfer, passing the option <code>-g "-p 10"</code> to FTS. Now using Chris Brew's test script.<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams<br />
<br />
|-<br />
|50*10MB=500MB<br />
|5<br />
|10<br />
|}<br />
<br />
Transfer IDs are d4f03598-6c95-11da-a18f-e44be7748cb0<br />
Transfer Started - Wed Dec 14 11:36:20 GMT 2005<br />
Active Jobs: Done 4 files (1 active, 45 pending and 0 delayed) - Wed Dec 14 11:36:50 GMT 2005<br />
Active Jobs: Done 5 files (5 active, 40 pending and 0 delayed) - Wed Dec 14 11:36:57 GMT 2005<br />
...<br />
Active Jobs: Done 49 files (0 active, 0 pending and 1 delayed) - Wed Dec 14 11:40:36 GMT 2005<br />
Transfer Finished - Wed Dec 14 11:40:37 GMT 2005<br />
Transfered 49 files in 257 s (+- 10s)<br />
Approx rate = 1561 Mb/s<br />
<br />
Saw rate of 1561Mb/s accoring to this! This number cannot be corret, there must be an error in the test script. Setup another transfer, passing the option <code>-g "-p 10"</code> to FTS.<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams<br />
<br />
|-<br />
|10*1GB=10GB<br />
|5<br />
|10<br />
|}<br />
<br />
Transfer IDs are dc24f7dc-6c98-11da-a18f-e44be7748cb0<br />
Transfer Started - Wed Dec 14 11:58:00 GMT 2005<br />
Active Jobs: Done 1 files (5 active, 4 pending and 0 delayed) - Wed Dec 14 12:24:20 GMT 2005<br />
...<br />
Active Jobs: Done 8 files (0 active, 0 pending and 2 delayed) - Wed Dec 14 12:56:35 GMT 2005<br />
Transfer Finished - Wed Dec 14 12:56:36 GMT 2005<br />
Transfered 8 files in 3516 s (+- 10s)<br />
Approx rate = 18 Mb/s<br />
<br />
So now back to a low transfer rate of 18Mb/s. Try transferring smaller files (10*100Mb = 1GB).<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams<br />
<br />
|-<br />
|10*100MB=1GB<br />
|5<br />
|10<br />
|}<br />
<br />
Transfer IDs are dc662920-6ca4-11da-a18f-e44be7748cb0<br />
Transfer Started - Wed Dec 14 13:24:23 GMT 2005<br />
Active Jobs: Done 1 files (5 active, 4 pending and 0 delayed) - Wed Dec 14 13:26:08 GMT 2005<br />
...<br />
Active Jobs: Done 10 files (0 active, 0 pending and 0 delayed) - Wed Dec 14 13:29:35 GMT 2005<br />
Transfer Finished - Wed Dec 14 13:29:36 GMT 2005<br />
Transfered 10 files in 313 s (+- 10s)<br />
Approx rate = 261 Mb/s<br />
<br />
So now up to a respectable 261Mb/s. Looks like dCache may be having problems with transferring large files, possibly timing out. Perform another test with 100*100MB = 10GB .<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams<br />
<br />
|-<br />
|100*100MB=10GB<br />
|5<br />
|10<br />
|}<br />
<br />
Transfer IDs are 0364f5a2-6ca6-11da-a18f-e44be7748cb0<br />
Transfer Started - Wed Dec 14 13:32:16 GMT 2005<br />
Active Jobs: Done 2 files (5 active, 93 pending and 0 delayed) - Wed Dec 14 13:34:15 GMT 2005<br />
...<br />
Active Jobs: Done 100 files (0 active, 0 pending and 0 delayed) - Wed Dec 14 14:21:20 GMT 2005<br />
Transfer Finished - Wed Dec 14 14:21:21 GMT 2005<br />
Transfered 100 files in 2945 s (+- 10s)<br />
Approx rate = 278 Mb/s<br />
<br />
Decent transfer rate of 278Mb/s.<br />
Now observing problems with dCache pools going offline (as reported by web interface). The offline pools are ones that are NFS mounted from the University SAN. 4 of the 10 NFS mounted pools remain online. FTS transfers were hanging when trying to use these pools to transfer files into. Seeing java processes in status D. <br />
<br />
# ps aux|grep " D "<br />
root 4353 0.0 2.1 622700 84668 pts/0 D 11:29 0:00 /usr/java/j2sdk1.<br />
root 4393 0.0 2.1 622700 84668 pts/0 D 11:29 0:00 /usr/java/j2sdk1.<br />
root 11186 0.0 1.5 506484 58760 pts/0 D 15:04 0:00 /usr/java/j2sdk1.<br />
root 11864 0.0 1.5 509448 62100 pts/0 D 15:20 0:00 /usr/java/j2sdk1.<br />
root 13029 0.0 1.6 584296 65912 pts/0 D 16:27 0:00 /usr/java/j2sdk1.<br />
root 13194 0.0 0.0 1740 584 pts/0 S 16:32 0:00 grep D<br />
<br />
Problem not resolved by restaring dcache-pool or NFS. Reboot required.<br />
<br />
<br />
=== 16/12/05 ===<br />
<br />
Noticed that if I try to make gridftp use > 10 parallel streams (dCache -> dCache), the transfer does not work and Graeme's python script has repeating output of:<br />
<br />
Child: /opt/glite/bin/glite-transfer-status -l 449b254e-6e2e-11da-a18f-e44be7748cb0<br />
Overall status: Active<br />
Matching for duration in fts query line number 6 failed.<br />
Found<br />
<br />
and the pool node gridftp log reports:<br />
<br />
12/16 12:20:05 Cell(GFTP-dcache-Unknown-207@gridftpdoor-dcacheDomain) : CellAdapter: SocketRedirector(Thread-632):Adapter: done, EOD received ? = false<br />
12/16 12:20:05 Cell(GFTP-dcache-Unknown-207@gridftpdoor-dcacheDomain) : CellAdapter: Closing data channel: 1 remaining: 9 eodc says there will be: -1<br />
12/16 12:20:05 Cell(GFTP-dcache-Unknown-207@gridftpdoor-dcacheDomain) : CellAdapter: SocketRedirector(Thread-633):Adapter: done, EOD received ? = false<br />
12/16 12:20:05 Cell(GFTP-dcache-Unknown-207@gridftpdoor-dcacheDomain) : CellAdapter: Closing data channel: 2 remaining: 8 eodc says there will be: -1<br />
...<br />
<br />
Now change dCacheSetup file on the pool node to see if this has any effect on performance. First modify so that it only uses 1 parallel stream.<br />
<br />
# Set number of parallel streams per GridFTP transfer<br />
parallelStreams=1<br />
<br />
(default was 10). Restart dcache-opt and see what sort of transfer rate we get - 8.37Mb/s (100MB file, 1 stream). Now just try submitting same job, but with "-p 2" option - 8.30Mb/s. This is strange. Why did I not get the error messages as above? Try again with "-p 11" - same response as above with the "Matching duration...". glite-transfer-status returns:<br />
<br />
State: Waiting<br />
Retries: 1<br />
Reason: Transfer failed. ERROR the server sent an error response: 426 426 Transfer aborted, closing connection :Unexpected Exception : java.net.SocketException: Connection reset<br />
<br />
Possibly I need to change parallelStreams in the admin node config file. Do this then re-run the tests.<br />
<br />
1 stream - 9.37Mb/s<br />
2 streams - 7.52Mb/s<br />
10 streams- 9.43Mb/s<br />
11 streams- failed with same error as above.<br />
<br />
??<br />
<br />
What about the .srmconfig/config.xml file. I had been using:<br />
<br />
<!--nonnegative integer, 2048 by default--><br />
<buffer_size> 131072 </buffer_size><br />
<!--integer, 0 by default (which means do not set tcp_buffer_size at all)--><br />
<tcp_buffer_size> 0 </tcp_buffer_size><br />
<!--integer, 10 by default--><br />
<streams_num> 10 </streams_num><br />
<br />
Now try again, but with <code>streams_num</code> of 1, but pass the "-p 10" option to gridftp.<br />
<br />
"-p 1", <streams_num> 1 - 9.40Mb/s : FTS logs report that 1 stream was used<br />
"-p 10", <streams_num> 1 - 10.65Mb/s : FTS logs report that 10 streams were used, so this does not appear to have any influence.<br />
<br />
What about modifying the buffer sizes? There are also corresponding buffere sizes in dCacheSetup. Change config.xml to this:<br />
<br />
<buffer_size> 2048 </buffer_size><br />
<br />
"-p 1", <streams_num> 1 - 10.76Mb/s : FTS logs report that 1 stream was used<br />
"-p 10", <streams_num> 1 - 15.01Mb/s : 10 streams<br />
<br />
Faster in both cases. Try educing buffer size further to 1024 and run tests again.<br />
<br />
"-p 10", <streams_num> 1 - 9.42Mb/s : <br />
<br />
Change to 4096:<br />
<br />
"-p 10", <streams_num> 1 - 10.77Mb/s<br />
<br />
The ~10Mb/s limit may be an issue due to the NFS mounted disk pools that dCache is using. I will make these pools read only for now to see what effect this has on transfer rate. Setup is now:<br />
<br />
nfs-test<br />
linkList :<br />
nfs-test-link (pref=10/10/0;ugroups=2;pools=1)<br />
poolList :<br />
dcache_28 (enabled=true;active=22;links=0;pgroups=1)<br />
dcache_26 (enabled=true;active=24;links=0;pgroups=1)<br />
dcache_22 (enabled=true;active=4;links=0;pgroups=1)<br />
dcache_30 (enabled=true;active=24;links=0;pgroups=1)<br />
dcache_24 (enabled=true;active=0;links=0;pgroups=1)<br />
dcache_32 (enabled=true;active=22;links=0;pgroups=1)<br />
dcache_25 (enabled=true;active=26;links=0;pgroups=1)<br />
dcache_27 (enabled=true;active=23;links=0;pgroups=1)<br />
dcache_29 (enabled=true;active=8;links=0;pgroups=1)<br />
dcache_23 (enabled=true;active=1;links=0;pgroups=1)<br />
dcache_31 (enabled=true;active=25;links=0;pgroups=1)<br />
ResilientPools<br />
linkList :<br />
poolList :<br />
default<br />
linkList :<br />
default-link (pref=10/10/10;ugroups=2;pools=1)<br />
poolList :<br />
dcache_1 (enabled=true;active=1;links=0;pgroups=1)<br />
dcache_7 (enabled=true;active=25;links=0;pgroups=1)<br />
dcache_14 (enabled=true;active=13;links=0;pgroups=1)<br />
dcache_13 (enabled=true;active=16;links=0;pgroups=1)<br />
dcache_20 (enabled=true;active=6;links=0;pgroups=1)<br />
dcache_6 (enabled=true;active=22;links=0;pgroups=1)<br />
dcache_16 (enabled=true;active=13;links=0;pgroups=1)<br />
dcache_8 (enabled=true;active=23;links=0;pgroups=1)<br />
dcache_11 (enabled=true;active=19;links=0;pgroups=1)<br />
dcache_4 (enabled=true;active=29;links=0;pgroups=1)<br />
dcache_18 (enabled=true;active=7;links=0;pgroups=1)<br />
dcache_21 (enabled=true;active=5;links=0;pgroups=1)<br />
dcache_3 (enabled=true;active=0;links=0;pgroups=1)<br />
dcache_17 (enabled=true;active=11;links=0;pgroups=1)<br />
dcache_19 (enabled=true;active=6;links=0;pgroups=1)<br />
dcache_2 (enabled=true;active=1;links=0;pgroups=1)<br />
dcache_12 (enabled=true;active=19;links=0;pgroups=1)<br />
dcache_9 (enabled=true;active=22;links=0;pgroups=1)<br />
dcache_10 (enabled=true;active=19;links=0;pgroups=1)<br />
dcache_5 (enabled=true;active=25;links=0;pgroups=1)<br />
dcache_15 (enabled=true;active=12;links=0;pgroups=1)<br />
<br />
Notice the writepref value (0) for the nfs-test-link. This makes the NFS mounted pools read only.<br />
<br />
==== Ed to RAL ====<br />
<br />
Just as a test, I tried transferring 100MB file from Ed dCache to RAL dCache.<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams (-p)<br />
!Effective Rate (Mb/s)<br />
<br />
|-<br />
|100MB<br />
|1<br />
|50<br />
|24.75<br />
|}<br />
<br />
So there are definitely issues with writing to our dCache. This would imply that it is NFS causing the problem. If I perform another test, but this time take a file that is definitely on a non-NFS mounted pool, then I get:<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams (-p)<br />
!Effective Rate (Mb/s)<br />
<br />
|-<br />
|1*1GB<br />
|1<br />
|50<br />
|95.54<br />
<br />
|-<br />
|5*1GB<br />
|1<br />
|50<br />
|143.06<br />
<br />
|-<br />
|5*1GB<br />
|5<br />
|50<br />
|see below<br />
|}<br />
<br />
In the last test above, the transfers started and 3 files were successfully copied to RAL. Via ganglia I was seeing rates of ~30MB/s out of my pool node! However, the python script then started outputting <code><Matching for duration in fts query line number 6 failed.</code> Not clear what is casuing this. Could the transfer rate be too high? Try lowering the number of streams:<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams (-p)<br />
!Effective Rate (Mb/s)<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|10<br />
|300.96<br />
<br />
|-<br />
|5*1GB<br />
|5<br />
|20<br />
|get same Matching error again<br />
<br />
|-<br />
|5*1GB<br />
|5<br />
|10<br />
|289.66<br />
<br />
|-<br />
|20*1GB<br />
|5<br />
|10<br />
|317.00<br />
<br />
|-<br />
|20*1GB<br />
|10<br />
|10<br />
|388.93<br />
<br />
|-<br />
|20*1GB<br />
|20<br />
|10<br />
|1 file Done, 19 went into Waiting state<br />
<br />
|-<br />
|15*1GB<br />
|20<br />
|10<br />
|422.08<br />
|}<br />
<br />
So seem to be able to write to the Tier-1 at a decent rate, when coming from a non-NFS mounted pool. Now try transfer with identical parameters, but with 1GB file coming from NFS mounted pool (from dcache_27 = scotgrid10).<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams (-p)<br />
!Effective Rate (Mb/s)<br />
<br />
|- <br />
|1*1GB<br />
|20<br />
|10<br />
|110.03<br />
<br />
|- <br />
|10*1GB<br />
|20<br />
|10<br />
|391.92<br />
<br />
|- <br />
|15*1GB<br />
|20<br />
|10<br />
|430.12<br />
|}<br />
<br />
This shows that reading from an NFS mounted pool gives a good transfer rate. Writing performance appears to be terrible. Now test writing to pools that connected via fibre channel.<br />
<br />
=== 17/12/05 ===<br />
<br />
RAL-ED, writing to the pools that reside on the RAID disk.<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams (-p)<br />
!Effective Rate (Mb/s)<br />
<br />
|- <br />
|10*1GB<br />
|20<br />
|10<br />
|125.43<br />
<br />
|-<br />
|15*1GB<br />
|20<br />
|10<br />
|156.37<br />
<br />
|-<br />
|15*1GB<br />
|20<br />
|20<br />
|Same error as before - Matching for duration in fts query line number 6 failed, plus entry in dCache gridftp logs.<br />
<br />
|-<br />
|20*1GB<br />
|20<br />
|10<br />
|143.58<br />
<br />
|-<br />
|20*1GB<br />
|5<br />
|11<br />
|Same error as before - Matching for duration in fts query line number 6 failed, plus entry in dCache gridftp logs.<br />
<br />
|-<br />
|20*1GB<br />
|1<br />
|11<br />
|Same error as before - Matching for duration in fts query line number 6 failed, plus entry in dCache gridftp logs.<br />
|}<br />
<br />
Seem to have reached a parallel stream limit of 10. Unsure what is imposing this limit. Try some GLA-ED transfers to see if same limit exists with DPM.<br />
<br />
==== GLA-ED ====<br />
<br />
Writing data into the non-NFS pools.<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams (-p)<br />
!Effective Rate (Mb/s)<br />
<br />
|-<br />
|10*1GB<br />
|5<br />
|10<br />
|56.08<br />
<br />
|-<br />
|10*1GB<br />
|5<br />
|20<br />
|57.0<br />
<br />
|-<br />
|20*1GB<br />
|20<br />
|20<br />
|files being transferred, then all 20 went into Waiting for some reason.<br />
<br />
|}<br />
<br />
==== ED-GLA ====<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams (-p)<br />
!Effective Rate (Mb/s)<br />
<br />
|-<br />
|20*1GB<br />
|5<br />
|10<br />
|Files going into Waiting state, timeouts appearing in fts logs.<br />
<br />
|-<br />
|20*1GB<br />
|20<br />
|10<br />
|ditto<br />
<br />
|-<br />
|5*1GB<br />
|20<br />
|10<br />
|181<br />
<br />
|-<br />
|10*1GB<br />
|20<br />
|10<br />
|122.76<br />
<br />
|-<br />
|10*1GB<br />
|20<br />
|30<br />
|122.74<br />
<br />
|-<br />
|50*1GB<br />
|20<br />
|10<br />
|transfers going into Waiting, SRM timeouts in FTS logs (30 min limit reached)<br />
<br />
|}<br />
<br />
<br />
<br />
<br />
=== 20/12/05 ===<br />
<br />
Want to perform some systematic testing of the Ed to RAL channel to see what effect changing the number of parallel streams and concurrent files has on transfer rate and file transfer success.<br />
<br />
==== Ed to RAL ====<br />
<br />
If no entry in Note column, then all file transfers succssful.<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams (-p)<br />
!Con File * Paral Streams<br />
!Effective Rate (Mb/s)<br />
!Notes<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|1<br />
|5<br />
|238.53<br />
|<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|2<br />
|10<br />
|201.34<br />
|<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|4<br />
|20<br />
|228.53<br />
|<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|8<br />
|40<br />
|264.89<br />
|<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|10<br />
|50<br />
|122.17<br />
|2 done, 3 waiting<br />
|<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|12<br />
|60<br />
|110.2<br />
|1 done, 4 waiting - FTS logs show <code>426 426 Transfer aborted</code>. All transfers talking to gftp0446.<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|12<br />
|60<br />
|224.16<br />
|3 done, 2 waiting - FTS logs show <code>426 426 Transfer aborted</code>. All transfers talking to gftp0444.<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|18<br />
|90<br />
|<br />
|<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|20<br />
|100<br />
|<br />
|<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|22<br />
|110<br />
|<br />
|<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|24<br />
|120<br />
|<br />
|<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|26<br />
|130<br />
|<br />
|<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|28<br />
|140<br />
|<br />
|<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|30<br />
|150<br />
|<br />
|<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|32<br />
|160<br />
|<br />
|<br />
<br />
|-<br />
|<br />
|<br />
|<br />
|<br />
|<br />
<br />
|- <br />
|10*1GB<br />
|10<br />
|1<br />
|10<br />
|148.65<br />
|<br />
<br />
|- <br />
|10*1GB<br />
|10<br />
|2<br />
|20<br />
|256.89<br />
|8 Done, 2 Waiting. <code>FINAL:NETWORK: Transfer failed due to possible network problem - timed out</code><br />
<br />
|- <br />
|10*1GB<br />
|10<br />
|10<br />
|100<br />
|373.92<br />
|<br />
<br />
|- <br />
|10*1GB<br />
|10<br />
|12<br />
|120<br />
|288.69<br />
|5/10. 426 errors again.<br />
<br />
|- <br />
|10*1GB<br />
|10<br />
|20<br />
|200<br />
|103.97<br />
|4/10. 426 errors again.<br />
|}<br />
<br />
==== RAL to Ed ====<br />
<br />
If no mention made in Note column, then all file transfers succssful.<br />
<br />
* Files being put into the non-NFS mounted pools.<br />
* dCacheSetup file on both admin and pool node using parallelStreams=1 (yes, dcache services have been restarted after changing file).<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams (-p)<br />
!Con File * Paral Streams<br />
!Effective Rate (Mb/s)<br />
!Notes<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|1<br />
|5<br />
|144.82<br />
|<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|2<br />
|10<br />
|148.26<br />
|<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|4<br />
|20<br />
|154.55<br />
|<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|6<br />
|30<br />
|35.2<br />
|1 Done, 4 waiting. Strange since the files are all in the dCache if I do an ls -l in /pnfs/...<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|8<br />
|40<br />
|156.51<br />
|<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|10<br />
|50<br />
|156.13<br />
|<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|12<br />
|60<br />
|20.95<br />
|3 done, 2 waiting, 426 error again. FTS log shows that it took ~20 mins between starting transfer and it finishing. Pool gridftpdoor logs show same messages as before <code>4 remaining: 6 eodc says there will be: -1</code> etc.<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|14<br />
|70<br />
|<br />
|5 waiting immediated after submission. 426 errors in FTS logs. pool node logs show similar errors to above.<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|16<br />
|80<br />
|<br />
|Same as above.<br />
<br />
|-<br />
|<br />
|<br />
|<br />
|<br />
|<br />
|<br />
<br />
|- <br />
|10*1GB<br />
|10<br />
|1<br />
|10<br />
|156.25<br />
|Pool node log repeatedly contains <code>CellAdapter: cought SocketTimeoutException: Do nothing, just allow looping to continue</code>, but files still transferred.<br />
<br />
|- <br />
|10*1GB<br />
|10<br />
|2<br />
|20<br />
|144.25<br />
|<br />
<br />
|- <br />
|10*1GB<br />
|10<br />
|4<br />
|40<br />
|156.56<br />
|<br />
<br />
|- <br />
|10*1GB<br />
|10<br />
|6<br />
|60<br />
|<br />
|<br />
<br />
|- <br />
|10*1GB<br />
|10<br />
|8<br />
|80<br />
|146.02<br />
|<br />
<br />
|- <br />
|10*1GB<br />
|10<br />
|10<br />
|100<br />
|151.88<br />
|<br />
<br />
|- <br />
|10*1GB<br />
|10<br />
|12<br />
|120<br />
|142.90<br />
|3 done, 7 waiting. 426 errors again in FTS logs.<br />
<br />
|- <br />
|10*1GB<br />
|10<br />
|14<br />
|140<br />
|<br />
|<br />
<br />
|- <br />
|10*1GB<br />
|10<br />
|16<br />
|160<br />
|<br />
|<br />
<br />
|}<br />
<br />
=== 21/12/05 ===<br />
<br />
==== ED-ED (dCache to DPM) ====<br />
<br />
Just want to look at some dCache to DPM transfers to see how bad the transfer rate is.<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams (-p)<br />
!Con File * Paral Streams<br />
!Effective Rate (Mb/s)<br />
!Notes<br />
<br />
|- <br />
|5*1GB<br />
|5<br />
|10<br />
|50<br />
|0<br />
|Transfers were taking place very slowly (I could see file size increasing in the DPM filesystem), but then SRM timed out.<br />
<br />
|- <br />
|5*100MB<br />
|5<br />
|10<br />
|50<br />
|9.14<br />
|Writing to NFS mounted RAID disk.<br />
<br />
|- <br />
|5*100MB<br />
|5<br />
|20<br />
|100<br />
|9.13<br />
|Writing to NFS mounted RAID disk.<br />
<br />
|- <br />
|5*100MB<br />
|5<br />
|20<br />
|100<br />
|3.79<br />
|2/5 done, 3 files exist. Writing to NFS mounted SAN.<br />
<br />
|- <br />
|5*100MB<br />
|5<br />
|20<br />
|100<br />
|9.31<br />
|5/5 done. Writing to filesystem local to DPM admin node.<br />
<br />
|}<br />
<br />
==== ED-GLA (dCache to DPM) ====<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams (-p)<br />
!Con File * Paral Streams<br />
!Effective Rate (Mb/s)<br />
!Notes<br />
<br />
|-<br />
|5*100MB<br />
|5<br />
|5<br />
|25<br />
|8.90<br />
|<br />
<br />
|-<br />
|5*100MB<br />
|5<br />
|10<br />
|50<br />
|9.12<br />
|<br />
<br />
|-<br />
|5*100MB<br />
|5<br />
|20<br />
|100<br />
|9.34<br />
|<br />
<br />
|}<br />
<br />
<br />
So seeing same consistently low transfer rate from dCache into DPM (even accounting for writing to different filesystems that are mounted in different ways).<br />
<br />
==== ED-ED (DPM to dCache) ====<br />
<br />
Copy files from the DPM pool that resides on the DPM admin node so that there are no conflicts with simultaneous reading and writing to the RAID array.<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams (-p)<br />
!Con File * Paral Streams<br />
!Effective Rate (Mb/s)<br />
!Notes<br />
<br />
|- <br />
|5*100MB<br />
|5<br />
|10<br />
|50<br />
|34.27<br />
|<br />
<br />
|- <br />
|5*100MB<br />
|5<br />
|20<br />
|100<br />
|0<br />
|Same problem as before when using > 10 streams into the dCache. Files immediately go into Waiting state.<br />
<br />
|- <br />
|10*100MB<br />
|10<br />
|10<br />
|100<br />
|105.40<br />
|<br />
<br />
|- <br />
|50*100MB<br />
|50<br />
|10<br />
|500<br />
|88.61<br />
|<br />
<br />
|- <br />
|10*1GB<br />
|10<br />
|1<br />
|10<br />
|171.57<br />
|<br />
<br />
|- <br />
|10*1GB<br />
|10<br />
|1<br />
|10<br />
|170.01<br />
|Again, to check. Not sure why it is slower than GLA-ED.<br />
<br />
|- <br />
|10*1GB<br />
|10<br />
|2<br />
|20<br />
|172.81<br />
|<br />
<br />
|- <br />
|10*1GB<br />
|10<br />
|5<br />
|50<br />
|172.41<br />
|<br />
<br />
|-<br />
|10*1GB<br />
|10<br />
|10<br />
|100<br />
|152.44<br />
|<br />
<br />
<br />
|}<br />
<br />
==== GLA-ED (DPM to dCache) ====<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams (-p)<br />
!Con File * Paral Streams<br />
!Effective Rate (Mb/s)<br />
!Notes<br />
<br />
|-<br />
|5*1GB<br />
|5<br />
|10<br />
|50<br />
|152.63<br />
|<br />
<br />
|-<br />
|10*1GB<br />
|10<br />
|1<br />
|10<br />
|228.93<br />
|<br />
<br />
|-<br />
|10*1GB<br />
|10<br />
|1<br />
|10<br />
|234.24<br />
|Did this as a check.<br />
<br />
|-<br />
|10*1GB<br />
|10<br />
|2<br />
|20<br />
|205.43<br />
|<br />
<br />
|-<br />
|10*1GB<br />
|10<br />
|5<br />
|50<br />
|186.49<br />
|<br />
<br />
<br />
|-<br />
|10*1GB<br />
|10<br />
|10<br />
|100<br />
|169.18<br />
|<br />
<br />
|-<br />
|10*1GB<br />
|10<br />
|20<br />
|200<br />
|0<br />
|See same problems when using > 10 streams. <code>426 426 Data connection. data_write() failed: Handle not in the proper state</code><br />
<br />
|}<br />
<br />
Why is the GLA-ED rate decreasing as the number of parallel streams increases? Could this be related to the value of parallelStreams on the dCache server?<br />
<br />
==== ED-RAL (dCache to dCache) ====<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams (-p)<br />
!Con File * Paral Streams<br />
!Effective Rate (Mb/s)<br />
!Notes<br />
<br />
|-<br />
|15*1GB<br />
|15<br />
|1<br />
|15<br />
|423.05<br />
|<br />
<br />
|-<br />
|15*1GB<br />
|15<br />
|5<br />
|75<br />
|457.31<br />
|<br />
<br />
|-<br />
|15*1GB<br />
|15<br />
|10<br />
|150<br />
|398.23<br />
|<br />
<br />
|-<br />
|15*1GB<br />
|15<br />
|10<br />
|150<br />
|453.14<br />
|<br />
<br />
|-<br />
|15*1GB<br />
|15<br />
|15<br />
|225<br />
|426 errors again.<br />
|<br />
<br />
|}<br />
<br />
<br />
==== ED-GLA (DPM to DPM) ====<br />
<br />
These files are all coming from the NFS mounted pools.<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams (-p)<br />
!Con File * Paral Streams<br />
!Effective Rate (Mb/s)<br />
!Notes<br />
<br />
|-<br />
|15*1GB<br />
|15<br />
|1<br />
|15<br />
|159.74<br />
|<br />
<br />
|-<br />
|15*1GB<br />
|15<br />
|5<br />
|75<br />
|125.02<br />
|14/15. SRM timeout for one file. <code>Error in srm__setFileStatusSOAP-ENV:Client - Invalid state</code><br />
<br />
|-<br />
|15*1GB<br />
|15<br />
|10<br />
|150<br />
|99.40<br />
|14/15. SRM timeout for one file.<br />
<br />
|-<br />
|15*1GB<br />
|15<br />
|20<br />
|300<br />
|68.64<br />
|11/15. SRM timeout for four files.<br />
<br />
|}<br />
<br />
Now try with a 1GB file coming from a pool that is local to the DPM head node.<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams (-p)<br />
!Con File * Paral Streams<br />
!Effective Rate (Mb/s)<br />
!Notes<br />
<br />
|-<br />
|15*1GB<br />
|15<br />
|1<br />
|15<br />
|161.13<br />
|<br />
<br />
|-<br />
|15*1GB<br />
|15<br />
|5<br />
|75<br />
|77.36<br />
|11/15. <code>Error in srm__setFileStatusSOAP-ENV:Client - Invalid state</code><br />
<br />
|-<br />
|15*1GB<br />
|15<br />
|10<br />
|150<br />
|80.04<br />
|12/15.<code>Error in srm__setFileStatusSOAP-ENV:Client - Invalid state</code><br />
<br />
|-<br />
|15*1GB<br />
|15<br />
|20<br />
|300<br />
|<br />
|<br />
<br />
|-<br />
|5*1GB<br />
|5<br />
|1<br />
|5<br />
|127.67<br />
<br />
|-<br />
|5*1GB<br />
|5<br />
|10<br />
|50<br />
|114.33<br />
<br />
|-<br />
|5*1GB<br />
|5<br />
|20<br />
|100<br />
|116.01<br />
<br />
|-<br />
|5*1GB<br />
|5<br />
|50<br />
|250<br />
|118.63<br />
|}<br />
<br />
The above transfer rates are not a true reflection of what happened. Performing a dpns-ls of the destination directory at GLA shows that the files that went into waiting state were infact transferred. However, due to the above SOAP error in FTS, the file status was never set to Done and therefore the files were always in a waiting state since the SRM eventually timed out. This meant that Graeme's script never got round to calling srm-adv-del. Looking at the ganglia plots of the DPM node shows that there were peaks in the data output rate of <~30MB/s.<br />
<br />
=== 12/01/06 ===<br />
<br />
==== ED-RAL ====<br />
<br />
[[Image:ED-RAL-fts-06-01-11.png|thumb|right]]<br />
1000*1GB files dCache to dCache. 10 concurrent files, 5 streams. You can see from the plots that the transfer took approximately 5 hours, giving a rate of about 440Mb/s. A few of the transfers failed. Now need to try and improve the transfer rate in the reverse direction (NFS and RAID5 issues I think).<br />
[[Image:ED-RAL-1TB-scotgrid-switch.png|thumb|left|ScotGrid switch, green is traffic in]][[Image:ED-RAL-network-06-01-11.png|thumb|center]]<br />
<br />
=== 16/01/06 ===<br />
<br />
==== ED-DUR ====<br />
<br />
First test of transfering files from Edinburgh DPM to Durham DPM (dCache to DPM issue still exists).<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams (-p)<br />
!Con File * Paral Streams<br />
!Effective Rate (Mb/s)<br />
!Notes<br />
<br />
|-<br />
|10*1GB<br />
|10<br />
|5<br />
|50<br />
|90.33Mb/s<br />
|File NFS mounted from the RAID'ed disk.<br />
<br />
<br />
|-<br />
|100*1GB<br />
|10<br />
|5<br />
|50<br />
|92.75Mb/s<br />
|90/100 transferred. File NFS mounted from the RAID'ed disk.<br />
<br />
|}<br />
<br />
<br />
==== ED-GLA ====<br />
<br />
Potential fix for the dCache to DPM problems that we have been seeing:<br />
<br />
export RFIO_TCP_NODELAY=yes<br />
<br />
in /etc/sysconfig/dpm-gsiftp and restart dpm-gsiftp. (From the web: TCP_NODELAY is for a specific purpose; to disable the Nagle buffering algorithm. It should only be set for applications that send frequent small bursts of information without getting an immediate response, where timely delivery of data is required (the canonical example is mouse movements) ).<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
|-style="background:#7C8AAF;color:white"<br />
!Size<br />
!Concurrent Files<br />
!Parallel Streams (-p)<br />
!Con File * Paral Streams<br />
!Effective Rate (Mb/s)<br />
!Notes<br />
<br />
|-<br />
|10*1GB<br />
|10<br />
|5<br />
|50<br />
|96Mb/s<br />
|9/10 sucessful. File from the RAID'ed disk.<br />
<br />
|-<br />
|20*1GB<br />
|10<br />
|5<br />
|50<br />
|85Mb/s<br />
|18/20 sucessful. File from the RAID'ed disk. Failed due to file already existing.<br />
<br />
<br />
|}<br />
<br />
=== 27/01/06 ===<br />
<br />
==== RAL-ED ====<br />
<br />
Started 1TB transfer with 10 files, 10 streams. Seeing rates of ~130Mb/s into our RAID 5 disk (not over NFS) before the ScotGrid machines were powered down due to maintenance.<br />
<br />
Had to apply [[Scotgrid_LCG_2.7_Pre-Release_Testing#FTS_testing|change]] to dCache setup at Edinburgh to allow the FTS transfers to succeed.<br />
<br />
<br />
<br />
[[Category:ScotGrid]]<br />
[[Category:Service Challenge 4]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Edinburgh_NFS_TestsEdinburgh NFS Tests2007-03-01T12:02:08Z<p>Andrew elwell: </p>
<hr />
<div> $ ./tiobench.pl --identifier dpm-SAN-test1 --dir /sanstor/stor4/ --size 8192 --threads 1 --threads 4 --block 4096 --block 13107<br />
Run #1: ./tiotest -t 4 -f 2048 -r 1000 -b 13107 -d /sanstor/stor4/ -T<br />
<br />
Unit information<br />
================<br />
File size = megabytes<br />
Blk Size = bytes<br />
Rate = megabytes per second<br />
CPU% = percentage of CPU used during the test<br />
Latency = milliseconds<br />
Lat% = percent of requests that took longer than X seconds<br />
CPU Eff = Rate divided by CPU% - throughput per cpu load<br />
<br />
Sequential Reads<br />
File Blk Num Avg Maximum Lat% Lat% CPU<br />
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff<br />
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----<br />
dpm-SAN-test1 8192 4096 1 42.21 24.71% 0.090 108.58 0.00000 0.00000 171<br />
dpm-SAN-test1 8192 4096 4 49.71 77.96% 0.311 433.45 0.00000 0.00000 64<br />
dpm-SAN-test1 8192 13107 1 38.86 27.96% 0.319 528.98 0.00000 0.00000 139<br />
dpm-SAN-test1 8192 13107 4 49.32 81.45% 0.997 454.04 0.00000 0.00000 61<br />
<br />
Random Reads<br />
File Blk Num Avg Maximum Lat% Lat% CPU<br />
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff<br />
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----<br />
dpm-SAN-test1 8192 4096 1 1.53 1.665% 2.548 47.62 0.00000 0.00000 92<br />
dpm-SAN-test1 8192 4096 4 5.34 7.855% 2.859 30.05 0.00000 0.00000 68<br />
dpm-SAN-test1 8192 13107 1 4.05 3.074% 3.085 58.74 0.00000 0.00000 132<br />
dpm-SAN-test1 8192 13107 4 12.64 10.61% 3.826 40.77 0.00000 0.00000 119<br />
<br />
Sequential Writes<br />
File Blk Num Avg Maximum Lat% Lat% CPU<br />
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff<br />
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----<br />
dpm-SAN-test1 8192 4096 1 2.03 2.034% 1.925 2103.77 0.00005 0.00000 100<br />
dpm-SAN-test1 8192 4096 4 11.92 24.37% 1.274 1712.83 0.00000 0.00000 49<br />
dpm-SAN-test1 8192 13107 1 2.00 2.067% 6.260 1005.44 0.00000 0.00000 97<br />
dpm-SAN-test1 8192 13107 4 13.78 28.97% 3.499 1818.60 0.00000 0.00000 48<br />
<br />
Random Writes<br />
File Blk Num Avg Maximum Lat% Lat% CPU<br />
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff<br />
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----<br />
dpm-SAN-test1 8192 4096 1 2.23 4.141% 1.680 527.76 0.00000 0.00000 54<br />
dpm-SAN-test1 8192 4096 4 0.57 0.841% 24.238 3604.38 0.32500 0.00000 68<br />
dpm-SAN-test1 8192 13107 1 3.33 6.200% 3.740 365.43 0.00000 0.00000 54<br />
dpm-SAN-test1 8192 13107 4 0.70 1.636% 70.013 4555.28 0.45000 0.00000 43<br />
<br />
$ ./tiobench.pl --identifier dpm-SAN-test1 --dir /sanstor/stor4/ --size 8192 --threads 1 --threads 4 --block 1024 --block 2048<br />
Run #1: ./tiotest -t 4 -f 2048 -r 1000 -b 2048 -d /sanstor/stor4/ -T<br />
<br />
Sequential Reads<br />
File Blk Num Avg Maximum Lat% Lat% CPU<br />
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff<br />
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----<br />
dpm-SAN-test1 8192 1024 1 38.45 30.31% 0.023 108.28 0.00000 0.00000 127<br />
dpm-SAN-test1 8192 1024 4 49.05 80.28% 0.077 438.77 0.00000 0.00000 61<br />
dpm-SAN-test1 8192 2048 1 37.11 24.40% 0.050 773.00 0.00000 0.00000 152<br />
dpm-SAN-test1 8192 2048 4 49.05 68.06% 0.155 359.10 0.00000 0.00000 72<br />
<br />
Random Reads<br />
File Blk Num Avg Maximum Lat% Lat% CPU<br />
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff<br />
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----<br />
dpm-SAN-test1 8192 1024 1 0.56 2.288% 1.744 20.40 0.00000 0.00000 24<br />
dpm-SAN-test1 8192 1024 4 1.70 6.966% 2.035 34.30 0.00000 0.00000 24<br />
dpm-SAN-test1 8192 2048 1 0.38 0.816% 5.199 23.33 0.00000 0.00000 46<br />
dpm-SAN-test1 8192 2048 4 3.19 6.530% 2.260 28.12 0.00000 0.00000 49<br />
<br />
Sequential Writes<br />
File Blk Num Avg Maximum Lat% Lat% CPU<br />
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff<br />
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----<br />
dpm-SAN-test1 8192 1024 1 1.99 2.491% 0.490 1388.44 0.00000 0.00000 80<br />
dpm-SAN-test1 8192 1024 4 11.51 28.69% 0.333 3036.85 0.00001 0.00000 40<br />
dpm-SAN-test1 8192 2048 1 1.99 2.138% 0.979 1642.21 0.00000 0.00000 93<br />
dpm-SAN-test1 8192 2048 4 11.20 24.37% 0.677 2458.06 0.00005 0.00000 46<br />
<br />
Random Writes<br />
File Blk Num Avg Maximum Lat% Lat% CPU<br />
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff<br />
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----<br />
dpm-SAN-test1 8192 1024 1 0.30 0.838% 3.148 609.95 0.00000 0.00000 36<br />
dpm-SAN-test1 8192 1024 4 0.32 1.977% 10.465 2521.99 0.05000 0.00000 16<br />
dpm-SAN-test1 8192 2048 1 0.37 0.800% 5.034 1082.77 0.00000 0.00000 46<br />
dpm-SAN-test1 8192 2048 4 0.55 1.902% 12.016 2555.61 0.07500 0.00000 29<br />
<br />
<br />
$ ./tiobench.pl --identifier dpm-SAN-test3 --dir /sanstor/stor4/ --size 8192 --size 16384 --threads 1 --threads 4 --block <br />
8192 --block 16384<br />
<br />
Sequential Reads<br />
File Blk Num Avg Maximum Lat% Lat% CPU<br />
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff<br />
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----<br />
dpm-SAN-test3 8192 8192 1 41.92 23.24% 0.184 396.56 0.00000 0.00000 180<br />
dpm-SAN-test3 8192 8192 4 49.32 66.69% 0.630 368.86 0.00000 0.00000 74<br />
dpm-SAN-test3 8192 16384 1 10.72 7.051% 1.448 1153.29 0.00000 0.00000 152<br />
dpm-SAN-test3 8192 16384 4 31.53 41.49% 1.948 1019.61 0.00000 0.00000 76<br />
dpm-SAN-test3 16384 8192 1 41.69 25.69% 0.185 519.07 0.00000 0.00000 162<br />
dpm-SAN-test3 16384 8192 4 48.95 75.30% 0.635 359.61 0.00000 0.00000 65<br />
dpm-SAN-test3 16384 16384 1 21.57 12.84% 0.720 767.11 0.00000 0.00000 168<br />
dpm-SAN-test3 16384 16384 4 48.09 73.24% 1.292 492.60 0.00000 0.00000 66<br />
<br />
Random Reads<br />
File Blk Num Avg Maximum Lat% Lat% CPU<br />
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff<br />
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----<br />
dpm-SAN-test3 8192 8192 1 3.51 2.355% 2.224 53.00 0.00000 0.00000 149<br />
dpm-SAN-test3 8192 8192 4 11.44 13.90% 2.620 24.22 0.00000 0.00000 82<br />
dpm-SAN-test3 8192 16384 1 5.30 2.796% 2.945 392.64 0.00000 0.00000 189<br />
dpm-SAN-test3 8192 16384 4 16.80 11.82% 3.397 29.21 0.00000 0.00000 142<br />
dpm-SAN-test3 16384 8192 1 1.07 1.023% 7.321 55.76 0.00000 0.00000 104<br />
dpm-SAN-test3 16384 8192 4 3.29 3.159% 9.324 40.14 0.00000 0.00000 104<br />
dpm-SAN-test3 16384 16384 1 1.90 0.883% 8.202 45.34 0.00000 0.00000 216<br />
dpm-SAN-test3 16384 16384 4 5.29 2.711% 11.489 43.22 0.00000 0.00000 195<br />
<br />
Sequential Writes<br />
File Blk Num Avg Maximum Lat% Lat% CPU<br />
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff<br />
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----<br />
dpm-SAN-test3 8192 8192 1 1.77 1.684% 4.413 1705.95 0.00000 0.00000 105<br />
dpm-SAN-test3 8192 8192 4 11.44 22.98% 2.709 2787.09 0.00029 0.00000 50<br />
dpm-SAN-test3 8192 16384 1 1.76 1.722% 8.895 1701.50 0.00000 0.00000 102<br />
dpm-SAN-test3 8192 16384 4 11.59 21.78% 5.133 4839.43 0.00687 0.00000 53<br />
dpm-SAN-test3 16384 8192 1 1.70 1.653% 4.591 1854.74 0.00000 0.00000 103<br />
dpm-SAN-test3 16384 8192 4 10.46 20.90% 2.915 2791.92 0.00019 0.00000 50<br />
dpm-SAN-test3 16384 16384 1 1.62 1.568% 9.668 3018.56 0.00057 0.00000 103<br />
dpm-SAN-test3 16384 16384 4 10.97 21.02% 5.614 9987.98 0.00467 0.00000 52<br />
<br />
Random Writes<br />
File Blk Num Avg Maximum Lat% Lat% CPU<br />
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff<br />
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----<br />
dpm-SAN-test3 8192 8192 1 18.84 25.91% 0.400 341.75 0.00000 0.00000 73<br />
dpm-SAN-test3 8192 8192 4 0.46 1.061% 63.676 7123.23 1.02500 0.00000 43 <br />
dpm-SAN-test3 8192 16384 1 13.49 22.88% 1.130 212.50 0.00000 0.00000 59<br />
dpm-SAN-test3 8192 16384 4 0.62 1.138% 87.431 8695.31 1.25000 0.00000 55<br />
dpm-SAN-test3 16384 8192 1 21.66 40.88% 0.344 85.77 0.00000 0.00000 53<br />
dpm-SAN-test3 16384 8192 4 23.24 55.77% 1.221 130.75 0.00000 0.00000 42<br />
dpm-SAN-test3 16384 16384 1 8.72 12.70% 1.777 299.03 0.00000 0.00000 69<br />
dpm-SAN-test3 16384 16384 4 20.60 47.13% 2.697 407.79 0.00000 0.00000 44<br />
<br />
<br />
<br />
[[Category:ScotGrid]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Ed_RAID_TestsEd RAID Tests2007-03-01T12:01:11Z<p>Andrew elwell: </p>
<hr />
<div>Perform some benchmarking using [http://www.coker.com.au/bonnie++/ bonnie++]. Man page for bonnie++ can be found [http://linux.com.hk/penguin/man/8/bonnie++.html here]<br />
<br />
== dcache.epcc.ed.ac.uk ==<br />
<br />
The specs of the machine are:<br />
<br />
* Proc: 8 x Intel(R) XEON(TM) MP CPU 1.90GHz<br />
* Cache: 512KB<br />
* Mem: 16GB<br />
* Disk specs: ???? RAID 5 configuration<br />
<br />
=== bonnie++ ===<br />
<br />
-s file size<br />
-n number of files to use in the file creation test<br />
-m machine name<br />
<br />
# ./bonnie++-1.03a/bonnie++ -d /export/raid01/ -s 32180 -m dcache -u root<br />
Using uid:0, gid:0.<br />
Writing with putc()...done<br />
Writing intelligently...done<br />
Rewriting...done<br />
Reading with getc()...done<br />
Reading intelligently...done<br />
start 'em...done...done...done...<br />
Create files in sequential order...done.<br />
Stat files in sequential order...done.<br />
Delete files in sequential order...done.<br />
Create files in random order...done.<br />
Stat files in random order...done.<br />
Delete files in random order...done.<br />
Version 1.03 ------Sequential Output------ --Sequential Input- --Random- <br />
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--<br />
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP<br />
dcache 32180M 14431 99 39763 85 23434 36 17851 87 91911 46 804.9 9<br />
------Sequential Create------ --------Random Create--------<br />
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--<br />
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP<br />
16 1877 98 +++++ +++ +++++ +++ 1994 99 +++++ +++ 5632 99<br />
dcache,32180M,14431,99,39763,85,23434,36,17851,87,91911,46,804.9,9,16,1877,98,++ +++,+++,+++++,+++,1994,99,+++++,+++,5632,99<br />
<br />
<br />
Notice that in this test, I did not use multiple threads.<br />
<br />
# ./bonnie++-1.03a/bonnie++ -d /san-storage/scotgrid1/pool/data/ -s 32180 -m dcache -u root<br />
Using uid:0, gid:0.<br />
Writing with putc()...done<br />
Writing intelligently...done<br />
Rewriting...done<br />
Reading with getc()...done<br />
Reading intelligently...done<br />
start 'em...done...done...done...<br />
Create files in sequential order...done.<br />
Stat files in sequential order...done.<br />
Delete files in sequential order...done.<br />
Create files in random order...done.<br />
Stat files in random order...done.<br />
Delete files in random order...done.<br />
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-<br />
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--<br />
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP<br />
dcache 32180M 2276 20 3370 13 7950 37 19897 99 64732 47 353.6 4<br />
------Sequential Create------ --------Random Create--------<br />
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--<br />
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP<br />
16 193 4 2172 15 85 2 62 1 2190 17 154 2<br />
dcache,32180M,2276,20,3370,13,7950,37,19897,99,64732,47,353.6,4,16,193,4,2172,15,85,2,62,1,2190,17,154,2<br />
<br />
=== dd tests ===<br />
<br />
Write cache is enabled in the RAID controller, but only if there is a battery backup available. The battery in the controller that service raid22 needs replacing, so the effect of having write caching on and off can be seen by the results of the following two tests (since the battery is OK in the other controller, serving raid10):<br />
<br />
[root@dcache raid22]# time dd if=/dev/zero of=big-testfile bs=1024k count=5120 <br />
5120+0 records in<br />
5120+0 records out<br />
<br />
real 3m52.104s<br />
user 0m0.020s<br />
sys 1m8.950s<br />
<br />
This is about 22MB/s.<br />
<br />
[root@dcache raid10]# time dd if=/dev/zero of=big-testfile bs=1024k count=5120<br />
5120+0 records in<br />
5120+0 records out<br />
<br />
real 2m4.012s<br />
user 0m0.010s<br />
sys 1m6.300s<br />
<br />
This is about 41MB/s. So there is a performance boost with write caching on, but not as great as we are seeing on a test machine, where it is only taking ~35seconds to write a 5GB file to a RAID 5.<br />
<br />
Writing to the SAN over NFS, the same tests gives the result:<br />
<br />
[root@dcache scotgrid1]# time dd if=/dev/zero of=big-testfile bs=1024k count=5120<br />
5120+0 records in<br />
5120+0 records out<br />
<br />
real 4m57.581s<br />
user 0m0.000s<br />
sys 1m23.900s<br />
<br />
Although we do not know the setup of the RAID 5 array in the SAN, which may/may not be using write caching.<br />
<br />
== dpm.epcc.ed.ac.uk ==<br />
<br />
* Proc: 2*Intel Pentium III (Coppermine) CPU 1GHz<br />
* Cache: 256KB<br />
* Mem: 2GB<br />
* We are currently using NFS to mount 1 volume from the RAID and 2 volumes from the SAN.<br />
<br />
Using [http://sourceforge.net/projects/tiobench/ tiobench] to run some benchmarking tests. I had to enable large filesizes in the Makefile.<br />
<br />
# ./tiotest -t 1 -f 8192 -r 4000 -b 4096 -d /storage2<br />
Tiotest results for 1 concurrent io threads:<br />
,----------------------------------------------------------------------.<br />
| Item | Time | Rate | Usr CPU | Sys CPU |<br />
+-----------------------+----------+--------------+----------+---------+<br />
| Write 8192 MBs | 958.8 s | 8.544 MB/s | 0.2 % | 8.1 % |<br />
| Random Write 16 MBs | 4.0 s | 3.877 MB/s | 0.2 % | 4.7 % |<br />
| Read 8192 MBs | 277.6 s | 29.512 MB/s | 0.8 % | 15.3 % |<br />
| Random Read 16 MBs | 2.1 s | 7.560 MB/s | 1.0 % | 7.3 % |<br />
`----------------------------------------------------------------------'<br />
Tiotest latency results:<br />
,-------------------------------------------------------------------------.<br />
| Item | Average latency | Maximum latency | % >2 sec | % >10 sec |<br />
+--------------+-----------------+-----------------+----------+-----------+<br />
| Write | 0.456 ms | 525.375 ms | 0.00000 | 0.00000 |<br />
| Random Write | 0.984 ms | 631.220 ms | 0.00000 | 0.00000 |<br />
| Read | 0.131 ms | 54.939 ms | 0.00000 | 0.00000 |<br />
| Random Read | 0.512 ms | 9.519 ms | 0.00000 | 0.00000 |<br />
|--------------+-----------------+-----------------+----------+-----------|<br />
| Total | 0.294 ms | 631.220 ms | 0.00000 | 0.00000 |<br />
`--------------+-----------------+-----------------+----------+-----------'<br />
<br />
It is clear from the above results that writing is >3 times slower than reading into the NFS mounted RAID volume.<br />
<br />
=== dd test ===<br />
<br />
Writing to the RAID 5 volume over NFS using UDP gives:<br />
<br />
[root@dpm dpmmgr]# time dd if=/dev/zero of=big-testfile bs=1024k count=5120<br />
5120+0 records in<br />
5120+0 records out<br />
<br />
real 20m20.782s<br />
user 0m0.060s<br />
sys 1m7.740s<br />
<br />
Corresponding to a rate of about 4.5MB/s. Presumably using TCP will result in even slower performance.<br />
<br />
=== iperf ===<br />
<br />
Running iperf between dcache and dpm shows that the <br />
<br />
[root@dcache iperf-dir]# ./iperf -c dpm.epcc.ed.ac.uk -p 52000<br />
------------------------------------------------------------<br />
Client connecting to dpm.epcc.ed.ac.uk, TCP port 52000<br />
TCP window size: 16.0 KByte (default)<br />
------------------------------------------------------------<br />
[ 5] local 129.215.175.24 port 43712 connected with 129.215.175.6 port 52000<br />
[ ID] Interval Transfer Bandwidth<br />
[ 5] 0.0-10.0 sec 987 MBytes 828 Mbits/sec<br />
<br />
Corresponding to about 100MB/s over TCP.<br />
<br />
<br />
<br />
[[Category:ScotGrid]]<br />
[[Category:Storage]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/061019-dteam-minutes061019-dteam-minutes2007-03-01T11:44:40Z<p>Andrew elwell: </p>
<hr />
<div>GridPP deteam meeting 17/10/2006<br />
<br />
Olivier van der Aa<br />
Greig Cowan<br />
Jamie Ferguson<br />
Philippa Strange<br />
Jeremy Coles<br />
Graeme Stuart<br />
<br />
Team Member updates<br />
<br />
Greig:<br />
- Transfer tests between ed-man. tomcat dying at ral. Rates where high but no indication if tomcat dying is due to the high rate. <br />
- dCache testing of the new version. Main changes are: voms, close_wait solved.<br />
JC, will it go to the pps ? GC, dCache is separate from the lcg validation because they have their release validation procedure. <br />
- JC: how are the other sites handling the dCache. There is a concern about the stability of SE.<br />
<br />
Olivier:<br />
- Worked on site availability. Computing delivered/potential. <br />
- Helping to solve Apel accounting problem at RHUL<br />
- Chase up lhcb running in London. QMUL currently out of the production mask. <br />
- Helping Duncan at Brunel. <br />
<br />
Jamie:<br />
- Problem with the filetransfer script due to tomcat restart. <br />
- RALPPD-RHUL 18 hours, will finish at 12h. Successful. Trying to understand why it did not work for man<br />
- Lancaster, is using srmcp. <br />
JC: UCL-HEP just has outbound. Olivier will chase that issue (ACTION: Olivier follow-up transfer tests)<br />
- JC: Jamie is there for 6.5 days. What is the priority: get Glasgow dpm up and running. Make sure the script is ok. <br />
- GS: After Jamie is gone I am ready to lead more the transfer tests.<br />
- JF: I have send an email to each site to have some explanation. JC: we will discuss this at the meeting tomorrow. <br />
<br />
Graeme:<br />
- Working on the cluster. Problem with the OS delivered now solved. <br />
- Tried to run jobs via the admin and it did not work<br />
- JC: did you have a try of the disk tests tool. GS: Have done it on the Hitachi disk and seems ok. We did the test for one week. Rate was very good > 100MB/s<br />
<br />
Philippa:<br />
- TPM schedule appear to have changed. We are backup this week. Next shift is <br />
nov 18 T8. T7 11 Dec and 1 jan. <br />
<br />
Jeremy Coles:<br />
- Interviews for someone working on pps. Nobody that applied has knowledge <br />
of grid. <br />
- Another interview for the security post.<br />
<br />
Action Review:<br />
- D-060609 closed<br />
- D-060815 closed<br />
- D-060830 Started <br />
- D-061010-1 : about the transfer tests<br />
- D-061010-3: closed<br />
- D-061010-5: closed<br />
<br />
Actions<br />
<br />
Action: <br />
- Olivier check that the new cic portal gives less results and is easier to <br />
fill in. <br />
- Olivier follow-up transfer tests at UCL.<br />
<br />
[[Category:GridPP_Deployment]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Bham_Upgrade_to_LCG-2.7.0Bham Upgrade to LCG-2.7.02007-03-01T11:44:08Z<p>Andrew elwell: </p>
<hr />
<div>== '''Initial objectives and motivation''' ==<br />
<br />
* Early upgrade to 2.7.0 constituted a logical continuation to the UK pre-release testing.<br />
* I opted for an upgrade with no downtime (well hopefully!) rather than a fresh install for this release. <br />
<br />
== '''Yaim config files''' ==<br />
<br />
=== Yaim tool for VO management ===<br />
<br />
I tried out the online yaim tool for VO management [https://lcg-sft.cern.ch/yaimtool/yaimtool.py], which produced a configuration segment <br />
that I pasted into site-info.def. It's a great tool! The list of VOS it produced were all with capital letters:<br />
<br />
VOS="ALICE ATLAS BABAR BIOMED CMS DTEAM HONE ILC LHCB SIXT ZEUS"<br />
<br />
I didn't want this and I expect yaim would complain about this? A good measure is also to check the other values in the generated file with the information in the 2.6.0 site-info file.<br />
<br />
=== Changes to watch out for ===<br />
<br />
Here are the changes in the yaim config file to watch out for.<br />
I initially unset the following variable as we don't have a classic SE:<br />
<br />
CLASSIC_HOST="classic SE host"<br />
<br />
Leaving it blank will cause yaim to crash, leaving it as it is will wrongly configure the bdii-update.conf file inserting "classic SE<br />
host" in the ldap contact string instead of the SE hostname. So, I had to set this to my SE/DPM server hostname.<br />
I was unsure about what value to attribute to DPMDATA. I set it to:<br />
<br />
DPMDATA="/dpm/ph.bham.ac.uk/home"<br />
<br />
on my CE as this is the value I want my site BDII to publish for GlueSARoot, but I had to set this to one of my DPM filesystem on our SE<br />
site-info.def file. If you don't, this is the type of thing you're going to end up with:<br />
<br />
root@epgse1 install-scripts]# dpm-qryconf<br />
POOL dpmPart DEFSIZE 200.00M GC_START_THRESH 0 GC_STOP_THRESH 0 DEFPINTIME 0 PUT_RETENP 86400 FSS_POLICY maxfreespace GC_POLICY lru RS_POLICY<br />
fifo GID 0 S_TYPE -<br />
CAPACITY 1.79T FREE 1.60T ( 89.4%)<br />
epgse1 /disk/f3a CAPACITY 601.08G FREE 561.87G ( 93.5%)<br />
epgse1 /disk/f3b CAPACITY 601.08G FREE 500.11G ( 83.2%)<br />
epgse1 /disk/f3c CAPACITY 601.08G FREE 555.95G ( 92.5%)<br />
epgse1.ph.bham.ac.uk /dpm/ph.bham.ac.uk/home CAPACITY 32.60G FREE 24.19G ( 74.2%<br />
<br />
As can be seen from the output, new dpm-fs (epgse1.ph.bham.ac.uk) has<br />
been created with the FQDN of my SE, but that's not what I wanted - I<br />
deleted it with dpm-rmfs<br />
<br />
The users.conf file is slightly different, I generated a new file with<br />
an increased number of pool accounts and the new prd accounts, based on my<br />
existing pool account configuration.<br />
<br />
There is a new group groups.conf yaim file which will be needed for<br />
VOMSES. I think it's safe to leave it as it is, though I generated the<br />
info for my extra VOs based on the template. In hindsight, I would not<br />
recommend doing this? See also [http://www.listserv.rl.ac.uk/cgi-bin/webadmin?A2=ind0602&L=lcg-rollout&P=5023 ROLLOUT].<br />
<br />
== '''Upgrade plan''' ==<br />
<br />
As explained in the yaim release note, it is not possible to perform<br />
an upgrade with no downtime unless two MON boxes are running simultaneously with old and new versions of LCG.<br />
I was lucky to have a spare node with a certificate on which I<br />
installed a 2.7 MON. I pointed my newly 2.7 WNs to this new MON while<br />
still keeping the 2.6 MON node in operation which was the last node I<br />
updated to 2.7. I then reran the yaim configuration scripts on my all<br />
nodes to point to my old 2.6 MON. <br />
<br />
== '''Signed rpms?''' ==<br />
<br />
Not yet, but on the way? In the pre-release, it was mentioned that packages were signed with <br />
<br />
http://glite.web.cern.ch/glite/packages/keys/EGEE.gLite.GPG.public.asc<br />
<br />
So, I thought I would give it a go, but ended with yum complaints:<br />
<br />
edg-mkgridmap-conf-2.6.0- 100% |=========================| 19 kB 00:00<br />
Error: Unsigned Package /var/cache/yum/sl-lcg/packages/edg-mkgridmap-conf-2.6.0-1_sl3.noarch.rpm<br />
Error: You may want to run yum clean or remove the file:<br />
/var/cache/yum/sl-lcg/packages/edg-mkgridmap-conf-2.6.0-1_sl3.noarch.rpm<br />
Error: You may need to disable gpg checking to install this package<br />
<br />
(Yes, edg-mkgridmap-conf-2.6 is in the 2.7 repository)<br />
<br />
== '''Temporary MON installation''' ==<br />
<br />
yaim suddenly stopped complaining about the mktemp usage. I also saw this error message when upgrading<br />
LFC and DPM. A typical error message is:<br />
<br />
Configuring config_lfc_upgrade<br />
Usage: mktemp [-d] [-q] [-u] template<br />
<br />
mktemp needed an argument on my SL-3.0.4. The following functions were affected: <br />
config_DPM_upgrade, config_lfc_upgrade and config_rgma_server. This seems to be only an issue with mktemp in SL-3.0.4, I did see this error in the<br />
pre-release installation on SL-3.0.5. See also [http://www.listserv.rl.ac.uk/cgi-bin/webadmin?A2=ind0602&L=lcg-rollout&P=8654|ROLLOUT]<br />
<br />
In the fresh install of my MON, the MYSQL password for mysql was not set for the localhost and I set it manually, see<br />
[http://www.listserv.rl.ac.uk/cgi-bin/webadmin?A2=ind0602&L=lcg-rollout&P=1600|ROLLOUT]<br />
<br />
== '''WN and CE upgrade''' ==<br />
<br />
Was smooth!<br />
<br />
== '''DPM backups and upgrade!''' ==<br />
<br />
I pushed and the wrong installation scripts on my SE, and corrupted my<br />
DPM database as a result! Fortunately, I had done a back-up just before<br />
the upgrade, unfotunately I initially had some problems restoring the<br />
database. The moral of this story is to not only back up your database but also<br />
to make sure you can recreate it. I did some tests using a spare<br />
machine (my laptop) with a MySQL installed:<br />
<br />
[mysql@localhost mysql]# mysql -u root -p < /home/yrc/mysql-dump-2006-02-03T06-00.sql Enter password:<br />
ERROR 1005 (HY000) at line 268: Can't create table './dpm_db/dpm_fs.frm' (errno: 150)<br />
<br />
I traced this unhelpful error message to a problem with foreign key<br />
checks (the thinggy that makes sure there is a bijection between<br />
indices in different tables). I had to hack the sql backup file and add:<br />
<br />
SET FOREIGN_KEY_CHECKS=0;<br />
all the SQL stuff<br />
SET FOREIGN_KEY_CHECKS=1;<br />
<br />
at the beginning and the end of the file.<br />
<br />
Anyway, once this problem was solved, Graeme's [[DPM_Upgrades|function]] did marvels!<br />
Here is an extract of the output for info:<br />
<br />
Examining DPM for required upgrades to domain names and db schema<br />
MODIFIED FUNCTION!<br />
Examining DPM server hostname epgse1... looks unqualified.<br />
Found simple hostnames. Converting to FQDNs in DPM/DPNS databases.<br />
Sat Feb 4 18:11:17 2006 : Starting to add the domain name.<br />
Please wait...<br />
Sat Feb 4 18:11:17 2006 : 1000 entries migrated<br />
Sat Feb 4 18:11:18 2006 : 2000 entries migrated<br />
Sat Feb 4 18:11:19 2006 : 3000 entries migrated<br />
Sat Feb 4 18:11:20 2006 : 4000 entries migrated<br />
Sat Feb 4 18:11:21 2006 : 5000 entries migrated<br />
Sat Feb 4 18:11:21 2006 : 6000 entries migrated<br />
Sat Feb 4 18:11:22 2006 : 7000 entries migrated<br />
Sat Feb 4 18:11:23 2006 : 8000 entries migrated<br />
Sat Feb 4 18:11:24 2006 : 9000 entries migrated<br />
Sat Feb 4 18:11:25 2006 : 10000 entries migrated<br />
Sat Feb 4 18:11:26 2006 : 11000 entries migrated<br />
Sat Feb 4 18:11:26 2006 : 12000 entries migrated<br />
Sat Feb 4 18:11:27 2006 : The update of the DPNS database is over<br />
3 disk server names have been modified in the configuration.<br />
12311 entries have been migrated.<br />
domain name = ph.bham.ac.uk<br />
db vendor = MySQL<br />
db = epgse1.ph.bham.ac.uk<br />
DPNS database user = dpmmgr<br />
DPNS database password = IwasNotGoingToLeaveitHere!<br />
DPNS database name = cns_db<br />
DPM database name = dpm_db<br />
Mysql database version used: 2.1.0<br />
Found schema version 2.1.0. No need to upgrade the DPM database schema<br />
<br />
A dpm-qryconf revealed FQDN for all my dpm filesystems!<br />
<br />
P.S. Following my database problem, I maintained my site in production<br />
as no jobs, but dteam, landed on our CE due to the VO software publishing bug (now fixed) [http://www.listserv.rl.ac.uk/cgi-bin/webadmin?A2=ind0602&L=lcg-rollout&P=8430|ROLLOUT]<br />
<br />
== '''Tweaks and bugs''' ==<br />
<br />
Testing was made very difficult because the SFT result webpage got<br />
stuck on the 2006-02-05 01:05:01. I could still run the SFT but<br />
couldn't get any info on why I failed a test. This is in these occasions, that<br />
we really realise the importance of the SFT test suite is!<br />
<br />
=== SE information system ===<br />
<br />
I had problems to start globus-mds on my DPM SE (A service globus-mds<br />
start) will report that this daemon has started, but a netstat will<br />
show that no process is listening on port 2135. After some debugging<br />
I found out that yaim had overwritten the globus-script-initializer file <br />
in /opt/globus/libexec with an empty file. I copied across this file<br />
from my CE, an I could at last properly start globus-mds. For more<br />
details on this, see [http://www.listserv.rl.ac.uk/cgi-bin/webadmin?A2=ind0602&L=lcg-rollout&D=0&P=20012|ROLLOUT]<br />
<br />
=== Maui === <br />
<br />
There is new cool dynamic plug-in for Maui. It published the<br />
information correctly for the default Maui config in yaim, but it did not<br />
like my configuration based on Steve's Cookbook method, even after I<br />
added the edginfo and rgma users to the Maui admins.<br />
This is being looked at, see thread on [http://www.listserv.rl.ac.uk/cgi-bin/webadmin?A2=ind0602&L=lcg-rollout&P=11661|ROLLOUT]<br />
Actually, not so cool at all! <br />
<br />
'''I had disabled the vomaxjobs-maui plugin after the Bham CE got flooded by +1000 jobs!''' I'm now relying on the 2.6.0 lcg-info-dynamic-pbs plugin to report on the status of jobs until an official fix for vomaxjobs-maui is released.<br />
<br />
=== RGMA ===<br />
<br />
I failed many times the RGMA client test. I observed different type of<br />
errors when the rgma client test ran on the same WN:<br />
<br />
Checking C API: Failure - failed to query test tuple<br />
<br />
Checking CommandLine API: Failure - failed to query test tuple<br />
Checking Java API: Failure - failed to query test tuple<br />
<br />
Checking C++ API: Failure - failed to query test tuple<br />
<br />
Checking C API: Failed to create producer: Mangled HTTP response from servlet.<br />
Failure - failed to insert test tuple<br />
Checking C++ API: R-GMA application error in PrimaryProducer: No xml returned<br />
Failure - failed to insert test tuple<br />
Checking CommandLine API: ERROR: Could not contact R-GMA server at epgmo1.ph.bham.ac.uk:8443 - HTTP error 400 (No Host matches server name epgmo1.ph.bham.ac.uk)<br />
Failure - failed to insert test tuple<br />
Checking Java API: Failed to contact R-GMA server: Server returned HTTP response code: 400 for URL: https://epgmo1.ph.bham.ac.uk:8443/R-GMA/PrimaryProducerServlet/createPrimaryProducer?terminationIntervalSec=600&type=memory&isLatest=false&isHistory=false<br />
Failure - failed to insert test tuple<br />
Checking Python API: RGMA Error: Could not contact R-GMA server at epgmo1.ph.bham.ac.uk:8443 - HTTP error 400 (No Host matches server name epgmo1.ph.bham.ac.uk)<br />
Failure - failed to insert test tuple<br />
<br />
It seems that restarting RGMA fixes this? I'm looking into this at the moment.<br />
<br />
=== Information publishing ===<br />
<br />
We shall all make sure we publish only correct information, there are<br />
various very interesting messages on ROLLOUT and GRIDPP-STORAGE.<br />
<br />
=== GridIce ===<br />
<br />
GridIce runs on my LFC/MON node. I had to add a blank line followed by<br />
<br />
[mds/gris/provider/gridice]<br />
<br />
after [mds/gris/provider/edg] in /etc/globus.conf.<br />
Unfortunately, I still do not publish anything and see the same<br />
problem which is described in [http://www.listserv.rl.ac.uk/cgi-bin/webadmin?A2=ind0602&L=lcg-rollout&P=11779|ROLLOUT]<br />
<br />
<br />
[[Category:SouthGrid]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/GLite_File_Transfer_ServiceGLite File Transfer Service2007-02-28T12:48:51Z<p>Andrew elwell: </p>
<hr />
<div>==Overview==<br />
<br />
The '''gLite File Transfer Service''', developed as part of the [[gLite]] middleware stack, aims to reliably copy one [[Storage URL]] to another. It uses a 3rd party copy (e.g. <tt>gsiftp</tt>) to achieve this, but will retry if this fails.<br />
<br />
It also schedules these copys along [[FTS Channels|network channels]] to ensure that bandwidth is properly used.<br />
<br />
State in the FTS is held in a database, which ensures that the service can be restarted reliably.<br />
<br />
==Resources and Documentation==<br />
<br />
===Presentations on the FTS (Overview and architecture)===<br />
<br />
* At [http://agenda.cern.ch/askArchive.php?base=agenda&categ=a051050&id=a051050s3t2/transparencies CERN SC3 Workshop].<br />
* At [https://edms.cern.ch/file/593997/1/EGEE-JRA1-PRE-593997-FTS-180505.pdf SC3 Workshop in London].<br />
* [http://agenda.cern.ch/askArchive.php?base=agenda&categ=a053393&id=a053393s1t1/transparencies Tutorial] from SC3 planning meeting.<br />
* [[User:Graeme stewart|Graeme]]'s talk at GridPP 13.<br />
<br />
===Installation and Use===<br />
<br />
* The most up-to-date information is in the LCG [https://twiki.cern.ch/twiki/bin/view/LCG/FtsRelease13 wiki] (this is for FTS from gLite 1.3).<br />
* UK Tier 2s should use a client configuration for the [[RAL Tier1 File Transfer Service | RAL FTS service]].<br />
<br />
===Client API===<br />
<br />
* gLite FTS [http://glite.web.cern.ch/glite/documentation/default.asp Command Line Tools]<br />
* gLite FTS [http://glite.web.cern.ch/glite/documentation/default.asp User guide]<br />
<br />
==See also==<br />
[[RAL Tier1 File Transfer Service]]<br />
<br />
[[Category:File_Transfer]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/FTSFTS2007-02-28T12:48:10Z<p>Andrew elwell: </p>
<hr />
<div>#REDIRECT [[GLite_File_Transfer_Service]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Building_SlonyBuilding Slony2007-02-28T12:31:56Z<p>Andrew elwell: </p>
<hr />
<div>Slony needs to be built against the version of postgresql that is in use, therefore there may not be rpms available.<br />
<br />
==To build slony on SL3==<br />
<br />
Download the slony tarball from the Slony website<br />
<br />
Install the following rpms if not already installed:<br />
<br />
rpm-build<br />
postgresql-devel<br />
flex<br />
bison<br />
openssl-devel<br />
autoconf<br />
gcc<br />
perl-DBI<br />
<br />
(Unfortunately the version of autoconf supplied with SL3 seems to be too old to build slony successfully, however [ftp://ftp.pbone.net/mirror/www.arklinux.org/2005.1-SR1/noarch/autoconf-2.59-1ark.noarch.rpm autoconf 2.59] works.)<br />
<br />
<br />
As the postgres user:<br />
<br />
Expand the tarball<br />
Change into the newly created directory<br />
Run ./configure<br />
Edit the Makefile.global and change the line :<br />
override CPPFLAGS := -I${pgincludedir} -I${pgincludeserverdir} $(CPPFLAGS)<br />
to<br />
override CPPFLAGS := -I${pgincludedir} -I${pgincludeserverdir} -I/usr/kerberos/include $(CPPFLAGS)<br />
Run make<br />
<br />
<br />
As root:<br />
<br />
Change into the directory<br />
Run make rpm<br />
<br />
If all has completed successfully then an rpm will exist under /usr/src/redhat/RPMS/i386 with a name like : postgresql-slony1-engine-1.1.5_RC3-1_PG8.1.0.i386.rpm<br />
<br />
==To build slony on SL4:==<br />
<br />
This is identical to above except that it is not necessary to edit Makefile.global<br />
<br />
<br />
[[Category: DCache]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/DcacheDcache2007-02-16T14:11:56Z<p>Andrew elwell: </p>
<hr />
<div>#REDIRECT [[dCache]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/DgangDgang2006-12-11T15:04:41Z<p>Andrew elwell: </p>
<hr />
<div>There is a tool that FermiLab use for cluster management. This may just be adding to a growing list of such tools and people may already know about it, but it caught my interest since a modified version exists that the FermiLab people use to manage their dCache systems. It basically allows you to execute the same command on a subset of the nodes in your cluster. This could prove useful for starting/stopping dCache services at sites that have a large number of nodes. This may be particularly relevant for sites running resilient dCache. Then again, such sites may already be using some other cluster management software like quattor. The benefit of the Fermilab one is that it appears to be pretty simple (written in python and uses ssh to comminicate with the nodes). You can find out more about the tool (rgang) [http://fermitools.fnal.gov/abstracts/rgang/abstract.html here]. The dCache version is called '''dgang'''.<br />
<br />
<br />
[[category:dCache]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/DPM_MySQL_databaseDPM MySQL database2006-12-11T15:03:27Z<p>Andrew elwell: </p>
<hr />
<div>This page describes the structure of the MySQL database tables that DPM (v1.5.7) uses. The interesting ones are <code>dpm_db</code> which deals with the get and put requests (and therefore does not need to be backed up) and <code>cns_db</code> which holds the state of the DPM namespace (which should be backed up). The '''c''' in <code>cns_db</code> comes from the fact that the DPNS shares code with CASTOR. The database user and password that you need to use is contained in the file <code>/opt/lcg/etc/DPMCONFIG</code> (remember to set the permissions so that only the root/dpmmgr can read this file).<br />
<br />
# mysql --user=<username> -p<br />
Enter password:<br />
Welcome to the MySQL monitor. Commands end with ; or \g.<br />
Your MySQL connection id is 48 to server version: 4.1.11-standard<br />
<br />
mysql> show databases;<br />
+----------+<br />
| Database |<br />
+----------+<br />
| cns_db |<br />
| dpm_db |<br />
| mysql |<br />
| test |<br />
+----------+<br />
4 rows in set (0.00 sec)<br />
<br />
mysql> use test;<br />
Database changed<br />
mysql> show tables;<br />
Empty set (0.00 sec)<br />
<br />
mysql> use mysql;<br />
Database changed<br />
mysql> show tables;<br />
+---------------------------+<br />
| Tables_in_mysql |<br />
+---------------------------+<br />
| columns_priv |<br />
| db |<br />
| func |<br />
| help_category |<br />
| help_keyword |<br />
| help_relation |<br />
| help_topic |<br />
| host |<br />
| tables_priv |<br />
| time_zone |<br />
| time_zone_leap_second |<br />
| time_zone_name |<br />
| time_zone_transition |<br />
| time_zone_transition_type |<br />
| user |<br />
+---------------------------+<br />
15 rows in set (0.00 sec)<br />
<br />
mysql> use dpm_db;<br />
Database changed<br />
mysql> show tables;<br />
+------------------+<br />
| Tables_in_dpm_db |<br />
+------------------+<br />
| dpm_copy_filereq |<br />
| dpm_fs |<br />
| dpm_get_filereq |<br />
| dpm_pending_req |<br />
| dpm_pool |<br />
| dpm_put_filereq |<br />
| dpm_req |<br />
| dpm_space_reserv |<br />
| dpm_unique_id |<br />
| schema_version |<br />
+------------------+<br />
10 rows in set (0.00 sec)<br />
<br />
mysql> use cns_db;<br />
Database changed<br />
mysql> show tables;<br />
+--------------------+<br />
| Tables_in_cns_db |<br />
+--------------------+<br />
| Cns_class_metadata |<br />
| Cns_file_metadata |<br />
| Cns_file_replica |<br />
| Cns_groupinfo |<br />
| Cns_symlinks |<br />
| Cns_unique_gid |<br />
| Cns_unique_id |<br />
| Cns_unique_uid |<br />
| Cns_user_metadata |<br />
| Cns_userinfo |<br />
| schema_version |<br />
+--------------------+<br />
11 rows in set (0.00 sec)<br />
<br />
== <code>cns_db</code> ==<br />
<br />
Probably the most interesting tables are <code>Cns_file_metadata</code> and <code>Cns_file_replica</code> as these contain information on the location (pool and filesystem) of the file, its size, ACLs, creation time etc. i.e.<br />
<br />
mysql> select f_type, poolname, fs, sfn from Cns_file_replica;<br />
+--------+-----------+-----------+----------------------------------------------------------------------+<br />
| f_type | poolname | fs | sfn |<br />
+--------+-----------+-----------+----------------------------------------------------------------------+<br />
| P | dpm-pool1 | /storage1 | wn4.epcc.ed.ac.uk:/storage1/dteam/2006-09-13/20060913_191110.txt.1.0 |<br />
+--------+-----------+-----------+----------------------------------------------------------------------+<br />
<br />
<code>Cns_groupinfo</code> and <code>Cns_userinfo</code> contain the mappings of the internal (i.e. non-Unix) DPM gids and uids to the VO groups and user DNs. i.e.<br />
<br />
mysql> select * from Cns_groupinfo;<br />
+-------+------+-----------+<br />
| rowid | gid | groupname |<br />
+-------+------+-----------+<br />
| 1 | 102 | atlas |<br />
| 2 | 103 | alice |<br />
| 3 | 104 | lhcb |<br />
| 4 | 105 | cms |<br />
| 5 | 106 | dteam |<br />
| 6 | 107 | biomed |<br />
+-------+------+-----------+<br />
<br />
mysql> select * from Cns_userinfo;<br />
+-------+--------+---------------------------------------------------------------------------------------+<br />
| rowid | userid | username |<br />
+-------+--------+---------------------------------------------------------------------------------------+<br />
| 1 | 101 | /C=UK/O=eScience/BlahBlah.......................... |<br />
+-------+--------+---------------------------------------------------------------------------------------+<br />
<br />
When you assign a [[DPM_VO_Specific_Pools | VO to a specific pool]], you must use either the VO name or the internal DPM gid, not the Unix gid for that group.<br />
<br />
<br />
[[category:Disk Pool Manager]] [[category:Databases]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/Building_DPMBuilding DPM2006-12-11T14:59:58Z<p>Andrew elwell: </p>
<hr />
<div>This page is out of date. I'll update it when I get a chance, but don't use it for now.<br />
<br />
Graeme<br />
<br />
==DPM Sources==<br />
<br />
===CVS Repositories===<br />
<br />
====DPM====<br />
<br />
DPM sources are avaliable from CERN's anonymous CVS system. To check it out set:<br />
<br />
export CVSROOT=:pserver:anonymous@isscvs.cern.ch:/local/reps/lcgware<br />
export CVS_RSH=ssh <br />
cvs co -r lcgX_Y_Z LCG-DM<br />
<br />
Where <tt>X_Y_Z</tt> is the DPM version number to check out. (Unless you're doing bleeding edge DPM development, it's best to use the last stable release.)<br />
<br />
====DPM gsiftp====<br />
<br />
The sources for the DPM gridftp server are in a different repository:<br />
<br />
cvs co -r lcgX_Y_Z DPM-FTP<br />
<br />
N.B. The version numbers of the DPM gsiftp server don't necessarily correspond to those for the rest of the DPM source code.<br />
<br />
[[category:Disk Pool Manager]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/XFS_Filesystem_HowtoXFS Filesystem Howto2006-12-11T14:58:27Z<p>Andrew elwell: </p>
<hr />
<div>The following is the procedure to migrate from a non-xfs filesystem (e.g. ex2 or ex3) to xfs without loosing data.<br />
The DPM nameserver uses the diskserver and the filepath therein to source files. Hence the filesystem format can be altered without affecting the DPM nameservers ability to locate files.<br />
DO NOT unmount the filesystem (umount /dev/...) from the dpm headnode and DEFINATELY DO NOT remove the dpm filesystems (e.g dpm-rmfs)<br />
<br />
* Shutdown the DPM daemons<br />
on the headnode<br />
service dpm-gsiftp stop<br />
service srmv2 stop<br />
service srmv1 stop<br />
service dpm stop<br />
service rfiod stop<br />
service dpnsdaemon stop<br />
on each disk server<br />
service dpm-gsiftp stop<br />
service rfiod stop<br />
<br />
and make sure there can be no more writing to the files<br />
on the headnode<br />
mount -o remount ro "each filesystem mountpoint"<br />
<br />
* Tar up all the data from each individual filesystem. e.g. if there are n filesystems then create n tarballs. And store the tarballs in a secure location.<br />
The following examples assume the DPM filesystem mountpoint is filepath/dpmdata.<br />
The tarballing and moving can be done in one command.<br />
from the source machine (dpm headnode)<br />
tar -cvf filepath/dpmdata | ssh user@destination cat \> dpmdata.tar<br />
from the destination (storage machine)<br />
ssh user@source tar -cvf - filepath/dpmdata > dpmdata.tar<br />
<br />
* Verify the contents of each tarball.<br />
tar -tvf dpmdata.tar<br />
<br />
* Format each filesystem to be xfs. see the [[XFS Kernel Howto]].<br />
<br />
* Unpack each tarball into the appropriate directory.<br />
The moving and unpacking can be done in one command.<br />
from the source (storage machine)<br />
tar -xpvf dpmdata.tar | ssh user@destination \> filepath/dpmdata<br />
from the destination (dpm headnode)<br />
ssh user@source tar -xpvf dpmdata.tar - > filepath/dpmdata<br />
<br />
[[category:XFS]][[category:Disk Pool Manager]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/DCache_upgrade_1.6.6_to_1.7.0DCache upgrade 1.6.6 to 1.7.02006-12-11T14:56:31Z<p>Andrew elwell: </p>
<hr />
<div>[[category:dCache]]<br />
This page details the steps required to upgrade to the v1.7.0 of dCache from v1.6.6 that was distributed with gLite 3. <br />
<br />
== Pre-Upgrade ==<br />
<br />
At the time of writing, dCache v1.7.0 has not moved to the CERN gLite apt repository and can only be installed by using the dCache.org repository:<br />
<br />
cat /etc/apt/sources.list.d/dcache.list<br />
rpm http://www.dcache.org/apt/ sl stable <br />
<br />
This also contains an updated version of glite-yaim which has new configuration options that dCache can take advantage of. <br />
<br />
apt-get install glite-yaim <br />
<br />
Modify your site-info.def to take account of the new config options. '''Make sure that you set'''<br />
<br />
RESET_DCACHE_PNFS=no<br />
RESET_DCACHE_RDBMS=no<br />
<br />
and have these port ranges set (note that YAIM looks for these settings, so you must have them defined in your site-info.def)<br />
<br />
DCACHE_PORT_RANGE_PROTOCOLS_SERVER_GSIFTP=50000,52000<br />
DCACHE_PORT_RANGE_PROTOCOLS_SERVER_MISC=60000,62000<br />
DCACHE_PORT_RANGE_PROTOCOLS_CLIENT_GSIFTP=33115,33215<br />
<br />
I think the DCACHE_PORT_RANGE is depreciated. These ranges correspond to the following:<br />
<br />
* GridFTP range for the dCache server acting in passive mode.<br />
* dcap/xrootd port range for server acting in passive mode (now default for these protocols).<br />
* GridFTP port range when the dCache is acting as an active client.<br />
<br />
Depending on your site policy, you will need to carefully choose the port ranges for the gridftp traffic. Typically sites only have 20000:25000 open for gridftp transfers, so anything that tries to use the defauly 33115:33215 for active client transfers will be blocked. If you want, you could split up the assigned range across the server and client range variables above.<br />
<br />
The are now two dCache admin node metapackages available. <br />
<br />
* glite-SE_dcache_admin_postgres - for sites running the postgreSQL version of PNFS<br />
* glite-SE_dcache_admin_gdbm - for sites still running with the GDBM backend.<br />
<br />
Information for migrating to the postgres version can be found in the dCache book [http://www.dcache.org/manuals/Book/cb-pnfs-postgres.shtml here]. Sites are recommended to move to the postgres version as soon as possible, preferably before performing this upgrade.<br />
<br />
For sites already running the postgres version of dCache 1.6.6-5 you need to drop some of the postgres tables and recreate them _before_ upgrading to dCache 1.7.0. Stop all dCache and PNFS services then run:<br />
<br />
dropdb -U srmdcache billing<br />
dropdb -U srmdcache dcache<br />
dropdb -U srmdcache replicas<br />
createdb -U srmdcache billing<br />
createdb -U srmdcache dcache<br />
createdb -U srmdcache replicas<br />
psql -U srmdcache replicas -f /opt/d-cache/etc/psql_install_replicas.sql<br />
<br />
== Upgrade ==<br />
<br />
Make sure that postgres is running before you run the upgrade.<br />
<br />
Installation and configuration then proceed as normal for YAIM.<br />
<br />
/opt/glite/yaim/scripts/install_node /opt/glite/yaim/etc/site-info.def \<br />
glite-SE_dcache_admin_postgres 2>&1 | tee /root/dcache_admin_upgrade.txt<br />
<br />
/opt/glite/yaim/scripts/configure_node /opt/glite/yaim/etc/site-info.def \<br />
glite-SE_dcache_admin_postgres 2>&1 | tee /root/dcache_admin_upgrade-config.txt<br />
<br />
== Post-Upgrade ==<br />
<br />
If you have set<br />
<br />
RESET_DCACHE_CONFIGURATION=yes<br />
<br />
in site-info.def then after the upgrade you should also check that the dCache billing database is still enabled:<br />
<br />
grep billingToDb /opt/d-cache/config/dCacheSetup<br />
billingToDb=yes<br />
# EXPERT: First is default if billingToDb=no, second for billingToDb=yes<br />
<br />
and that the GridFTP performance markers are set to a sensible value like<br />
10 (you may experience problems with FTS transfers if you use the default value of 180).<br />
<br />
grep performanceMarkerPeriod /opt/d-cache/config/dCacheSetup<br />
# Set performanceMarkerPeriod to 180 to get performanceMarkers<br />
performanceMarkerPeriod=10<br />
<br />
If you have made any changes to the /opt/d-cache/config/*.batch files then<br />
these will also have to be remade.<br />
<br />
== Checks ==<br />
<br />
Simple check to see what processes are listening:<br />
<br />
# netstat -tlp|grep java<br />
tcp 0 0 localhost.localdomain:8005 *:* LISTEN 14248/java<br />
tcp 0 0 *:8009 *:* LISTEN 14248/java<br />
tcp 0 0 *:5001 *:* LISTEN 14248/java<br />
tcp 0 0 *:22223 *:* LISTEN 13659/java<br />
tcp 0 0 *:webcache *:* LISTEN 14248/java<br />
tcp 0 0 *:22128 *:* LISTEN 14197/java<br />
tcp 0 0 *:2288 *:* LISTEN 13738/java<br />
tcp 0 0 *:57559 *:* LISTEN 13412/java<br />
tcp 0 0 *:8443 *:* LISTEN 14248/java<br />
tcp 0 0 *:2811 *:* LISTEN 14089/java<br />
tcp 0 0 *:22111 *:* LISTEN 13989/java<br />
<br />
* 5001 is used by SOAPMonitor service.<br />
* 8005 is used locally by a shutdown script. Tomcat binds it to the localhost interface only.<br />
* 8009 is used for AJPv13 (Apache JServ Protocol which has something to do with communication between the web server and the servlet container).<br />
* 8080 is http access to tomcat. Since the SRM web service is using GSI authentication and verification of the user's credential, before execution any of the requests, it is not a security risk to have it open, if you trust tomcat to be secure ( it might allow attackers to exploit some other known tomcat/axis vulnerabilities).<br />
* 8443 is the SRM.<br />
* 2811 is the GridFTP door.<br />
* 2288 is the web interface.<br />
* 22223 is the ssh admin interface.<br />
* 22128 is GSIDcap door.<br />
* 22111 is dCache information publisher.<br />
<br />
The SRM developer has stated that future versions of the code will have modified installation scripts which will disable services on 5001, 8009 and 8080. In case of shutdown, the service will protected by dynamically generated password (see [http://marc.theaimsgroup.com/?l=tomcat-user&m=103133645416097&w=2 here]).<br />
<br />
Make sure that you can srmcp into and out of the dCache after the upgrade and that PNFS is atill mounted on the door nodes.</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/DCache_Utility_LogDCache Utility Log2006-12-11T14:55:37Z<p>Andrew elwell: </p>
<hr />
<div>=== srm -> local ===<br />
<br />
11/29 17:50:56 Cell(PinManager@utilityDomain) : \<br />
stopTimer(): timer not found for requestId=-9223372036854775775<br />
<br />
=== srmCopy from the dCache ===<br />
<br />
11/29 17:51:16 Cell(PinManager@utilityDomain) : \<br />
stopTimer(): timer not found for requestId=-9223372036854775774<br />
<br />
<br />
Other SRM transfers do not lead to entries in the log file.<br />
<br />
[[category:dCache]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/DCache_SRM_LogDCache SRM Log2006-12-11T14:55:23Z<p>Andrew elwell: </p>
<hr />
<div>[[category:dCache]]<br />
== srmPut ==<br />
<br />
11/29 17:13:09 Cell(SRM-wn4@srm-wn4Domain) : PutRequestHandler error: copy request state changed to Done<br />
11/29 17:13:09 Cell(SRM-wn4@srm-wn4Domain) : PutRequestHandler error: changing fr#-2147483470 to Done<br />
<br />
== srmGet ==<br />
<br />
11/29 17:14:20 Cell(SRM-wn4@srm-wn4Domain) : Request id=-2147483469: copy request state changed to Done<br />
11/29 17:14:20 Cell(SRM-wn4@srm-wn4Domain) : Request id=-2147483469: changing fr#-2147483468 to Done<br />
<br />
== srmCopy from the dCache ==<br />
<br />
11/29 17:15:36 Cell(SRM-wn4@srm-wn4Domain) : Request id=-2147483467: copy request state changed to Done<br />
11/29 17:15:36 Cell(SRM-wn4@srm-wn4Domain) : Request id=-2147483467: changing fr#-2147483466 to Done<br />
<br />
== srmCopy into the dCache ==<br />
<br />
11/29 17:16:12 Cell(SRM-wn4@srm-wn4Domain) : CopyRequest reqId # -2147483465Request.createCopyRequest : \<br />
created new request succesfully \<br />
user credentials are: /C=UK/O=eScience/OU=Edinburgh/L=NeSC/CN=greig cowan<br />
11/29 17:16:12 Cell(SRM-wn4@srm-wn4Domain) : SRMClientV1 : \<br />
connecting to srm at httpg://srm.epcc.ed.ac.uk:8443/srm/managerv1<br />
11/29 17:16:18 Cell(SRM-wn4@srm-wn4Domain) : remoing TransferInfo for callerId=20005<br />
user credentials are: /C=UK/O=eScience/OU=Edinburgh/L=NeSC/CN=greig cowan<br />
11/29 17:16:18 Cell(SRM-wn4@srm-wn4Domain) : SRMClientV1 : \<br />
connecting to srm at httpg://srm.epcc.ed.ac.uk:8443/srm/managerv1<br />
11/29 17:16:19 Cell(SRM-wn4@srm-wn4Domain) : CopyRequest reqId \<br />
# -2147483465copyRequest getter_putter is non null, stopping<br />
11/29 17:16:19 Cell(SRM-wn4@srm-wn4Domain) : CopyRequest reqId \<br />
# -2147483465changing fr#-2147483464 to Done<br />
<br />
== srm-advisory-delete ==<br />
<br />
No entry.<br />
<br />
<br />
== SRM frozen ==<br />
<br />
05/18 16:15:55 Cell(SRM-dc001@srm-dc001Domain) : SRMClientV1 : get : try<br />
# 0 failed with error<br />
05/18 16:15:55 Cell(SRM-dc001@srm-dc001Domain) : SRMClientV1 :<br />
java.net.ConnectException: Connection timed out<br />
05/18 16:15:55 Cell(SRM-dc001@srm-dc001Domain) : SRMClientV1 : get : try<br />
again <br />
<br />
== Certificates problem ==<br />
<br />
03/23 12:23:11 Cell(SRM-dc001@srm-dc001Domain) : SslGsiSocketFactory :<br />
03/23 12:23:11 Cell(SRM-dc001@srm-dc001Domain) : Authentication failed.<br />
Caused by Failure unspecified at GSS-API level. Caused by<br />
COM.claymoresystems.ptls.SSLThrewAlertException: Unknown CA<br />
03/23 12:23:11 Cell(SRM-dc001@srm-dc001Domain) : at<br />
COM.claymoresystems.ptls.SSLConn.alert(SSLConn.java:235)<br />
03/23 12:23:11 Cell(SRM-dc001@srm-dc001Domain) : at<br />
COM.claymoresystems.ptls.SSLHandshake.recvCertificate(SSLHandshake.java:304)<br />
03/23 12:23:11 Cell(SRM-dc001@srm-dc001Domain) : at<br />
COM.claymoresystems.ptls.SSLHandshakeServer.processTokens(SSLHandshakeServer.java:217)<br />
03/23 12:23:11 Cell(SRM-dc001@srm-dc001Domain) : at<br />
COM.claymoresystems.ptls.SSLHandshake.processHandshake(SSLHandshake.java:135)<br />
03/23 12:23:11 Cell(SRM-dc001@srm-dc001Domain) : at<br />
org.globus.gsi.gssapi.GlobusGSSContextImpl.acceptSecContext(GlobusGSSContextImpl.java:276)<br />
03/23 12:23:11 Cell(SRM-dc001@srm-dc001Domain) : at<br />
org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:119)<br />
03/23 12:23:11 Cell(SRM-dc001@srm-dc001Domain) : at<br />
org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:137)<br />
03/23 12:23:11 Cell(SRM-dc001@srm-dc001Domain) : at<br />
org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:161)<br />
03/23 12:23:11 Cell(SRM-dc001@srm-dc001Domain) : at<br />
org.dcache.srm.security.SslGsiSocketFactory$GsiClientSocket.getInputStream(SslGsiSocketFactory.java:808)<br />
03/23 12:23:11 Cell(SRM-dc001@srm-dc001Domain) : at<br />
org.dcache.srm.security.SslGsiSocketFactory$SocketInputStreamWrapper.retrieveInputIfNeeded(SslGsiSocketFactory.java:503)</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/DCache_Pnfsd_LogDCache Pnfsd Log2006-12-11T14:54:53Z<p>Andrew elwell: </p>
<hr />
<div> 11/29/05 17:55:34 127.0.0.1-0-0(0,1,2,3,4,6,10,) - setattr 000400000000000000001968-0000000500000002 uid=-1;gid=-1;size=0;mode=37777777777;a=ffffffff;m=ffffffff (0) -> 0<br />
11/29/05 17:55:34 127.0.0.1-0-0(0,1,2,3,4,6,10,) - write 000400000000000000001968-0000000500000002 0 18 (0) -> 0<br />
11/29/05 17:55:34 127.0.0.1-0-0(0,1,2,3,4,6,10,) - create dir 000000000000000000001040-0000000000000000 name .(pset)(000400000000000000001968)(attr)(0)(100644:18118:2688:438c9616:438c9616:438c9616) uid=0;gid=-1;size=-1;mode=100644;a=ffffffff;m=ffffffff;id=000400000000000000001968;;level=0;;line=100644:18118:2688:438c9616:438c9616:438c9616; : 000400000000000000001969-0000001B00000000 (0) -> 0<br />
11/29/05 17:55:34 127.0.0.1-0-0(0,1,2,3,4,6,10,) - create dir 000000000000000000001040-0000000000000000 name .(pset)(000400000000000000001968)(attr)(1)(100644:18118:2688:438c9616:438c9616:438c9616) uid=0;gid=-1;size=-1;mode=100644;a=ffffffff;m=ffffffff;id=000400000000000000001968;;level=1;;line=100644:18118:2688:438c9616:438c9616:438c9616; : 000400000000000000001969-0000001B00000001 (0) -> 0<br />
11/29/05 17:55:34 127.0.0.1-0-0(0,1,2,3,4,6,10,) - create dir 000000000000000000001040-0000000000000000 name .(pset)(000400000000000000001968)(size)(17371) uid=0;gid=-1;size=-1;mode=100644;a=ffffffff;m=ffffffff;id=000400000000000000001968 : 000400000000000000001969-0000001B00000000 (0) -> 0<br />
11/29/05 17:55:34 127.0.0.1-0-0(0,1,2,3,4,6,10,) - setattr 000400000000000000001968-0000000500000002 uid=-1;gid=-1;size=0;mode=37777777777;a=ffffffff;m=ffffffff (0) -> 0<br />
11/29/05 17:55:34 127.0.0.1-0-0(0,1,2,3,4,6,10,) - write 000400000000000000001968-0000000500000002 0 1E (0) -> 0<br />
11/29/05 17:55:38 127.0.0.1-0-0(0,1,2,3,4,6,10,) - setattr 000400000000000000001950-0000000500000002 uid=-1;gid=-1;size=0;mode=37777777777;a=ffffffff;m=ffffffff (0) -> 0<br />
11/29/05 17:55:38 127.0.0.1-0-0(0,1,2,3,4,6,10,) - write 000400000000000000001950-0000000500000002 0 25 (0) -> 0<br />
11/29/05 17:55:38 127.0.0.1-0-0(0,1,2,3,4,6,10,) - remove dir 000400000000000000001060-0000000000000000 name srm1-175435.txt : 000400000000000000001950 ;mdmRmFile=0; (0) -> 0<br />
<br />
<br />
[[category:dCache]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/DCache_Pnfs_LogDCache Pnfs Log2006-12-11T14:53:59Z<p>Andrew elwell: </p>
<hr />
<div>[[category:dCache]]<br />
=== srmPut ===<br />
<br />
11/29 17:30:13 Cell(PnfsManager@pnfsDomain) : \<br />
Failed : CacheException(rc=666;msg=can't get pnfsId (not a pnfsfile))<br />
11/29 17:30:13 Cell(PnfsManager@pnfsDomain) : \<br />
CacheException(rc=666;msg=can't get pnfsId (not a pnfsfile))<br />
11/29 17:30:13 Cell(PnfsManager@pnfsDomain) : \<br />
at diskCacheV111.namespace.PnfsManagerV3.getStorageInfo(PnfsManagerV3.java:678)<br />
11/29 17:30:13 Cell(PnfsManager@pnfsDomain) : \<br />
at diskCacheV111.namespace.PnfsManagerV3.processPnfsMessage(PnfsManagerV3.java:993)<br />
11/29 17:30:13 Cell(PnfsManager@pnfsDomain) : \<br />
at diskCacheV111.namespace.PnfsManagerV3$ProcessThread.run(PnfsManagerV3.java:912)<br />
11/29 17:30:13 Cell(PnfsManager@pnfsDomain) : \<br />
at java.lang.Thread.run(Thread.java:534)<br />
<br />
=== srmGet ===<br />
<br />
No entry.<br />
<br />
=== srmCopy into dCache ===<br />
<br />
11/29 17:32:53 Cell(PnfsManager@pnfsDomain) : \<br />
Failed : CacheException(rc=666;msg=can't get pnfsId (not a pnfsfile))<br />
11/29 17:32:53 Cell(PnfsManager@pnfsDomain) : \ <br />
CacheException(rc=666;msg=can't get pnfsId (not a pnfsfile))<br />
11/29 17:32:53 Cell(PnfsManager@pnfsDomain) : \<br />
at diskCacheV111.namespace.PnfsManagerV3.getStorageInfo(PnfsManagerV3.java:678)<br />
11/29 17:32:53 Cell(PnfsManager@pnfsDomain) : \<br />
at diskCacheV111.namespace.PnfsManagerV3.processPnfsMessage(PnfsManagerV3.java:993)<br />
11/29 17:32:53 Cell(PnfsManager@pnfsDomain) : \<br />
at diskCacheV111.namespace.PnfsManagerV3$ProcessThread.run(PnfsManagerV3.java:912)<br />
11/29 17:32:53 Cell(PnfsManager@pnfsDomain) : \<br />
at java.lang.Thread.run(Thread.java:534)<br />
<br />
=== srmCopy out of dCache ===<br />
<br />
No entry.<br />
<br />
=== srm-advisory-delete ===<br />
<br />
No entry initially. After a few minutes:<br />
<br />
11/29 17:35:45 Cell(PnfsManager@pnfsDomain) : \<br />
Exception in getCacheLocations java.lang.NullPointerException<br />
11/29 17:35:45 Cell(PnfsManager@pnfsDomain) : \<br />
java.lang.NullPointerException<br />
11/29 17:35:45 Cell(PnfsManager@pnfsDomain) : \<br />
at diskCacheV111.vehicles.CacheInfo.<init>(CacheInfo.java:151)<br />
11/29 17:35:45 Cell(PnfsManager@pnfsDomain) : \<br />
at diskCacheV111.namespace.provider.BasicNameSpaceProvider.getCacheLocation(BasicNameSpaceProvider.java:242)<br />
11/29 17:35:45 Cell(PnfsManager@pnfsDomain) : \<br />
at diskCacheV111.namespace.PnfsManagerV3.getCacheLocations(PnfsManagerV3.java:537)<br />
11/29 17:35:45 Cell(PnfsManager@pnfsDomain) : \<br />
at diskCacheV111.namespace.PnfsManagerV3.processPnfsMessage(PnfsManagerV3.java:977)<br />
11/29 17:35:45 Cell(PnfsManager@pnfsDomain) : \<br />
at diskCacheV111.namespace.PnfsManagerV3$ProcessThread.run(PnfsManagerV3.java:912)<br />
11/29 17:35:45 Cell(PnfsManager@pnfsDomain) : \<br />
at java.lang.Thread.run(Thread.java:534)<br />
11/29 17:35:45 Cell(cleaner@pnfsDomain) : \<br />
Got error from PnfsManager for 000400000000000000001898 [4] Pnfs lookup failed<br />
<br />
where 000400000000000000001898 is the pnfs ID of the file that was deleted. This can be seen from:<br />
<br />
# grep 000400000000000000001898 ../billing/2005/11/billing-2005.11.29<br />
1.29 17:32:54 [pool:wn4_1@wn4Domain:transfer] [000400000000000000001898,17371] \<br />
dteam:dteam@osm 0 4948 true {RemoteGsiftpTransfer-1.1,dcache.epcc.ed.ac.uk:0} {0:""}<br />
11.29 17:34:21 [pool:wn4_1:transfer] [000400000000000000001898,0] <unknown> true {null} {0:""}<br />
11.29 17:34:21 [pool:wn4_1@wn4Domain:transfer] [000400000000000000001898,17371] \<br />
dteam:dteam@osm 17371 6 false {GFtp-1.0 129.215.175.24 50000} {0:""}<br />
11.29 17:34:21 [door:GFTP-wn4-Unknown-138@gridftp-wn4Domain:request] \<br />
["/C=UK/O=eScience/OU=Edinburgh/L=NeSC/CN=greig cowan":18118:2688:dcache.epcc.ed.ac.uk] \<br />
[000400000000000000001898,0] <unknown> 1133285661607 0 {0:""}<br />
11.29 17:35:45 Pool=broadcast;RemoveFiles=,000400000000000000001898</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/DCache_HSMDCache HSM2006-12-11T14:53:23Z<p>Andrew elwell: </p>
<hr />
<div>Using dCache with an HSM backend can be done using a set of customised scripts or binaries that link the dCache functionality with the specifics of the HSM at each site.<br />
<br />
* See the [http://www.dcache.org/manuals/Book/cf-hsm.shtml dCache book] for information on configuring such a system.<br />
<br />
The Edinburgh Tier-2 plans to set up such an HSM interface in order to utilise storage from the SAN over NFS. This is currently configured to be used a dCache read pools.<br />
<br />
<br />
[[category:dCache]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/DCache_GFTP_LogDCache GFTP Log2006-12-11T14:52:59Z<p>Andrew elwell: </p>
<hr />
<div>== srmPut ==<br />
<br />
11/29 17:23:55 Cell(GFTP-wn4-Unknown-132@gridftp-wn4Domain) : \<br />
SocketAdapter: SocketRedirector(Thread-81):Starting a SocketRedirector<br />
<br />
== srmGet ==<br />
<br />
11/29 17:24:59 Cell(GFTP-wn4-Unknown-133@gridftp-wn4Domain) : \<br />
SocketAdapter: SocketRedirector(Thread-84):Starting a SocketRedirector<br />
<br />
== srmCopy from the dCache ==<br />
<br />
No entry.<br />
<br />
== srmCopy into the dCache ==<br />
<br />
No entry.<br />
<br />
== srm-advisory-delete ==<br />
<br />
No entry.<br />
<br />
== OSM error ==<br />
<br />
gridftp-dc002Domain.log:06/22 14:08:37<br />
Cell(GFTP-dc002-Unknown-45942@gridftp-dc002Domain) : PnfsHandler :<br />
CacheException (35) : Pnfs error : OSM info not found in<br />
/pnfs/fs/.(access)(000000000000000000001080)(type=--I--d-----)<br />
<br />
What does this mean?<br />
<br />
== Pool Manager ==<br />
<br />
06/14 16:05:40 Cell(GFTP-dc003-Unknown-152@gridftp-dc003Domain) : Pool<br />
manager timeout<br />
[[category:dCache]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/DCache_Domain_LogDCache Domain Log2006-12-11T14:52:32Z<p>Andrew elwell: </p>
<hr />
<div>=== local -> dCache ===<br />
<br />
11/29 17:54:46 Cell(wn4_1@wn4Domain) : getChecksumFromPnfs : \<br />
No crc available for 000400000000000000001950<br />
<br />
=== srmCopy into the dCache ===<br />
<br />
11/29 17:55:34 Cell(wn4_2@wn4Domain) : getChecksumFromPnfs : \<br />
No crc available for 000400000000000000001968<br />
<br />
<br />
No entires for other SRM transfers.<br />
<br />
<br />
[[category:dCache]]</div>Andrew elwellhttps://www.gridpp.ac.uk/wiki/DCache_DB_LogDCache DB Log2006-12-11T14:52:09Z<p>Andrew elwell: </p>
<hr />
<div>=== srm-advisory-delete ===<br />
<br />
11/29/05 17:55:38 0.0.0.0-0-0 dteam - remove \<br />
000400000000000000001060 srm1-175435.txt 000400000000000000001950 -> 0(0)<br />
<br />
No other entries for other SRM transfers.<br />
<br />
<br />
[[category:dCache]]</div>Andrew elwell