UCL-CENTRAL
Contents
Upgrade Log
Glite 3.0
Using the tarball install for the WN here
As indicated here.
I'm doing the TAR_WN install as a regular user onto an NFS mounted filesystem
problems
- A trailing '/' on INSTALL_ROOT causes problems in config_glite when trying to compare variables
- The Glite parts don't seem to be as happy with the install on one box then export over NFS model. It keeps trying to create files and directories outside the INSTALL_ROOT. so far
- /etc/profile.d/glite_setenv.sh
- /opt//glite/etc/rgma/rgma.conf
- /tmp/glite
- /var/glite
- /var/log/glite
- 1&2 I fixed up
- Ignored, anything that depends on a driectory in /tmp existing has other problems.
- Were detected by the installer and alternates created under the INSTALL_ROOT. Not sure that having directories intended for writing shared over NFS is that good an idea.
- Further the glite parameters mentioned in section 6.4.5 of this don't seem to be being set by YAIM which causes config_glite to abort. It seems to be fairly easy to just replace the "changeme" text in the template but I though YAIM was supposed to handle this.
- YAIM failed on the LFC because root access was only allowed from localhost not the public host name. Changing the LFC_DB_HOST to localhost fixed this.
GGUS tickets submitted
- 9155: problems with the glite 3.0 tar distribution
Transfer tests
MRTG graphs from the switch ports
Network traffic to/from UCL-CENTRAL's SE during GLA->UCL transfer test on 26/04/2006:
Our network group swear this is the network traffic to/from UCL-CENTRAL's filestore during GLA->UCL transfer test on 26/04/2006:
Network traffic to/from UCL-CENTRAL's SE during UCL->GLA transfer test on 27/04/2006:
Network traffic to/from UCL-CENTRAL's filestore during UCL->GLA transfer test on 27/04/2006:
File:GridPP-admin-3-27Apr06.JPG
We've since moved the pool from being NFS mounted to being attached directly to the DPM SE. In addition it now uses
XFS. Hopefully new tests should be quicker.
Ganglia monitoring of the filestore
GLA->UCL 26/04/2006:
File:Ganglia-admin-3-load-overview-26042006.png
UCL->GLA 27/04/2006:
File:Ganglia-admin-3-load-overview-27Apr06.png
Transfer test 06/06/2006
- Preparation: Had problem with the imperial ui. srmcp on any srm was sitting the following error:
srmcp file://///`pwd`/test100M srm://gfe02.hep.ph.ic.ac.uk:8443/pnfs/hep.ph.ic.ac.uk/data/dteam/ovda/t1 SRMClientV1 : put: try # 0 failed with error SRMClientV1 : no service found with address /managerv1.wsdl srm copy of at least one file failed or not completed
- Where using the srm client dcache-client-1.6.7-3.i386.rpm. After downgrading to dcache-client-1.6.6-5.i386.rpm things started to work. (srm-get-metadata need to work for the filetransfer.py scripts)
- First test using: 2 files, 1 stream, 1 file at a time:
- filetransfer.py --ftp-options="-p 1" --number=2 --delete srm://dcache.gridpp.rl.ac.uk:8443/pnfs/gridpp.rl.ac.uk/data/dteam/tfr2tier2/canned1G srm://gw-3.ccc.ucl.ac.uk:8443/dpm/ccc.ucl.ac.uk/home/dteam/ovda/060605/t1
- result: 2/2 transferred in 234.832408905 seconds 2000000000.0 bytes transferred. Bandwidth: 68.1336961734Mb/s
- changed to transfer 2 files in parallel: glite-transfer-channel-set -f 2 RALLCG2-UKILT2UCLCENTRAL
- result:
- 2/2 transferred in 132.158806086 seconds,
- 2000000000.0 bytes transferred.,
- Bandwidth: 121.066469Mb/s
- try with 4 files: glite-transfer-channel-set -f 4 RALLCG2-UKILT2UCLCENTRAL
- result:
- 4/4 transferred in 161.839152098 seconds
- 4000000000.0 bytes transferred.
- Bandwidth: 197.727185204Mb/s
- try with 8 files: glite-transfer-channel-set -f 8 RALLCG2-UKILT2UCLCENTRAL
- result:
- 8/8 transferred in 239.701545954 seconds
- 8000000000.0 bytes transferred.
- Bandwidth: 266.99869517Mb/s
- try with 8 files but two streams per file: --ftp-options="-p 2"
- result:
- 8/8 transferred in 268.182547092 seconds
- 8000000000.0 bytes transferred.
- Bandwidth: 238.643419171Mb/s
- try with 12 files 1 streams
- result:
- 7 files transferred but 5 in waiting state with message: Transfer failed. ERROR the server sent an error response: 425 425 Can't open data connection. timed out() failed.
- I have set the channel back to 8 file
Useful Links
UCL-CENTRAL's Wiki Page (this page!)
Monitoring and Admin
Site Functional Test Reports - UCL-CENTRAL History
GGUS - Global Grid User Support
GridPP
GridPP - The UK Grid for Particle Physics
Grid Acronym Soup (with links)
GridPP Front-end nodes warranty details
LCG Technical Reference
LCG - LHC Computing Grid Project
LCG Deployment - Release Information
Move to Authenticated Connectors - GOC Wiki
LCG Other Reference
UCL-HEP e-Science and Grid Links
LCG ServiceChallenges LCG TWiki
LCG Deployment - EIS docs and presentations
EGEE Reference
EGEE Sheets Information Sheets
Related software
CERN Scientific Linux
Scientific Linux CERN 3 (SLC3) pages
VOs
e-Science
Grid Operations Support Centre (GOSC)
UK e-Science Certification Authority
UCL
Information Systems Department
Other
Sun GridEngine Project Home Page
GRIDS Center -- Part of the NSF Middleware Initiative
OMII Open Middleware Infrastructure Institute
High-performance Linux clustering, Part 1 Clustering fundamentals
Grid computing Conceptual flyover for developers
GridLock - Grids, Webs, Security and Stuff
Wiki-Spotter