RHUL site log
This is the site log for the RHUL site.
SC4 Transfer Tests
This morning we did some 2GB transfers from Glasgow to RHUL. We used the Imperial College UI (thanks Olivier & colleagues). The RHUL network manager monitored the bandwidth usage on the site boundary, and we monitored it on a switch on the HEP network.
We did an initial test of 2 x 1MB to check that the transfer worked OK. Then tried 2 x 1GB repeated with 2, 4 and 6 parallel streams (-p option) to see the effect on bandwidth using the following command:
filetransfer.py --ftp-options="-p 2" --number=2 --delete -s https://fts0344.gridpp.rl.ac.uk:8443/sc3ral/glite-data-transfer-fts/services/FileTransfer srm://se2-gla.scotgrid.ac.uk:8443/dpm/scotgrid.ac.uk/home/dteam/tfr2tier2/canned1G srm://se1.pp.rhul.ac.uk:8443/dpm/pp.rhul.ac.uk/home/dteam/20060119-dtrtest
2 parallel streams, 2x 1GB: transfer rate 35Mb/s, saw ~3-4% of 1Gb/s on local network switch. Main university network at 88 Mb/s, 10% CPU on DPM pool node (gridraid4).
4 parallel streams, 2x 1GB: transfer rate 35Mb/s, saw ~4-5% of 1Gb/s on local network switch. Main university network again at 88 Mb/s, 13% CPU on DPM pool node. No negative effect on general use of network.
6 parallel streams, 2x 1GB: transfer rate 34Mb/s, saw ~4% of 1Gb/s on local network switch. Main university network slightly slower at start then hit same maximum, 13% CPU on DPM pool node.
The baseline network load from the whole of RHUL at the time was about 50 Mb/s. Our uplink to the London MAN is 100Mb/s so the maximum of 88Mb/s is a little short of what should be ultimately possible, but close. We also noted that the test did not seem to have any detrimental effect on other network use on campus or internet. The ganglia-fts graph showed a transfer rate of 16 MBytes/s:
In summary the bandwidth of our transfers, measured by filetransfer.py and corroborated by monitoring our switch and network, was ~35 Mbit/s.
Over night further tests were run to test the performance when background bandwidth is lower and to test for a longer period.
First test 2x1GB with 4 parallel streams: Bandwidth: 38.8 Mb/s
Second test: 180x1GB with 4 par streams: (should take about 11 hours)
During long 180 GB test, network was monitored on HEP network switch and Ganglia-FTS. The HEP switch showed a load of about 7-8 MByte/s. Ganglia-FTS showed similar results. HEP switch bandwidth ganglia jobs ganglia bandwidth
Transfers ceased around 23:30 due to RHUL site NIS problem, unrelated to grid tests but caused transfers to fail. Did not seem able to recover when problem was fixed. After another half an hour or so, tried restarting dpm-gsiftp on gridraid4 - no change.
Transfer Bandwidth Report: 53/180 transferred in 8681.80825901 seconds 53000000000.0 bytes transferred. Bandwidth: 48.8377521538Mb/s
This is ~5.8 MByte/s (where 1 MB = 2^20 Bytes), i.e. lower than the load observed on the switch monitoring during the transfer. This is probably because the average includes the tail when transfers gradually ended.
Repeat long test on Friday night/Saturday morning.
filetransfer.py --ftp-options="-p 2" --number=200 --delete -s https://fts0344.gridpp.rl.ac.uk:8443/sc3ral/glite-data-transfer-fts/services/FileTransfer srm://se2-gla.scotgrid.ac.uk:8443/dpm/scotgrid.ac.uk/home/dteam/tfr2tier2/canned1G srm://se1.pp.rhul.ac.uk:8443/dpm/pp.rhul.ac.uk/home/atlas/ftstest-2006-01-20
After starting up there was a steady stream of FTS errors like this:
FTS status query for 2336dd5a-89f2-11da-a18f-e44be7748cb0 failed: FTS Error: status: getFileStatus: requestID <2336dd5a-89f2-11da-a18f-e44be7748cb0> was not found
The evidence on Ganglia-fts was that there were 5 active transfers, and the switch port monitoring shows 7739.9 kB/s (6.2%), so I left it going. I also remembered that Olivier said something like this had happened with the test from QMUL and he left it going and it started to work again after a while.
Transfer looks to have completed ok by filetransfer.py did not finish for some reason so no bandwidth measurement yet. Graphs: File:Rhul-sc4-gangliafts-all-20060120.gif File:Rhul-sc4-gangliafts-active-20060120.gif File:Rhul-sc4-gangliafts-bw-20060120.gif File:Rhul-sc4-mtrg-20060120.png
Although the transfer finished, the filetransfer.py script never completed so the bandwidth measurement from that could be not be obtained.
From careful scrutiny of the ganglia graphs, it looks like the xfer began around 20:20 and ended at 03:50, i.e. 7hrs, 30 mins, or 27000 secs. That includes a small tail.
For 200 * 1 GB that gives an average transfer bandwidth of 59 +/-1 Mbit/s (using Graeme's preferred Si prefix definitions i.e. G = 10^3 M)
Transfer test of 250 GB from GLA->RHUL started at 1936; 128GB transferred at rate of 35Mbit/s, before grid-proxy expired and so filetransfer script unable to poll FTS gave up. The transfer continued transferring a total of 152 GB until manually cancelled. A smaller test of 10GB starting at 0616 the next morining was transferred at a rate of 54 Mbit/s.
Outbound test transferring 250GB from RHUL->GLA, started at 2018, completed at 0552, transfer rate 58Mbit/s, nothing unusual noticed.