RAL Tier1 CASTOR SRM tests T1toT2
There are 100GB of files (100 * 1GB) stored on Castor which are to be used for the tests.
Their SURLs are as follows.
... and so on until ...
A full list is available here.
Ensure version 0.5.2-1 of the script is installed on the machine that the FTS transfers shall be submitted from. Previous versions cannot take a file containing source SURLs as an argument and the 0.5.0 release has a nasty bug which cancels transfers after an hour, so use 0.5.2-1.
The options/arguments that should be used are...
--duration => time in minutes that the test should run for. --delete => the files are to be deleted from the destination after each transfer has taken place. --uniform-source-size => the metadata (size) of just one file is checked and not all of the source SURLs. Checking 100 would take a long time and is unnecessary if all source files are the same size. castorSURLs.txt is the file containing the list of SURLs. srm://se2-gla.scotgrid.ac.uk:8443/dpm/scotgrid.ac.uk/home/dteam/castorTest is the destination endpoint. It is formulated from an srm hostname and directory which exists/shall be created in the srm namespace.
A typical example would be...
filetransfer.py --duration=120 --delete --uniform-source-size srm://ralsrma.rl.ac.uk:8443//castor/ads.rl.ac.uk/prod/grid/hep/disk1tape1/dteam/j/jkf/castorTest/1GBcanned[000:099] srm://se2-gla.scotgrid.ac.uk:8443/dpm/scotgrid.ac.uk/home/dteam/castorTest
- RAL Castor disk servers
- Glasgow: here
- Edinburgh: here and choose pool1 from drop down menu
- Imperial College: [ here]
- Bristol: [ here]
So far the best rates, according to ganglia, to each site have been as follows,
RALPPD => 400Mbps
Lancaster => 100Mbps
Edinburgh => 200Mbps
Glasgow => 30 - 50Mbps but only with No. of concurrent files = 1
When No. of concurrent files > 1, the rate is barely detectable - in the order of few Kbps.
Birmingham => 400Mbps peak but this rate could not be sustained
The Castor->RALPPD test was submitted using the command below (using version 0.5.0-1 of the script) and ran for an hour before being cancelled by the script.
filetransfer.py --number=500 --delete -u -g"-p 1" srm://castor-srmv1.ads.rl.ac.uk:8443//castor/ads.rl.ac.uk/test/grid/hep/disk/dteam/j/jkf/1GBcanned[001:025] srm://heplnx204.pp.rl.ac.uk:8443/pnfs/pp.rl.ac.uk/data/dteam/TestFiles
The Castor->Lancaster test was submitted using the command below (using version 0.5.0-1 of the script) and was allaowed to run for approximately an hour before being manually cancelled.
filetransfer.py --number=500 --delete -u -g"-p 1" srm://castor-srmv1.ads.rl.ac.uk:8443//castor/ads.rl.ac.uk/test/grid/hep/disk/dteam/j/jkf/1GBcanned[005:025] srm://fal-pygrid-20.lancs.ac.uk:8443/pnfs/lancs.ac.uk/data/dteam/TestFiles
The figures below come from the filetransfer.py script (version 0.5.0-1)
channel-set -f 5, and using different source files
rate - barely detectible
channel-set -f 5, and using same source file
rate - barely detectible
channel-set -f 1, and using different source files
rate - 32Mbps
channel-set -f 1, and using same source file
rate - 37Mbps
Transfer Bandwidth Report Summary ================================= transfer 0 (a45345d3-5eaf-11db-b949-e6cd043d6f48) 61/100 (61000000000.0) transferred. Started at 14:51:45, Canceled at 15:51:22, Duration = 0:59:37, Bandwidth = 136.404571674Mb/s transfer 1 (69d7df85-5eb8-11db-b949-e6cd043d6f48) 64/100 (64000000000.0) transferred. Started at 15:54:33, Canceled at 16:54:35, Duration = 1:0:2, Bandwidth = 142.119974639Mb/s transfer 2 (302e4871-5ec1-11db-9cda-d8e9ca8bb8b4) 66/100 (66000000000.0) transferred. Started at 16:57:22, Canceled at 17:57:19, Duration = 0:59:56, Bandwidth = 146.795180623Mb/s transfer 3 (f5cd6e0d-5ec9-11db-9cda-d8e9ca8bb8b4) 72/100 (72000000000.0) transferred. Started at 18:0:9, Canceled at 19:0:31, Duration = 1:0:21, Bandwidth = 159.029164846Mb/s transfer 4 (943843fb-5ed2-11db-983f-bee6fd519f4e) 73/100 (73000000000.0) transferred. Started at 19:2:21, Canceled at 20:1:40, Duration = 0:59:18, Bandwidth = 164.104040354Mb/s transfer 5 (f178aa90-5eda-11db-983f-bee6fd519f4e) 82/100 (82000000000.0) transferred. Started at 20:3:28, Canceled at 21:3:51, Duration = 1:0:23, Bandwidth = 181.059464779Mb/stransfer 6 (75cd7302-5ee2-11db-9aeb-884ba7711b8a) 81/100 (81000000000.0) transferred. Started at 21:5:32, Canceled at 22:5:34, Duration = 1:0:1, Bandwidth = 179.933476843Mb/s transfer 7 (696035f5-5eeb-11db-9aeb-884ba7711b8a) 83/100 (83000000000.0) transferred. Started at 22:7:14, Canceled at 23:7:33, Duration = 1:0:18, Bandwidth = 183.495147493Mb/stransfer 8 (97dc90cf-5ef3-11db-bb97-9f39275fa11b) 88/100 (88000000000.0) transferred. Started at 23:9:30, Canceled at 0:8:58, Duration = 0:59:27, Bandwidth = 197.312074118Mb/stransfer 9 (2656be09-5efc-11db-bb97-9f39275fa11b) 90/100 (90000000000.0) transferred. Started at 0:10:52, Canceled at 1:11:24, Duration = 1:0:32, Bandwidth = 198.221413217Mb/stransfer 10 (e1e5e7a7-5f04-11db-8fa8-bed50c69e441) 80/100 (80000000000.0) transferred. Started at 1:13:18, Canceled at 2:3:42, Duration = 0:50:24, Bandwidth = 211.628663176Mb/stransfer 11 (03714569-5f0d-11db-8fa8-bed50c69e441) 80/100 (80000000000.0) transferred. Started at 2:1:22, Canceled at 2:50:27, Duration = 0:49:4, Bandwidth = 217.319135586Mb/s transfer 12 (b96c6ed9-5f13-11db-be92-a9c20fdf7840) 97/100 (97000000000.0) transferred. Started at 2:48:10, Canceled at 3:48:12, Duration = 1:0:1, Bandwidth = 215.453996401Mb/s transfer 13 (3e0ae03f-5f1a-11db-be92-a9c20fdf7840) 96/100 (96000000000.0) transferred. Started at 3:47:1, Canceled at 4:46:54, Duration = 0:59:53, Bandwidth = 213.734181929Mb/stransfer 14 (71cbf3dc-5f22-11db-bcbd-b6d19d9d7877) 92/100 (92000000000.0) transferred. Started at 4:45:48, Canceled at 5:46:24, Duration = 1:0:36, Bandwidth = 202.37585275Mb/s transfer 15 (f0ffc265-5f2a-11db-bcbd-b6d19d9d7877) 87/100 (87000000000.0) transferred. Started at 5:46:27, Canceled at 6:46:57, Duration = 1:0:30, Bandwidth = 191.722507401Mb/stransfer 16 (b9b37c5e-5f33-11db-873e-aafc1f6dd63c) 78/100 (78000000000.0) transferred. Started at 6:48:34, Canceled at 7:37:48, Duration = 0:49:13, Bandwidth = 211.271038287Mb/s transfer 17 (b0369de0-5f3b-11db-873e-aafc1f6dd63c) 89/100 (89000000000.0) transferred. Started at 7:34:14, Canceled at 8:33:47, Duration = 0:59:32, Bandwidth = 199.316927729Mb/s transfer 18 (a3e278c9-5f42-11db-87c2-b87c63e6ef06) 79/100 (79000000000.0) transferred. Started at 8:35:36, Canceled at 9:35:32, Duration = 0:59:55, Bandwidth = 175.763215737Mb/s transfer 19 (f83dd84a-5f4b-11db-87c2-b87c63e6ef06) 61/100 (61000000000.0) transferred. Started at 9:37:18, Canceled at 10:36:42, Duration = 0:59:23, Bandwidth = 136.9423075Mb/stransfer 20 (9038c539-5f55-11db-bfd2-ca92950a4d5c) 24/100 (24000000000.0) transferred. Started at 10:39:28, Canceled at 11:8:22, Duration = 0:28:54, Bandwidth = 110.710029247Mb/s transfer 21 (f7e157dd-5f59-11db-bfd2-ca92950a4d5c) 40/100 (40000000000.0) transferred. Started at 11:11:0, Active at 11:51:25, Duration = 0:40:25, Bandwidth = 131.912676408Mb/s Date of Submission was 19/10/2006 Total number of FTS submissions = 22 1663/2200 transferred in 75580.4656711 seconds 1.663e+12bytes transferred. Average Bandwidth:176.024319007Mb/s
The number of files on the RAL-Ed channel was initially 1, then increased to 5 (approx 4pm) and then 10 (approx 4.15pm)
27-Aug-06 I configured the channel to use 1 file (and 5 streams) to start with until I had transfered 10GB. I got an average rate (from Ganglia) of about 150Mb/s. I then increased the number of concurrent files from 1 to 5, the rate rapidly increased beyond 400Mb/s before falling to under 200MB/s. A few transfers were in the Waiting state:
Destination: srm://epgse1.ph.bham.ac.uk:8443/srm/managerv1?SFN=/dpm/ph.bham.ac.uk/home/dteam/castortest/tfr-file015 State: Waiting Retries: 2 Reason: Transfer failed. ERROR the server sent an error response: 425 425 Can't open data connection. timed out() failed
After 15 mins or so, the transfer was stalled. I left if in this state for just under 20 mins when I gave up hope and decided to give a kick to FTS by setting the number of concurrent transfers to 10. The transfer resumed until getting stalled again. I had then several failed transfers with the same ftp 425 error as above when the 3rd attempt failed. I cancelled the FTS transfer at this point. 36GB have been transfered at an average rate of 86Mb/s.
I had another go with poor rates as well and similar problems, hoever the rate was bit better:
35/50 transferred in 1579.58328986 seconds 35000000000.0 bytes transferred. Bandwidth: 177.261941043Mb/s
filetransfer.py --number=5 --delete -u -g"-p 1" srm://castor-srmv1.ads.rl.ac.uk:8443//castor/ads.rl.ac.uk/test/grid/hep/disk/dteam/j/jkf/1GBcanned[001:005] srm://gfe02.hep.ph.ic.ac.uk:8443/pnfs/hep.ph.ic.ac.uk/data/dteam/castortest/sep04
Reason: Failed SRM copy context. put on httpg://gfe02.hep.ph.ic.ac.uk:8443/srm/managerv1 ; id=-2147344621 Error is RequestFileStatus#-2147344620 failed with error:[ retrieval of "from" TURL failed with error rs.state = Failed rs.error = null]
Test2: Successful with changed SURL Path:
filetransfer.py --number=1 --delete --uniform-source-size -g"-p 1" srm://castor-srmv1.ads.rl.ac.uk:8443//castor/ads.rl.ac.uk/test/grid/hep/disk/dteam/j/jkf/castorTest/1GBcanned001 srm://gfe02.hep.ph.ic.ac.uk:8443/pnfs/hep.ph.ic.ac.uk/data/dteam/castortest/sep04
Transfer Bandwidth Report Summary ================================= transfer 0 (afef7179-3c1b-11db-88b9-d346ee9ad713) 1/1 (1000000000.0) transferred. Started at 14:46:57, Done at 14:59:13, Duration = 0:12:15, Bandwidth = 10.8730536802Mb/s Total number of FTS submissions = 1 1/1 transferred in 735.763864994 seconds 1000000000.0bytes transferred. Average Bandwidth:10.8730536802Mb/s
filetransfer.py --number=5 --delete --uniform-source-size -g"-p 1" srm://castor-srmv1.ads.rl.ac.uk:8443//castor/ads.rl.ac.uk/test/grid/hep/disk/dteam/j/jkf/castorTest/1GBcanned[000:005] srm://gfe02.hep.ph.ic.ac.uk:8443/pnfs/hep.ph.ic.ac.uk/data/dteam/castortest/sep04
Transfer Bandwidth Report Summary ================================= transfer 0 (31597dd7-3c20-11db-88b9-d346ee9ad713) 5/5 (5000000000.0) transferred. Started at 15:25:20, Done at 15:53:14, Duration = 0:27:53, Bandwidth = 23.8968979369Mb/s Total number of FTS submissions = 1 5/5 transferred in 1673.85742307 seconds 5000000000.0bytes transferred. Average Bandwidth:23.8968979369Mb/s