Sheffield Transfer Test Log

From GridPP Wiki
Revision as of 10:05, 14 April 2006 by Greig cowan (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

12/04/06

Carried out 1TB transfer from Glasgow into the dCache using FTS. Ran with 7 concurrent files and a single parallel stream. Result:

Transfer Bandwidth Report:
  542/1000 transferred in 43066.8521481 seconds
  542000000000.0 bytes transferred.
Bandwidth: 100.680680935Mb/s

According to the FTS logs, things were going OK (i.e. only 9 failures) until about 0420 in the morning. After that there were quite a few failures each hour, mostly due to SRM timeouts. These were probably caused by high load on the dCache node. Rate that the Glasgow disk server was pumping out data was fairly constant:

File:GLA-SHEF-1TB.gif

Want to repeat test but this time using a more realistic disk setup in the dCache where more that a single spindle is available for writing.

13/04/06

Repeated test but this time dteam files were writing into the atlas pool, corresponding to the disk array and not just a single disk as was the case yesterday. Saw performance improvement:

Transfer Bandwidth Report:
  252/1000 transferred in 13974.5861292 seconds
  252000000000.0 bytes transferred.
Bandwidth: 144.26187519Mb/s

With only 11 files failed due mostly to SRM timeouts (this was when the number of concurrent files was being changed during the transfer).

File:GLA-SHEF-0.25TB.gif

Now want to run outbound test to Glasgow. If rate still low, it may indicate an issue with the local Sheffield network.


File:SHEF-GLA-gla-load-1TB.gif

Test sucessful and saw a very good rate, so the the network cannot be limiting the GLA-SHEF transfers. Must be to do with the dCache setup. Would be good if SHEF could get another machine that could be used as another gridftp door.

Transfer Bandwidth Report:
  1000/1000 transferred in 19328.976089 seconds
  1e+12 bytes transferred.
Bandwidth: 413.886382971Mb/s

Interesting to note that the GLA disk servers that were the destinations of the data transfers were only running at about 20MB/s during the whole transfer. This is the same rate at which they pumped data out when the transfer was in the opposite direction. Is this a limit we have found for the GLA disk servers? Maybe we could try pushing them further next time.

14/04/06

srmcp testing

$ time for i in 0 1 2 3 4 5 6 7 8 9; do srmcp -debug=false srm://srm.epcc.ed.ac.uk:8443/pnfs/epcc.ed.ac.uk/data/dteam/canned/canned1G  srm://lcgse1.shef.ac.uk:8443/pnfs/shef.ac.uk/data/dteam/srmcp_test/file$i; done 
user credentials are: /C=UK/O=eScience/OU=Edinburgh/L=NeSC/CN=greig cowan
SRMClientV1 : connecting to srm at httpg://lcgse1.shef.ac.uk:8443/srm/managerv1
user credentials are: /C=UK/O=eScience/OU=Edinburgh/L=NeSC/CN=greig cowan
SRMClientV1 : connecting to srm at httpg://lcgse1.shef.ac.uk:8443/srm/managerv1
...
real    8m6.649s
user    0m34.120s
sys     0m0.880s

and then run same command, but in batch mode:

$ time srmcp -debug=false -copyjobfile=srm.batch
user credentials are: /C=UK/O=eScience/OU=Edinburgh/L=NeSC/CN=greig cowan
SRMClientV1 : connecting to srm at httpg://lcgse1.shef.ac.uk:8443/srm/managerv1
real    7m35.372s
user    0m8.820s
sys     0m0.130s

So it was certainly faster in batch mode, presumably due to reduced SRM negotiation. Also, this transfer corresponded to a rate of 175Mb/s, which is faster than that observed when using FTS. It would be good to carry out a transfer over a sustained length of time to study this more.