Difference between revisions of "Ed SC4 Dcache Tests"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 14:55, 30 May 2006

This page contains a log of the FTS tests that were carried out as part of Edinburgh's participation in SC4. In conjunction, these tests were used to undertand the local dCache setup.

Service_Challenge_Transfer_Tests

Outline of tests (draft)

SC4 requirement is for a sustained transfer of 1TB of data from the RAL Tier-1. As a warm up for this test, I will transfer smaller amount of data from the Tier-1 and at the same time modify our dCache setup to observe the effect that the following have on the data transfer rate:

  • only 1 NFS mounted pool
  • only NFS mounted pools
  • only 1 RAID volume pool
  • only RAID pools
  • all pools available for use

It is expected that there will be a decrease in the transfer rate when only the NFS mounted pools are available to the dCache, but I would like to get a quantitative results. This data will be used to modify our dCache setup in the future towards a more optimal configuration. In addition to modifying the dCache setup, it is also possible to use FTS to modify the configuration of the RAL-ED channel (in terms of number of concurrent file transfers and parallel streams). It is hoped that there will be sufficient time to study these effects on the transfer rate.

12/12/05

Started trying to initiate FTS tests. FTS was accepting the jobs, but querying the transfer status produced a strange error message. Problem was eventually resolved when a new myproxy -d was issued.

13/12/05

FTS tests were started properly. Initially just using Matt Hodges test script to start some transfers in order to observe the performance before any tuning took place. Also gave chance to study the FTS logs and ganglia monitoring pages. Submitted a batch transfer of files

Size Concurrent Files Parallel Streams
50GB 5 5

Initially transfers were successful, then started seeing error messages in the dCache pool node gridftp door logs:

 12/13 16:14:48 Cell(GFTP-dcache-Unknown-998@gridftpdoor-dcacheDomain) : CellAdapter: cought SocketTimeoutException: Do nothing, just allow looping to continue
 12/13 16:14:48 Cell(GFTP-dcache-Unknown-994@gridftpdoor-dcacheDomain) : CellAdapter: cought SocketTimeoutException: Do nothing, just allow looping to continue
 12/13 16:14:48 Cell(GFTP-dcache-Unknown-999@gridftpdoor-dcacheDomain) : CellAdapter: cought SocketTimeoutException: Do nothing, just allow looping to continue

Not clear what is causing this. Had to cancel the transfer due to this. Even if I now just submit a single file for transfer, I get an error these error messages. Setup another transfer:

Size Concurrent Files Parallel Streams
25*1GB 5 1

Only saw 11Mb/s. FTS log files reported problem with pool dcache_24. Confirmed by dCache monitoring. Not sure why, possibly excessive load. Also having problem with gridftp door on the admin node not starting up. For the moment, I have disabled it and all traffic now going through the pool node.

14/12/05

Setup another transfer, passing the option -g "-p 10" to FTS. Now using Chris Brew's test script.

Size Concurrent Files Parallel Streams
50*10MB=500MB 5 10
Transfer IDs are d4f03598-6c95-11da-a18f-e44be7748cb0
Transfer Started - Wed Dec 14 11:36:20 GMT 2005
Active Jobs: Done 4 files (1 active, 45 pending and 0 delayed) - Wed Dec 14 11:36:50 GMT 2005
Active Jobs: Done 5 files (5 active, 40 pending and 0 delayed) - Wed Dec 14 11:36:57 GMT 2005
...
Active Jobs: Done 49 files (0 active, 0 pending and 1 delayed) - Wed Dec 14 11:40:36 GMT 2005
Transfer Finished - Wed Dec 14 11:40:37 GMT 2005
Transfered 49 files in 257 s (+- 10s)
Approx rate = 1561 Mb/s

Saw rate of 1561Mb/s accoring to this! This number cannot be corret, there must be an error in the test script. Setup another transfer, passing the option -g "-p 10" to FTS.

Size Concurrent Files Parallel Streams
10*1GB=10GB 5 10
Transfer IDs are dc24f7dc-6c98-11da-a18f-e44be7748cb0
Transfer Started - Wed Dec 14 11:58:00 GMT 2005
Active Jobs: Done 1 files (5 active, 4 pending and 0 delayed) - Wed Dec 14 12:24:20 GMT 2005
...
Active Jobs: Done 8 files (0 active, 0 pending and 2 delayed) - Wed Dec 14 12:56:35 GMT 2005
Transfer Finished - Wed Dec 14 12:56:36 GMT 2005
Transfered 8 files in 3516 s (+- 10s)
Approx rate = 18 Mb/s

So now back to a low transfer rate of 18Mb/s. Try transferring smaller files (10*100Mb = 1GB).

Size Concurrent Files Parallel Streams
10*100MB=1GB 5 10
Transfer IDs are dc662920-6ca4-11da-a18f-e44be7748cb0
Transfer Started - Wed Dec 14 13:24:23 GMT 2005
Active Jobs: Done 1 files (5 active, 4 pending and 0 delayed) - Wed Dec 14 13:26:08 GMT 2005
...
Active Jobs: Done 10 files (0 active, 0 pending and 0 delayed) - Wed Dec 14 13:29:35 GMT 2005
Transfer Finished - Wed Dec 14 13:29:36 GMT 2005
Transfered 10 files in 313 s (+- 10s)
Approx rate = 261 Mb/s

So now up to a respectable 261Mb/s. Looks like dCache may be having problems with transferring large files, possibly timing out. Perform another test with 100*100MB = 10GB .

Size Concurrent Files Parallel Streams
100*100MB=10GB 5 10
Transfer IDs are 0364f5a2-6ca6-11da-a18f-e44be7748cb0
Transfer Started - Wed Dec 14 13:32:16 GMT 2005
Active Jobs: Done 2 files (5 active, 93 pending and 0 delayed) - Wed Dec 14 13:34:15 GMT 2005
...
Active Jobs: Done 100 files (0 active, 0 pending and 0 delayed) - Wed Dec 14 14:21:20 GMT 2005
Transfer Finished - Wed Dec 14 14:21:21 GMT 2005
Transfered 100 files in 2945 s (+- 10s)
Approx rate = 278 Mb/s

Decent transfer rate of 278Mb/s. Now observing problems with dCache pools going offline (as reported by web interface). The offline pools are ones that are NFS mounted from the University SAN. 4 of the 10 NFS mounted pools remain online. FTS transfers were hanging when trying to use these pools to transfer files into. Seeing java processes in status D.

# ps aux|grep " D "
root      4353  0.0  2.1 622700 84668 pts/0  D    11:29   0:00 /usr/java/j2sdk1.
root      4393  0.0  2.1 622700 84668 pts/0  D    11:29   0:00 /usr/java/j2sdk1.
root     11186  0.0  1.5 506484 58760 pts/0  D    15:04   0:00 /usr/java/j2sdk1.
root     11864  0.0  1.5 509448 62100 pts/0  D    15:20   0:00 /usr/java/j2sdk1.
root     13029  0.0  1.6 584296 65912 pts/0  D    16:27   0:00 /usr/java/j2sdk1.
root     13194  0.0  0.0  1740  584 pts/0    S    16:32   0:00 grep  D

Problem not resolved by restaring dcache-pool or NFS. Reboot required.


16/12/05

Noticed that if I try to make gridftp use > 10 parallel streams (dCache -> dCache), the transfer does not work and Graeme's python script has repeating output of:

Child:  /opt/glite/bin/glite-transfer-status -l 449b254e-6e2e-11da-a18f-e44be7748cb0
Overall status:  Active
Matching for duration in fts query line number 6 failed.
Found

and the pool node gridftp log reports:

12/16 12:20:05 Cell(GFTP-dcache-Unknown-207@gridftpdoor-dcacheDomain) : CellAdapter: SocketRedirector(Thread-632):Adapter: done, EOD received ? = false
12/16 12:20:05 Cell(GFTP-dcache-Unknown-207@gridftpdoor-dcacheDomain) : CellAdapter: Closing data channel: 1 remaining: 9 eodc says there will be: -1
12/16 12:20:05 Cell(GFTP-dcache-Unknown-207@gridftpdoor-dcacheDomain) : CellAdapter: SocketRedirector(Thread-633):Adapter: done, EOD received ? = false
12/16 12:20:05 Cell(GFTP-dcache-Unknown-207@gridftpdoor-dcacheDomain) : CellAdapter: Closing data channel: 2 remaining: 8 eodc says there will be: -1
...

Now change dCacheSetup file on the pool node to see if this has any effect on performance. First modify so that it only uses 1 parallel stream.

# Set number of parallel streams per GridFTP transfer
parallelStreams=1

(default was 10). Restart dcache-opt and see what sort of transfer rate we get - 8.37Mb/s (100MB file, 1 stream). Now just try submitting same job, but with "-p 2" option - 8.30Mb/s. This is strange. Why did I not get the error messages as above? Try again with "-p 11" - same response as above with the "Matching duration...". glite-transfer-status returns:

State:       Waiting
Retries:     1
Reason:      Transfer failed. ERROR the server sent an error response: 426 426 Transfer aborted, closing connection :Unexpected Exception : java.net.SocketException: Connection reset

Possibly I need to change parallelStreams in the admin node config file. Do this then re-run the tests.

 1 stream  - 9.37Mb/s
 2 streams - 7.52Mb/s
 10 streams- 9.43Mb/s
 11 streams- failed with same error as above.

??

What about the .srmconfig/config.xml file. I had been using:

<buffer_size> 131072 </buffer_size>
<tcp_buffer_size> 0 </tcp_buffer_size>
<streams_num> 10 </streams_num>

Now try again, but with streams_num of 1, but pass the "-p 10" option to gridftp.

  "-p 1",  <streams_num> 1 - 9.40Mb/s  : FTS logs report that 1 stream was used
  "-p 10", <streams_num> 1 - 10.65Mb/s : FTS logs report that 10 streams were used, so this does not appear to have any influence.

What about modifying the buffer sizes? There are also corresponding buffere sizes in dCacheSetup. Change config.xml to this:

 <buffer_size> 2048 </buffer_size>
 "-p 1",  <streams_num> 1 - 10.76Mb/s  : FTS logs report that 1 stream was used
 "-p 10", <streams_num> 1 - 15.01Mb/s  : 10 streams

Faster in both cases. Try educing buffer size further to 1024 and run tests again.

 "-p 10", <streams_num> 1 - 9.42Mb/s : 

Change to 4096:

 "-p 10", <streams_num> 1 - 10.77Mb/s

The ~10Mb/s limit may be an issue due to the NFS mounted disk pools that dCache is using. I will make these pools read only for now to see what effect this has on transfer rate. Setup is now:

nfs-test
linkList :
  nfs-test-link  (pref=10/10/0;ugroups=2;pools=1)
poolList :
  dcache_28  (enabled=true;active=22;links=0;pgroups=1)
  dcache_26  (enabled=true;active=24;links=0;pgroups=1)
  dcache_22  (enabled=true;active=4;links=0;pgroups=1)
  dcache_30  (enabled=true;active=24;links=0;pgroups=1)
  dcache_24  (enabled=true;active=0;links=0;pgroups=1)
  dcache_32  (enabled=true;active=22;links=0;pgroups=1)
  dcache_25  (enabled=true;active=26;links=0;pgroups=1)
  dcache_27  (enabled=true;active=23;links=0;pgroups=1)
  dcache_29  (enabled=true;active=8;links=0;pgroups=1)
  dcache_23  (enabled=true;active=1;links=0;pgroups=1)
  dcache_31  (enabled=true;active=25;links=0;pgroups=1)
ResilientPools
 linkList :
 poolList :
default
 linkList :
  default-link  (pref=10/10/10;ugroups=2;pools=1)
 poolList :
  dcache_1  (enabled=true;active=1;links=0;pgroups=1)
  dcache_7  (enabled=true;active=25;links=0;pgroups=1)
  dcache_14  (enabled=true;active=13;links=0;pgroups=1)
  dcache_13  (enabled=true;active=16;links=0;pgroups=1)
  dcache_20  (enabled=true;active=6;links=0;pgroups=1)
  dcache_6  (enabled=true;active=22;links=0;pgroups=1)
  dcache_16  (enabled=true;active=13;links=0;pgroups=1)
  dcache_8  (enabled=true;active=23;links=0;pgroups=1)
  dcache_11  (enabled=true;active=19;links=0;pgroups=1)
  dcache_4  (enabled=true;active=29;links=0;pgroups=1)
  dcache_18  (enabled=true;active=7;links=0;pgroups=1)
  dcache_21  (enabled=true;active=5;links=0;pgroups=1)
  dcache_3  (enabled=true;active=0;links=0;pgroups=1)
  dcache_17  (enabled=true;active=11;links=0;pgroups=1)
  dcache_19  (enabled=true;active=6;links=0;pgroups=1)
  dcache_2  (enabled=true;active=1;links=0;pgroups=1)
  dcache_12  (enabled=true;active=19;links=0;pgroups=1)
  dcache_9  (enabled=true;active=22;links=0;pgroups=1)
  dcache_10  (enabled=true;active=19;links=0;pgroups=1)
  dcache_5  (enabled=true;active=25;links=0;pgroups=1)
  dcache_15  (enabled=true;active=12;links=0;pgroups=1)

Notice the writepref value (0) for the nfs-test-link. This makes the NFS mounted pools read only.

Ed to RAL

Just as a test, I tried transferring 100MB file from Ed dCache to RAL dCache.

Size Concurrent Files Parallel Streams (-p) Effective Rate (Mb/s)
100MB 1 50 24.75

So there are definitely issues with writing to our dCache. This would imply that it is NFS causing the problem. If I perform another test, but this time take a file that is definitely on a non-NFS mounted pool, then I get:

Size Concurrent Files Parallel Streams (-p) Effective Rate (Mb/s)
1*1GB 1 50 95.54
5*1GB 1 50 143.06
5*1GB 5 50 see below

In the last test above, the transfers started and 3 files were successfully copied to RAL. Via ganglia I was seeing rates of ~30MB/s out of my pool node! However, the python script then started outputting <Matching for duration in fts query line number 6 failed. Not clear what is casuing this. Could the transfer rate be too high? Try lowering the number of streams:

Size Concurrent Files Parallel Streams (-p) Effective Rate (Mb/s)
5*1GB 5 10 300.96
5*1GB 5 20 get same Matching error again
5*1GB 5 10 289.66
20*1GB 5 10 317.00
20*1GB 10 10 388.93
20*1GB 20 10 1 file Done, 19 went into Waiting state
15*1GB 20 10 422.08

So seem to be able to write to the Tier-1 at a decent rate, when coming from a non-NFS mounted pool. Now try transfer with identical parameters, but with 1GB file coming from NFS mounted pool (from dcache_27 = scotgrid10).

Size Concurrent Files Parallel Streams (-p) Effective Rate (Mb/s)
1*1GB 20 10 110.03
10*1GB 20 10 391.92
15*1GB 20 10 430.12

This shows that reading from an NFS mounted pool gives a good transfer rate. Writing performance appears to be terrible. Now test writing to pools that connected via fibre channel.

17/12/05

RAL-ED, writing to the pools that reside on the RAID disk.

Size Concurrent Files Parallel Streams (-p) Effective Rate (Mb/s)
10*1GB 20 10 125.43
15*1GB 20 10 156.37
15*1GB 20 20 Same error as before - Matching for duration in fts query line number 6 failed, plus entry in dCache gridftp logs.
20*1GB 20 10 143.58
20*1GB 5 11 Same error as before - Matching for duration in fts query line number 6 failed, plus entry in dCache gridftp logs.
20*1GB 1 11 Same error as before - Matching for duration in fts query line number 6 failed, plus entry in dCache gridftp logs.

Seem to have reached a parallel stream limit of 10. Unsure what is imposing this limit. Try some GLA-ED transfers to see if same limit exists with DPM.

GLA-ED

Writing data into the non-NFS pools.

Size Concurrent Files Parallel Streams (-p) Effective Rate (Mb/s)
10*1GB 5 10 56.08
10*1GB 5 20 57.0
20*1GB 20 20 files being transferred, then all 20 went into Waiting for some reason.

ED-GLA

Size Concurrent Files Parallel Streams (-p) Effective Rate (Mb/s)
20*1GB 5 10 Files going into Waiting state, timeouts appearing in fts logs.
20*1GB 20 10 ditto
5*1GB 20 10 181
10*1GB 20 10 122.76
10*1GB 20 30 122.74
50*1GB 20 10 transfers going into Waiting, SRM timeouts in FTS logs (30 min limit reached)



20/12/05

Want to perform some systematic testing of the Ed to RAL channel to see what effect changing the number of parallel streams and concurrent files has on transfer rate and file transfer success.

Ed to RAL

If no entry in Note column, then all file transfers succssful.

Size Concurrent Files Parallel Streams (-p) Con File * Paral Streams Effective Rate (Mb/s) Notes
5*1GB 5 1 5 238.53
5*1GB 5 2 10 201.34
5*1GB 5 4 20 228.53
5*1GB 5 8 40 264.89
5*1GB 5 10 50 122.17 2 done, 3 waiting
5*1GB 5 12 60 110.2 1 done, 4 waiting - FTS logs show 426 426 Transfer aborted. All transfers talking to gftp0446.
5*1GB 5 12 60 224.16 3 done, 2 waiting - FTS logs show 426 426 Transfer aborted. All transfers talking to gftp0444.
5*1GB 5 18 90
5*1GB 5 20 100
5*1GB 5 22 110
5*1GB 5 24 120
5*1GB 5 26 130
5*1GB 5 28 140
5*1GB 5 30 150
5*1GB 5 32 160
10*1GB 10 1 10 148.65
10*1GB 10 2 20 256.89 8 Done, 2 Waiting. FINAL:NETWORK: Transfer failed due to possible network problem - timed out
10*1GB 10 10 100 373.92
10*1GB 10 12 120 288.69 5/10. 426 errors again.
10*1GB 10 20 200 103.97 4/10. 426 errors again.

RAL to Ed

If no mention made in Note column, then all file transfers succssful.

  • Files being put into the non-NFS mounted pools.
  • dCacheSetup file on both admin and pool node using parallelStreams=1 (yes, dcache services have been restarted after changing file).
Size Concurrent Files Parallel Streams (-p) Con File * Paral Streams Effective Rate (Mb/s) Notes
5*1GB 5 1 5 144.82
5*1GB 5 2 10 148.26
5*1GB 5 4 20 154.55
5*1GB 5 6 30 35.2 1 Done, 4 waiting. Strange since the files are all in the dCache if I do an ls -l in /pnfs/...
5*1GB 5 8 40 156.51
5*1GB 5 10 50 156.13
5*1GB 5 12 60 20.95 3 done, 2 waiting, 426 error again. FTS log shows that it took ~20 mins between starting transfer and it finishing. Pool gridftpdoor logs show same messages as before 4 remaining: 6 eodc says there will be: -1 etc.
5*1GB 5 14 70 5 waiting immediated after submission. 426 errors in FTS logs. pool node logs show similar errors to above.
5*1GB 5 16 80 Same as above.
10*1GB 10 1 10 156.25 Pool node log repeatedly contains CellAdapter: cought SocketTimeoutException: Do nothing, just allow looping to continue, but files still transferred.
10*1GB 10 2 20 144.25
10*1GB 10 4 40 156.56
10*1GB 10 6 60
10*1GB 10 8 80 146.02
10*1GB 10 10 100 151.88
10*1GB 10 12 120 142.90 3 done, 7 waiting. 426 errors again in FTS logs.
10*1GB 10 14 140
10*1GB 10 16 160

21/12/05

ED-ED (dCache to DPM)

Just want to look at some dCache to DPM transfers to see how bad the transfer rate is.

Size Concurrent Files Parallel Streams (-p) Con File * Paral Streams Effective Rate (Mb/s) Notes
5*1GB 5 10 50 0 Transfers were taking place very slowly (I could see file size increasing in the DPM filesystem), but then SRM timed out.
5*100MB 5 10 50 9.14 Writing to NFS mounted RAID disk.
5*100MB 5 20 100 9.13 Writing to NFS mounted RAID disk.
5*100MB 5 20 100 3.79 2/5 done, 3 files exist. Writing to NFS mounted SAN.
5*100MB 5 20 100 9.31 5/5 done. Writing to filesystem local to DPM admin node.

ED-GLA (dCache to DPM)

Size Concurrent Files Parallel Streams (-p) Con File * Paral Streams Effective Rate (Mb/s) Notes
5*100MB 5 5 25 8.90
5*100MB 5 10 50 9.12
5*100MB 5 20 100 9.34


So seeing same consistently low transfer rate from dCache into DPM (even accounting for writing to different filesystems that are mounted in different ways).

ED-ED (DPM to dCache)

Copy files from the DPM pool that resides on the DPM admin node so that there are no conflicts with simultaneous reading and writing to the RAID array.

Size Concurrent Files Parallel Streams (-p) Con File * Paral Streams Effective Rate (Mb/s) Notes
5*100MB 5 10 50 34.27
5*100MB 5 20 100 0 Same problem as before when using > 10 streams into the dCache. Files immediately go into Waiting state.
10*100MB 10 10 100 105.40
50*100MB 50 10 500 88.61
10*1GB 10 1 10 171.57
10*1GB 10 1 10 170.01 Again, to check. Not sure why it is slower than GLA-ED.
10*1GB 10 2 20 172.81
10*1GB 10 5 50 172.41
10*1GB 10 10 100 152.44


GLA-ED (DPM to dCache)

Size Concurrent Files Parallel Streams (-p) Con File * Paral Streams Effective Rate (Mb/s) Notes
5*1GB 5 10 50 152.63
10*1GB 10 1 10 228.93
10*1GB 10 1 10 234.24 Did this as a check.
10*1GB 10 2 20 205.43
10*1GB 10 5 50 186.49


10*1GB 10 10 100 169.18
10*1GB 10 20 200 0 See same problems when using > 10 streams. 426 426 Data connection. data_write() failed: Handle not in the proper state

Why is the GLA-ED rate decreasing as the number of parallel streams increases? Could this be related to the value of parallelStreams on the dCache server?

ED-RAL (dCache to dCache)

Size Concurrent Files Parallel Streams (-p) Con File * Paral Streams Effective Rate (Mb/s) Notes
15*1GB 15 1 15 423.05
15*1GB 15 5 75 457.31
15*1GB 15 10 150 398.23
15*1GB 15 10 150 453.14
15*1GB 15 15 225 426 errors again.


ED-GLA (DPM to DPM)

These files are all coming from the NFS mounted pools.

Size Concurrent Files Parallel Streams (-p) Con File * Paral Streams Effective Rate (Mb/s) Notes
15*1GB 15 1 15 159.74
15*1GB 15 5 75 125.02 14/15. SRM timeout for one file. Error in srm__setFileStatusSOAP-ENV:Client - Invalid state
15*1GB 15 10 150 99.40 14/15. SRM timeout for one file.
15*1GB 15 20 300 68.64 11/15. SRM timeout for four files.

Now try with a 1GB file coming from a pool that is local to the DPM head node.

Size Concurrent Files Parallel Streams (-p) Con File * Paral Streams Effective Rate (Mb/s) Notes
15*1GB 15 1 15 161.13
15*1GB 15 5 75 77.36 11/15. Error in srm__setFileStatusSOAP-ENV:Client - Invalid state
15*1GB 15 10 150 80.04 12/15.Error in srm__setFileStatusSOAP-ENV:Client - Invalid state
15*1GB 15 20 300
5*1GB 5 1 5 127.67
5*1GB 5 10 50 114.33
5*1GB 5 20 100 116.01
5*1GB 5 50 250 118.63

The above transfer rates are not a true reflection of what happened. Performing a dpns-ls of the destination directory at GLA shows that the files that went into waiting state were infact transferred. However, due to the above SOAP error in FTS, the file status was never set to Done and therefore the files were always in a waiting state since the SRM eventually timed out. This meant that Graeme's script never got round to calling srm-adv-del. Looking at the ganglia plots of the DPM node shows that there were peaks in the data output rate of <~30MB/s.

12/01/06

ED-RAL

1000*1GB files dCache to dCache. 10 concurrent files, 5 streams. You can see from the plots that the transfer took approximately 5 hours, giving a rate of about 440Mb/s. A few of the transfers failed. Now need to try and improve the transfer rate in the reverse direction (NFS and RAID5 issues I think).

File:ED-RAL-1TB-scotgrid-switch.png
ScotGrid switch, green is traffic in

16/01/06

ED-DUR

First test of transfering files from Edinburgh DPM to Durham DPM (dCache to DPM issue still exists).

Size Concurrent Files Parallel Streams (-p) Con File * Paral Streams Effective Rate (Mb/s) Notes
10*1GB 10 5 50 90.33Mb/s File NFS mounted from the RAID'ed disk.


100*1GB 10 5 50 92.75Mb/s 90/100 transferred. File NFS mounted from the RAID'ed disk.


ED-GLA

Potential fix for the dCache to DPM problems that we have been seeing:

export RFIO_TCP_NODELAY=yes

in /etc/sysconfig/dpm-gsiftp and restart dpm-gsiftp. (From the web: TCP_NODELAY is for a specific purpose; to disable the Nagle buffering algorithm. It should only be set for applications that send frequent small bursts of information without getting an immediate response, where timely delivery of data is required (the canonical example is mouse movements) ).

Size Concurrent Files Parallel Streams (-p) Con File * Paral Streams Effective Rate (Mb/s) Notes
10*1GB 10 5 50 96Mb/s 9/10 sucessful. File from the RAID'ed disk.
20*1GB 10 5 50 85Mb/s 18/20 sucessful. File from the RAID'ed disk. Failed due to file already existing.


06/02/06

ScotGrid-Edinburgh back online after power upgrade.

RAL-ED

1TB transfer into Edinburgh dCache, writing to a single non-NFS mounted pool. GridFTP doors exist on both the pool node and on the head node. Noticed that all transfers going through the head node gridftp door. Think this must be related to the max number of movers that I have set on this door. Started transfers with 5 files and 5 streams, then increased to 10 files and started seeing SRM timeouts (again) that were not present with just 5 files. Looks like increasing the number of files causes the load to increase by a significant amount on the pool node. Need to test again with just a gridftp door on the pool node.

File:RAL-ED-1TB-FileNumChange.png File:Dcache-load-1TB.png

After changing back to 5 files, the transfer rate appeared to pick up briefly, as can be seen in the plot above, but then reduced to a very low rate. Not clear why. As a test, I stopped the gridftp door on the admin node, thinking that the pool node door would then start getting used automatically. This did not happen. Instead, all of the FTS transfers started failing, claiming that the SRM did not support the transfer protocol. I guess this is not too surprising since the SRM probably still thinkd that the admin node gridftp door is available (I did think it might be able to pick this up though...). So, stopped the transfer and resubmitted, this time with only the pool door open. Turns out that the pool node gridftp door wasn't open! This would explain the failures. Now resubmit, 1TB transfer, with only the pool door open.

Update: Still having problems with only having the pool door open. It works if I try an srmPut using the dCache SRM client, but as soon as I submit an FTS job, the transfers fail and I get FTS log messages like:


TURL dest   = gsiftp://dcache.epcc.ed.ac.uk:2811//pnfs/epcc.ed.ac.uk/data/dteam/fts_test/2006/02/06/tfr000-file00004
2006-02-06 17:55:43,692 [DEBUG] - Calling gsiftp getfilesize
FILE SIZE = 1000000000
2006-02-06 17:55:44,309 [DEBUG] - Calling gsiftp transfer
2006-02-06 17:55:45,238 [INFO ] - GSIFTP: source: set up FTP mode. DCAU disabled. Streams = 5, TCP buffersize = 0
 GSIFTP: dest: set up FTP mode. DCAU disabled. Streams = 5, TCP buffersize = 0;
2006-02-06 17:55:45,238 [INFO ] - STATUS:END fail:TRANSFER - gridftp
2006-02-06 17:55:48,698 [INFO ] - STATUS:BEGIN:SRM_GETDONE
2006-02-06 17:55:48,699 [DEBUG] - Entered SrmUtil::setFileStatus
2006-02-06 17:55:58,489 [DEBUG] - Exiting SrmUtil::srm__setFileStatus
2006-02-06 17:55:58,489 [INFO ] - STATUS:END:SRM_GETDONE
2006-02-06 17:55:58,489 [INFO ] - STATUS:BEGIN:SRM_PUTDONE
2006-02-06 17:55:58,489 [DEBUG] - Entered SrmUtil::setFileStatus
2006-02-06 17:55:58,777 [DEBUG] - Exiting SrmUtil::srm__setFileStatus
2006-02-06 17:55:58,777 [INFO ] - STATUS:END:SRM_PUTDONE
2006-02-06 17:55:58,777 [DEBUG] - Entered SrmUtil::deleteSurl
2006-02-06 17:55:59,163 [ERROR] - Failed To Delete Surl. Error in srm__advisoryDelete: SOAP-ENV:Server -  advisoryDelete(User  [name=dteam001, uid=18118, gid=2688, root=/],/pnfs/epcc.ed.ac.uk/data/dteam/fts_test/2006/02/06/tfr000-file00004) Error file does not exist, cannot delete
2006-02-06 17:55:59,163 [ERROR] - SRM 'Delete' failed!; Failed To Delete Surl. Error in srm__advisoryDelete: SOAP-ENV:Server -    advisoryDelete(User [name=dteam001, uid=18118, gid=2688,  root=/],/pnfs/epcc.ed.ac.uk/data/dteam/fts_test/2006/02/06/tfr000-file00004) Error file does not exist, cannot delete
2006-02-06 17:55:59,164 [INFO ] - STATUS:FAILED
2006-02-06 17:55:59,164 [DEBUG] - exiting listener thread which still seems active
2006-02-06 17:55:59,164 [ERROR] - FINAL:TRANSPORT:Transfer failed. ERROR the server sent an error response: 553 553  /pnfs/epcc.ed.ac.uk/data/dteam/fts_test/2006/02/06/tfr000-file00004: Cannot create file: CacheException(rc=666;msg=Path do not exist)
%

So as soon as the TURL is returned, the transfer seems to fail. It is then no surprise the file cannot be deleted.

So, close pool node door and reopen the admin door and restart transfer. Result of Graeme's script:

Transfer Bandwidth Report:
  59/1000 transferred in 26247.4153671 seconds
  59000000000.0 bytes transferred.
Bandwidth: 17.9827229995Mb/s

This was with 2 concurrent files and 5 streams.

07/02/06 Update

Ganglia plots show fairly low bandwidth, until this morning. Load was also pretty high on dcache.


There appear to be lots of different things going on here. Although the FTS script only reports 59 sucessful transfers, inspection of the PNFS directory shows 239 1GB files present (i.e. they were not deleted) with timestamps showing that 2 files were transferred every two minutes (this ties in with the files=2 setting). So it appears that the script ended but did not manage to run the srm-adv-del command. Due to the presence of these files, the dCache pool that I was writing to (dcache_12) filled up. Since this was the only writable pool, the files would have nowhere else to go, halting any further transfer at the time the last file went in (about 0540, which matches the above network plot). In fact, the transfers are still running, but always failing (getting No valid credentials provided in FTS logs). I have now cancelled the transfer. Inspection of other FTS logs during the transfer show lots of SRM timeouts.

File:Dcache-load-less-than-1TB.png File:Srm-network-less-than-1TB.png

Notice also that the load on dcache (the pool node) wasn't excessively high during the transfers overnight, probably since they were running at such a low rate.


Resubmit 1TB transfer (only admin door open). 2 concurrent files, 5 streams to start with. Seeing intermittant bursts of network activity and a few SRM timeouts. Strange.

File:Srm-intermittant-network-1TB.png

Seeing this FTS error sometimes:

ERROR the server sent an error response: 425 425 Cannot open port: java.lang.Exception: Pool request timed out : dcache_12

This could explain what I'm seeing - too much load on a single pool? I've increased the number available (dcache_12 to dcache_19, all RAID) to see what effect it has on the rate.


08/02/06 Update

FTS script completed by saying that only ~200 files were transfered out of 1000. However, inspection of the ganglia plots and the contents of PNFS show that this is nonsense and files were getting transferred all night. This is a good test of a sustained transfer rate over a long period of time (approx 14 hours). The ScotGrid network monitoring image below shows that we sustained a rate of ~175Mb/s and FTS reports that 993 files were transferred.


File:Srm-network-1TB-2files.png File:Dcache-load-1TB-2files.png File:SCOTGRID-NETWORK-1tb-2FILES.png

This test also shows the effect of restricting dCache to only be able to write to one pool and allowing multiple write pools. With only one, there were lots of SRM time outs and the transfer rate was intermittant and low compared to the case when multiple (8) write pools were enabled. What is strange though is that all of these pools are on the same host, but the load on this host was higher when only one write pool was available compared to the case when the sustained transfer was taking place. Possibly this could be solved by altering the number of mover queues allowed per pool. This is something that I could do to load balance the dCache, with a higher number of mover queues for the RAID disk pools relative to the NFS mounted ones.

Conclusion:

  • Have a look at altering number of movers per pool and re-run single pool transfer again.
  • Re-run same test with multiple pools after increasing number of movers but with > 2files to see if higher rate achievable.

New dCache Setup

18/04/06

Have now split the dCache up into separate read and write pools. The RAID'ed disk pools are in the write group and the NFS pools are in the read group. Hopefully this should optimise performance since the NFS write bottleneck will be removed. However, there are still issue with the flushing of the write pools to the read pools to be resolved. Waiting on response from user-forum about this. Also enabled the second GridFTP door on dcache, so the load should also be spread a bit more, allowing the transfer rate to increase. In addition, during these FTS tests, I have used a single gridftp parallel stream since the optimisation tests showed this to give the highest rate.

Transfer Bandwidth Report:
  250/250 transferred in 9811.36120009 seconds
  250000000000.0 bytes transferred.
Bandwidth: 203.845313531Mb/s


The image shows the network usage during the 250GB FTS test and also the high load experienced by the dCache pool (and door) node. The load on the SRM node (and second door) was low.

File:SHEF-ED-network.png File:SHEF-ED-dcache-load.png

With 5 parallel streams:

Transfer Bandwidth Report:
  250/250 transferred in 10004.0994718 seconds
  250000000000.0 bytes transferred.
Bandwidth: 199.918044161Mb/s

01/05/06

New month, new test. Using the new dCache configuration detailed above (two doors, read/write pools) and also with 16GB of memory now available on the disk server, I ran another test from Glasgow to give some indication of performance. The result of a 100GB transfer with a single parallel stream and 7 concurrent files is:

Transfer Bandwidth Report:
  100/100 transferred in 2896.62316608 seconds
  100000000000.0 bytes transferred.
Bandwidth: 276.183664264Mb/s

File:GLA-ED-100GB-test.pngFile:GLA-ED-100GB-test-load.png

The ScotGrid switch reported higher bandwidth than that reported by script. This showed that we peeked at 338Mb/s. Clearly the changes have had a positive impact on the write performance of the production dCache ;-).

Equilvalent test in the opposite direction gives:

Transfer Bandwidth Report:
  100/100 transferred in 2055.87620091 seconds
  100000000000.0 bytes transferred.
Bandwidth: 389.128489179Mb/s

So our dCache can pump data out at a sufficiently high rate, but writing remains a problem, although there has been an improvement since the last time it was measured. Now need to look at simultaneous reading and writing.

02/05/06

Simultaneous (short) read/write test. Not entirely realistic since transfer script only pulls one file from the SRM, so this will get cached.

GLA->ED

 Transfer Bandwidth Report:
  98/100 transferred in 3622.77725482 seconds
  98000000000.0 bytes transferred.
Bandwidth: 216.408557539Mb/s

ED->GLA

Transfer Bandwidth Report:
  100/100 transferred in 2172.01412892 seconds
  100000000000.0 bytes transferred.
Bandwidth: 368.321729287Mb/s

File:GLA-ED-simul-transfer-network.gif

So rates were lower than in the individual tests that were done yesterday, which is to be expected. There was a larger drop in the GLA->ED transfer rate than the ED->GLA, and also a couple of failed file transfers, due to SRM timeouts. I think this behaviour can probably be traced back to the use of only a single disk server at Edinburgh which causes bottlenecks. Need to remember that I have setup the dCache at Edinburgh to use read and write pools, with the read pools being the NFS mounted partitions from the SAN. This should not be an issue in this test (due to the caching) but when we use multiplt read files, we may observe a further drop in performance due to an NFS bottleneck.

FTS in srmCopy mode

17/05/06

STAR-ED FTS channel now using srmCopy instead of 3rd party GridFTP (urlcopy). Ed dCache configuration as before, with two GridFTP doors and read/write pools. Since now using srmCopy, data will go straight to the disk pool and not be routed via a GridFTP door first of all. Since we only have a single disk server but two GridFTP doors, this mode of operation could potentially lower our writing rate.

GLA->ED

25 concurrent files. Single stream (a netstat on the disk server during the transfer reveals that there are 50 connections opened up between the Ed and Gla disk servers; 25 are control channels (connected to 2811) and 25 are data channels, connected to ports in 50000-52000).

Transfer Bandwidth Report:
  50/50 transferred in 1253.38757515 seconds
  50000000000.0 bytes transferred.
Bandwidth: 319.135124626Mb/s

As it turns out, we see an even faster write rate into our dCache (previously seeing 276Mb/s from Glasgow). As can be seen from the ganglia plots, there was little network traffic between the dCache nodes (Out/blue line) since all of the data was being transferred directly to the disk server (dcache.epcc). The load on dCache was slightly higher than the previous case where urlcopy was being used, but this is to be expected since it is doing all of the work now. The head node (srm.epcc) had essentially no load on it.

File:GLA-ED-50TB-FTS-srmCopy-network.pngFile:GLA-ED-50TB-FTS-srmCopy-load.png

Another point to note is that there were no error messages in the dCache SRM logs unlike the case with the IC-HEP dCache yesterday. I will need to retest, but these errors are possibly the result of a problem wer had with the Edinburgh dCache which was being used as a source during that test.

Now repeat the test but with 7 concurrent files, just to get an idea of how it influences the rate.

Transfer Bandwidth Report:
  50/50 transferred in 1222.87713408 seconds
  50000000000.0 bytes transferred.
Bandwidth: 327.09745636Mb/s

Remarkably it is even faster and the load on our disk server isn't as large. Do another test:

Transfer Bandwidth Report:
  50/50 transferred in 1332.07108617 seconds
  50000000000.0 bytes transferred.
Bandwidth: 300.284274731Mb/s

OK, rate dropped a little, but still better than the urlcopy case.

Currently the number of parallel GridFTP streams is 1 since this is set by the parallelStreams parameter in the destination dCacheSetup file. This cannot be overridden by passing the -p=N_s option to FTS. I think the parallelStreams option will override in all cases.

RAL-PP -> ED

Run a test from another dCache source SRM, just to compare to the DPM case above.

Transfer Bandwidth Report:
  100/100 transferred in 4156.05011821 seconds
  100000000000.0 bytes transferred.
Bandwidth: 192.490460232Mb/s

Hmm, clearly not as fast. Maximum outbound rate from RAL-PP -> RAL has been recorded as 380, so how come we do not see this here? Networking issues?

18/05/06

RAL-PP -> GLA

Run identical test into Glasgow DPM, just for comparison.

Transfer Bandwidth Report:
  54/55 transferred in 2389.1915729 seconds
  54000000000.0 bytes transferred.
Bandwidth: 180.814299238Mb/s

So, maybe RAL-PP do have some networking issues when sending data to other Tier-2's.

1TB test GLA -> ED

Transfer Bandwidth Report:
  1000/1000 transferred in 22697.0156629 seconds
  1e+12 bytes transferred.
Bandwidth: 352.469246125Mb/s

Very decent transfer rate. (1-minute) Load on the dcache pool node (4 CPUs) was about 20 for the duration of the transfer. This is slightly higher than the load on the node during a urlcopy transfer, but it is not significantly higher. The load on the head node (dual core) was ~1. Additionally, there is no inter-node traffic since srmCopy transfers directly to the pool node when dCache is the destination SRM.

19/05/06

Changed the number of parallel GridFTP streams that dCache uses during a transfer by editing /opt/d-cache/config/dCacheSetup on the head node and restarting dcache-core servrices. parallelStreams was increased from 1 to 10 and a 50GB transfer run:

Transfer Bandwidth Report:
  50/50 transferred in 1957.17592192 seconds
  50000000000.0 bytes transferred. 
Bandwidth: 204.37610923Mb/s

Running a netstat -tap on the pool node shows that there are now 11 (1 control+10 data) open connections to the source disk server per file, but this has had a detrimental effect on the final transfer rate. The load on the machine is also higher than the case where only a single stream is used.

File:GLA-ED-50TB-FTS-srmCopy-network-Ns10.pngFile:GLA-ED-50TB-FTS-srmCopy-load-Ns10.png