Glasgow/Edinburgh dCache Performance Tests

From GridPP Wiki
Jump to: navigation, search

User:Greig cowan tested different kernels and filesystems on a RAID 5 box using a Dell Megaraid controller.

SL3.0.5 with vanilla 2.4 kernel

Vanilla Kernel 2.4 ext2
Parallel Streams
3 5 10
Files 3 165, 0 166, 1 176, 0
5 188, 0 182, 0 184, 0
10 184, 0 185, 0 177, 0
Vanilla Kernel 2.4 ext3
Parallel Streams
3 5 10
Files 3 186, 0 155, 0 145, 0
5 159, 0 163, 0 160, 0
10 163, 0 140, 0 165, 0
Vanilla Kernel 2.4 jfs
Parallel Streams
3 5 10
Files 3 206, 0 203, 0 193, 0
5 192, 0 199, 0 199, 0
10 186, 0 187, 0 186, 0

File:Jfs-test-10-files.gif

SL3.0.5 with CERN 2.4 XFS kernel

CERN Kernel 2.4 ext2
Parallel Streams
3 5 10
Files 3 210, 0 199, 0 194, 0
5 202, 0 206, 0 198, 0
10 192, 0 182, 0 203, 0
CERN Kernel 2.4 ext3
Parallel Streams
3 5 10
Files 3 157, 0 152, 0 155, 0
5 183, 0 172, 0 167, 0
10 168, 0 159, 0 180, 2
CERN Kernel 2.4 xfs
Parallel Streams
3 5 10
Files 3 214, 0 222,0 209, 0
5 225, 0 222, 0 222, 0
10 225, 0 217, 0 214, 0

File:Xfs-test-3files.gifFile:Xfs-test-3files-load.gif

Have been running with performanceMarker = 10. This seems to have drastically reduced the number of SRM timeouts that FTS is reporting.

Kernel Tunings

I decided to apply the SC3 kernel tunings to see what kind of effect it would have on the transfer rate.

CERN Kernel 2.4 ext2
Parallel Streams
3 5 10
Files 3 221, 0 218, 0 216, 0
5 213, 0 233, 0 221, 0
10 222, 0 222, 1 220, 0
CERN Kernel 2.4 ext3
Parallel Streams
3 5 10
Files 3 178, 0 169, 0 191, 0
5 212, 0 208, 0 184, 0
10 171, 0 161, 0 197, 0
CERN Kernel 2.4 xfs
Parallel Streams
3 5 10
Files 3 229, 0 217, 0 216, 1
5 249, 0 248, 0 248, 0
10 242, 0 249, 0 245, 0

Notice that the transfer rate is higher after application of the SC3 tweaks, but the load on the machine is not any larger than before.

SL3.0.6 with CERN 2.6 kernel

CERN 2.6 ext2
Parallel Streams
3 5 10
Files 3 188, 0 197, 0 191, 0
5 158, 0 136, 0 117, 0
10 0, 30 0, 30 0, 30
CERN 2.6 ext3
Parallel Streams
3 5 10
Files 3 171, 0 191, 0 186, 0
5 186, 0 187, 0 181, 0
10 213, 0 213, 0 210, 0
CERN 2.6 xfs
Parallel Streams
3 5 10
Files 3 243, 0 244, 0 226, 0
5 237, 0 235, 0 227, 0
10 224, 0 222, 0 221, 0
CERN 2.6 jfs
Parallel Streams
3 5 10
Files 3 203, 0 208, 0 197, 0
5 205, 0 196, 0 187, 0
10 201, 0 194, 0 173, 0

File:Ext2-test-2.6-load.gifFile:Ext2-test-2.6-network.gif

With SC3 kernel tuning parameters

CERN 2.6 ext2
Parallel Streams
3 5 10
Files 3 207, 0 143, 0 190, 0
5 66, 19 0, 30 0, 30
10 0, 30 0, 30 0, 30
CERN 2.6 ext3
Parallel Streams
3 5 10
Files 3 249, 0 259, 0 233, 0
5 253, 0 178, 25 92, 14
10 201, 20 200, 11 , 0
CERN 2.6 xfs
Parallel Streams
3 5 10
Files 3 175, 0 256, 0 276, 0
5 276, 0 268, 0 284, 0
10 283, 0 272, 0 280, 0
CERN 2.6 jfs
Parallel Streams
3 5 10
Files 3 264, 0 254, 0 245, 0
5 254, 0 212, 0 179, 0
10 126, 20 0, 30 0, 30

File:Ex3-test-2.6-load-problems.gif File:Ex3-test-2.6-network-problems.gif

Altering dCache configuration

Changing value of parallelStreams parameter in dCacheSetup file. All with CERN 2.6 kernel, no kernel tweaks applied, using jfs filesystem.

parallelStreams = 1
Parallel Streams
1 3 5 10
Files 1 170, 0 133, 0 132, 0 130, 0
3 226, 0 193, 0 190, 0 203, 0
5 236, 0 200, 0 201, 0 193, 0
10 250, 0 209, 0 194, 0 180, 0
parallelStreams = 3
Parallel Streams
1 3 5 10
Files 1 167, 0 132, 0 132, 0 131, 0
3 220, 0 197, 0 197, 0 214, 0
5 229, 0 196, 0 155, 0 177, 0
10 250, 0 198, 0 191, 0 159, 0

--- Aside

OK, turns out that changing the dCacheSetup file parameter parallelStreams has no effect on the transfer rates observed with the current version of FTS (1.4). Even though the dCacheSetup file does have the entry:

#  ---- Number of parallel streams per GridFTP transfer
parallelStreams=1

the parameter parallelStreams only appears in the /opt/d-cache/config/srm.batch file. Only when you use srmcp does this parameter come into play and decide how many parallel streams need to be used. In fact, as an aside setting the stream_num in the .srmconfig/config.xml file on the srmcp client does not have any influence on the number of streams that the resulting transfers uses. This appears to be exclusively controlled by the parallelStreams parameter on the server.

---

Transfers with only 1 concurrent file show quite a peaky network load, this becomes smoothed out with larger number of files. This makes sense due to the SRM negotiation overhead that is required. With multiple files, data can be getting transferred while another file is going through the negotiation step.

Changed /opt/d-cache/config/gridftpdoor.batch parameter

-maxStreamsPerClient=10 \

From 10 to 11, since this limits the number of streams that a transfer can use for each file transfer. 30GB transfer with 3 concurrent files and 11 streams gave 209Mb/s, which is in the same region as all multi-stream transfers above. From the above tables, it is clear that no higher data transfer rates are not observed in our dCache from using multiple streams. The highest rates have all been obtained using 1 stream and multiple concurrent files. An explanation of this could be that the RAID 5 configuration of the disk requires on 1 stream to be used, otherwise the disk I/O will be high when trying to write lots of streams from one file that are coming in simultaneously (this is related to the problem that Imperial have been experiencing recently).

Then tried increasing the number of concurrent files further to 15 and running with just one stream. Rate: 142Mb/s 15/30 success

This led to a high rate, but also a very high load on the dCache node. This must be caused at the end of the transfers when the SRM PutDone is being completed. Since all 15 transfers are ending simultaneously, there is is a lot for the CPU to do, meaning that the load increases while the transfer rate drops since no new transfers can start until the first set are completed. You can see this clearly in the network monitoring and load plots below.

File:Jfs-test-2.6-kernel-15-files-1-stream-load.gifFile:Jfs-test-2.6-kernel-15-files-1-stream-network.gif

It is better to start of the transfers with only a couple of concurrent files and then ramp up to the final level slowly. This will stagger the final SRM PutDone completion leading to a smoother file transfer profile. Rate: 249Mb/s 30/30. This can be seen here:

File:Jfs-test-2.6-kernel-15-files-1-stream-load-stagger.gifFile:Jfs-test-2.6-kernel-15-files-1-stream-network-stagger.gif

Then tried increasing the number of concurrent files further to 30 and running with just one stream, again staggering the number fo files from 1 up to 30. Rate: 274Mb/s . But you need to be careful how you go about increasing the number of concurrent files. If you increase too quickly, you get into the same situation as the first '15 con file' transfer above.

With the SC3 kernel tuning parameters applied, the rates increased further, as expected, up to 307Mb/s for the 1 stream, 5 file case. Instability was again observed when moving to 10 files, but I believe that this is again caused by FTS not staggering the start times of the file transfers leading to a very high load on the dCache and consequently the jobs failing.

Additional single stream testing of dCache with different pool filesystems

CERN 2.6 ext3
Parallel Streams
1
Files 1 183, 0
3 260, 0
5 242, 0
10 151, 10
CERN 2.6 ext3 with tunings
Parallel Streams
1
Files 1 171, 0
3 277, 0
5 320, 0
10 0, 30
CERN 2.6 ext2
Parallel Streams
1
Files 1 187, 0
3 172, 0
5 158, 0
10 78, 30

Notice in the ext2 case above the 10 concurrent file transfer succeeded with 1 stream, unlike the cases where multiple streams were used. The rate was still low though, presumably due to the high load on the machine generated by a it having to deal with a large number of simultaneous SRM PutDone requests. Still appears that using 1 parallel stream gives the best performance when writing into a dCache.

File:Ext2-test-2.6-kernel-10-files-1-stream-load.gifFile:Ext2-test-2.6-kernel-10-files-1-stream-network.gif