Difference between revisions of "RAL Tier1 CASTOR CMS Testing"
(No difference)
|
Latest revision as of 09:02, 13 July 2007
Details of CMS CASTOR Instance load-testing.
Test Details
Long 200MB/s Write to cmsWanIn
Date: 11-07-2007
Time: 23:15 -
Type: Rfio write from T1 batch farm to cmsWanIn
Jobs: 40 * 5MB/s
Castor parameters: Same as below
Observations:
13-07-2007: Write finished after ~24 hours, tape migration at full bandwidth
http://www.onlineclienttest.co.uk/CastorPlots/LongTest1/finishwrite-all.png
12-07-2007, 13:50: Back to initial steady state:
http://www.onlineclienttest.co.uk/CastorPlots/LongTest1/all-2.png
12-07-2007, 12:40: Write significantly cut, migration at ~200MB/s:
http://www.onlineclienttest.co.uk/CastorPlots/LongTest1/WriteHit.png
Write rate at just below 200MB/s. Migration at ~100MB/s. Each disk server appears to be able to stream 'burstily' at write rate as plots below show:
http://www.onlineclienttest.co.uk/CastorPlots/LongTest1/gdss93.png http://www.onlineclienttest.co.uk/CastorPlots/LongTest1/gdss94.png http://www.onlineclienttest.co.uk/CastorPlots/LongTest1/gdss95.png http://www.onlineclienttest.co.uk/CastorPlots/LongTest1/gdss96.png http://www.onlineclienttest.co.uk/CastorPlots/LongTest1/gdss97.png http://www.onlineclienttest.co.uk/CastorPlots/LongTest1/gdss98.png
Configuration change:
Date: 11-07-2007
Nick W. changed current Viglin servers (x 6 in cmsWanIn) to reflect previous Areca tunings.
Parameter | Viglin | Areca |
BlockDev | 256 | 512 |
Num. Data Disks | 13 Raid 5 | 22 Raid 6 |
204MB/s Write to cmsWanIn
Date: 10-07-2007
Time: 23:35 - 00:05
Type: Rfio write from T1 batch farm to cmsWanIn
Jobs: 34 * 6MB/s write
Castor parameters: Same as below (no tuning)
Observations:
Similar behavior as for 40 * 5MB/s test.
200MB/s Write to cmsWanIn
Date: 10-07-2007
Time: 22:25 - 23:30
Type: Rfio write from T1 batch farm to cmsWanIn
Jobs: 40 * 5MB/s write
Castor parameters: Same as below (no tuning)
Observations:
Wrote at just under 200MB/s. Tape migration at ~75 - 100MB/s. After write, tape migration at ~150 - 200MB/s
http://www.onlineclienttest.co.uk/CastorPlots/120.png
160MB/s Write to cmsWanIn
Date: 10-07-2007
Time: 20:30 - 21:30
Type: Rfio write from T1 batch farm to cmsWanIn
Jobs: 32 * 5MB/s write
Castor parameters: Believed same as below (no tuning yet?)
Observations:
Wrote at just under 160MB/s. Tape migration at 100MB/s, followed by catch-up at ~200MB/s
128MB/s Write to cmsWanIn
Date: 10-07-2007
Time: 19:20 - 20:20
Type: Rfio write from T1 batch farm to cmsWanIn
Jobs: 32 * 4MB/s write
Castor parameters: Believed same as below (no tuning yet?)
Observations:
Test run to ensure reproducibility of original 128MB/s test before re-running with higher rates to find failure point. Write and tape migration kept up, small tape catch-up at end of write (peak ~200MB/s).
http://www.onlineclienttest.co.uk/CastorPlots/128.png http://www.onlineclienttest.co.uk/CastorPlots/128-2.png
256MB/s Write to cmsWanIn
Date: 09-07-2007
Time: 16:30 - 18:30
Type: Rfio write from T1 batch farm to cmsWanIn
Jobs: 64 * 4MB/s write
Castor parameters: Same as below
Observations:
Write observed constant at 256MB/s
Tape migration at ~60MB/s during rfio write, then at ~200MB/s after write finished
LSF slots for total running jobs staying at about 32 => 5-6 jobs per server
128MB/s Write to cmsWanIn
Date: 09-07-2007
Time: 14:00 - 16:00
Type: Rfio write from T1 batch farm to cmsWanIn
Jobs: 32 * 4MB/s write
Castor parameters: cmsWanIn includes gdss93-98 with default network tuning, 4 tape drives, 20 LSF slots/server, default policies, castor version 2.1.3-15.
Observations:
Write observed constant at 128MB/s
Tape migration kept up at 128MB/s
LSF slots for total running jobs staying at about 64 => 10-11 jobs per server