RAL Tier1 CASTOR CMS Testing

From GridPP Wiki
Revision as of 09:02, 13 July 2007 by James jackson (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Details of CMS CASTOR Instance load-testing.

Test Details

Long 200MB/s Write to cmsWanIn

Date: 11-07-2007
Time: 23:15 -
Type: Rfio write from T1 batch farm to cmsWanIn
Jobs: 40 * 5MB/s
Castor parameters: Same as below

Observations:

13-07-2007: Write finished after ~24 hours, tape migration at full bandwidth

http://www.onlineclienttest.co.uk/CastorPlots/LongTest1/finishwrite-all.png

12-07-2007, 13:50: Back to initial steady state:

http://www.onlineclienttest.co.uk/CastorPlots/LongTest1/all-2.png

12-07-2007, 12:40: Write significantly cut, migration at ~200MB/s:

http://www.onlineclienttest.co.uk/CastorPlots/LongTest1/WriteHit.png

Write rate at just below 200MB/s. Migration at ~100MB/s. Each disk server appears to be able to stream 'burstily' at write rate as plots below show:

http://www.onlineclienttest.co.uk/CastorPlots/LongTest1/gdss93.png http://www.onlineclienttest.co.uk/CastorPlots/LongTest1/gdss94.png http://www.onlineclienttest.co.uk/CastorPlots/LongTest1/gdss95.png http://www.onlineclienttest.co.uk/CastorPlots/LongTest1/gdss96.png http://www.onlineclienttest.co.uk/CastorPlots/LongTest1/gdss97.png http://www.onlineclienttest.co.uk/CastorPlots/LongTest1/gdss98.png


Configuration change:
Date: 11-07-2007

Nick W. changed current Viglin servers (x 6 in cmsWanIn) to reflect previous Areca tunings.

Current Viglin vs. CSA06 Areca Parameters
Parameter Viglin Areca
BlockDev 256 512
Num. Data Disks 13 Raid 5 22 Raid 6

204MB/s Write to cmsWanIn

Date: 10-07-2007
Time: 23:35 - 00:05
Type: Rfio write from T1 batch farm to cmsWanIn
Jobs: 34 * 6MB/s write
Castor parameters: Same as below (no tuning)

Observations:
Similar behavior as for 40 * 5MB/s test.

200MB/s Write to cmsWanIn

Date: 10-07-2007
Time: 22:25 - 23:30
Type: Rfio write from T1 batch farm to cmsWanIn
Jobs: 40 * 5MB/s write
Castor parameters: Same as below (no tuning)

Observations:
Wrote at just under 200MB/s. Tape migration at ~75 - 100MB/s. After write, tape migration at ~150 - 200MB/s

http://www.onlineclienttest.co.uk/CastorPlots/120.png

160MB/s Write to cmsWanIn

Date: 10-07-2007
Time: 20:30 - 21:30
Type: Rfio write from T1 batch farm to cmsWanIn
Jobs: 32 * 5MB/s write
Castor parameters: Believed same as below (no tuning yet?)

Observations:
Wrote at just under 160MB/s. Tape migration at 100MB/s, followed by catch-up at ~200MB/s

128MB/s Write to cmsWanIn

Date: 10-07-2007
Time: 19:20 - 20:20
Type: Rfio write from T1 batch farm to cmsWanIn
Jobs: 32 * 4MB/s write
Castor parameters: Believed same as below (no tuning yet?)

Observations:
Test run to ensure reproducibility of original 128MB/s test before re-running with higher rates to find failure point. Write and tape migration kept up, small tape catch-up at end of write (peak ~200MB/s).

http://www.onlineclienttest.co.uk/CastorPlots/128.png http://www.onlineclienttest.co.uk/CastorPlots/128-2.png

256MB/s Write to cmsWanIn

Date: 09-07-2007
Time: 16:30 - 18:30
Type: Rfio write from T1 batch farm to cmsWanIn
Jobs: 64 * 4MB/s write
Castor parameters: Same as below

Observations:
Write observed constant at 256MB/s
Tape migration at ~60MB/s during rfio write, then at ~200MB/s after write finished
LSF slots for total running jobs staying at about 32 => 5-6 jobs per server

128MB/s Write to cmsWanIn

Date: 09-07-2007
Time: 14:00 - 16:00
Type: Rfio write from T1 batch farm to cmsWanIn
Jobs: 32 * 4MB/s write
Castor parameters: cmsWanIn includes gdss93-98 with default network tuning, 4 tape drives, 20 LSF slots/server, default policies, castor version 2.1.3-15.

Observations:
Write observed constant at 128MB/s
Tape migration kept up at 128MB/s
LSF slots for total running jobs staying at about 64 => 10-11 jobs per server