RAL Tier1 SC3 Log

This page refers to events during WLCG Service Challenge 3 from the RAL perspective, which took place in January 2006, as such, it is largely of historical interest.

Log of RAL in service challenge 3. Previously this log was maintained elsewhere. The main service at RAL involved with SC3 is the RAL dCache service.

24/01/2006

File:CERN-RAL-24012006.jpg

CERN to RAL 24th January 2006

10:10 - Uploaded plot showing 12 hours receiving data at >= 150MB/s
09:30 - csfnfs60 has stopped receiving data, nothing in log file, dcache-pool restarted
00:20 - csfnfs61 has stopped receiving data, nothing in log file, dcache-pool restarted

23/01/2006

File:CERN-RAL-20060123.jpg

CERN to RAL 23rd January 2006

15:56 - Uploaded plot of our record hour so far.
14:06 - forced clear out of duplicate CMS files on csfnfs39_1 and csfnfs39_2 pools to increase free space
13:44 - did manual clear out of sc3 dir, plus misc files in dteam directory, set clearout cron to hourly
13:25 - restarted dcache-core on gftp0446 & gftp0447 - transfers stuck not doing much
13:17 - no traffic going through gftp0445 - restarted dcache-core
11:50 - csfnfs62: "Too many open files" - restarted dcache-pool, init.d script updated, but after restart on 13th
11:45 - gftp0444 showed 43 gftp transfers active, but very little traffic, restarted dcache-core

21/01/2006

20:44 - csfnfs42 and 54 show "Too many open files", updated startup scripts, restarted dcache-pool

20/01/2006

23:25 - noticed that CERN-RAL channel is now at 40 transfers
10:06 - csfnfs51 giving too many open files errors, added ulimit, restarted

19/01/2006

16:25 - confirmed as FTS problems at CERN
16:15 - transfers stopped at 15:40, srm seems fine, but FTS has no active transfers for CERN-RAL channel, mailed CERN
09:55 - increased frequency of cleaning script to 3 hours

18/01/2006

15:55 - transfers had stopped to csfnfs51 - all pools filled - did manual run of deleter cron
13:17 - gmond on pnfs.gridpp hung, restarted
13:02 - Increased number of FTS files from 20 to 30
11:10 - gftp0447 had hung about 10:00, rebooted
10:53 - copied updated sysctl.conf onto gftp0445 to get picked up at next reboot - rate to gftp0444 not any worse and it hasn't crashed yet
10:50 - pool restarted messages in PoolManaget from csfnfs63 pools stopped, looks like two dcache-pool instances were running on csfnfs63
09:50 - more transfers queued on csfnfs63_4 pool, restarting dcache-pool on csfnfs63
09:25 - gftp0445,446 locked up around 05:00

17/01/2006

23:23 - All is well.
17:16 - 57 transfers queued on csfnfs63_4 pool, increased max movers from 50 to 100
09:30 - All gftp systems locked up at 05:00, have now been rebooted, gftp0444 has picked up new systcl settings

16/01/2006

19:30 - No data has been flowing through gftp0445 for the last couple of hours but I don't know why.
09:29 - csfnfs39 - Too many open files - dcache-pool restarted
01:07 - Transfer balance has improved - restarted gftp0447's dcache-core

15/01/2006

23:10 - Restarted dcache-core on gftp0444 to see if it sorts out transfer imbalance

File:CERN-RAL-FTP-SERVER-20050115.jpg

CERN to RAL 15th Jan 2006

15:48 - Uploaded picture showing that two dead gridftp servers does not make a large difference.
13:43 - Noticed gftp0445 & gftp0446 had hung - rebooted remotely
13:40 - csfnfs53 showing "Too many open files", copied startup script from nfs60, restarted dcache-pool
10:38 - csfnfs60 showing "Too many open files" messages, dcache-pool restarted

14/01/2006

18:50 - CASTOR running and data arriving at RAL again.
03:45 - Transfers have competely stopped - CERN CASTOR problem?
03:45 - Maarten Litmaath reported failures to two pools on csfnfs63, on investigation noticed frequent PoolRestarted messages for csfnfs63's pools in PoolManager pinboard, decided to restart dcache-pool service on csfnfs63.

13/01/2006

10:56 - Added ulimit -n 16384 to csfnfs63:/etc/init.d/dcache-pool to match other disk servers, will be implemented at next resart. I missed this one out.
09:50 - csfnfs62's pools were showing "Too many open files" errors - restarted dcache-pool service

12/01/2006

File:CERN-RAL-20050112.jpg

CERN to RAL 12 January 2006

Service Challenge CERN Disk to RAL Disk start.

A good rate overnight up to 1.3 Gbits/s.
17:50 - Added all dCache pools to dteam pool group on UKLight visible servers, except for 4 tape buffer pools.
17:00 - Updated gftp0444's /etc/sysctl.conf with new values, will take effect on next reboot.
15:00 - Two of the GridFTP doors crashed, this shortly after Martin L raised us to 30 concurrent files.

FTS Interaction

The state of FTS channel can be queried with

 $ glite-transfer-channel-list \ 
     -s https://sc3-fts-external.cern.ch:8443/site-fts/glite-data-transfer-fts/services/ChannelManagement  \
     CERN-RAL
 Channel: CERN-RAL
 Between: CERN-SC and RAL
 State: Active
 Contact: lcg-support@gridpp.rl.ac.uk
 Bandwidth: 0
 Nominal throughput: 0
 Number of files: 30, streams: 5
 Number of VO shares: 5
 VO 'dteam' share is: 20
 VO 'alice' share is: 20
 VO 'atlas' share is: 20
 VO 'cms' share is: 20
 VO 'lhcb' share is: 20

Links

RAL Tier1 SC3 Log

Contents

24/01/2006

23/01/2006

21/01/2006

20/01/2006

19/01/2006

18/01/2006

17/01/2006

16/01/2006

15/01/2006

14/01/2006

13/01/2006

12/01/2006

FTS Interaction

Links

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Main GridPP website

Navigation

Tools