RAL Tier1 weekly operations castor 02/08/2010

Work this week

Matthew:
- A/L
Shaun:
- Operations: Two disk servers in LHCbUser showed heavy load. Due to other two being full.
- Operations: Database problem on Neptune; resloved with Ian and Keir
- Operations: Need to schedule firmware update for SL08 disk servers
- PreProd: Some problems with grid-map file. Resolved by changing order of fetching info.
- PreProd: Problems on VULCAN DB
- PreProd: Problems with gdss154 as source for disk-2-disk copy
- PreProd: CMS and ATLAS started testing. No response yet from ALICE and LHCb
Chris:
- A/L
Richard:
- ..
Brian:
- ..
Jens:
- ..

gdss419 (AtlasSimStrip-d1t0) - has got 2 drive failures. One of the drive replaced 29/07/2010, the second one should be replaced next week. The machine could be out of production until 06/08/2010
gdss187 (AtlasFarm)- h/w has been fixed, needs to have checksum verified due to fsprobe errors
CMS has reported some files not migrating to tape after couple of weeks. They all were in "tapecopy_failed" status. Resetting manually the status to "tapecopy_tobemigrated" has moved them to tape. The cause of this problem is still unknown.
Atlas jobManager has crashed silently without producing any errors. Has been restarted which has fixed the problem (30/07/2010)
ATLAS stager db corrupted due to known bug on Sunday. Recovered on the same day.

PreProd

Entries in/planned to go to GOCDB None