RAL Tier1 weekly operations castor 28/09/2009

Summary of Previous Week

SRM 2.8 upgrade on ATLAS (Shaun, DB Team)
Finalizing testing for CIP 2.0 (Jens)
Investigating cause of D2D Transfer incident (Chris)
Finalized disk server deploymentation documentation (Chris)
Deployed 5 DS for atlasHotDisk and 14 for AtlasSimStrip (Chris)
Working on a problem with kernel clashing with FC card which prevents us to upgrade tape servers to the latest kernel (Chris)
Distributing Raid5/6 servers across service classes using draining (Brian)
Diagnose and fix network cable problem on Vulcan test database (Cheney)
Fix sendmail problem DLF database single (Cheney)
Started build of a new failover tape robot controller (Cheney)
Fixed SLS (out of inodes due to logrotate failure) (Cheney)
Fixed controller crash on database hardware (twice) (Cheney)
Applied changes to nagios config for new diskservers (Cheney)
Applied Oracle ASM Patch on Production RACs (DB Team)
Installing and acceptance testing new CASTOR servers (Richard, Cheney)
Coordinating bringing CASTOR down for UPS test (Matt)
Writing post mortem of NS upgrade D2D transfer incident (Matt)
Working with GOCDB developers to suggest including 'DEGRADED' status (Matt)

Carry on working on kernel problem for tape servers (Chris)
Black and White list tests (Chris)
Carry on LSF investigation (Chris)
Working on puppet manifest for polymorphic central servers (Chris)
2.8-1 deployment and testing (Shaun)
Install and Configure Database Agent for Oracle Enterprise Manager at CERN (DB Team)
Installing SLC 64 bit on new preprod machines (Richard)
Finish off patching including non-castor (Cheney)
Write next Techwatch newsletter (Cheney)
Distributing Raid5/6 servers across service classes using draining (Brian)
Chasing up strategic objectives (Matt)
Disaster recovery documentation (Matt)

Ongoing

The ORACLE ASM failed again on night of 24/9/09. However, the ORACLE patch worked and ORACLE was able to recover without any adverse service impact.

Problems with ganglia check on GEN instance delaying work on monitoring (in hand)

Entries in/planned to go to GOCDB

Description	Start	End	Type	Affected VO(s)
CIP 2.0 upgrade	29/9/09 1200	29/9/09 1400	At risk	All instances